1st Edition
Big Data Analytics A Guide to Data Science Practitioners Making the Transition to Big Data
Part 1. Setting the Scene: Analyzing Big Data 1. What is Big in "Big Data"? 2. Approaches to Analyzing Big Data 3. The Two Domains of Big Data Analytics Part 2. Platform: Software and Computing Resources 4. Software: Programming with (Big) Data 5. Hardware: Computing Resources 6. Distributed Systems 7. Cloud Computing Part 3. Components of Big Data Analytics 8. Data Collection and Data Storage 9. Big Data Cleaning and Transformation 10. Descriptive Statistics and Aggregation 11. (Big) Data Visualization Part 4. Application: Topics in Big Data Econometrics 12. Bottlenecks in Everyday Data Analytics Tasks 13. Econometrics with GPUs 14. Regression Analysis and Categorization with Spark and R 15. Large-scale Text Analysis with sparklyr Part 5. Appendices Appendix A. GitHub Appendix B. R Basics Appendix C. Install Hadoop
Biography
Ulrich Matter is an Assistant Professor of Economics at the University of St.Gallen. His primary research interests lie at the intersection of data science, political economics, and media economics. His teaching activities cover topics in data science, applied econometrics, and data analytics. Before joining the University of St. Gallen, he was a Visiting Researcher at the Berkman Klein Center for Internet & Society at Harvard University and a postdoctoral researcher and lecturer at the Faculty for Business and Economics, University of Basel.
“This book is a superb practical guide for data scientists and graduate students in business and economics interested in data analytics. The combination of a clear introduction to the concepts and techniques of big data analytics with examples of how to code these tools makes this book both accessible and practical. I highly recommend this book to anyone seeking to prepare themselves for the ever-evolving world of data analytics in business and economics research.”
- Oded Netzer, Vice Dean for Research, Columbia Business School"Ulrich Matter’s book on Big Data Analytics is an ideal resource for academics and corporate practitioners who have had some exposure to data analytics and want to enrich their toolbox to handle Big Data. This monograph sets the scene from many points of view: programming techniques, databases, distributed computing, Big Data handling, visualization, machine learning, and GPU deployment. Even though R has been chosen as the programming language, many techniques discussed in the book are not R-dependent and can be easily translated into other languages and computing environments. The writing style makes this handbook useful both as a main reference in the teaching of a course in related topics as well as an aid for those who want to learn the material independently. The author’s approach is 100% hands-on. Not much attention is paid to the technical aspects involving algorithms; all the focus goes to implementation strategies and to the specificities of the interplay between programming, hardware, databases, and visualization problems that arises in Big Data contexts. The book has been thoroughly tested in classes that the author has been teaching for a number of years, which makes it a safe bet for those looking for a textbook on the topic. I highly recommend it!"
- Juan-Pablo Ortega, Head, Division of Mathematical Sciences, Nanyang Technological University






