1st Edition

Big Data Analytics A Guide to Data Science Practitioners Making the Transition to Big Data

By Ulrich Matter Copyright 2024
    328 Pages 40 Color & 22 B/W Illustrations
    by Chapman & Hall

    328 Pages 40 Color & 22 B/W Illustrations
    by Chapman & Hall

    328 Pages 40 Color & 22 B/W Illustrations
    by Chapman & Hall

    Successfully navigating the data-driven economy presupposes a certain understanding of the technologies and methods to gain insights from Big Data. This book aims to help data science practitioners to successfully manage the transition to Big Data.  
    Building on familiar content from applied econometrics and business analytics, this book introduces the reader to the basic concepts of Big Data Analytics. The focus of the book is on how to productively apply econometric and machine learning techniques with large, complex data sets, as well as on all the steps involved before analysing the data (data storage, data import, data preparation). The book combines conceptual and theoretical material with the practical application of the concepts using R and SQL. The reader will thus acquire the skills to analyse large data sets, both locally and in the cloud. Various code examples and tutorials, focused on empirical economic and business research, illustrate practical techniques to handle and analyse Big Data.  

    Key Features: 
     
    - Includes many code examples in R and SQL, with R/SQL scripts freely provided online.  
    - Extensive use of real datasets from empirical economic research and business analytics, with data files freely provided online.  
    - Leads students and practitioners to think critically about where the bottlenecks are in practical data analysis tasks with large data sets, and how to address them.  
     

    The book is a valuable resource for data science practitioners, graduate students and researchers who aim to gain insights from big data in the context of research questions in business, economics, and the social sciences. 

    Part 1. Setting the Scene: Analyzing Big Data  1. What is Big in "Big Data"?  2. Approaches to Analyzing Big Data  3. The Two Domains of Big Data Analytics  Part 2. Platform: Software and Computing Resources  4. Software: Programming with (Big) Data  5. Hardware: Computing Resources  6. Distributed Systems  7. Cloud Computing  Part 3. Components of Big Data Analytics  8. Data Collection and Data Storage  9. Big Data Cleaning and Transformation  10. Descriptive Statistics and Aggregation  11. (Big) Data Visualization  Part 4. Application: Topics in Big Data Econometrics  12. Bottlenecks in Everyday Data Analytics Tasks  13. Econometrics with GPUs  14. Regression Analysis and Categorization with Spark and R  15. Large-scale Text Analysis with sparklyr  Part 5. Appendices  Appendix A. GitHub  Appendix B. R Basics  Appendix C. Install Hadoop

    Biography

    Ulrich Matter is an Assistant Professor of Economics at the University of St.Gallen. His primary research interests lie at the intersection of data science,  political economics, and media economics. His teaching activities cover topics in data science, applied econometrics, and data analytics. Before joining the University of St. Gallen, he was a Visiting Researcher at the Berkman Klein Center for Internet & Society at Harvard University and a postdoctoral researcher and lecturer at the Faculty for Business and Economics, University of Basel. 

    “This book is a superb practical guide for data scientists and graduate students in business and economics interested in data analytics. The combination of a clear introduction to the concepts and techniques of big data analytics with examples of how to code these tools makes this book both accessible and practical. I highly recommend this book to anyone seeking to prepare themselves for the ever-evolving world of data analytics in business and economics research.”
    - Oded Netzer, Vice Dean for Research, Columbia Business School

    "Ulrich Matter’s book on Big Data Analytics is an ideal resource for academics and corporate practitioners who have had some exposure to data analytics and want to enrich their toolbox to handle Big Data. This monograph sets the scene from many points of view: programming techniques, databases, distributed computing, Big Data handling, visualization, machine learning, and GPU deployment. Even though R has been chosen as the programming language, many techniques discussed in the book are not R-dependent and can be easily translated into other languages and computing environments.  The writing style makes this handbook useful both as a main reference in the teaching of a course in related topics as well as an aid for those who want to learn the material independently. The author’s approach is 100% hands-on. Not much attention is paid to the technical aspects involving algorithms; all the focus goes to implementation strategies and to the specificities of the interplay between programming, hardware, databases, and visualization problems that arises in Big Data contexts. The book has been thoroughly tested in classes that the author has been teaching for a number of years, which makes it a safe bet for those looking for a textbook on the topic. I highly recommend it!"
    - Juan-Pablo Ortega, Head, Division of Mathematical Sciences, Nanyang Technological University