Statistical Inference via Data Science: A ModernDive into R and the Ti

Description

Statistical Inference via Data Science: A ModernDive into R and the Tidyverse, Second Edition offers a comprehensive guide to learning statistical inference with data science tools widely used in industry, academia, and government. The first part of this book introduces the tidyverse suite of R packages, including ggplot2 for data visualization and dplyr for data wrangling. The second... Read more

Statistical Inference via Data Science: A ModernDive into R and the Tidyverse, Second Edition offers a comprehensive guide to learning statistical inference with data science tools widely used in industry, academia, and government. The first part of this book introduces the tidyverse suite of R packages, including ggplot2 for data visualization and dplyr for data wrangling. The second part introduces data modeling via simple and multiple linear regression. The third part presents statistical inference using simulation-based methods within a general framework implemented in R via the infer package, a suitable complement to the tidyverse. By working with these methods, readers can implement effective exploratory data analyses, conduct statistical modeling with data, and carry out statistical inference via confidence intervals and hypothesis testing. All of these tasks are performed by strongly emphasizing data visualization.

Key Features in the Second Edition:

Minimal Prerequisites: No prior calculus or coding experience is needed, making the content accessible to a wide audience.
Real-World Data: Learn with real-world datasets, including all domestic flights leaving New York City in 2023, the Gapminder project, FiveThirtyEight.com data, and new datasets on health, global development, music, coffee quality, and geyser eruptions.
Simulation-Based Inference: Statistical inference through simulation-based methods.
Expanded Theoretical Discussions: Includes deeper coverage of theory-based approaches, their connection with simulation-based approaches, and a presentation of intuitive and formal aspects of these methods.
Enhanced Use of the infer Package: Leverages the infer package for “tidy” and transparent statistical inference, enabling readers to construct confidence intervals and conduct hypothesis tests through multiple linear regression and beyond.
Dynamic Online Resources: All code and output are embedded in the text, with additional interactive exercises, discussions, and solutions available online.
Broadened Applications: Suitable for undergraduate and graduate courses, including statistics, data science, and courses emphasizing reproducible research.

The first edition of the book has been used in so many different ways--for courses in statistical inference, statistical programming, business analytics, and data science for social policy, and by professionals in many other means. Ideal for those new to statistics or looking to deepen their knowledge, this edition provides a clear entry point into data science and modern statistical methods.

Read less

Author(s)

Biography

Chester Ismay is Vice President of Data and Automation at MATE Seminars and is a freelance data science consultant and instructor. He also teaches in the Center for Executive and Professional Education at Portland State University. He completed his PhD in statistics from Arizona State University in 2013. He has previously worked in various roles, including as an actuary at Scottsdale Insurance Company (now Nationwide E&S/Specialty) and at Ripon College, Reed College, and Pacific University. He has experience working in online education and was previously a Data Science Evangelist at DataRobot, where he led data science, machine learning, and data engineering in-person and virtual workshops for DataRobot University. In addition to his work for *ModernDive*, he contributed as the initial developer of the `infer` R package and is the author and maintainer of the `thesisdown` R package.

Albert Y. Kim is an Associate Professor of Statistical & Data Sciences at Smith College in Northampton, MA, USA. He completed his PhD in statistics at the University of Washington in 2011. Previously he worked in the Search Ads Metrics Team at Google Inc.\ as well as at Reed, Middlebury, and Amherst Colleges. In addition to his work for *ModernDive*, he is a co-author of the `resampledata` and `SpatialEpi` R packages. Both Dr. Kim and Dr. Ismay, along with Jennifer Chunn, are co-authors of the `fivethirtyeight` package of code and datasets published by the data journalism website FiveThirtyEight.com.

Arturo Valdivia is a Senior Lecturer in the Department of Statistics at Indiana University, Bloomington. He earned his PhD in Statistics from Arizona State University in 2013. His research interests focus on statistical education, exploring innovative approaches to help students grasp complex ideas with clarity. Over his career, he has taught a wide range of statistics courses, from introductory to advanced levels, to more than 1,800 undergraduate students and over 900 graduate students pursuing master's and Ph.D. programs in statistics, data science, and other disciplines. In recognition of his teaching excellence, he received Indiana University’s Trustees Teaching Award in 2023.

Critics' Reviews

Praise for the First Edition:

"My overall impression of the book is very positive. If you want to learn R programming and statistics at the same time, this is a good book for you. I like the intertwining of the two since I think modern data analysis requires computing. Focusing on resampling techniques for the creation of confidence intervals and the conducting of hypothesis tests is a deviation from typical introductory books. I think that focus helps solidify a student’s understanding of sampling variability and its central role in statistical inference."
~Adam L. Pintar, Journal of Quality Technology

"Through the use of analogies, hands-on exercises, and abundant opportunities to get coding, this book delivers on its promise to give a reader, without a background in statistics or programming the tools necessary for understanding and conducting real-world statistical inference and data analysis. With an emphasis on learning new concepts first 'by hand,' before turning to the code, it would make a particularly useful classroom companion. However, the 'learning checks' provided throughout also make it a great guide for self-study. Students and teachers alike will benefit from this thoughtful introduction, as it addresses even the smallest of details that can trip beginners (and up), and often prevent them from getting to the more fruitful parts of data analysis."
~Mara Averick, Developer Advocate, RStudio, Inc.

"This is a comprehensive, modern resource for teaching and learning data science. ModernDive couples the introduction of core statistical concepts directly with learning how to apply data science methods to realistic data sets using the R programming language. The pedagogical approach of ModernDive is thoughtful and highly effective. The text engages learners early with tangible and practical concepts (such as creating data visualizations) that enable students to see early returns on their investment in learning R. The authors have created a guide to learning data science that increases students’ engagement and enthusiasm, while simultaneously providing students with the depth of understanding needed to conduct meaningful and reproducible data analyses. ModernDive is my go-to resource for teaching data science. I use it in all of my courses and workshops and I have found it to be the most effective and comprehensive introduction to data science in R available."
~Rich Majerus, Queens University of Charlotte

"With its emphasis on visualization, real world data, and simulation, along with clear instructions about how to work with R and the Tidyverse, ModernDive is the most accessible and student-friendly statistics textbook I have taught from. The book's early chapters on data wrangling and visualization provide students with hands-on experience with real data and get them excited about making beautiful and informative figures with modern statistical tools like R and the Tidyverse. Where the book especially shines is its simulation-based approach to modeling, confidence intervals, and hypothesis testing. Instead of teaching a complicated flowchart with dozens of types of statistical tests, the book is instead centered around linear modeling and simulation. The chapters on hypothesis testing use simulation to teach about p-values, an approach that students find eminently intuitive. Overall, ModernDive is a phenomenal modern introduction to statistical inference—it is an essential book for any statistics instructor!"
~Dr. Andrew Heiss, Andrew Young School of Policy Studies, Georgia State University

"The monograph belongs to the The R series, and it can serve as a convenient way for learning data science and statistics simultaneously with the R language. The textbook consists of four parts, eleven chapters, and each chapter contains sections and subsections. In the Preface, the authors describe the book structure and illustrate it with a pipeline going from importing data to making its tidy version, which is applied in a loop of transforming-modeling-visualizing, and finally is used for communication, or interpretation and reporting of the modeling results...The monograph supplies multiple links to the websites of the R packages and related statistical methods, and the online version of the book with all the codes and outputs is available at moderndive.com. The textbook presents to students and researchers a very useful introduction to the data science and contemporary R programing, with numerous examples of R implementation for solving various problems of statistical estimation and inference."
~Stan Lipovetsky, Technometrics, Vol 62

"One of the great things about this textbook is that the authors provide great learning checks and helpful hints scattered throughout the chapters, with links in the text to references that can help the reader along if they get stuck. Although this textbook sticks to the simpler world of simple and multiple linear regression (foregoing the complexities of other regressions like logistic and Poisson), the take-home messages really apply to all types of regression for inference, especially considering the intended audience for this book is for instructors teaching introductory statistical inference courses (particularly those interested in using R).If you are an instructor, and are teaching an introductory course to statistical inference (and particularly want to teach it in R), I highly recommend this text for its adaptability, availability, and ease of use."
~Zachary Fusfeld, Biometrics

"The new ModernDive (Statistical Inference via Data Science) textbook is simply wonderful! It uses accessible language to introduce the topics of data science and statistics, as well as an intuitive simulation-based inference first approach. Importantly, it does not stop there. It also places great emphasis on how to do all of this in the R programming language! True to the book's name, the R code taught and demonstrated in the book uses a modern, tidy approach for data wrangling, visualization and statistics. I have used it successfully in an introductory statistics setting at both the undergraduate level and the professional Master's level. Furthermore, I would choose to do this again."
~Tiffany Timbers, University of British Columbia

"With the help of visualization, the authors give examples of identifying outliers and identifying relationships between continuous numerical data. Based on this, we can conclude that the authors very well describe one of the steps of data analysis – pre-processing. This step is important because it is a main milestone in the identification of the relationship between variables in the data...The authors also provide a detailed review of the main methods of presenting the classical results based on linear models. This part is very important in the preparation of articles or books and greatly simplifies the work on the preparation."
~Igor Malyk, ISCB News

“The forementioned book is a successful attempt to help convert classical statisticians into modern data scientists. This book aims and provides an excellent exposition of data-driven statistical tools to draw statistical inferences from data, all while using the R software and its ‘tidyverse’ package…This book is designed for those who want to understand and know how to retrieve the information hidden inside the provided data, using R software using the tools of classical statistics. The authors have tried to keep the readers away from in-depth mathematical details while presenting the material in this book. The authors assume that the readers have a good grasp of the statistical tools and methodologies…The topics are accompanied and explained with data-based examples.”
~Shalabh, IIT Kanpur

Statistical Inference via Data Science A ModernDive into R and the Tidyverse

Description

Table of Contents

Author(s)

Biography

Critics' Reviews

SOCIAL NETWORKS

Secure Shopping Payment Options