Statistical Inference via Data Science: A ModernDive into R and the Tidyverse provides a pathway for learning about statistical inference using data science tools widely used in industry, academia, and government. It introduces the tidyverse suite of R packages, including the ggplot2 package for data visualization, and the dplyr package for data wrangling. After equipping readers with just enough of these data science tools to perform effective exploratory data analyses, the book covers traditional introductory statistics topics like confidence intervals, hypothesis testing, and multiple regression modeling, while focusing on visualization throughout.
● Assumes minimal prerequisites, notably, no prior calculus nor coding experience
● Motivates theory using real-world data, including all domestic flights leaving New York City in 2013, the Gapminder project, and the data journalism website, FiveThirtyEight.com
● Centers on simulation-based approaches to statistical inference rather than mathematical formulas
● Uses the infer package for "tidy" and transparent statistical inference to construct confidence intervals and conduct hypothesis tests via the bootstrap and permutation methods
● Provides all code and output embedded directly in the text; also available in the online version at moderndive.com
This book is intended for individuals who would like to simultaneously start developing their data science toolbox and start learning about the inferential and modeling tools used in much of modern-day research. The book can be used in methods and data science courses and first courses in statistics, at both the undergraduate and graduate levels.
Table of Contents
1 Getting Started with Data in R
I Data Science via the tidyverse
2 Data Visualization
3 Data Wrangling
4 Data Importing & “Tidy” Data
II Data Modeling via moderndive
5 Basic Regression
6 Multiple Regression
III Statistical Inference via infer
8 Bootstrapping & Confidence Intervals
9 Hypothesis Testing
10 Inference for Regression
11 Tell the Story with Data
A Statistical Background
B Information about R packages Used
• Chester Ismay is a Data Science Evangelist for DataRobot and is based in Portland, Oregon, USA.
•Albert Y. Kim is an Assistant Professor of Statistical and Data Sciences at Smith College in Northampton, Massachusetts, USA.
Featured Author Profiles
"Through apt use of analogies, hands-on exercises, and abundant opportunities to get coding, this book delivers on its promise to give a reader without a background in statistics or programming the tools necessary for understanding and conducting real-world statistical inference and data analysis. With an emphasis on learning new concepts first "by hand," before turning to the code, it would make a particularly useful classroom companion. However, the "learning checks" provided throughout also make it a great guide for self-study. Students and teachers alike will benefit from this thoughtful introduction, as it addresses even the smallest of details that can trip beginners up, and keep them from getting to the more fruitful parts of data analysis."
- Mara Averick, Developer Advocate, RStudio, Inc.
"This is a comprehensive, modern resource for teaching and learning data science. ModernDive couples the introduction of core statistical concepts directly with learning how to apply data science methods to realistic data sets using the R programming language. The pedagogical approach of ModernDive is thoughtful and highly effective. The text engages learners early with tangible and practical concepts, such as creating data visualizations, that enable students to see early returns on their investment in learning R. The authors have created a guide to learning data science that increases students’ engagement and enthusiasm, while simultaneously providing students with the depth of understanding needed to conduct meaningful and reproducible data analyses. ModernDive is my go-to resource for teaching data science. I use it in all of my courses and workshops and I have found it to be the most effective and comprehensive introduction to data science in R available."
- Rich Majerus, Queens University of Charlotte
"With its emphasis on visualization, real world data, and simulation, along with clear instructions about how to work with R and the Tidyverse, ModernDive is the most accessible and student-friendly statistics textbook I have taught from. The book's early chapters on data wrangling and visualization provide students with hands-on experience with real data and get them excited about making beautiful and informative figures with modern statistical tools like R and the Tidyverse. Where the book especially shines is its simulation-based approach to modeling, confidence intervals, and hypothesis testing. Instead of teaching a complicated flowchart with dozens of types of statistical tests, the book is instead centered around linear modeling and simulation. The chapters on hypothesis testing use simulation to teach about p-values, an approach that students find eminently intuitive. Overall, ModernDive is a phenomenal modern introduction to statistical inference—it is an essential book for any statistics instructor!"
-Dr. Andrew Heiss, Andrew Young School of Policy Studies, Georgia State University
"The monograph belongs to the The R series, and it can serve as a convenient way for learning data science and statistics simultaneously with the R language. The textbook consists of four parts, eleven chapters, and each chapter contains sections and subsections. In Preface, the authors describe the book structure and illustrate it with a pipeline going from importing data to making its tidy version, which is applied in a loop of transforming-modeling-visualizing, and finally is used for communication, or interpretation and reporting of the modeling results...The monograph supplies multiple links to the websites of the R packages and related statistical methods, and the online version of the book with all the codes and outputs is available at moderndive.com. The textbook presents to students and researchers a very useful introduction to the data science and contemporary R programing, with numerous examples of R implementation for solving various problems of statistical estimation and inference."
- Stan Lipovetsky, Technometrics, Vol 62