This book introduces the reader to data science using R and the tidyverse. No prerequisite knowledge is needed in college-level programming or mathematics (e.g., calculus or statistics). The book is self-contained so readers can immediately begin building data science workflows without needing to reference extensive amounts of external resources for onboarding. The contents are targeted for undergraduate students but are equally applicable to students at the graduate level and beyond. The book develops concepts using many real-world examples to motivate the reader.
Upon completion of the text, the reader will be able to:
- Gain proficiency in R programming
- Load and manipulate data frames, and "tidy" them using tidyverse tools
- Conduct statistical analyses and draw meaningful inferences from them
- Perform modeling from numerical and textual data
- Generate data visualizations (numerical and spatial) using ggplot2 and understand what is being represented
An accompanying R package "edsdata" contains synthetic and real datasets used by the textbook and is meant to be used for further practice. An exercise set is made available and designed for compatibility with automated grading tools for instructor use.
1. Data Types 2. Data Transformation 3. Data Visualization 4. Building Simulations 5. Sampling 6. Hypothesis Testing 7. Quantifying Uncertainty 8. Towards Normality 9. Regression 10. Text Analysis