Data Science: A First Introduction focuses on using the R programming language in Jupyter notebooks to perform data manipulation and cleaning, create effective visualizations, and extract insights from data using classification, regression, clustering, and inference.
The text emphasizes workflows that are clear, reproducible, and shareable, and includes coverage of the basics of version control. All source code is available online, demonstrating the use of good reproducible project workflows.
Based on educational research and active learning principles, the book uses a modern approach to R and includes accompanying autograded Jupyter worksheets for interactive, self-directed learning. The book will leave readers well-prepared for data science projects.
The book is designed for learners from all disciplines with minimal prior knowledge of mathematics and programming. The authors have honed the material through years of experience teaching thousands of undergraduates in the University of British Columbia’s DSCI100: Introduction to Data Science course.
Table of Contents
1. R and the tidyverse, 2. Reading in data locally and from the web, 3. Cleaning and wrangling data, 4. Effective data visualization, 5. Classification I: training & predicting, 6. Classification II: evaluation & tuning, 7. Regression I: K-nearest neighbors, 8. Regression II: linear regression, 9. Clustering, 10. Statistical inference, 11. Combining code and text with Jupyter, 12. Collaboration with version control, 13. Setting up your computer
Tiffany Timbers is an Assistant Professor of Teaching in the Department of Statistics and Co-Director for the Master of Data Science program (Vancouver Option) at the University of British Columbia.
Trevor Campbell is an Assistant Professor in the Department of Statistics at the University of British Columbia.
Melissa Lee is an Assistant Professor of Teaching in the Department of Statistics at the University of British Columbia
'Many students leave school with a thorough understanding of core statistical theories and machine learning algorithms but a limited sense for how to put these ideas into practice. Real data science work entails a far broader set of skills including communication, collaboration, technical project management, and rapid iteration. Data Science: a First Introduction targets this gap by previewing this broader set of topics. By including less often discussed concepts like version control and modeling pipelines, that are often neglected at the introductory level, this book will help students build the right 'muscles' from the beginning of their studies and convert their knowledge into practice.'
-Emily Riederer, Capitol One
This book provides a sophisticated first introduction to the field of data science and provides a balanced mix of practical skills along with generalizable principles. As we continue to introduce students to data science and train them to confront an expanding array of data science problems, they will be well-served by the ideas presented here.
-Roger Peng, Johns Hopkins University (from the Forward)
[…] The authors provide a friendly, effective on-ramp to programmatic data analysis with R and the tidyverse. I appreciate the coverage of critical practical matters, which are often neglected or written off as “out of scope”, such as navigating the file system, developing a sustainable workflow, and using version control. […] Although it’s aimed an introductory level, more experienced readers will enjoy dipping into this book for accessible content on a variety of modern data science tools and topics.
– Jenny Bryan, RStudio
This book offers a clear, thoughtful, and systematic treatment of the fundamentals of data science, with accompanying R code. As its name implies, it is truly an introduction, and is suitable for those who wish to self-teach R and data science, as well as to college instructors teaching a first course in data science. With a diverse set of topics […] this book is a one-stop shop that will be a valuable resource for years to come.
-Daniela Witten, University of Washington
This book is a comprehensive introduction to data science […]. In addition to data wrangling and visualization with the tidyverse, the book also provides a deep dive into statistical modeling and inference with the tidymodels framework, which makes this book an incredibly valuable addition to the landscape of introductory data science books.
-Mine Çetinkaya-Rundel, Professor of the Practice at Duke University and Educator at RStudio
The authors of this new textbook are expert teachers as well as data scientists, and that expertise is reflected in each chapter and every exercise. Topics are introduced in a digestible order, examples are approachable and well-motivated, and all the code is presented in digestible, carefully-explained pieces. If you are using R to introduce students to reproducible quantitative analysis, this "First Introduction" should be your first choice.
-Greg Wilson, Third-Bit Inc.
[…] This book starts off by working with data and visualizing it, then levels up quickly to high impact topics like predictive modeling, inference, and collaboration with version control. […] These are topics that are tricky to squeeze into an intro text. But this book is not intimidating – each topic is framed in way that is approachable yet advanced, and the authors give readers a lot of support along the way. […] This book has also been field-tested by a highly respected data science education program at University of British Columbia[…], making it an ideal resource that educators can trust and rely on to freshen up their own materials and workflows.
-Alison Hill, IBM
'This book is superb. It is written in a lively and engaging style, which grabs the reader’s attention. It sees the goal of data analysis as that of finding answers to study questions, which in turn can lead to knowledge discovery. That paradigm for inquiry is well demonstrated by step-by-step demonstrations using novel, interesting datasets e.g. concerning indigenous languages. For that reason, I would strongly recommend the book for self-study, as well as for courses on data analysis'
- Jim Zidek, University of British Columbia
'The book is well written, organized, and focused. Readers will appreciate the level of detail given and the intuitive explanations and graphics. I applaud the authors for writing such an excellent introductory text.'
-Adam Loy, Carleton University
'More than anything else, what I like about this book is the thoughtful ordering of the chapters… the mindful order in which topics are presented in this textbook aligns with how I think they should be taught to the students of today. This is in contrast to how these topics were learned by the textbook authors of today who were the students of yesterday. As textbook authors, it’s hard to break free of this “curse of knowledge” and cover topics from the perspective of someone starting with a clean slate. This job breaks free of this curse and presents the freshest perspective in introductory data science I've seen to date.'
- Albert Kim, Smith College
'I made it 10 per cent of the way through Timbers et al before I learnt something new. Frankly I was surprised I made it so far. Data science pedagogy has been so disjoint and so many of us are self-taught that it is refreshing to have a class-room-tested textbook that is focused on workflows and reproducibility. The approaches are rigorous and opinionated, and the text is filled with kindness and warmth. It is the book that I wish I had when I first came to learn this material. The book is unashamedly focused on the newest innovations including `tidymodels` and the native pipe operator, and I soon found myself learning things, on average, at roughly one-thing-per-page, which was an exciting experience for someone who spends his days doing and teaching data science in R. This is a text that I can see myself coming back to regularly, not just in my teaching, but as a reference. I am hopeful that the authors will go on to write "Data Science", and "Advanced Data Science", without too much delay!'
- Rohan Alexander, University of Toronto