464 Pages 224 Color & 3 B/W Illustrations
    by Chapman & Hall

    464 Pages 224 Color & 3 B/W Illustrations
    by Chapman & Hall

    Data Science: A First Introduction with Python focuses on using the Python programming language in Jupyter notebooks to perform data manipulation and cleaning, create effective visualizations, and extract insights from data using classification, regression, clustering, and inference. It emphasizes workflows that are clear, reproducible, and shareable, and includes coverage of the basics of version control. Based on educational research and active learning principles, the book uses a modern approach to Python and includes accompanying autograded Jupyter worksheets for interactive, self-directed learning. The text will leave readers well-prepared for data science projects. It is designed for learners from all disciplines with minimal prior knowledge of mathematics and programming. The authors have honed the material through years of experience teaching thousands of undergraduates at the University of British Columbia.

    Key Features:

    • Includes autograded worksheets for interactive, self-directed learning.
    • Introduces readers to modern data analysis and workflow tools such as Jupyter notebooks and GitHub, and covers cutting-edge data analysis and manipulation Python libraries such as pandas, scikit-learn, and altair.
    • Is designed for a broad audience of learners from all backgrounds and disciplines.

    Preface

    Foreword

    Acknowledgments

    1. Python and Pandas

    2. Reading in data locally and from the web

    3. Cleaning and wrangling data

    4. Effective data visualization

    5. Classification I: training & predicting

    6. Classification II: evaluation & tuning

    7. Regression I: K-nearest neighbors

    8. Regression II: linear regression

    9. Clustering

    10. Statistical inference

    11. Combining code and text with Jupyter

    12. Collaboration with version control

    13. Setting up your computer

    Bibliography

    Index

    Biography

    Tiffany Timbers is an Associate Professor of Teaching in the Department of Statistics and Co-Director for the Master of Data Science program (Vancouver Option) at the University of British Columbia. In these roles she teaches and develops curriculum around the responsible application of Data Science to solve real-world problems. One of her favourite courses she teaches is a graduate course on collaborative software development, which focuses on teaching how to create R and Python packages using modern tools and workflows.

    Trevor Campbell is an Associate Professor in the Department of Statistics at the University of British Columbia. His research focuses on automated, scalable Bayesian inference algorithms, Bayesian nonparametrics, streaming data, and Bayesian theory. He was previously a postdoctoral associate in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and Institute for Data, Systems, and Society (IDSS) at MIT and a Ph.D. candidate in the Laboratory for Information and Decision Systems (LIDS) at MIT.

    Melissa Lee is an Assistant Professor of Teaching in the Department of Statistics at the University of British Columbia. She teaches and develops curriculum for undergraduate statistics and data science courses. Her work focuses on student-centered approaches to teaching, developing and assessing open educational resources, and promoting equity, diversity, and inclusion initiatives.

    Joel Ostblom is an Assistant Professor of Teaching in the Statistics Department at the University of British Columbia. He teaches and develops data science courses at the graduate and undergraduate level, with a focus on data visualization, data science ethics, and machine learning. Joel cares deeply about spreading data literacy and excitement over programmatic data analysis, which is reflected in his contributions to open source projects and openly accessible data science learning resources.

    Lindsey Heagy is an Assistant Professor in the Department of Earth, Ocean and Atmospheric Sciences and Director of the Geophysical Inversion Facility at UBC. Her research combines computational methods in numerical simulations, inversions, and machine learning for using geophysical data to characterize the subsurface. Primary applications of interest include mineral exploration, carbon sequestration, groundwater, and environmental studies.

    "This book offers a clear, thoughtful, and systematic treatment of the fundamentals of data science, with accompanying Python code. As its name implies, it is truly an introduction, and is suitable for those who wish to self-teach Python and data science, as well as to college instructors teaching a first course in data science. With a diverse set of topics that includes (among others) getting data from the web, visualization, cross-validation, clustering, and version control, this book is a one-stop shop that will be a valuable resource for years to come."
    Daniela Witten, University of Washington. 

    "The authors of this new textbook are expert teachers as well as data scientists, and that expertise is reflected in each chapter and every exercise. Topics are introduced in a digestible order, examples are approachable and well-motivated, and all the code is presented in digestible, carefully-explained pieces. If you are using Python to introduce students to reproducible quantitative analysis, this "First Introduction" should be your first choice."
    Greg Wilson, Third-Bit Inc.

    "This book provides a sophisticated first introduction to the field of data science and provides a balanced mix of practical skills along with generalizable principles. As we continue to introduce students to data science and train them to confront an expanding array of data science problems, they will be well-served by the ideas presented here."
    - Roger Peng, John Hopkins Unviersity - From the Foreword

    "… The authors provide a friendly, effective on-ramp to programmatic data analysis with Python and key packages for data analysis (e.g., pandas, altair, and scikit-learn). I appreciate the coverage of critical practical matters, which are often neglected or written off as “out of scope”, such as navigating the file system, developing a sustainable workflow, and using version control..."
    Jenny Bryan, Posit

    "This book is a comprehensive introduction to data science … In addition to data wrangling and visualization with pandas and altair, the book also provides a deep dive into statistical modeling and inference with the scikit-learn framework, which makes this book an incredibly valuable addition to the landscape of introductory data science books."
    Mine Çetinkaya-Rundel, Professor of the Practice at Duke University and Educator at Posit

    " … This book starts off by working with data and visualizing it, then levels up quickly to high impact topics like predictive modeling, inference, and collaboration with version control … These are topics that are tricky to squeeze into an intro text. But this book is not intimidating … This book has also been field-tested by a highly respected data science education program at the University of British Columbia… making it an ideal resource that educators can trust and rely on to freshen up their own materials and workflows."
    Alison Hill, IBM

    "I made it 10 per cent of the way through Timbers et al before I learnt something new. Frankly I was surprised I made it so far. Data science pedagogy has been so disjoint and so many of us are self-taught that it is refreshing to have a class-room-tested textbook that is focused on workflows and reproducibility. The approaches are rigorous and opinionated, and the text is filled with kindness and warmth. It is the book that I wish I had when I first came to learn this material. The book is unashamedly focused on the newest innovations, and the use of Python makes it especially widely applicable. Going through the book I found myself learning things, on average, at roughly one-thing-per-page, which was an exciting experience for someone who spends his days doing and teaching data science. This is a text that I can see myself coming back to regularly, not just in my teaching, but as a reference. I am hopeful that the authors will go on to write "Advanced Data Science", without too much delay!"
    Rohan Alexander, University of Toronto

    "This is a truly introductory data science book. There are few books on the market that don’t (at least tacitly) assume a somewhat high level of familiarity with coding, or that the reader will pick it up quickly. This book meets the truly novice data scientist where they are and guides them through the basics of many important concepts. While this book won’t make anyone an expert data scientist, it will give the reader a flavor for what they can do, and it will give them the tools necessary to tackle a variety of simple projects and the knowledge to read more other introductory and intermediate texts. The book is well written, organized, and focused. Readers will appreciate the level of detail given and the intuitive explanations and graphics. I applaud the authors for writing such an excellent introductory text."
    Adam Loy, Carleton University

    "More than anything else, what I like about this book is the thoughtful ordering of the chapters. The order in which topics are presented in this textbook aligns with how I think they should be taught to the students of today; in contrast to how these topics were learned by the textbook authors of today who were the students of yesterday. As textbook authors, it’s hard to break free of this “curse of knowledge” and cover topics from the perspective of someone starting with a clean slate. This book breaks free of this curse and presents the freshest perspective in introductory data science I've seen to date."
    Albert Y. Kim, Smith College

    "Many students leave school with a thorough understanding of core statistical theories and machine learning algorithms but a limited sense for how to put these ideas into practice. Real data science work entails a far broader set of skills including communication, collaboration, technical project management, and rapid iteration. Data Science: a First Introduction targets this gap by previewing this broader set of topics. By including less often discussed concepts like version control and modeling pipelines, that are often neglected at the introductory level, this book will help students build the right 'muscles' from the beginning of their studies and convert their knowledge into practice."
    Emily Riederer, Capitol One

    "This book is superb. It is written in a lively and engaging style, which grabs the reader’s attention. It sees the goal of data analysis as that of finding answers to study questions, which in turn can lead to knowledge discovery. That paradigm for inquiry is well demonstrated by step-by-step demonstrations using novel, interesting datasets e.g. concerning indigenous languages. For that reason, I would strongly recommend the book for self-study, as well as for courses on data analysis."
    Jim Zidek, University of British Columbia