1st Edition

From Concepts to Code Introduction to Data Science

By Adam P. Tashman Copyright 2024
    385 Pages 78 B/W Illustrations
    by Chapman & Hall

    385 Pages 78 B/W Illustrations
    by Chapman & Hall

    385 Pages 78 B/W Illustrations
    by Chapman & Hall

    The breadth of problems that can be solved with data science is astonishing, and this book provides the required tools and skills for a broad audience. The reader takes a journey into the forms, uses, and abuses of data and models, and learns how to critically examine each step. Python coding and data analysis skills are built from the ground up, with no prior coding experience assumed. The necessary background in computer science, mathematics, and statistics is provided in an approachable manner.

    Each step of the machine learning lifecycle is discussed, from business objective planning to monitoring a model in production. This end-to-end approach supplies the broad view necessary to sidestep many of the pitfalls that can sink a data science project. Detailed examples are provided from a wide range of applications and fields, from fraud detection in banking to breast cancer classification in healthcare. The reader will learn the techniques to accomplish tasks that include predicting outcomes, explaining observations, and detecting patterns. Improper use of data and models can introduce unwanted effects and dangers to society. A chapter on model risk provides a framework for comprehensively challenging a model and mitigating weaknesses. When data is collected, stored, and used, it may misrepresent reality and introduce bias. Strategies for addressing bias are discussed. From Concepts to Code: Introduction to Data Science leverages content developed by the author for a full-year data science course suitable for advanced high school or early undergraduate students. This course is freely available and it includes weekly lesson plans.

    1. Introduction

    2. Communicating Effectively and Earning Trust

    3. Data Science Project Planning

    4. An Overview of Data

    5. Computing Preliminaries and Setup

    6. Data Processing

    7. Data Storage and Retrieval

    8. Mathematics Preliminaries

    9. Statistics Preliminaries

    10. Data Transformation

    11. Exploratory Data Analysis

    12. An Overview of Machine Learning

    13. Modeling with Linear Regression

    14. Classification with Logistic Regression

    15. Clustering with K-Means

    16. Elements of Reproducible Data Science

    17. Model Risk

    18. Next Steps

    Symbols

    Biography

    Adam P. Tashman has been working in data science for over twenty years. He is Associate Professor of Data Science at the University of Virginia School of Data Science. He is currently Director of the Capstone program, and he was formerly Director of the Online Master's of Data Science program. He was the School of Data Science Capital One Fellow for the 2023-2024 academic year. Dr. Tashman won multiple awards from Amazon Web Services, where he advised education and government technology companies on best practices in machine learning and artificial intelligence. Dr. Tashman lives in Charlottesville, VA with his wonderful wife Elle and daughter Callie.