1st Edition

Data Science and Machine Learning Mathematical and Statistical Methods

    538 Pages
    by Chapman & Hall

    "This textbook is a well-rounded, rigorous, and informative work presenting the mathematics behind modern machine learning techniques. It hits all the right notes: the choice of topics is up-to-date and perfect for a course on data science for mathematics students at the advanced undergraduate or early graduate level. This book fills a sorely-needed gap in the existing literature by not sacrificing depth for breadth, presenting proofs of major theorems and subsequent derivations, as well as providing a copious amount of Python code. I only wish a book like this had been around when I first began my journey!" -Nicholas Hoell, University of Toronto

    "This is a well-written book that provides a deeper dive into data-scientific methods than many introductory texts. The writing is clear, and the text logically builds up regularization, classification, and decision trees. Compared to its probable competitors, it carves out a unique niche. -Adam Loy, Carleton College

    The purpose of Data Science and Machine Learning: Mathematical and Statistical Methods is to provide an accessible, yet comprehensive textbook intended for students interested in gaining a better understanding of the mathematics and statistics that underpin the rich variety of ideas and machine learning algorithms in data science.

    Key Features:

    • Focuses on mathematical understanding.
    • Presentation is self-contained, accessible, and comprehensive.
    • Extensive list of exercises and worked-out examples.
    • Many concrete algorithms with Python code.
    • Full color throughout.


    Further Resources can be found on the authors website: https://github.com/DSML-book/Lectures



    Importing, Summarizing, and Visualizing Data


    Structuring Features According to Type

    Summary Tables

    Summary Statistics

    Visualizing Data

    Plotting Qualitative Variables

    Plotting Quantitative Variables

    Data Visualization in a Bivariate Setting


    Statistical Learning


    Supervised and Unsupervised Learning

    Training and Test Loss

    Tradeoffs in Statistical Learning

    Estimating Risk

    In-Sample Risk


    Modeling Data

    Multivariate Normal Models

    Normal Linear Models

    Bayesian Learning


    Monte Carlo Methods

    Introduction .

    Monte Carlo Sampling

    Generating Random Numbers

    Simulating Random Variables

    Simulating Random Vectors and Processes


    Markov Chain Monte Carlo

    Monte Carlo Estimation

    Crude Monte Carlo

    Bootstrap Method

    Variance Reduction

    Monte Carlo for Optimization

    Simulated Annealing

    Cross-Entropy Method

    Splitting for Optimization

    Noisy Optimization


    Unsupervised Learning


    Risk and Loss in Unsupervised Learning

    Expectation–Maximization (EM) Algorithm

    Empirical Distribution and Density Estimation

    Clustering via Mixture Models

    Mixture Models

    EM Algorithm for Mixture Models

    Clustering via Vector Quantization


    Clustering via Continuous Multiextremal Optimization

    Hierarchical Clustering

    Principal Component Analysis (PCA)

    Motivation: Principal Axes of an Ellipsoid

    PCA and Singular Value Decomposition (SVD)




    Linear Regression

    Analysis via Linear Models

    Parameter Estimation

    Model Selection and Prediction

    Cross-Validation and Predictive Residual Sum of Squares

    In-Sample Risk and Akaike Information Criterion

    Categorical Features

    Nested Models

    Coefficient of Determination

    Inference for Normal Linear Models

    Comparing Two Normal Linear Models

    Confidence and Prediction Intervals

    Nonlinear Regression Models

    Linear Models in Python



    Analysis of Variance (ANOVA)

    Confidence and Prediction Intervals

    Model Validation

    Variable Selection

    Generalized Linear Models


    Regularization and Kernel Methods



    Reproducing Kernel Hilbert Spaces

    Construction of Reproducing Kernels

    Reproducing Kernels via Feature Mapping

    Kernels from Characteristic Functions

    Reproducing Kernels Using Orthonormal Features

    Kernels from Kernels

    Representer Theorem

    Smoothing Cubic Splines

    Gaussian Process Regression

    Kernel PCA




    Classification Metrics

    Classification via Bayes’ Rule

    Linear and Quadratic Discriminant Analysis

    Logistic Regression and Softmax Classification

    K-nearest Neighbors Classification

    Support Vector Machine

    Classification with Scikit-Learn


    Decision Trees and Ensemble Methods


    Top-Down Construction of Decision Trees

    Regional Prediction Functions

    Splitting Rules

    Termination Criterion

    Basic Implementation

    Additional Considerations

    Binary Versus Non-Binary Trees

    Data Preprocessing

    Alternative Splitting Rules

    Categorical Variables

    Missing Values

    Controlling the Tree Shape

    Cost-Complexity Pruning

    Advantages and Limitations of Decision Trees

    Bootstrap Aggregation

    Random Forests



    Deep Learning


    Feed-Forward Neural Networks


    Methods for Training

    Steepest Descent

    Levenberg–Marquardt Method

    Limited-Memory BFGS Method

    Adaptive Gradient Methods

    Examples in Python

    Simple Polynomial Regression

    Image Classification


    Linear Algebra and Functional Analysis

    Vector Spaces, Bases, and Matrices

    Inner Product

    Complex Vectors and Matrices

    Orthogonal Projections

    Eigenvalues and Eigenvectors

    Left- and Right-Eigenvectors

    Matrix Decompositions

    (P)LU Decomposition

    Woodbury Identity

    Cholesky Decomposition

    QR Decomposition and the Gram–Schmidt Procedure

    Singular Value Decomposition

    Solving Structured Matrix Equations

    Functional Analysis

    Fourier Transforms

    Discrete Fourier Transform

    Fast Fourier Transform

    Multivariate Differentiation and Optimization

    Multivariate Differentiation

    Taylor Expansion

    Chain Rule

    Optimization Theory

    Convexity and Optimization

    Lagrangian Method


    Numerical Root-Finding and Minimization

    Newton-Like Methods

    Quasi-Newton Methods

    Normal Approximation Method

    Nonlinear Least Squares

    Constrained Minimization via Penalty Functions

    Probability and Statistics

    Random Experiments and Probability Spaces

    Random Variables and Probability Distributions


    Joint Distributions

    Conditioning and Independence

    Conditional Probability


    Expectation and Covariance

    Conditional Density and Conditional Expectation

    Functions of Random Variables

    Multivariate Normal Distribution

    Convergence of Random Variables

    Law of Large Numbers and Central Limit Theorem

    Markov Chains



    Method of Moments

    Maximum Likelihood Method

    Confidence Intervals

    Hypothesis Testing

    Python Primer

    Getting Started

    Python Objects

    Types and Operators

    Functions and Methods


    Flow Control





    Creating and Shaping Arrays


    Array Operations

    Random Numbers


    Creating a Basic Plot


    Series and DataFrame

    Manipulating Data Frames

    Extracting Information



    Partitioning the Data


    Fitting and Prediction

    Testing the Model

    System Calls, URL Access, and Speed-Up




    Dirk P. Kroese, PhD, is a Professor of Mathematics and Statistics at The University of Queensland. He has published over 120 articles and five books in a wide range of areas in mathematics, statistics, data science, machine learning, and Monte Carlo methods. He is a pioneer of the well-known Cross-Entropy method—an adaptive Monte Carlo technique, which is being used around the world to help solve difficult estimation and optimization problems in science, engineering, and finance.

    Zdravko Botev, PhD, is an Australian Mathematical Science Institute Lecturer in Data Science and Machine Learning with an appointment at the University of New South Wales in Sydney, Australia. He is the recipient of the 2018 Christopher Heyde Medal of the Australian Academy of Science for distinguished research in the Mathematical Sciences.

    Thomas Taimre, PhD, is a Senior Lecturer of Mathematics and Statistics at The University of Queensland.  
    His research interests range from applied probability and Monte Carlo methods to applied physics and the remarkably universal self-mixing effect in lasers. He has published over 100 articles, holds a patent, and is the coauthor of Handbook of Monte Carlo Methods (Wiley).

    Radislav Vaisman, PhD, is a Lecturer of Mathematics and Statistics at The University of Queensland. His research interests lie at the intersection of applied probability, machine learning, and computer science. He has published over 20 articles and two books.





    "The first impression when handling and opening this book at a random page is superb. A big format (A4) and heavy weight, because the paper quality is high, along with a spectacular style and large font, much colour and many plots, and blocks of python code enhanced in colour boxes. This makes the book attractive and easy to study...The book is a very well-designed data science course, with mathematical rigor in mind. Key concepts are highlighted in red in the margins, often with links to other parts of the book...This book will be excellent for those that want to build a strong mathematical foundation for their knowledge on the main machine learning techniques, and at the same time get python recipes on how to perform the analyses for worked examples."
    - Victor Moreno, ISCB News, December 2020