Data Science and Machine Learning : Mathematical and Statistical Methods book cover
1st Edition

Data Science and Machine Learning
Mathematical and Statistical Methods

ISBN 9781138492530
Published November 22, 2019 by Chapman and Hall/CRC
532 Pages

FREE Standard Shipping
USD $105.00

Prices & shipping based on shipping country


Book Description

"This textbook is a well-rounded, rigorous, and informative work presenting the mathematics behind modern machine learning techniques. It hits all the right notes: the choice of topics is up-to-date and perfect for a course on data science for mathematics students at the advanced undergraduate or early graduate level. This book fills a sorely-needed gap in the existing literature by not sacrificing depth for breadth, presenting proofs of major theorems and subsequent derivations, as well as providing a copious amount of Python code. I only wish a book like this had been around when I first began my journey!" -Nicholas Hoell, University of Toronto

"This is a well-written book that provides a deeper dive into data-scientific methods than many introductory texts. The writing is clear, and the text logically builds up regularization, classification, and decision trees. Compared to its probable competitors, it carves out a unique niche. -Adam Loy, Carleton College

The purpose of Data Science and Machine Learning: Mathematical and Statistical Methods is to provide an accessible, yet comprehensive textbook intended for students interested in gaining a better understanding of the mathematics and statistics that underpin the rich variety of ideas and machine learning algorithms in data science.

Key Features:

  • Focuses on mathematical understanding.
  • Presentation is self-contained, accessible, and comprehensive.
  • Extensive list of exercises and worked-out examples.
  • Many concrete algorithms with Python code.
  • Full color throughout.


Further Resources can be found on the authors website:

Table of Contents



Importing, Summarizing, and Visualizing Data


Structuring Features According to Type

Summary Tables

Summary Statistics

Visualizing Data

Plotting Qualitative Variables

Plotting Quantitative Variables

Data Visualization in a Bivariate Setting


Statistical Learning


Supervised and Unsupervised Learning

Training and Test Loss

Tradeoffs in Statistical Learning

Estimating Risk

In-Sample Risk


Modeling Data

Multivariate Normal Models

Normal Linear Models

Bayesian Learning


Monte Carlo Methods

Introduction .

Monte Carlo Sampling

Generating Random Numbers

Simulating Random Variables

Simulating Random Vectors and Processes


Markov Chain Monte Carlo

Monte Carlo Estimation

Crude Monte Carlo

Bootstrap Method

Variance Reduction

Monte Carlo for Optimization

Simulated Annealing

Cross-Entropy Method

Splitting for Optimization

Noisy Optimization


Unsupervised Learning


Risk and Loss in Unsupervised Learning

Expectation–Maximization (EM) Algorithm

Empirical Distribution and Density Estimation

Clustering via Mixture Models

Mixture Models

EM Algorithm for Mixture Models

Clustering via Vector Quantization


Clustering via Continuous Multiextremal Optimization

Hierarchical Clustering

Principal Component Analysis (PCA)

Motivation: Principal Axes of an Ellipsoid

PCA and Singular Value Decomposition (SVD)




Linear Regression

Analysis via Linear Models

Parameter Estimation

Model Selection and Prediction

Cross-Validation and Predictive Residual Sum of Squares

In-Sample Risk and Akaike Information Criterion

Categorical Features

Nested Models

Coefficient of Determination

Inference for Normal Linear Models

Comparing Two Normal Linear Models

Confidence and Prediction Intervals

Nonlinear Regression Models

Linear Models in Python



Analysis of Variance (ANOVA)

Confidence and Prediction Intervals

Model Validation

Variable Selection

Generalized Linear Models


Regularization and Kernel Methods



Reproducing Kernel Hilbert Spaces

Construction of Reproducing Kernels

Reproducing Kernels via Feature Mapping

Kernels from Characteristic Functions

Reproducing Kernels Using Orthonormal Features

Kernels from Kernels

Representer Theorem

Smoothing Cubic Splines

Gaussian Process Regression

Kernel PCA




Classification Metrics

Classification via Bayes’ Rule

Linear and Quadratic Discriminant Analysis

Logistic Regression and Softmax Classification

K-nearest Neighbors Classification

Support Vector Machine

Classification with Scikit-Learn


Decision Trees and Ensemble Methods


Top-Down Construction of Decision Trees

Regional Prediction Functions

Splitting Rules

Termination Criterion

Basic Implementation

Additional Considerations

Binary Versus Non-Binary Trees

Data Preprocessing

Alternative Splitting Rules

Categorical Variables

Missing Values

Controlling the Tree Shape

Cost-Complexity Pruning

Advantages and Limitations of Decision Trees

Bootstrap Aggregation

Random Forests



Deep Learning


Feed-Forward Neural Networks


Methods for Training

Steepest Descent

Levenberg–Marquardt Method

Limited-Memory BFGS Method

Adaptive Gradient Methods

Examples in Python

Simple Polynomial Regression

Image Classification


Linear Algebra and Functional Analysis

Vector Spaces, Bases, and Matrices

Inner Product

Complex Vectors and Matrices

Orthogonal Projections

Eigenvalues and Eigenvectors

Left- and Right-Eigenvectors

Matrix Decompositions

(P)LU Decomposition

Woodbury Identity

Cholesky Decomposition

QR Decomposition and the Gram–Schmidt Procedure

Singular Value Decomposition

Solving Structured Matrix Equations

Functional Analysis

Fourier Transforms

Discrete Fourier Transform

Fast Fourier Transform

Multivariate Differentiation and Optimization

Multivariate Differentiation

Taylor Expansion

Chain Rule

Optimization Theory

Convexity and Optimization

Lagrangian Method


Numerical Root-Finding and Minimization

Newton-Like Methods

Quasi-Newton Methods

Normal Approximation Method

Nonlinear Least Squares

Constrained Minimization via Penalty Functions

Probability and Statistics

Random Experiments and Probability Spaces

Random Variables and Probability Distributions


Joint Distributions

Conditioning and Independence

Conditional Probability


Expectation and Covariance

Conditional Density and Conditional Expectation

Functions of Random Variables

Multivariate Normal Distribution

Convergence of Random Variables

Law of Large Numbers and Central Limit Theorem

Markov Chains



Method of Moments

Maximum Likelihood Method

Confidence Intervals

Hypothesis Testing

Python Primer

Getting Started

Python Objects

Types and Operators

Functions and Methods


Flow Control





Creating and Shaping Arrays


Array Operations

Random Numbers


Creating a Basic Plot


Series and DataFrame

Manipulating Data Frames

Extracting Information



Partitioning the Data


Fitting and Prediction

Testing the Model

System Calls, URL Access, and Speed-Up



View More



Dirk P. Kroese, PhD, is a Professor of Mathematics and Statistics at The University of Queensland. He has published over 120 articles and five books in a wide range of areas in mathematics, statistics, data science, machine learning, and Monte Carlo methods. He is a pioneer of the well-known Cross-Entropy method—an adaptive Monte Carlo technique, which is being used around the world to help solve difficult estimation and optimization problems in science, engineering, and finance.

Zdravko Botev, PhD, is an Australian Mathematical Science Institute Lecturer in Data Science and Machine Learning with an appointment at the University of New South Wales in Sydney, Australia. He is the recipient of the 2018 Christopher Heyde Medal of the Australian Academy of Science for distinguished research in the Mathematical Sciences.

Thomas Taimre, PhD, is a Senior Lecturer of Mathematics and Statistics at The University of Queensland.  
His research interests range from applied probability and Monte Carlo methods to applied physics and the remarkably universal self-mixing effect in lasers. He has published over 100 articles, holds a patent, and is the coauthor of Handbook of Monte Carlo Methods (Wiley).

Radislav Vaisman, PhD, is a Lecturer of Mathematics and Statistics at The University of Queensland. His research interests lie at the intersection of applied probability, machine learning, and computer science. He has published over 20 articles and two books.






"The first impression when handling and opening this book at a random page is superb. A big format (A4) and heavy weight, because the paper quality is high, along with a spectacular style and large font, much colour and many plots, and blocks of python code enhanced in colour boxes. This makes the book attractive and easy to study...The book is a very well-designed data science course, with mathematical rigor in mind. Key concepts are highlighted in red in the margins, often with links to other parts of the book...This book will be excellent for those that want to build a strong mathematical foundation for their knowledge on the main machine learning techniques, and at the same time get python recipes on how to perform the analyses for worked examples."
- Victor Moreno, ISCB News, December 2020