1st Edition

# Data Science and Machine Learning Mathematical and Statistical Methods

**Also available as eBook on:**

"This textbook is a well-rounded, rigorous, and informative work presenting the mathematics behind modern machine learning techniques. It hits all the right notes: the choice of topics is up-to-date and perfect for a course on data science for mathematics students at the advanced undergraduate or early graduate level. This book fills a sorely-needed gap in the existing literature by not sacrificing depth for breadth, presenting proofs of major theorems and subsequent derivations, as well as providing a copious amount of Python code. I only wish a book like this had been around when I first began my journey!" **-Nicholas Hoell, University of Toronto**

*"This is a well-written book that provides a deeper dive into data-scientific methods than many introductory texts. The writing is clear, and the text logically builds up regularization, classification, and decision trees. Compared to its probable competitors, it carves out a unique niche.* **-Adam Loy, Carleton College**

The purpose of ** Data Science and Machine Learning: Mathematical and Statistical Methods **is to provide an accessible, yet comprehensive textbook intended for students interested in gaining a better understanding of the mathematics and statistics that underpin the rich variety of ideas and machine learning algorithms in data science.

**Key Features:**

- Focuses on mathematical understanding.
- Presentation is self-contained, accessible, and comprehensive.
- Extensive list of exercises and worked-out examples.
- Many concrete algorithms with Python code.
- Full color throughout.

Further Resources can be found on the authors website: https://github.com/DSML-book/Lectures

**Preface **

**Notation **

**Importing, Summarizing, and Visualizing Data**

Introduction

Structuring Features According to Type

Summary Tables

Summary Statistics

Visualizing Data

Plotting Qualitative Variables

Plotting Quantitative Variables

Data Visualization in a Bivariate Setting

Exercises

Statistical Learning

Introduction

Supervised and Unsupervised Learning

Training and Test Loss

Tradeoffs in Statistical Learning

Estimating Risk

In-Sample Risk

Cross-Validation

Modeling Data

Multivariate Normal Models

Normal Linear Models

Bayesian Learning

Exercises

Monte Carlo Methods

Introduction .

Monte Carlo Sampling

Generating Random Numbers

Simulating Random Variables

Simulating Random Vectors and Processes

Resampling

Markov Chain Monte Carlo

Monte Carlo Estimation

Crude Monte Carlo

Bootstrap Method

Variance Reduction

Monte Carlo for Optimization

Simulated Annealing

Cross-Entropy Method

Splitting for Optimization

Noisy Optimization

Exercises

Unsupervised Learning

Introduction

Risk and Loss in Unsupervised Learning

Expectation–Maximization (EM) Algorithm

Empirical Distribution and Density Estimation

Clustering via Mixture Models

Mixture Models

EM Algorithm for Mixture Models

Clustering via Vector Quantization

K-Means

Clustering via Continuous Multiextremal Optimization

Hierarchical Clustering

Principal Component Analysis (PCA)

Motivation: Principal Axes of an Ellipsoid

PCA and Singular Value Decomposition (SVD)

Exercises

Regression

Introduction

Linear Regression

Analysis via Linear Models

Parameter Estimation

Model Selection and Prediction

Cross-Validation and Predictive Residual Sum of Squares

In-Sample Risk and Akaike Information Criterion

Categorical Features

Nested Models

Coefficient of Determination

Inference for Normal Linear Models

Comparing Two Normal Linear Models

Confidence and Prediction Intervals

Nonlinear Regression Models

Linear Models in Python

Modeling

Analysis

Analysis of Variance (ANOVA)

Confidence and Prediction Intervals

Model Validation

Variable Selection

Generalized Linear Models

Exercises

Regularization and Kernel Methods

Introduction

Regularization

Reproducing Kernel Hilbert Spaces

Construction of Reproducing Kernels

Reproducing Kernels via Feature Mapping

Kernels from Characteristic Functions

Reproducing Kernels Using Orthonormal Features

Kernels from Kernels

Representer Theorem

Smoothing Cubic Splines

Gaussian Process Regression

Kernel PCA

Exercises

Classification

Introduction

Classification Metrics

Classification via Bayes’ Rule

Linear and Quadratic Discriminant Analysis

Logistic Regression and Softmax Classification

K-nearest Neighbors Classification

Support Vector Machine

Classification with Scikit-Learn

Exercises

Decision Trees and Ensemble Methods

Introduction

Top-Down Construction of Decision Trees

Regional Prediction Functions

Splitting Rules

Termination Criterion

Basic Implementation

Additional Considerations

Binary Versus Non-Binary Trees

Data Preprocessing

Alternative Splitting Rules

Categorical Variables

Missing Values

Controlling the Tree Shape

Cost-Complexity Pruning

Advantages and Limitations of Decision Trees

Bootstrap Aggregation

Random Forests

Boosting

Exercises

Deep Learning

Introduction

Feed-Forward Neural Networks

Back-Propagation

Methods for Training

Steepest Descent

Levenberg–Marquardt Method

Limited-Memory BFGS Method

Adaptive Gradient Methods

Examples in Python

Simple Polynomial Regression

Image Classification

Exercises

Linear Algebra and Functional Analysis

Vector Spaces, Bases, and Matrices

Inner Product

Complex Vectors and Matrices

Orthogonal Projections

Eigenvalues and Eigenvectors

Left- and Right-Eigenvectors

Matrix Decompositions

(P)LU Decomposition

Woodbury Identity

Cholesky Decomposition

QR Decomposition and the Gram–Schmidt Procedure

Singular Value Decomposition

Solving Structured Matrix Equations

Functional Analysis

Fourier Transforms

Discrete Fourier Transform

Fast Fourier Transform

Multivariate Differentiation and Optimization

Multivariate Differentiation

Taylor Expansion

Chain Rule

Optimization Theory

Convexity and Optimization

Lagrangian Method

Duality

Numerical Root-Finding and Minimization

Newton-Like Methods

Quasi-Newton Methods

Normal Approximation Method

Nonlinear Least Squares

Constrained Minimization via Penalty Functions

Probability and Statistics

Random Experiments and Probability Spaces

Random Variables and Probability Distributions

Expectation

Joint Distributions

Conditioning and Independence

Conditional Probability

Independence

Expectation and Covariance

Conditional Density and Conditional Expectation

Functions of Random Variables

Multivariate Normal Distribution

Convergence of Random Variables

Law of Large Numbers and Central Limit Theorem

Markov Chains

Statistics

Estimation

Method of Moments

Maximum Likelihood Method

Confidence Intervals

Hypothesis Testing

Python Primer

Getting Started

Python Objects

Types and Operators

Functions and Methods

Modules

Flow Control

Iteration

Classes

Files

NumPy

Creating and Shaping Arrays

Slicing

Array Operations

Random Numbers

Matplotlib

Creating a Basic Plot

Pandas

Series and DataFrame

Manipulating Data Frames

Extracting Information

Plotting

Scikit-learn

Partitioning the Data

Standardization

Fitting and Prediction

Testing the Model

System Calls, URL Access, and Speed-Up

Bibliography

Index

### Biography

**Dirk P. Kroese, PhD**, is a Professor of Mathematics and Statistics at The University of Queensland. He has published over 120 articles and five books in a wide range of areas in mathematics, statistics, data science, machine learning, and Monte Carlo methods. He is a pioneer of the well-known Cross-Entropy method—an adaptive Monte Carlo technique, which is being used around the world to help solve difficult estimation and optimization problems in science, engineering, and finance.

**Zdravko Botev**, PhD, is an Australian Mathematical Science Institute Lecturer in Data Science and Machine Learning with an appointment at the University of New South Wales in Sydney, Australia. He is the recipient of the 2018 Christopher Heyde Medal of the Australian Academy of Science for distinguished research in the Mathematical Sciences.

**Thomas Taimre, PhD**, is a Senior Lecturer of Mathematics and Statistics at The University of Queensland.

His research interests range from applied probability and Monte Carlo methods to applied physics and the remarkably universal self-mixing effect in lasers. He has published over 100 articles, holds a patent, and is the coauthor of Handbook of Monte Carlo Methods (Wiley).

**Radislav Vaisman,** PhD, is a Lecturer of Mathematics and Statistics at The University of Queensland. His research interests lie at the intersection of applied probability, machine learning, and computer science. He has published over 20 articles and two books.

"The first impression when handling and opening this book at a random page is superb. A big format (A4) and heavy weight, because the paper quality is high, along with a spectacular style and large font, much colour and many plots, and blocks of python code enhanced in colour boxes. This makes the book attractive and easy to study...The book is a very well-designed data science course, with mathematical rigor in mind. Key concepts are highlighted in red in the margins, often with links to other parts of the book...This book will be excellent for those that want to build a strong mathematical foundation for their knowledge on the main machine learning techniques, and at the same time get python recipes on how to perform the analyses for worked examples."

- Victor Moreno, ISCB News, December 2020