Chapman and Hall/CRC

397 pages | 186 B/W Illus.

FREE Standard Shipping!

"**A First Course in Machine Learning **by Simon Rogers and Mark Girolami is the best introductory book for ML currently available. It combines rigor and precision with accessibility, starts from a detailed explanation of the basic foundations of Bayesian analysis in the simplest of settings, and goes all the way to the frontiers of the subject such as infinite mixture models, GPs, and MCMC."

—Devdatt Dubhashi, Professor, Department of Computer Science and Engineering, Chalmers University, Sweden

"This textbook manages to be easier to read than other comparable books in the subject while retaining all the rigorous treatment needed. The new chapters put it at the forefront of the field by covering topics that have become mainstream in machine learning over the last decade."

—Daniel Barbara, George Mason University, Fairfax, Virginia, USA

"The new edition of **A First Course in Machine Learning **by Rogers and Girolami is an excellent introduction to the use of statistical methods in machine learning. The book introduces concepts such as mathematical modeling, inference, and prediction, providing ‘just in time’ the essential background on linear algebra, calculus, and probability theory that the reader needs to understand these concepts."

—Daniel Ortiz-Arroyo, Associate Professor, Aalborg University Esbjerg, Denmark

"I was impressed by how closely the material aligns with the needs of an introductory course on machine learning, which is its greatest strength…Overall, this is a pragmatic and helpful book, which is well-aligned to the needs of an introductory course and one that I will be looking at for my own students in coming months."

—David Clifton, University of Oxford, UK

"The first edition of this book was already an excellent introductory text on machine learning for an advanced undergraduate or taught masters level course, or indeed for anybody who wants to learn about an interesting and important field of computer science. The additional chapters of advanced material on Gaussian process, MCMC and mixture modeling provide an ideal basis for practical projects, without disturbing the very clear and readable exposition of the basics contained in the first part of the book."

—Gavin Cawley, Senior Lecturer, School of Computing Sciences, University of East Anglia, UK

"This book could be used for junior/senior undergraduate students or first-year graduate students, as well as individuals who want to explore the field of machine learning…The book introduces not only the concepts but the underlying ideas on algorithm implementation from a critical thinking perspective."

—Guangzhi Qu, Oakland University, Rochester, Michigan, USA

"This book could be used for junior/senior undergraduate students or first-year graduate students, as well as individuals who want to explore the field of machine learning. The prerequisites on math or statistics are minimal and following the content is a fairly easy process. The book introduces not only the concepts but the underlying ideas on algorithm implementation from a critical thinking perspective."

—Guangzhi Qu, Oakland University, Rochester, Michigan, USA

"The new edition of **A First Course in Machine Learning** by Rogers and Girolami is an excellent introduction to the use of statistical methods in machine learning. The book introduces concepts such as mathematical modeling, inference, and prediction, providing ‘just in time’ the essential background on linear algebra, calculus, and probability theory that the reader needs to understand these concepts. One of the strengths of the book is its practical approach. An extensive collection of code written in MATLAB/Octave, R, and Python is available from an associated web page that allows the reader to change models and parameter values to make [it] easier to understand and apply these models in real applications. The authors [also] introduce more advanced, state-of-the-art machine learning methods, such as Gaussian process models and advanced mixture models, which are used across machine learning. This makes the book interesting not only to students with little or no background in machine learning but also to more advanced graduate students interested in statistical approaches to machine learning."

—Daniel Ortiz-Arroyo, Associate Professor, Aalborg University Esbjerg, Denmark

"**A First Course in Machine Learning** by Simon Rogers and Mark Girolami is the best introductory book for ML currently available. It combines rigor and precision with accessibility, starts from a detailed explanation of the basic foundations of Bayesian analysis in the simplest of settings, and goes all the way to the frontiers of the subject such as infinite mixture models, GPs, and MCMC."

—Devdatt Dubhashi, Professor, Department of Computer Science and Engineering, Chalmers University, Sweden

"This textbook manages to be easier to read than other comparable books in the subject while retaining all the rigorous treatment needed. The new chapters put it at the forefront of the field by covering topics that have become mainstream in machine learning over the last decade."

—Daniel Barbara, George Mason University, Fairfax, Virginia, USA

"I was impressed by how closely the material aligns with the needs of an introductory course on machine learning, which is its greatest strength. While there are other books available that aim for completeness, with exhaustively comprehensive introductions to every branch of machine learning, the book by Rogers and Girolami starts with the basics, builds a solid and logical foundation of methodology, before introducing some more advanced topics. The essentials of the model construction, validation, and evaluation process are communicated clearly and in such a manner as to be accessible to the student taking such a course. I was also pleased to see that the authors have not shied away from producing algebraic derivations throughout, which are for many students an essential part of the learning process—many other texts omit such details, leaving them as ‘an exercise for the reader.’ Being shown the explicit steps required for such derivations is an important part of developing a sense of confidence in the student. Overall, this is a pragmatic and helpful book, which is well-aligned to the needs of an introductory course and one that I will be looking at for my own students in coming months."

—David Clifton, University of Oxford, UK

**Linear Modelling: A Least Squares Approach**

LINEAR MODELLING

De ning the model

Modelling assumptions

De ning a good model

The least squares solution—a worked example

Worked example

Least squares t to the Olympic data

Summary

MAKING PREDICTIONS

A second Olympic dataset

Summary

VECTOR/MATRIX NOTATION

Example

Numerical example

Making predictions

Summary

NON-LINEAR RESPONSE FROM A LINEAR MODEL

GENERALISATION AND OVER-FITTING

Validation data

Cross-validation

Computational scaling of K-fold cross-validation

REGULARISED LEAST SQUARES

EXERCISES

FURTHER READING

**Linear Modelling: A Maximum Likelihood Approach**

ERRORS AS NOISE

Thinking generatively

RANDOM VARIABLES AND PROBABILITY

Random variables

Probability and distributions

Adding probabilities

Conditional probabilities

Joint probabilities

Marginalisation

Aside—Bayes' rule

Expectations

POPULAR DISCRETE DISTRIBUTIONS

Bernoulli distribution

Binomial distribution

Multinomial distribution

CONTINUOUS RANDOM VARIABLES { DENSITY

FUNCTIONS

POPULAR CONTINUOUS DENSITY FUNCTIONS

The uniform density function

The beta density function

The Gaussian density function

Multivariate Gaussian

SUMMARY

THINKING GENERATIVELY…CONTINUED

LIKELIHOOD

Dataset likelihood

Maximum likelihood

Characteristics of the maximum likelihood solution

Maximum likelihood favours complex models

THE BIAS-VARIANCE TRADE-OFF

Summary

EFFECT OF NOISE ON PARAMETER ESTIMATES

Uncertainty in estimates

Comparison with empirical values

Variability in model parameters—Olympic data

VARIABILITY IN PREDICTIONS

Predictive variability—an example

Expected values of the estimators

CHAPTER SUMMARY

EXERCISES

FURTHER READING

**The Bayesian Approach to Machine Learning**

A COIN GAME

Counting heads

The Bayesian way

THE EXACT POSTERIOR

THE THREE SCENARIOS

No prior knowledge

The fair coin scenario

A biased coin

The three scenarios—a summary

Adding more data

MARGINAL LIKELIHOODS

Model comparison with the marginal likelihood

HYPERPARAMETERS

GRAPHICAL MODELS

SUMMARY

A BAYESIAN TREATMENT OF THE OLYMPIC 100m DATA 122

The model

The likelihood

The prior

The posterior

A first-order polynomial

Making predictions

MARGINAL LIKELIHOOD FOR POLYNOMIAL MODEL

ORDER SELECTION

CHAPTER SUMMARY

EXERCISES

FURTHER READING

Bayesian Inference

NON-CONJUGATE MODELS

BINARY RESPONSES

A model for binary responses

A POINT ESTIMATE—THE MAP SOLUTION

THE LAPLACE APPROXIMATION

Laplace approximation example: Approximating a

gamma density

Laplace approximation for the binary response model

SAMPLING TECHNIQUES

Playing darts

The Metropolis{Hastings algorithm

The art of sampling

CHAPTER SUMMARY

EXERCISES

FURTHER READING

Classification

THE GENERAL PROBLEM

PROBABILISTIC CLASSIFIERS

The Bayes classifier

Likelihood—class-conditional distributions

Prior class distribution

Example—Gaussian class-conditionals

Making predictions

The naive-Bayes assumption

Example—classifying text

Smoothing

Logistic regression

Motivation

Non-linear decision functions

Non-parametric models—the Gaussian process

NON-PROBABILISTIC CLASSIFIERS

K-nearest neighbours

Choosing K

Support vector machines and other kernel methods

The margin

Maximising the margin

Making predictions

Support vectors

Soft margins

Kernels

Summary

ASSESSING CLASSIFICATION PERFORMANCE

Accuracy—0/1 loss

Sensitivity and speci city

The area under the ROC curve

Confusion matrices

DISCRIMINATIVE AND GENERATIVE CLASSIFIERS

CHAPTER SUMMARY

EXERCISES

FURTHER READING

Clustering

THE GENERAL PROBLEM

K-MEANS CLUSTERING

Choosing the number of clusters

Where K-means fails

Kernelised K-means

Summary

MIXTURE MODELS

A generative process

Mixture model likelihood

The EM algorithm

Updating _k

Updating _k

Updating _k

Updating qnk

Some intuition

Example

EM nds local optima

Choosing the number of components

Other forms of mixture component

MAP estimates with EM

Bayesian mixture models

CHAPTER SUMMARY

EXERCISES

FURTHER READING

Principal Components Analysis and Latent Variable Models

THE GENERAL PROBLEM

Variance as a proxy for interest

PRINCIPAL COMPONENTS ANALYSIS

Choosing D

Limitations of PCA

LATENT VARIABLE MODELS

Mixture models as latent variable models

Summary

VARIATIONAL BAYES

Choosing Q(_)

Optimising the bound

A PROBABILISTIC MODEL FOR PCA

Q_ (_ )

Qxn(xn)

Qwm(wm)

The required expectations

The algorithm

An example

MISSING VALUES

Missing values as latent variables

Predicting missing values

NON-REAL-VALUED DATA

Probit PPCA

Visualising parliamentary data

Aside—relationship to classification

CHAPTER SUMMARY

EXERCISES

FURTHER READING

Advanced Topics

Gaussian Processes

PROLOGUE—NON-PARAMETRIC MODELS

GAUSSIAN PROCESS REGRESSION

The Gaussian process prior

Noise-free regression

Noisy regression

Summary

Noisy regression—an alternative route Alternative covariance functions

Linear

Polynomial

Neural network

ARD

Composite covariance functions

Summary

GAUSSIAN PROCESS CLASSIFICATION

A classi cation likelihood

A classi cation roadmap

The point estimate approximation

Propagating uncertainty through the sigmoid

The Laplace approximation

Summary

HYPERPARAMETER OPTIMISATION

EXTENSIONS

Non-zero mean

Multiclass classi cation

Other likelihood functions and models

Other inference schemes

CHAPTER SUMMARY

EXERCISES

FURTHER READING

Markov Chain Monte Carlo Sampling

GIBBS SAMPLING

EXAMPLE: GIBBS SAMPLING FOR GP

CLASSIFICATION

Conditional densities for GP classi cation via Gibbs sampling

Summary

WHY DOES MCMC WORK?

SOME SAMPLING PROBLEMS AND SOLUTIONS

Burn-in and convergence

Autocorrelation

Summary

ADVANCED SAMPLING TECHNIQUES

Adaptive proposals and Hamiltonian Monte Carlo

Approximate Bayesian computation

Population MCMC and temperature schedules

Sequential Monte Carlo

CHAPTER SUMMARY

EXERCISES

FURTHER READING

Advanced Mixture Modelling

A GIBBS SAMPLER FOR MIXTURE MODELS

COLLAPSED GIBBS SAMPLING

AN INFINITE MIXTURE MODEL

The Chinese restaurant process

Inference in the in nite mixture model

Summary

DIRICHLET PROCESSES

Hierarchical Dirichlet processes

Summary

BEYOND STANDARD MIXTURES—TOPIC MODELS

CHAPTER SUMMARY

EXERCISES

FURTHER READING

Glossary

Index

- BUS061000
- BUSINESS & ECONOMICS / Statistics
- COM012040
- COMPUTERS / Programming / Games
- COM021030
- COMPUTERS / Database Management / Data Mining
- COM037000
- COMPUTERS / Machine Theory