Statistical Regression and Classification: From Linear Models to Machine Learning, 1st Edition (Paperback) book cover

Statistical Regression and Classification

From Linear Models to Machine Learning, 1st Edition

By Norman Matloff

Chapman and Hall/CRC

490 pages

Purchasing Options:$ = USD
Paperback: 9781498710916
pub: 2017-08-01
Hardback: 9781138066465
pub: 2017-07-20
eBook (VitalSource) : 9781315119588
pub: 2017-09-19
from $34.98

FREE Standard Shipping!


Statistical Regression and Classification: From Linear Models to Machine Learning takes an innovative look at the traditional statistical regression course, presenting a contemporary treatment in line with today's applications and users. The text takes a modern look at regression:

* A thorough treatment of classical linear and generalized linear models, supplemented with introductory material on machine learning methods.

* Since classification is the focus of many contemporary applications, the book covers this topic in detail, especially the multiclass case.

* In view of the voluminous nature of many modern datasets, there is a chapter on Big Data.

* Has special Mathematical and Computational Complements sections at ends of chapters, and exercises are partitioned into Data, Math and Complements problems.

* Instructors can tailor coverage for specific audiences such as majors in Statistics, Computer Science, or Economics.

* More than 75 examples using real data.

The book treats classical regression methods in an innovative, contemporary manner. Though some statistical learning methods are introduced, the primary methodology used is linear and generalized linear parametric models, covering both the Description and Prediction goals of regression methods. The author is just as interested in Description applications of regression, such as measuring the gender wage gap in Silicon Valley, as in forecasting tomorrow's demand for bike rentals. An entire chapter is devoted to measuring such effects, including discussion of Simpson's Paradox, multiple inference, and causation issues. Similarly, there is an entire chapter of parametric model fit, making use of both residual analysis and assessment via nonparametric analysis.

Norman Matloff is a professor of computer science at the University of California, Davis, and was a founder of the Statistics Department at that institution. His current research focus is on recommender systems, and applications of regression methods to small area estimation and bias reduction in observational studies. He is on the editorial boards of the Journal of Statistical Computation and the R Journal. An award-winning teacher, he is the author of The Art of R Programming and Parallel Computation in Data Science: With Examples in R, C++ and CUDA.


" . . . Matloff delivers a well-balanced book for advanced beginners. Besides the mathematical formulas, he also presents many chunks of R code, and if the reader is able to read R code, the formulas and calculations become clearer. Due to the computational R code, the well-written Appendix, and an overall clear English, the book will help students and autodidacts. Matloff has written a textbook of the best kind for such a broad topic." ~ Jochen Kruppa, Biometric Journal

". . . the book is well suitable for a wide audience: For practitioners interested in applying the methodology, for students in statistics as well as economics/social sciences and computer science. Even in more mathematically oriented classes it can be used as a complimentary text to the usual theoretic textbooks deepening students ability to interpret and question statistical results. ~ Claudia Kirch, Magdeburg

"This is an application-oriented book introducing frequently used classification and regression methods and the principles behind them. This book tries to keep a balance between theory and practice. It not only elaborates the theories of statistical regression and classification, but also provides large amount of real world examples and R codes to help the reader practice what they learned. As stated in the preface, the targeted readers are data analysts and college students. The style of the book fits well to the anticipated audience." ~ Quanquan Gu, University of Virginia

Table of Contents

*Statistical Regression and Classification: From Linear Models to Machine Learning was awarded the 2017 Ziegel Award for the best book reviewed in Technometrics in 2017.*

Setting the Stage

Example: Predicting Bike-Sharing Activity

Example of the Prediction Goal: Body Fat

Example of the Description Goal: Who Clicks Web Ads?

Optimal Prediction

A Note About E(), Samples and Populations

Example: Do Baseball Players Gain Weight As They Age?

Prediction vs Description

A First Estimator

A Possibly Better Estimator, Using a Linear Model

Parametric vs Nonparametric Models

Example: Click-Through Rate

Several Predictor Variables

Multipredictor Linear Models

Estimation of Coefficients

The Description Goal

Nonparametric Regression Estimation: k-NN

Looking at Nearby Points

Measures of Nearness

The k-NN Method, and Tuning Parameters

Nearest-Neighbor Analysis in the regtools


Example: Baseball Player Data

After Fitting a Model, How Do We Use It for Prediction?

Parametric Settings

Nonparametric Settings

The Generic predict() Function

Overfitting, and the Variance-Bias Tradeoff


Example: Student Evaluations of Instructors


Linear Model Case

The Code

Applying the Code

k-NN Case

Choosing the Partition Sizes

Important Note on Tuning Parameters

Rough Rule of Thumb

Example: Bike-Sharing Data

Linear Modeling of _(t)

Nonparametric Analysis

Interaction Terms, Including Quadratics

Example: Salaries of Female Programmers and Engineers

Saving Your Work

Higher-Order Polynomial Models

Classification Techniques

It's a Regression Problem!

Example: Bike-Sharing Data

Crucial Advice: Don't Automate, Participate!

Mathematical Complements

Indicator Random Variables

Mean Squared Error of an Estimator

_(t) Minimizes Mean Squared Prediction Error

_(t) Minimizes the Misclassification Rate

Kernel-Based Nonparametric Estimation of Regression


General Nonparametric Regression

Some Properties of Conditional Expectation

Conditional Expectation As a Random Variable

The Law of Total Expectation

Law of Total Variance

Tower Property

Geometric View

Computational Complements

CRAN Packages

The Function tapply() and Its Cousins

The Innards of the k-NN Code

Function Dispatch

Centering and Scaling

Further Exploration: Data, Code and Math Problems

Linear Regression Models


The "Error Term"

Random- vs Fixed-X Cases

Least-Squares Estimation


Matrix Formulations

() in Matrix Terms

Using Matrix Operations to Minimize ()

Models Without an Intercept Term

A Closer Look at lm() Output

Statistical Inference



Motivation: the Multivariate Normal Distribution Family

Unbiasedness and Consistency

b_ Is Unbiased

Bias As an Issue/Nonissue

b_ Is Statistically Consistent

Inference under Homoscedasticity

Review: Classical Inference on a Single Mean

Back to Reality

The Concept of a Standard Error

Extension to the Regression Case

Example: Bike-Sharing Data

Collective Predictive Strength of the X(j)

Basic Properties

Definition of R

Bias Issues


The Leaving-One-Out Method"

Extensions of LOOM

LOOM for k-NN

Other Measures

The Practical Value of p-Values | Small OR Large

Misleadingly Small p-Values

Example: Forest Cover Data

Example: Click Through Data

Misleadingly LARGE p-Values

The Verdict \

Missing Values

Mathematical Complements

Covariance Matrices

The Multivariate Normal Distribution Family

The Central Limit Theorem

Details on Models Without a Constant Term

Unbiasedness of the Least-Squares Estimator

Consistency of the Least-Squares Estimator

Biased Nature of S

The Geometry of Conditional Expectation

Random Variables As Inner Product Spaces


Conditional Expectations As Projections

Predicted Values and Error Terms Are Uncorrelated

Classical \Exact" Inference

Asymptotic (p + )-Variate Normality of b_

Computational Complements

Details of the Computation of ()

R Functions Relating to the Multivariate Normal Distribution


Example: Simulation Computation of a Bivariate

Normal Quantity

More Details of 'lm' Objects

Homoscedasticity and Other Assumptions in Practice

Normality Assumption

Independence Assumption | Don't Overlook It

Estimation of a Single Mean

Inference on Linear Regression Coefficients

What Can Be Done?

Example: MovieLens Data

Dropping the Homoscedasticity Assumption

Robustness of the Homoscedasticity Assumption

Weighted Least Squares

A Procedure for Valid Inference

The Methodology

Example: Female Wages

Simulation Test

Variance-Stabilizing Transformations

The Verdict

Further Reading

Computational Complements

The R merge() Function

Mathematical Complements

The Delta Method

Distortion Due to Transformation \

Further Exploration: Data, Code and Math Problems

Generalized Linear and Nonlinear Models

Example: Enzyme Kinetics Model

The Generalized Linear Model (GLM)


Poisson Regression

Exponential Families

GLM Computation

R's glm() Function

GLM: the Logistic Model


Example: Pima Diabetes Data

Interpretation of Coefficients

The predict() Function Again

Overall Prediction Accuracy

Example: Predicting Spam E-mail

Linear Boundary

GLM: the Poisson Regression Model

Least-Squares Computation for Nonlinear Models

The Gauss-Newton Method

Eicker-White Asymptotic Standard Errors

Example: Bike Sharing Data

The Elephant in the Room": Convergence Issues

Further Reading

Computational Complements

R Factors

Mathematical Complements

Maximum Likelihood Estimation

Further Exploration: Data, Code and Math Problems

Multiclass Classification Problems

Key Notation

Key Equations

Estimating the Functions i(t)

How Do We Use Models for Prediction?

One vs All or All vs All?

Which Is Better?

Example: Vertebrae Data


Example: Letter Recognition Data

Example: k-NN on the Letter Recognition Data

The Verdict

The Classical Approach: Fisher Linear Discriminant Analysis



Example: Vertebrae Data

LDA Code and Results

Multinomial Logistic Model



Example: Vertebrae Data

The Issue of \Unbalanced" (and Balanced) Data

Why the Concern Regarding Balance?

A Crucial Sampling Issue

It All Depends on How We Sample


Example: Letter Recognition

Going Beyond Using the 0.5 Threshhold

Unequal Misclassification Costs

Revisiting the Problem of Unbalanced Data

The Confusion Matrix and the ROC Curve


Example: Spam Data

Mathematical Complements

Classification via Density Estimation

Methods for Density Estimation

Time Complexity Comparison, OVA vs AVA

Optimal Classification Rule for Unequal Error Costs

Computational Complements

R Code for OVA and AVA Logit Analysis

ROC Code

Further Exploration: Data, Code and Math Problems

Model Fit: Assessment and Improvement

Aims of This Chapter



Goals of Model Fit-Checking

Prediction Context

Description Context

Center vs Fringes of the Data Set

Example: Currency Data

Overall Measures of Model Fit

R-Squared, Revisited

Cross-Validation, Revisited

Plotting Parametric Fit Against Nonparametric One

Residuals vs Smoothing

Diagnostics Related to Individual Predictors

Partial Residual Plots

Plotting Nonparametric Fit Against Each Predictor

The freqparcoord Package

Parallel Coordinates

The regdiag() Function

Effects of Unusual Observations on Model Fit

The inuence() Function

Example: Currency Data

Use of freqparcoord for Outlier Detection

Automated Outlier Resistance

Median Regression

Example: Currency Data

Example: Vocabulary Acquisition

Classification Settings

Example: Pima Diabetes Study

Improving Fit

Deleting Terms from the Model

Adding Polynomial Terms

Example: Currency Data

Example: Programmer/Engineer Census Data


View from the 30,000 Foot Level


A Tool to Aid Model Selection

Special Note on the Description Goal

Computational Complements

Data Wrangling for the Word Bank Dataset

Mathematical Complements

The Hat Matrix

Matrix Inverse Update

The Median Minimizes Mean Absolute Deviation

Further Exploration: Data, Code and Math Problems

Disaggregating Regressor Effects

A Small Analytical Example

Example: Baseball Player Data

Simpson's Paradox

Example: UCB Admissions Data (Logit)

The Verdict

Unobserved Predictor Variables

Instrumental Variables (IVs)

The IV Method

Stage Least Squares:

Example: Years of Schooling

Multiple Predictors

The Verdict

Random Effects Models

Example: Movie Ratings Data, Random Effects

Multiple Random Effects

Why Use Random/Mixed Effects Models?

Regression Function Averaging

Estimating the Counterfactual

Example: Job Training

Small Area Estimation: \Borrowing from Neighbors"

The Verdict

Multiple Inference

The Frequent Occurence of Extreme Events

Relation to Statistical Inference

The Bonferroni Inequality

Scheffe's Method

Example: MovieLens Data

The Verdict

Computational Complements

Movie Lens Data Wrangling

More Data Wrangling in the MovieLens Example

Mathematical Complements

Iterated Projections

Standard Errors for RFA

Asymptotic Chi-Square Distributions

Further Exploration: Data, Code and Math Problems

Shrinkage Estimators

Relevance of James-Stein to Regression Estimation


What's All the Fuss About?

A Simple Guiding Model

Wrong" Signs in Estimated Coefficients

Checking for Multicollinearity

The Variance Ination Factor

Example: Currency Data

What Can/Should One Do?

Do Nothing

Eliminate Some Predictors

Employ a Shrinkage Method

Ridge Regression

Alternate Definitions

Yes, It Is Smaller

Choosing the Value of _

Example: Currency Data



The lars Package

Example: Currency Data

The Elastic Net

Cases of Exact Multicollinearity, Including p > n

Why It May Work

Example: R mtcars Data

Additional Motivation for the Elastic Net

Bias, Standard Errors and Significance Tests

Generalized Linear Models

Example: Vertebrae Data

Other Terminology

Further Reading

Mathematical Complements

James-Stein Theory


Theoretical Properties

When Might Shrunken Estimators Be Helpful?

Ridge Action Increases Eigenvalues

Computational Complements

Code for ridgelm()

Further Exploration: Data, Code and Math Problems

Variable Selection and Dimension Reduction

A Closer Look at Under/Overfitting

A Simple Guiding Example

How Many Is Too Many?

Fit Criteria

Some Common Measures

No Panacea!

Variable Selection Methods

Simple Use of p-Values: Pitfalls

Asking \What If" Questions

Stepwise Selection

Basic Notion

Forward vs Backward Selection

R Functions for Stepwise Regression

Example: Bodyfat Data

Classification Settings

Example: Bank Marketing Data

Example: Vertebrae Data

Nonparametric Settings

Is Dimension Reduction Important in the

Nonparametric Setting?


Why the LASSO Often Performs Subsetting

Example: Bodyfat Data

Post-Selection Inference

Direct Methods for Dimension Reduction

Informal Nature

Role in Regression Analysis



Example: Bodyfat Data

Example: Instructor Evaluations

Nonnegative Matrix Factorization (NMF)



Sum-of-Parts Property

Example: Spam Detection

Use of freqparcoord for Dimension Reduction

Example: Student Evaluations of Instructors

Dimension Reduction for Dummy/R Factor


The Verdict

Further Reading

Computational Complements

Computation for NMF

Mathematical Complements

MSEs for the Simple Example

Further Exploration: Data, Code and Math Problems

Partition-Based Methods


Example: Vertebral Column Data

Technical Details

Statistical Consistency

Tuning Parameters

Random Forests


Example: Vertebrae Data

Example: Letter Recognition

Other Implementations of CART

Further Exploration: Data, Code and Math Problems

Semi-Linear Methods

k-NN with Linear Smoothing

Extrapolation Via lm()

Multicollinearity Issues

Example: Bodyfat Data

Tuning Parameter

Linear Approximation of Class Boundaries


Geometric Motivation

Reduced Convex Hulls

Tuning Parameter

Nonlinear Boundaries

Statistical Consistency

Example: Letter Recognition Data

Neural Networks

Example: Vertebrae Data

Tuning Parameters and Other Technical Details

Dimension Reduction

Statistical Consistency

The Verdict

Mathematical Complements

Edge Bias with k-NN and Kernel Methods

Dual Formulation for SVM

The Kernel Trick

Further Reading

Further Exploration: Data, Code and Math Problems

Regression and Classification in Big Data

Solving the Big-n Problem

Software Alchemy

Example: Flight Delay Data

More on the Insufficient Memory Issue

Deceivingly Big- n

The Independence Assumption in Big-n Data

Addressing Big-p

How Many Is Too Many?

Toy Model

Results from the Research Literature

A Much Simpler and More Direct Approach

Nonparametric Case

The Curse of Dimensionality

Example: Currency Data

Example: Quiz Documents

The Verdict

Mathematical Complements

Speedup from Software Alchemy

Computational Complements

The partools Package

Use of the tm Package

Further Exploration: Data, Code and Math Problems

About the Author

Norman Matloff is a professor of computer science at the University of California, Davis, and was a founder of the Statistics Department at that institution. Statistical Regression and Classification: From Linear Models to Machine Learning was awarded the 2017 Ziegel Award for the best book reviewed in Technometrics in 2017. His current research focus is on recommender systems, and applications of regression methods to small area estimation and bias reduction in observational studies. He is on the editorial boards of the Journal of Statistical Computation and the R Journal. An award-winning teacher, he is the author of The Art of R Programming and Parallel Computation in Data Science: With Examples in R, C++ and CUDA.

About the Series

Chapman & Hall/CRC Texts in Statistical Science

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
COMPUTERS / Machine Theory
MATHEMATICS / Probability & Statistics / General