Chapman and Hall/CRC

490 pages

FREE Standard Shipping!

Statistical Regression and Classification: From Linear Models to Machine Learning takes an innovative look at the traditional statistical regression course, presenting a contemporary treatment in line with today's applications and users. The text takes a modern look at regression:

* A thorough treatment of classical linear and generalized linear models, supplemented with introductory material on machine learning methods.

* Since classification is the focus of many contemporary applications, the book covers this topic in detail, especially the multiclass case.

* In view of the voluminous nature of many modern datasets, there is a chapter on Big Data.

* Has special Mathematical and Computational Complements sections at ends of chapters, and exercises are partitioned into Data, Math and Complements problems.

* Instructors can tailor coverage for specific audiences such as majors in Statistics, Computer Science, or Economics.

* More than 75 examples using real data.

The book treats classical regression methods in an innovative, contemporary manner. Though some statistical learning methods are introduced, the primary methodology used is linear and generalized linear parametric models, covering both the Description and Prediction goals of regression methods. The author is just as interested in Description applications of regression, such as measuring the gender wage gap in Silicon Valley, as in forecasting tomorrow's demand for bike rentals. An entire chapter is devoted to measuring such effects, including discussion of Simpson's Paradox, multiple inference, and causation issues. Similarly, there is an entire chapter of parametric model fit, making use of both residual analysis and assessment via nonparametric analysis.

**Norman Matloff** is a professor of computer science at the University of California, Davis, and was a founder of the Statistics Department at that institution. His current research focus is on recommender systems, and applications of regression methods to small area estimation and bias reduction in observational studies. He is on the editorial boards of the *Journal of Statistical Computation* and the *R Journal*. An award-winning teacher, he is the author of *The Art of R Programming* and *Parallel Computation in Data Science: With Examples in R, C++ and CUDA*.

" . . . Matloff delivers a well-balanced book for advanced beginners. Besides the mathematical formulas, he also presents many chunks of R code, and if the reader is able to read R code, the formulas and calculations become clearer. Due to the computational R code, the well-written Appendix, and an overall clear English, the book will help students and autodidacts. Matloff has written a textbook of the best kind for such a broad topic." *~ Jochen Kruppa, Biometric Journal*

". . . the book is well suitable for a wide audience: For practitioners interested in applying the methodology, for students in statistics as well as economics/social sciences and computer science. Even in more mathematically oriented classes it can be used as a complimentary text to the usual theoretic textbooks deepening students ability to interpret and question statistical results. *~* *Claudia Kirch, Magdeburg*

"This is an application-oriented book introducing frequently used classification and regression methods and the principles behind them. This book tries to keep a balance between theory and practice. It not only elaborates the theories of statistical regression and classification, but also provides large amount of real world examples and R codes to help the reader practice what they learned. As stated in the preface, the targeted readers are data analysts and college students. The style of the book fits well to the anticipated audience." ~ *Quanquan Gu,* *University of Virginia*

***Statistical Regression and Classification: From Linear Models to Machine Learning** **was awarded the 2017 Ziegel Award for the best book reviewed in Technometrics in 2017.***

**Setting the Stage**

*Example: Predicting Bike-Sharing Activity*

*Example of the Prediction Goal: Body Fat*

*Example of the Description Goal: Who Clicks Web Ads?*

*Optimal Prediction *

*A Note About E(), Samples and Populations *

*Example: Do Baseball Players Gain Weight As They Age?*

Prediction vs Description

A First Estimator

A Possibly Better Estimator, Using a Linear Model

*Parametric vs Nonparametric Models*

*Example: Click-Through Rate *

*Several Predictor Variables *

Multipredictor Linear Models

Estimation of Coefficients

The Description Goal

Nonparametric Regression Estimation: k-NN

Looking at Nearby Points

Measures of Nearness

The k-NN Method, and Tuning Parameters

Nearest-Neighbor Analysis in the regtools

Package

Example: Baseball Player Data

*After Fitting a Model, How Do We Use It for Prediction?*

Parametric Settings

Nonparametric Settings

The Generic predict() Function

*Overfitting, and the Variance-Bias Tradeoff*

Intuition

Example: Student Evaluations of Instructors

**Cross-Validation **

Linear Model Case

The Code

Applying the Code

k-NN Case

Choosing the Partition Sizes

*Important Note on Tuning Parameters *

*Rough Rule of Thumb *

*Example: Bike-Sharing Data *

Linear Modeling of _(t)

Nonparametric Analysis

*Interaction Terms, Including Quadratics *

Example: Salaries of Female Programmers and Engineers

*Saving Your Work *

Higher-Order Polynomial Models

*Classification Techniques *

It's a Regression Problem!

Example: Bike-Sharing Data

*Crucial Advice: Don't Automate, Participate! *

*Mathematical Complements *

Indicator Random Variables

Mean Squared Error of an Estimator

_(t) Minimizes Mean Squared Prediction Error

_(t) Minimizes the Misclassification Rate

Kernel-Based Nonparametric Estimation of Regression

Functions

General Nonparametric Regression

Some Properties of Conditional Expectation

Conditional Expectation As a Random Variable

The Law of Total Expectation

Law of Total Variance

Tower Property

Geometric View

*Computational Complements *

CRAN Packages

The Function tapply() and Its Cousins

The Innards of the k-NN Code

Function Dispatch

**Centering and Scaling**

**Further Exploration: Data, Code and Math Problems**

**Linear Regression Models**

*Notation *

*The "Error Term" *

*Random- vs Fixed-X Cases *

*Least-Squares Estimation *

Motivation

Matrix Formulations

() in Matrix Terms

Using Matrix Operations to Minimize ()

Models Without an Intercept Term

*A Closer Look at lm() Output *

Statistical Inference

*Assumptions *

Classical

Motivation: the Multivariate Normal Distribution Family

*Unbiasedness and Consistency *

b_ Is Unbiased

Bias As an Issue/Nonissue

b_ Is Statistically Consistent

*Inference under Homoscedasticity** *

Review: Classical Inference on a Single Mean

Back to Reality

The Concept of a Standard Error

Extension to the Regression Case

Example: Bike-Sharing Data

*Collective Predictive Strength of the X(j)** *

Basic Properties

Definition of R

Bias Issues

Adjusted-R

The Leaving-One-Out Method"

Extensions of LOOM

LOOM for k-NN

Other Measures

*The Practical Value of p-Values | Small OR Large** *

Misleadingly Small p-Values

Example: Forest Cover Data

Example: Click Through Data

Misleadingly LARGE p-Values

The Verdict \

*Missing Values *

**Mathematical Complements **

Covariance Matrices

The Multivariate Normal Distribution Family

The Central Limit Theorem

Details on Models Without a Constant Term

Unbiasedness of the Least-Squares Estimator

Consistency of the Least-Squares Estimator

Biased Nature of S

**The Geometry of Conditional Expectation **

Random Variables As Inner Product Spaces

Projections

Conditional Expectations As Projections

Predicted Values and Error Terms Are Uncorrelated

Classical \Exact" Inference

Asymptotic (p + )-Variate Normality of b_

*Computational Complements *

Details of the Computation of ()

R Functions Relating to the Multivariate Normal Distribution

Family

Example: Simulation Computation of a Bivariate

Normal Quantity

More Details of 'lm' Objects

**Homoscedasticity and Other Assumptions in Practice **

*Normality Assumption*

*Independence Assumption | Don't Overlook It*

Estimation of a Single Mean

Inference on Linear Regression Coefficients

What Can Be Done?

Example: MovieLens Data

*Dropping the Homoscedasticity Assumption*

Robustness of the Homoscedasticity Assumption

Weighted Least Squares

A Procedure for Valid Inference

The Methodology

Example: Female Wages

Simulation Test

Variance-Stabilizing Transformations

The Verdict

*Further Reading *

*Computational Complements *

The R merge() Function

*Mathematical Complements*

The Delta Method

Distortion Due to Transformation \

*Further Exploration: Data, Code and Math Problems*

**Generalized Linear and Nonlinear Models**

*Example: Enzyme Kinetics Model *

*The Generalized Linear Model (GLM) *

Definition

Poisson Regression

Exponential Families

GLM Computation

R's glm() Function

*GLM: the Logistic Model*

Motivation

Example: Pima Diabetes Data

Interpretation of Coefficients

The predict() Function Again

Overall Prediction Accuracy

Example: Predicting Spam E-mail

Linear Boundary

*GLM: the Poisson Regression Model*

*Least-Squares Computation for Nonlinear Models *

The Gauss-Newton Method

Eicker-White Asymptotic Standard Errors

Example: Bike Sharing Data

The Elephant in the Room": Convergence Issues

*Further Reading *

*Computational Complements*

R Factors

*Mathematical Complements*

Maximum Likelihood Estimation

*Further Exploration: Data, Code and Math Problems *

**Multiclass Classification Problems **

*Key Notation *

*Key Equations *

*Estimating the Functions i(t) *

*How Do We Use Models for Prediction? *

*One vs All or All vs All? *

Which Is Better?

Example: Vertebrae Data

Intuition

Example: Letter Recognition Data

Example: k-NN on the Letter Recognition Data

The Verdict

*The Classical Approach: Fisher Linear Discriminant Analysis *

Background

Derivation

Example: Vertebrae Data

LDA Code and Results

*Multinomial Logistic Model *

Model

Software

Example: Vertebrae Data

*The Issue of \Unbalanced" (and Balanced) Data *

Why the Concern Regarding Balance?

A Crucial Sampling Issue

It All Depends on How We Sample

Remedies

Example: Letter Recognition

*Going Beyond Using the 0.5 Threshhold *

Unequal Misclassification Costs

Revisiting the Problem of Unbalanced Data

The Confusion Matrix and the ROC Curve

Code

Example: Spam Data

*Mathematical Complements *

Classification via Density Estimation

Methods for Density Estimation

Time Complexity Comparison, OVA vs AVA

Optimal Classification Rule for Unequal Error Costs

*Computational Complements *

R Code for OVA and AVA Logit Analysis

ROC Code

*Further Exploration: Data, Code and Math Problems *

**Model Fit: Assessment and Improvement **

*Aims of This Chapter *

*Methods *

*Notation *

*Goals of Model Fit-Checking *

Prediction Context

Description Context

Center vs Fringes of the Data Set

*Example: Currency Data *

**Overall Measures of Model Fit **

R-Squared, Revisited

Cross-Validation, Revisited

Plotting Parametric Fit Against Nonparametric One

Residuals vs Smoothing

*Diagnostics Related to Individual Predictors *

Partial Residual Plots

Plotting Nonparametric Fit Against Each Predictor

The freqparcoord Package

Parallel Coordinates

The regdiag() Function

*Effects of Unusual Observations on Model Fit *

The inuence() Function

Example: Currency Data

Use of freqparcoord for Outlier Detection

*Automated Outlier Resistance *

Median Regression

Example: Currency Data

*Example: Vocabulary Acquisition *

*Classification Settings *

Example: Pima Diabetes Study

*Improving Fit *

Deleting Terms from the Model

Adding Polynomial Terms

Example: Currency Data

Example: Programmer/Engineer Census Data

Boosting

View from the 30,000 Foot Level

Performance

*A Tool to Aid Model Selection *

*Special Note on the Description Goal *

*Computational Complements *

Data Wrangling for the Word Bank Dataset

Mathematical Complements

The Hat Matrix

Matrix Inverse Update

The Median Minimizes Mean Absolute Deviation

*Further Exploration: Data, Code and Math Problems *

**Disaggregating Regressor Effects**

*A Small Analytical Example *

*Example: Baseball Player Data *

*Simpson's Paradox *

Example: UCB Admissions Data (Logit)

The Verdict

*Unobserved Predictor Variables*

Instrumental Variables (IVs)

The IV Method

Stage Least Squares:

Example: Years of Schooling

Multiple Predictors

The Verdict

Random Effects Models

Example: Movie Ratings Data, Random Effects

Multiple Random Effects

Why Use Random/Mixed Effects Models?

*Regression Function Averaging *

Estimating the Counterfactual

Example: Job Training

Small Area Estimation: \Borrowing from Neighbors"

The Verdict

*Multiple Inference *

The Frequent Occurence of Extreme Events

Relation to Statistical Inference

The Bonferroni Inequality

Scheffe's Method

Example: MovieLens Data

The Verdict

*Computational Complements*

Movie Lens Data Wrangling

More Data Wrangling in the MovieLens Example

*Mathematical Complements *

Iterated Projections

Standard Errors for RFA

Asymptotic Chi-Square Distributions

*Further Exploration: Data, Code and Math Problems *

Shrinkage Estimators

*Relevance of James-Stein to Regression Estimation *

*Multicollinearity *

What's All the Fuss About?

A Simple Guiding Model

Wrong" Signs in Estimated Coefficients

Checking for Multicollinearity

The Variance Ination Factor

Example: Currency Data

What Can/Should One Do?

Do Nothing

Eliminate Some Predictors

Employ a Shrinkage Method

*Ridge Regression *

Alternate Definitions

Yes, It Is Smaller

Choosing the Value of _

Example: Currency Data

*The LASSO *

Definition

The lars Package

Example: Currency Data

The Elastic Net

**Cases of Exact Multicollinearity, Including ****p > n**

Why It May Work

Example: R mtcars Data

Additional Motivation for the Elastic Net

*Bias, Standard Errors and Significance Tests *

*Generalized Linear Models *

Example: Vertebrae Data

*Other Terminology *

*Further Reading *

*Mathematical Complements *

James-Stein Theory

Definition

Theoretical Properties

When Might Shrunken Estimators Be Helpful?

Ridge Action Increases Eigenvalues

*Computational Complements *

Code for ridgelm()

*Further Exploration: Data, Code and Math Problems *

**Variable Selection and Dimension Reduction**

*A Closer Look at Under/Overfitting *

A Simple Guiding Example

*How Many Is Too Many? *

*Fit Criteria *

Some Common Measures

No Panacea!

*Variable Selection Methods *

*Simple Use of p-Values: Pitfalls *

*Asking \What If" Questions *

*Stepwise Selection *

Basic Notion

Forward vs Backward Selection

R Functions for Stepwise Regression

Example: Bodyfat Data

Classification Settings

Example: Bank Marketing Data

Example: Vertebrae Data

Nonparametric Settings

Is Dimension Reduction Important in the

Nonparametric Setting?

The LASSO

Why the LASSO Often Performs Subsetting

Example: Bodyfat Data

*Post-Selection Inference *

*Direct Methods for Dimension Reduction *

Informal Nature

Role in Regression Analysis

PCA

Issues

Example: Bodyfat Data

Example: Instructor Evaluations

Nonnegative Matrix Factorization (NMF)

Overview

Interpretation

Sum-of-Parts Property

Example: Spam Detection

Use of freqparcoord for Dimension Reduction

Example: Student Evaluations of Instructors

Dimension Reduction for Dummy/R Factor

Variables

*The Verdict *

*Further Reading *

*Computational Complements *

Computation for NMF

*Mathematical Complements *

MSEs for the Simple Example

*Further Exploration: Data, Code and Math Problems *

**Partition-Based Methods**

*CART *

*Example: Vertebral Column Data *

*Technical Details *

*Statistical Consistency *

*Tuning Parameters *

*Random Forests *

Bagging

Example: Vertebrae Data

Example: Letter Recognition

*Other Implementations of CART *

*Further Exploration: Data, Code and Math Problems*

Semi-Linear Methods

*k-NN with Linear Smoothing *

Extrapolation Via lm()

Multicollinearity Issues

Example: Bodyfat Data

Tuning Parameter

*Linear Approximation of Class Boundaries *

SVMs

Geometric Motivation

Reduced Convex Hulls

Tuning Parameter

Nonlinear Boundaries

Statistical Consistency

Example: Letter Recognition Data

Neural Networks

Example: Vertebrae Data

Tuning Parameters and Other Technical Details

Dimension Reduction

Statistical Consistency

*The Verdict *

**Mathematical Complements**** **

Edge Bias with k-NN and Kernel Methods

Dual Formulation for SVM

The Kernel Trick

*Further Reading *

*Further Exploration: Data, Code and Math Problems *

**Regression and Classification in Big Data **

*Solving the Big-n Problem *

Software Alchemy

Example: Flight Delay Data

More on the Insufficient Memory Issue

Deceivingly Big- n

The Independence Assumption in Big-n Data

**Addressing Big-p**

How Many Is Too Many?

Toy Model

Results from the Research Literature

A Much Simpler and More Direct Approach

Nonparametric Case

The Curse of Dimensionality

Example: Currency Data

Example: Quiz Documents

The Verdict

*Mathematical Complements *

Speedup from Software Alchemy

*Computational Complements *

The partools Package

Use of the tm Package

*Further Exploration: Data, Code and Math Problems *

- BUS061000
- BUSINESS & ECONOMICS / Statistics
- COM037000
- COMPUTERS / Machine Theory
- MAT029000
- MATHEMATICS / Probability & Statistics / General