Hands-On Machine Learning with R: 1st Edition (Hardback) book cover

Hands-On Machine Learning with R

1st Edition

By Brad Boehmke, Brandon M. Greenwell

Chapman and Hall/CRC

457 pages

Purchasing Options:$ = USD
Hardback: 9781138495685
pub: 2019-11-21
SAVE ~$19.99
Available for pre-order
$99.95
$79.96
x


FREE Standard Shipping!

Description

Hands-on Machine Learning with R provides a practical and applied approach to learning and developing intuition into today’s most popular machine learning methods. This book serves as a practitioner’s guide to the machine learning process and is meant to help the reader learn to apply the machine learning stack within R, which includes using various R packages such as glmnet, h2o, ranger, xgboost, keras, and others to effectively model and gain insight from their data. The book favors a hands-on approach, providing an intuitive understanding of machine learning concepts through concrete examples and just a little bit of theory. 

Throughout this book, the reader will be exposed to the entire machine learning process including feature engineering, resampling, hyperparameter tuning, model evaluation, and interpretation. The reader will be exposed to powerful algorithms such as regularized regression, random forests, gradient boosting machines, deep learning, generalized low rank models, and more! By favoring a hands-on approach and using real word data, the reader will gain an intuitive understanding of the architectures and engines that drive these algorithms and packages, understand when and how to tune the various hyperparameters, and be able to interpret model results. By the end of this book, the reader should have a firm grasp of R’s machine learning stack and be able to implement a systematic approach for producing high quality modeling results.

Features:

  • Offers a practical and applied introduction to the most popular machine learning methods.
  • Takes readers through the entire modeling process; from data prep to hyperparameter tuning, model evaluation, and interpretation.
  • Introduces readers to a wide variety of packages that make up R’s machine learning stack.
  • Uses a hands-on approach and real world data.

Brad Boehmke is a data scientist at 84.51° where he wears both software developer and machine learning engineer hats. He is an Adjunct Professor at the University of Cincinnati, author of Data Wrangling with R, and creator of multiple public and private enterprise R packages.

Brandon Greenwell is a data scientist at 84.51° where he works on a diverse team to enable, empower, and encourage others to successfully apply machine learning to solve real business problems. He’s part of the Adjunct Graduate Faculty at Wright State University, an Adjunct Instructor at the University of Cincinnati, and the author of several R packages available on CRAN.

Table of Contents

FUNDAMENTALS

Introduction to Machine Learning

Supervised learning

Regression problems

Classification problems

Unsupervised learning

Roadmap

The data sets

Modeling Process

Prerequisites

Data splitting

Simple random sampling

Stratified sampling

Class imbalances

Creating models in R

Many formula interfaces

Many engines

Resampling methods

Contents

k-fold cross validation

Bootstrapping

Alternatives

Bias variance trade-off

Bias

Variance

Hyperparameter tuning

Model evaluation

Regression models

Classification models

Putting the processes together

Feature & Target Engineering

Prerequisites

Target engineering

Dealing with missingness

Visualizing missing values

Imputation

Feature filtering

Numeric feature engineering

Skewness

Standardization

Categorical feature engineering

Lumping

One-hot & dummy encoding

Label encoding

Alternatives

Dimension reduction

Proper implementation

Sequential steps

Data leakage

Putting the process together

Contents v

SUPERVISED LEARNING

Linear Regression

Prerequisites

Simple linear regression

Estimation

Inference

Multiple linear regression

Assessing model accuracy

Model concerns

Principal component regression

Partial least squares

Feature interpretation

Final thoughts

Logistic Regression

Prerequisites

Why logistic regression

Simple logistic regression

Multiple logistic regression

Assessing model accuracy

Model concerns

Feature interpretation

Final thoughts

Regularized Regression

Prerequisites

Why regularize?

Ridge penalty

Lasso penalty

Elastic nets

Implementation

vi Contents

Tuning

Feature interpretation

Attrition data

Final thoughts

Multivariate Adaptive Regression Splines

Prerequisites

The basic idea

Multivariate regression splines

Fitting a basic MARS model

Tuning

Feature interpretation

Attrition data

Final thoughts

K-Nearest Neighbors

Prerequisites

Measuring similarity

Distance measures

Pre-processing

Choosing k

MNIST example

Final thoughts

Decision Trees

Prerequisites

Structure

Partitioning

How deep?

Early stopping

Pruning

Ames housing example

Contents vii

Feature interpretation

Final thoughts

Bagging

Prerequisites

Why and when bagging works

Implementation

Easily parallelize

Feature interpretation

Final thoughts

Random Forests

Prerequisites

Extending bagging

Out-of-the-box performance

Hyperparameters

Number of trees

mtry

Tree complexity

Sampling scheme

Split rule

Tuning strategies

Feature interpretation

Final thoughts

Gradient Boosting

Prerequisites

How boosting works

A sequential ensemble approach

Gradient descent

Basic GBM

Hyperparameters

viii Contents

Implementation

General tuning strategy

Stochastic GBMs

Stochastic hyperparameters

Implementation

XGBoost

XGBoost hyperparameters

Tuning strategy

Feature interpretation

Final thoughts

Deep Learning

Prerequisites

Why deep learning

Feedforward DNNs

Network architecture

Layers and nodes

Activation

Backpropagation

Model training

Model tuning

Model capacity

Batch normalization

Regularization

Adjust learning rate

Grid Search

Final thoughts

Contents ix

Support Vector Machines

Prerequisites

Optimal separating hyperplanes

The hard margin classifier

The soft margin classifier

The support vector machine

More than two classes

Support vector regression

Job attrition example

Class weights

Class probabilities

Feature interpretation

Final thoughts

Stacked Models

Prerequisites

The Idea

Common ensemble methods

Super learner algorithm

Available packages

Stacking existing models

Stacking a grid search

Automated machine learning

Final thoughts

Interpretable Machine Learning

Prerequisites

The idea

Global interpretation

Local interpretation

Model-specific vs. model-agnostic

x Contents

Permutation-based feature importance

Concept

Implementation

Partial dependence

Concept

Implementation

Alternative uses

Individual conditional expectation

Concept

Implementation

Feature interactions

Concept

Implementation

Alternatives

Local interpretable model-agnostic explanations

Concept

Implementation

Tuning

Alternative uses

Shapley values

Concept

Implementation

XGBoost and built-in Shapley values

Localized step-wise procedure

Concept

Implementation

Final thoughts

DIMENSION REDUCTION

Contents xi

Principal Components Analysis

Prerequisites

The idea

Finding principal components

Performing PCA in R

Selecting the number of principal components

Eigenvalue criterion

Proportion of variance explained criterion

Scree plot criterion

Final thoughts

Generalized Low Rank Models

Prerequisites

The idea

Finding the lower ranks

Alternating minimization

Loss functions

Regularization

Selecting k

Fitting GLRMs in R

Basic GLRM model

Tuning to optimize for unseen data

Final thoughts

Autoencoders

Prerequisites

Undercomplete autoencoders

Comparing PCA to an autoencoder

Stacked autoencoders

Visualizing the reconstruction

Sparse autoencoders

xii Contents

Denoising autoencoders

Anomaly detection

Final thoughts

CLUSTERING

K-means Clustering

Prerequisites

Distance measures

Defining clusters

k-means algorithm

Clustering digits

How many clusters?

Clustering with mixed data

Alternative partitioning methods

Final thoughts

Hierarchical Clustering

Prerequisites

Hierarchical clustering algorithms

Hierarchical clustering in R

Agglomerative hierarchical clustering

Divisive hierarchical clustering

Determining optimal clusters

Working with dendrograms

Final thoughts

Model-based Clustering

Prerequisites

Measuring probability and uncertainty

Covariance types

Model selection

My basket example

Final thoughts

About the Authors

Brad Boehmke is a data scientist at 84.51° where he wears both software developer and machine learning engineer hats. He is an Adjunct Professor at the University of Cincinnati, author of Data Wrangling with R, and creator of multiple public and private enterprise R packages.

Brandon Greenwell is a data scientist at 84.51° where he works on a diverse team to enable, empower, and encourage others to successfully apply machine learning to solve real business problems. He’s part of the Adjunct Graduate Faculty at Wright State University, an Adjunct Instructor at the University of Cincinnati, and the author of several R packages available on CRAN.

About the Series

Chapman & Hall/CRC The R Series

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
BUS061000
BUSINESS & ECONOMICS / Statistics
COM021030
COMPUTERS / Database Management / Data Mining
MAT029000
MATHEMATICS / Probability & Statistics / General