484 Pages
    by Chapman & Hall

    Hands-on Machine Learning with R provides a practical and applied approach to learning and developing intuition into today’s most popular machine learning methods. This book serves as a practitioner’s guide to the machine learning process and is meant to help the reader learn to apply the machine learning stack within R, which includes using various R packages such as glmnet, h2o, ranger, xgboost, keras, and others to effectively model and gain insight from their data. The book favors a hands-on approach, providing an intuitive understanding of machine learning concepts through concrete examples and just a little bit of theory. 

    Throughout this book, the reader will be exposed to the entire machine learning process including feature engineering, resampling, hyperparameter tuning, model evaluation, and interpretation. The reader will be exposed to powerful algorithms such as regularized regression, random forests, gradient boosting machines, deep learning, generalized low rank models, and more! By favoring a hands-on approach and using real word data, the reader will gain an intuitive understanding of the architectures and engines that drive these algorithms and packages, understand when and how to tune the various hyperparameters, and be able to interpret model results. By the end of this book, the reader should have a firm grasp of R’s machine learning stack and be able to implement a systematic approach for producing high quality modeling results.

    Features:

    ·         Offers a practical and applied introduction to the most popular machine learning methods.

    ·         Topics covered include feature engineering, resampling, deep learning and more.

    ·         Uses a hands-on approach and real world data.

    FUNDAMENTALS

    Introduction to Machine Learning

    Supervised learning

    Regression problems

    Classification problems

    Unsupervised learning

    Roadmap

    The data sets

    Modeling Process

    Prerequisites

    Data splitting

    Simple random sampling

    Stratified sampling

    Class imbalances

    Creating models in R

    Many formula interfaces

    Many engines

    Resampling methods

    Contents

    k-fold cross validation

    Bootstrapping

    Alternatives

    Bias variance trade-off

    Bias

    Variance

    Hyperparameter tuning

    Model evaluation

    Regression models

    Classification models

    Putting the processes together

    Feature & Target Engineering

    Prerequisites

    Target engineering

    Dealing with missingness

    Visualizing missing values

    Imputation

    Feature filtering

    Numeric feature engineering

    Skewness

    Standardization

    Categorical feature engineering

    Lumping

    One-hot & dummy encoding

    Label encoding

    Alternatives

    Dimension reduction

    Proper implementation

    Sequential steps

    Data leakage

    Putting the process together

    Contents v

    SUPERVISED LEARNING

    Linear Regression

    Prerequisites

    Simple linear regression

    Estimation

    Inference

    Multiple linear regression

    Assessing model accuracy

    Model concerns

    Principal component regression

    Partial least squares

    Feature interpretation

    Final thoughts

    Logistic Regression

    Prerequisites

    Why logistic regression

    Simple logistic regression

    Multiple logistic regression

    Assessing model accuracy

    Model concerns

    Feature interpretation

    Final thoughts

    Regularized Regression

    Prerequisites

    Why regularize?

    Ridge penalty

    Lasso penalty

    Elastic nets

    Implementation

    vi Contents

    Tuning

    Feature interpretation

    Attrition data

    Final thoughts

    Multivariate Adaptive Regression Splines

    Prerequisites

    The basic idea

    Multivariate regression splines

    Fitting a basic MARS model

    Tuning

    Feature interpretation

    Attrition data

    Final thoughts

    K-Nearest Neighbors

    Prerequisites

    Measuring similarity

    Distance measures

    Pre-processing

    Choosing k

    MNIST example

    Final thoughts

    Decision Trees

    Prerequisites

    Structure

    Partitioning

    How deep?

    Early stopping

    Pruning

    Ames housing example

    Contents vii

    Feature interpretation

    Final thoughts

    Bagging

    Prerequisites

    Why and when bagging works

    Implementation

    Easily parallelize

    Feature interpretation

    Final thoughts

    Random Forests

    Prerequisites

    Extending bagging

    Out-of-the-box performance

    Hyperparameters

    Number of trees

    mtry

    Tree complexity

    Sampling scheme

    Split rule

    Tuning strategies

    Feature interpretation

    Final thoughts

    Gradient Boosting

    Prerequisites

    How boosting works

    A sequential ensemble approach

    Gradient descent

    Basic GBM

    Hyperparameters

    viii Contents

    Implementation

    General tuning strategy

    Stochastic GBMs

    Stochastic hyperparameters

    Implementation

    XGBoost

    XGBoost hyperparameters

    Tuning strategy

    Feature interpretation

    Final thoughts

    Deep Learning

    Prerequisites

    Why deep learning

    Feedforward DNNs

    Network architecture

    Layers and nodes

    Activation

    Backpropagation

    Model training

    Model tuning

    Model capacity

    Batch normalization

    Regularization

    Adjust learning rate

    Grid Search

    Final thoughts

    Contents ix

    Support Vector Machines

    Prerequisites

    Optimal separating hyperplanes

    The hard margin classifier

    The soft margin classifier

    The support vector machine

    More than two classes

    Support vector regression

    Job attrition example

    Class weights

    Class probabilities

    Feature interpretation

    Final thoughts

    Stacked Models

    Prerequisites

    The Idea

    Common ensemble methods

    Super learner algorithm

    Available packages

    Stacking existing models

    Stacking a grid search

    Automated machine learning

    Final thoughts

    Interpretable Machine Learning

    Prerequisites

    The idea

    Global interpretation

    Local interpretation

    Model-specific vs. model-agnostic

    x Contents

    Permutation-based feature importance

    Concept

    Implementation

    Partial dependence

    Concept

    Implementation

    Alternative uses

    Individual conditional expectation

    Concept

    Implementation

    Feature interactions

    Concept

    Implementation

    Alternatives

    Local interpretable model-agnostic explanations

    Concept

    Implementation

    Tuning

    Alternative uses

    Shapley values

    Concept

    Implementation

    XGBoost and built-in Shapley values

    Localized step-wise procedure

    Concept

    Implementation

    Final thoughts

    DIMENSION REDUCTION

    Contents xi

    Principal Components Analysis

    Prerequisites

    The idea

    Finding principal components

    Performing PCA in R

    Selecting the number of principal components

    Eigenvalue criterion

    Proportion of variance explained criterion

    Scree plot criterion

    Final thoughts

    Generalized Low Rank Models

    Prerequisites

    The idea

    Finding the lower ranks

    Alternating minimization

    Loss functions

    Regularization

    Selecting k

    Fitting GLRMs in R

    Basic GLRM model

    Tuning to optimize for unseen data

    Final thoughts

    Autoencoders

    Prerequisites

    Undercomplete autoencoders

    Comparing PCA to an autoencoder

    Stacked autoencoders

    Visualizing the reconstruction

    Sparse autoencoders

    xii Contents

    Denoising autoencoders

    Anomaly detection

    Final thoughts

    CLUSTERING

    K-means Clustering

    Prerequisites

    Distance measures

    Defining clusters

    k-means algorithm

    Clustering digits

    How many clusters?

    Clustering with mixed data

    Alternative partitioning methods

    Final thoughts

    Hierarchical Clustering

    Prerequisites

    Hierarchical clustering algorithms

    Hierarchical clustering in R

    Agglomerative hierarchical clustering

    Divisive hierarchical clustering

    Determining optimal clusters

    Working with dendrograms

    Final thoughts

    Model-based Clustering

    Prerequisites

    Measuring probability and uncertainty

    Covariance types

    Model selection

    My basket example

    Final thoughts

    Biography

    Brad Boehmke is a data scientist at 84.51° where he wears both software developer and machine learning engineer hats. He is an Adjunct Professor at the University of Cincinnati, author of Data Wrangling with R, and creator of multiple public and private enterprise R packages.

    Brandon Greenwell is a data scientist at 84.51° where he works on a diverse team to enable, empower, and encourage others to successfully apply machine learning to solve real business problems. He’s part of the Adjunct Graduate Faculty at Wright State University, an Adjunct Instructor at the University of Cincinnati, and the author of several R packages available on CRAN.

    "Hands-On Machine Learning with R is a great resource for understanding and applying models. Each section provides descriptions and instructions using a wide range of R packages."
    - Max Kuhn, Machine Learning Software Engineer, RStudio

    "You can't find a better overview of practical machine learning methods implemented with R."
    - JD Long, co-author of R Cookbook

    "Simultaneously approachable, accessible, and rigorous, Hands-On Machine Learning with R offers a balance of theory and implementation that can actually bring you from relative novice to competent practitioner."
    - Mara Averick, RStudio Dev Advocate

    "...The book describes in detail the various methods for solving classification and clustering problems. Functions from many R libraries are compared, which enables the reader to understand their respective advantages and disadvantages. The authors have developed a clear structure to the book that includes a brief description of each model, examples of using the model for specific real-life examples, and discussion of the advantages and disadvantages of the model. This structure is one of the book’s main advantages."
    - Igor Malyk, ISCB News, July 2020