1st Edition

Medical Risk Prediction Models With Ties to Machine Learning

By Thomas A. Gerds, Michael W. Kattan Copyright 2022
312 Pages
by Chapman & Hall

312 Pages
by Chapman & Hall

312 Pages
by Chapman & Hall

Medical Risk Prediction Models: With Ties to Machine Learning is a hands-on book for clinicians, epidemiologists, and professional statisticians who need to make or evaluate a statistical prediction model based on data. The subject of the book is the patient’s individualized probability of a medical event within a given time horizon. Gerds and Kattan describe the mathematical details of making... Read more

  • Software

Why should I care about statistical prediction models?

The many uses of prediction models in medicine

The unique messages of this book

Prognostic factor modeling philosophy

The rest of this book

  • I am going to make a prediction model What do I need to know?

Prediction model framework

Target population

The time origin

The event of interest

The prediction time horizon and follow-up

Landmarking

Risks and risk predictions

Classification of risk

Predictor variables

Checklist

Prediction performance

Proper scoring rules

Calibration

Discrimination

Explained variation

Variability and uncertainty

The interpretation is relative

Utility

Average versus subgroups

Study design

Study design and sources of information

Cohort

Multi-center study

Randomized clinical trial

Case-control

Given treatment and treatment options

Sample size calculation

Data

Purpose dataset

Data dictionary

Measurement error

Missing values

Censored data

Competing risks

Modeling

Risk prediction model

Risk classifier

How is prediction modeling different from statistical inference?

 

 

  • Regression model

 

Linear predictor

Expert selects the candidate predictors

How to select variables for inclusion in the final model

All possible interactions

Checklist

Machine learning

Validation

The conventional model

Internal and external validation

Conditional versus expected performance

Cross-validation

Data splitting

Bootstrap

Model checking and goodness of fit

Reproducibility

Pitfalls

Age as time scale

Odds ratios and hazard ratios are not predictions of risks

Do not blame the metric

Censored data versus competing risks

Disease-specific survival

Overfitting

Data-dependent decisions

Balancing data

Independent predictor

Automated variable selection

 

 

  • How should I prepare for modeling?

 

Definition of subjects

Choice of time scale

Pre-selection of predictor variables

Preparation of predictor variables

Categorical variables

Continuous variables

Derived predictor variables

Repeated measurements

Measurement error

Missing values

Preparation of event time outcome

Illustration without competing risks

Illustration with competing risks

Artificial censoring at the prediction time horizon

 

  • I am ready to build a prediction model

 

Specifying the model type

Uncensored binary outcome

Right-censored time-to-event outcome (no competing risks)

Right-censored time-to-event outcome with competing risks

Benchmark model

Uncensored binary outcome

Right-censored time-to-event outcome (without competing risks)

Right-censored time-to-event with competing risks

Including predictor variables

Categorical predictor variables

Continuous predictor variables

Interaction effects

Modeling strategy

Variable selection

Conventional model strategy

Whether to use a standard regression model or something else

Advanced topics

How to prevent overfitting the data

How to deal with missing values

How to deal with non-converging models

What you should put in your manuscript

Baseline tables

Follow Up tables

Regression tables

Risk plots

Nomograms

Deployment

Risk charts

Internet calculator

Cost-benefit analysis (waiting lists)

 

  • Does my model predict accurately?

 

Model assessment roadmap

Visualization of the predictions

Calculation of model performance

Visualization of model performance

Uncensored binary outcome

Distribution of the predicted risks

Brier score

AUC

Calibration curves

Right-censored time-to-event outcome (without competing risks)

Distribution of the predicted risks

Brier score with censored data

Time-dependent AUC for censored data

Calibration curve for censored data

Competing risks

Distribution of the predicted risks

Brier score with competing risks

Time-dependent AUC for competing risks

Calibration curve for competing risks

The Index of Prediction Accuracy (IPA)

Choice of prediction time horizon

Time-dependent prediction performance

 

 

  • How do I decide between rival models?

 

Model comparison roadmap

Analysis of rival prediction models

Uncensored binary outcome

Right-censored time-to-event outcome (without competing risks)

Competing risks

Clinically relevant change of prediction

Does a new marker improve prediction?

Many new predictors

Updating a subject's prediction

What would make me an expert?

Multiple cohorts / Multi-center studies

The role of treatment for making a prediction model

Modeling treatment

Comparative effectiveness tables

Learning curve paradigm

Internal validation (data splitting)

Single split

Calendar split

Multiple splits (cross-validation)

Dilemma of internal validation

The apparent and the + estimator

Tips and tricks

Missing values

Missing values in the learning data

Missing values in the validation data

Time-varying coefficient models

Time-varying predictor variables

 

  • Can't the computer just take care of all of this?

 

Zero layers of cross-validation

What may happen if you do not look at the data

Unsupervised modeling steps

Final model

One layer of cross-validation

Penalized regression

Supervised spline selection

Machine learning (two levels of cross-validation)

Random forest

Deep learning and artificial neural networks

The super learner

 

 

  • Things you might have expected in our book

 

          Threshold selection for decision making

          Number of events per variable

          Confidence intervals for predicted probabilities

          Models developed from case-control data

          Hosmer-Lemeshow test

          Backward elimination and stepwise selection

          Rank correlation (c-index) for survival outcome

          Integrated Brier score

          Net reclassification index and the integrated discrimination improvement

          Re-classification tables

          Boxplots of rival models conditional on the outcome

Biography

Thomas A. Gerds is professor at the biostatistics unit at the University of Copenhagen. He is affiliated with the Danish Heart Foundation. He is author of several R-packages on CRAN and has taught statistics courses to non-statisticians for many years.

Michael Kattan is a highly cited author and Chair of the Department of Quantitative Health Sciences at Cleveland Clinic. He is a Fellow of the American Statistical Association and has received two awards from the Society for Medical Decision Making: the Eugene L. Saenger Award for Distinguished Service, and the John M. Eisenberg Award for Practical Application of Medical Decision Making Research.

"Two of the top researchers in the field of clinical prediction models have produced a highly innovative book that brings a very technical topic to public grasp by throwing out the formulas and just talking straight from the heart of practical experience. While clinicians and medical residents can now learn how to build, diagnose and validate risk models themselves, all public health researchers, old and new, will reap the benefits and enjoyment from reading this book."
~Donna Ankerst, Technical University of Munich