1st Edition

Medical Risk Prediction Models
With Ties to Machine Learning

  • Available for pre-order. Item will ship after December 18, 2020
ISBN 9781138384477
December 18, 2020 Forthcoming by Chapman and Hall/CRC
304 Pages

USD $99.95

Prices & shipping based on shipping country


Book Description

Medical Risk Prediction Models: With Ties to Machine Learning is a hands-on book is for clinicians, epidemiologists, and professional statisticians who need to make or evaluate a statistical prediction model based on data. The subject of the book is the patient's individualized probability of a medical event within a given time horizon. Gerds & Kattan describe the mathematical details of making and evaluating a statistical prediction model in a highly pedagogical manner while avoiding mathematical notation. Read this book when you are in doubt about whether a Cox regression model predicts better than a random survival forest.

- All you need to know to correctly make an Online risk calculator from scratch
- Discrimination, calibration, predictive performance with censored data and competing risks
- R-code and illustrative examples
- Interpretation of prediction performance via benchmarks
- Comparison and combination of rival modeling strategies via cross-validation

Table of Contents

  1. Software
  2. Why should I care about statistical prediction models?

    The many uses of prediction models in medicine

    The unique messages of this book

    Prognostic factor modeling philosophy

    The rest of this book

  3. I am going to make a prediction model What do I need to know?
  4. Prediction model framework

    Target population

    The time origin

    The event of interest

    The prediction time horizon and follow-up


    Risks and risk predictions

    Classification of risk

    Predictor variables


    Prediction performance

    Proper scoring rules



    Explained variation

    Variability and uncertainty

    The interpretation is relative


    Average versus subgroups

    Study design

    Study design and sources of information


    Multi-center study

    Randomized clinical trial


    Given treatment and treatment options

    Sample size calculation


    Purpose dataset

    Data dictionary

    Measurement error

    Missing values

    Censored data

    Competing risks


    Risk prediction model

    Risk classifier

    How is prediction modeling different from statistical inference?

  5. Regression model
  6. Linear predictor

    Expert selects the candidate predictors

    How to select variables for inclusion in the final model

    All possible interactions


    Machine learning


    The conventional model

    Internal and external validation

    Conditional versus expected performance


    Data splitting


    Model checking and goodness of fit



    Age as time scale

    Odds ratios and hazard ratios are not predictions of risks

    Do not blame the metric

    Censored data versus competing risks

    Disease-specific survival


    Data-dependent decisions

    Balancing data

    Independent predictor

    Automated variable selection

  7. How should I prepare for modeling?
  8. Definition of subjects

    Choice of time scale

    Pre-selection of predictor variables

    Preparation of predictor variables

    Categorical variables

    Continuous variables

    Derived predictor variables

    Repeated measurements

    Measurement error

    Missing values

    Preparation of event time outcome

    Illustration without competing risks

    Illustration with competing risks

    Artificial censoring at the prediction time horizon

  9. I am ready to build a prediction model
  10. Specifying the model type

    Uncensored binary outcome

    Right-censored time-to-event outcome (no competing risks)

    Right-censored time-to-event outcome with competing risks

    Benchmark model

    Uncensored binary outcome

    Right-censored time-to-event outcome (without competing risks)

    Right-censored time-to-event with competing risks

    Including predictor variables

    Categorical predictor variables

    Continuous predictor variables

    Interaction effects

    Modeling strategy

    Variable selection

    Conventional model strategy

    Whether to use a standard regression model or something else

    Advanced topics

    How to prevent overfitting the data

    How to deal with missing values

    How to deal with non-converging models

    What you should put in your manuscript

    Baseline tables

    Follow Up tables

    Regression tables

    Risk plots



    Risk charts

    Internet calculator

    Cost-benefit analysis (waiting lists)

  11. Does my model predict accurately?
  12. Model assessment roadmap

    Visualization of the predictions

    Calculation of model performance

    Visualization of model performance

    Uncensored binary outcome

    Distribution of the predicted risks

    Brier score


    Calibration curves

    Right-censored time-to-event outcome (without competing risks)

    Distribution of the predicted risks

    Brier score with censored data

    Time-dependent AUC for censored data

    Calibration curve for censored data

    Competing risks

    Distribution of the predicted risks

    Brier score with competing risks

    Time-dependent AUC for competing risks

    Calibration curve for competing risks

    The Index of Prediction Accuracy (IPA)

    Choice of prediction time horizon

    Time-dependent prediction performance

  13. How do I decide between rival models?
  14. Model comparison roadmap

    Analysis of rival prediction models

    Uncensored binary outcome

    Right-censored time-to-event outcome (without competing risks)

    Competing risks

    Clinically relevant change of prediction

    Does a new marker improve prediction?

    Many new predictors

    Updating a subject's prediction

    What would make me an expert?

    Multiple cohorts / Multi-center studies

    The role of treatment for making a prediction model

    Modeling treatment

    Comparative effectiveness tables

    Learning curve paradigm

    Internal validation (data splitting)

    Single split

    Calendar split

    Multiple splits (cross-validation)

    Dilemma of internal validation

    The apparent and the + estimator

    Tips and tricks

    Missing values

    Missing values in the learning data

    Missing values in the validation data

    Time-varying coefficient models

    Time-varying predictor variables

  15. Can't the computer just take care of all of this?
  16. Zero layers of cross-validation

    What may happen if you do not look at the data

    Unsupervised modeling steps

    Final model

    One layer of cross-validation

    Penalized regression

    Supervised spline selection

    Machine learning (two levels of cross-validation)

    Random forest

    Deep learning and artificial neural networks

    The super learner

  17. Things you might have expected in our book

          Threshold selection for decision making

          Number of events per variable

          Confidence intervals for predicted probabilities

          Models developed from case-control data

          Hosmer-Lemeshow test

          Backward elimination and stepwise selection

          Rank correlation (c-index) for survival outcome

          Integrated Brier score

          Net reclassification index and the integrated discrimination improvement

          Re-classification tables

          Boxplots of rival models conditional on the outcome

View More



Thomas A. Gerds is professor at the biostatistics unit at the University of Copenhagen. He is affiliated with the Danish Heart Foundation. He is author of several R-packages on CRAN and has taught statistics courses to non-statisticians for many years.

Michael Kattan is a highly cited author and Chair of the Department of Quantitative Health Sciences at Cleveland Clinic. He is a Fellow of the American Statistical Association and has received two awards from the Society for Medical Decision Making: the Eugene L. Saenger Award for Distinguished Service, and the John M. Eisenberg Award for Practical Application of Medical Decision Making Research.


"Two of the top researchers in the field of clinical prediction models have produced a highly innovative book that brings a very technical topic to public grasp by throwing out the formulas and just talking straight from the heart of practical experience. While clinicians and medical residents can now learn how to build, diagnose and validate risk models themselves, all public health researchers, old and new, will reap the benefits and enjoyment from reading this book."
~Donna Ankerst, Technical University of Munich