1st Edition

Medical Risk Prediction Models With Ties to Machine Learning

By Thomas A. Gerds, Michael W. Kattan Copyright 2022
    312 Pages
    by Chapman & Hall

    312 Pages
    by Chapman & Hall

    Medical Risk Prediction Models: With Ties to Machine Learning is a hands-on book for clinicians, epidemiologists, and professional statisticians who need to make or evaluate a statistical prediction model based on data. The subject of the book is the patient’s individualized probability of a medical event within a given time horizon. Gerds and Kattan describe the mathematical details of making and evaluating a statistical prediction model in a highly pedagogical manner while avoiding mathematical notation. Read this book when you are in doubt about whether a Cox regression model predicts better than a random survival forest.

    Features:

    • All you need to know to correctly make an online risk calculator from scratch
    • Discrimination, calibration, and predictive performance with censored data and competing risks
    • R-code and illustrative examples
    • Interpretation of prediction performance via benchmarks
    • Comparison and combination of rival modeling strategies via cross-validation

    Thomas A. Gerds is a professor at the Biostatistics Unit at the University of Copenhagen and is affiliated with the Danish Heart Foundation. He is the author of several R-packages on CRAN and has taught statistics courses to non-statisticians for many years.

    Michael W. Kattan is a highly cited author and Chair of the Department of Quantitative Health Sciences at Cleveland Clinic. He is a Fellow of the American Statistical Association and has received two awards from the Society for Medical Decision Making: the Eugene L. Saenger Award for Distinguished Service, and the John M. Eisenberg Award for Practical Application of Medical Decision-Making Research.

    1. Software
    2. Why should I care about statistical prediction models?

      The many uses of prediction models in medicine

      The unique messages of this book

      Prognostic factor modeling philosophy

      The rest of this book

    3. I am going to make a prediction model What do I need to know?
    4. Prediction model framework

      Target population

      The time origin

      The event of interest

      The prediction time horizon and follow-up

      Landmarking

      Risks and risk predictions

      Classification of risk

      Predictor variables

      Checklist

      Prediction performance

      Proper scoring rules

      Calibration

      Discrimination

      Explained variation

      Variability and uncertainty

      The interpretation is relative

      Utility

      Average versus subgroups

      Study design

      Study design and sources of information

      Cohort

      Multi-center study

      Randomized clinical trial

      Case-control

      Given treatment and treatment options

      Sample size calculation

      Data

      Purpose dataset

      Data dictionary

      Measurement error

      Missing values

      Censored data

      Competing risks

      Modeling

      Risk prediction model

      Risk classifier

      How is prediction modeling different from statistical inference?

    5. Regression model
    6. Linear predictor

      Expert selects the candidate predictors

      How to select variables for inclusion in the final model

      All possible interactions

      Checklist

      Machine learning

      Validation

      The conventional model

      Internal and external validation

      Conditional versus expected performance

      Cross-validation

      Data splitting

      Bootstrap

      Model checking and goodness of fit

      Reproducibility

      Pitfalls

      Age as time scale

      Odds ratios and hazard ratios are not predictions of risks

      Do not blame the metric

      Censored data versus competing risks

      Disease-specific survival

      Overfitting

      Data-dependent decisions

      Balancing data

      Independent predictor

      Automated variable selection

    7. How should I prepare for modeling?
    8. Definition of subjects

      Choice of time scale

      Pre-selection of predictor variables

      Preparation of predictor variables

      Categorical variables

      Continuous variables

      Derived predictor variables

      Repeated measurements

      Measurement error

      Missing values

      Preparation of event time outcome

      Illustration without competing risks

      Illustration with competing risks

      Artificial censoring at the prediction time horizon

    9. I am ready to build a prediction model
    10. Specifying the model type

      Uncensored binary outcome

      Right-censored time-to-event outcome (no competing risks)

      Right-censored time-to-event outcome with competing risks

      Benchmark model

      Uncensored binary outcome

      Right-censored time-to-event outcome (without competing risks)

      Right-censored time-to-event with competing risks

      Including predictor variables

      Categorical predictor variables

      Continuous predictor variables

      Interaction effects

      Modeling strategy

      Variable selection

      Conventional model strategy

      Whether to use a standard regression model or something else

      Advanced topics

      How to prevent overfitting the data

      How to deal with missing values

      How to deal with non-converging models

      What you should put in your manuscript

      Baseline tables

      Follow Up tables

      Regression tables

      Risk plots

      Nomograms

      Deployment

      Risk charts

      Internet calculator

      Cost-benefit analysis (waiting lists)

    11. Does my model predict accurately?
    12. Model assessment roadmap

      Visualization of the predictions

      Calculation of model performance

      Visualization of model performance

      Uncensored binary outcome

      Distribution of the predicted risks

      Brier score

      AUC

      Calibration curves

      Right-censored time-to-event outcome (without competing risks)

      Distribution of the predicted risks

      Brier score with censored data

      Time-dependent AUC for censored data

      Calibration curve for censored data

      Competing risks

      Distribution of the predicted risks

      Brier score with competing risks

      Time-dependent AUC for competing risks

      Calibration curve for competing risks

      The Index of Prediction Accuracy (IPA)

      Choice of prediction time horizon

      Time-dependent prediction performance

    13. How do I decide between rival models?
    14. Model comparison roadmap

      Analysis of rival prediction models

      Uncensored binary outcome

      Right-censored time-to-event outcome (without competing risks)

      Competing risks

      Clinically relevant change of prediction

      Does a new marker improve prediction?

      Many new predictors

      Updating a subject's prediction

      What would make me an expert?

      Multiple cohorts / Multi-center studies

      The role of treatment for making a prediction model

      Modeling treatment

      Comparative effectiveness tables

      Learning curve paradigm

      Internal validation (data splitting)

      Single split

      Calendar split

      Multiple splits (cross-validation)

      Dilemma of internal validation

      The apparent and the + estimator

      Tips and tricks

      Missing values

      Missing values in the learning data

      Missing values in the validation data

      Time-varying coefficient models

      Time-varying predictor variables

    15. Can't the computer just take care of all of this?
    16. Zero layers of cross-validation

      What may happen if you do not look at the data

      Unsupervised modeling steps

      Final model

      One layer of cross-validation

      Penalized regression

      Supervised spline selection

      Machine learning (two levels of cross-validation)

      Random forest

      Deep learning and artificial neural networks

      The super learner

    17. Things you might have expected in our book

              Threshold selection for decision making

              Number of events per variable

              Confidence intervals for predicted probabilities

              Models developed from case-control data

              Hosmer-Lemeshow test

              Backward elimination and stepwise selection

              Rank correlation (c-index) for survival outcome

              Integrated Brier score

              Net reclassification index and the integrated discrimination improvement

              Re-classification tables

              Boxplots of rival models conditional on the outcome

    Biography

    Thomas A. Gerds is professor at the biostatistics unit at the University of Copenhagen. He is affiliated with the Danish Heart Foundation. He is author of several R-packages on CRAN and has taught statistics courses to non-statisticians for many years.

    Michael Kattan is a highly cited author and Chair of the Department of Quantitative Health Sciences at Cleveland Clinic. He is a Fellow of the American Statistical Association and has received two awards from the Society for Medical Decision Making: the Eugene L. Saenger Award for Distinguished Service, and the John M. Eisenberg Award for Practical Application of Medical Decision Making Research.

    "Two of the top researchers in the field of clinical prediction models have produced a highly innovative book that brings a very technical topic to public grasp by throwing out the formulas and just talking straight from the heart of practical experience. While clinicians and medical residents can now learn how to build, diagnose and validate risk models themselves, all public health researchers, old and new, will reap the benefits and enjoyment from reading this book."
    ~Donna Ankerst, Technical University of Munich