Multiple Imputation of Missing Data in Practice : Basic Theory and Analysis Strategies book cover
1st Edition

Multiple Imputation of Missing Data in Practice
Basic Theory and Analysis Strategies

ISBN 9781498722063
Published November 26, 2021 by Chapman and Hall/CRC
494 Pages 47 B/W Illustrations

FREE Standard Shipping
USD $99.95

Prices & shipping based on shipping country


Book Description

Multiple Imputation of Missing Data in Practice: Basic Theory and Analysis Strategies provides a comprehensive introduction to the multiple imputation approach to missing data problems that are often encountered in data analysis. Over the past 40 years or so, multiple imputation has gone through rapid development in both theories and applications. It is nowadays the most versatile, popular, and effective missing-data strategy that is used by researchers and practitioners across different fields. There is a strong need to better understand and learn about multiple imputation in the research and practical community.

Accessible to a broad audience, this book explains statistical concepts of missing data problems and the associated terminology. It focuses on how to address missing data problems using multiple imputation. It describes the basic theory behind multiple imputation and many commonly-used models and methods. These ideas are illustrated by examples from a wide variety of missing data problems. Real data from studies with different designs and features (e.g., cross-sectional data, longitudinal data, complex surveys, survival data, studies subject to measurement error, etc.) are used to demonstrate the methods. In order for readers not only to know how to use the methods, but understand why multiple imputation works and how to choose appropriate methods, simulation studies are used to assess the performance of the multiple imputation methods. Example datasets and sample programming code are either included in the book or available at a github site (

Key Features

  1. Provides an overview of statistical concepts that are useful for better understanding missing data problems and multiple imputation analysis
  2. Provides a detailed discussion on multiple imputation models and methods targeted to different types of missing data problems (e.g., univariate and multivariate missing data problems, missing data in survival analysis, longitudinal data, complex surveys, etc.)
  3. Explores measurement error problems with multiple imputation
  4. Discusses analysis strategies for multiple imputation diagnostics
  5. Discusses data production issues when the goal of multiple imputation is to release datasets for public use, as done by organizations that process and manage large-scale surveys with nonresponse problems
  6. For some examples, illustrative datasets and sample programming code from popular statistical packages (e.g., SAS, R, WinBUGS) are included in the book. For others, they are available at a github site (

Table of Contents

  1. Introduction
  2. A Motivating Example

    Definition of Missing Data

    Missing Data Patterns

    Missing Data Mechanisms

    Structure of the Book

  3. Statistical Background
  4. Introduction

    Frequentist Theory

    Sampling Experiment

    Model, Parameter, and Estimation

    Hypothesis Testing

    Resampling Methods: the Bootstrap Approach

    Bayesian Analysis


    Prior Distribution

    Bayesian Computation

    Asymptotic Equivalence between Frequentist and Bayesian


    Likelihood-Based Approaches to Missing Data Analysis

    Ad-Hoc Missing Data Methods

    Use of Monte Carlo Simulation Study


  5. Multiple Imputation Analysis: Basics
  6. Introduction

    Basic Ideas

    Bayesian Motivation

    Basic Combining Rules and Their Justifications

    Why Does Multiple Imputation Work

    Statistical Inference on Multiply Imputed Data

    Scalar Inference

    Multi-Parameter Inference

    How to Choose the Number of Imputations

    How to Create Multiple Imputations

    Bayesian Imputation algorithm

    Proper Multiple Imputation

    Alternative Strategies

    Practical Implementation


  7. Multiple Imputation for Univariate Missing Data: Parametric Methods
  8. Overview

    Imputation for Continuous Data Based on Normal Linear Models

    Imputation for Non-Continuous Data Based on Generalized

    Linear Models

    Generalized Linear Models

    Imputation for Binary Data

    Logistic Regression Model Imputation

    Discriminant Analysis Imputation


    Data Separation

    Imputation for Non-Binary Categorical Data

    Imputation for Other Types of Data

    Imputation for a Missing Covariate in a Regression Analysis


  9. Multiple Imputation for Univariate Missing Data: Robust Methods
  10. Overview

    Data Transformation

    Transforming or Not

    How to Apply Transformation in Multiple Imputation

    Imputation Based on Smoothing Methods

    Main Idea

    Practical Use

    Adjustments for Continuous Data with Range Restrictions

    Predictive Mean Matching

    Hot-Deck Imputation

    Basic Idea and Procedure

    PMM for Non-Continuous Data

    Additional Discussion

    Inclusive Imputation Strategy

    Basic Idea

    Dual Modeling Strategy

    Propensity Score

    Calibration Estimation and Doubly Robust

    Imputation Methods


  11. Multiple Imputation for Multivariate Missing Data: the Joint Modeling Approach
  12. Introduction

    Imputation for Monotone Missing Data

    Multivariate Continuous Data

    Multivariate Normal Models

    Nonnormal Continuous Data

    Multivariate Categorical Data

    Log-Linear Models

    Latent Variable Models

    Mixed Categorical and Continuous Variables

    One Continuous Variable and One Binary Variable

    General Location Models

    Latent Variable Models

    Missing Outcome and Covariates in a Regression Analysis

    General Strategy

    Conditional Modeling Framework

    Using WinBUGS


    Missing Interactions and Squared Terms of

    Covariates in Regression Analysis

    Imputation Using Flexible Distributions


  13. Multiple Imputation for Multivariate Missing Data: the Fully Conditional Specification Approach
  14. Introduction

    Basic Idea

    Specification of Conditional Models

    Handling Complex Data Features

    Data Subject to Bounds or Restricted Ranges

    Skip Patterns


    General Algorithm


    Using WinBUGS

    Subtle Issues


    Performance under Model Misspecifications

    A Practical Example


  15. Multiple Imputation in Survival Data Analysis
  16. Introduction

    Imputation for Censored Data

    Theoretical Basis

    Parametric Imputation for Censored Event Times

    Semiparametric Imputation for Censored Event Times

    Merits of Imputing Censored Event Times

    Survival Analysis with Missing Covariates


    Joint Modeling

    Fully Conditional Specification

    Semiparametric Methods


  17. Multiple Imputation for Longitudinal Data
  18. Introduction

    Mixed Models for Longitudinal Data

    Imputation Based on Mixed Models

    Why Using Mixed Models

    General Imputation Algorithm


    Wide Format Imputation

    Multilevel Data


  19. Multiple Imputation Analysis for Complex Survey Data
  20. Introduction

    Design-Based Inference for Survey Data

    Imputation Strategies for Complex Survey Data

    General Principles

    Incorporating the Survey Sampling Design

    Assuming MAR

    Using FCS

    Modeling Options

    Some Examples from the Literature

    Database Construction and Release

    Data Editing

    Documentation and Release


  21. Multiple Imputation for Data Subject to Measurement Error
  22. Introduction


    Imputation Strategies

    True Values Partially Observed

    Basic Setup

    Direct Imputation

    Accommodating a Specific Analysis

    Using FCS

    Predictors under Detection Limits

    True Values Fully Unobserved

    Data Harmonization Using Bridge Studies

    Combining Information from Multiple Data Sources

    Imputation for a Composite Variable


  23. Multiple Imputation Diagnostics
  24. Overview

    Imputation Model Development

    Inclusion of Variables

    Forming Imputation Models

    Comparison between Observed and Imputed Values

    Comparison on Marginal Distributions

    Comparison on Conditional Distributions

    Basic Idea

    Using Propensity Score

    Checking Completed Data

    Posterior Predictive Checking

    Comparing Completed Data with Their Replicates

    Assessing the Fraction of Missing Information

    Relating the Fraction of Missing Information with

    Model Predictability

    Prediction Accuracy

    Comparison among Different Methods


  25. Multiple Imputation Analysis for Nonignorable Missing Data
  26. Introduction

    The Implication of Missing Not at Random

    Using the Inclusive Imputation to Rescue

    Missing Not at Random Models

    Selection Models

    Pattern Mixture Models

    Shared Parameter Models

    Analysis Strategies

    Direct Imputation

    Sensitivity Analysis


  27. Some Advanced Topics


Uncongeniality in Multiple Imputation Analysis

Combining Analysis Results from Multiply Imputed Datasets:

Further Considerations

Normality Assumption in Question

Beyond Sufficient Statistics

Complicated Completed-Data Analyses: Variable Selection

High-Dimensional Data

Final Thoughts

View More



Yulei He and Guangyu Zhang are mathematical statisticians at the National Center for Health Statistics, the U.S. Centers for Disease Control and Prevention. Chiu-Heish Hsu is a Professor of Biostatistics at the University of Arizona. All authors have researched, taught, and consulted in multiple imputation and missing data analysis in the past 20 years.