1st Edition

Multiple Imputation of Missing Data in Practice Basic Theory and Analysis Strategies

By Yulei He, Guangyu Zhang, Chiu-Hsieh Hsu Copyright 2022
    494 Pages 47 B/W Illustrations
    by Chapman & Hall

    494 Pages 47 B/W Illustrations
    by Chapman & Hall

    494 Pages 47 B/W Illustrations
    by Chapman & Hall

    Multiple Imputation of Missing Data in Practice: Basic Theory and Analysis Strategies provides a comprehensive introduction to the multiple imputation approach to missing data problems that are often encountered in data analysis. Over the past 40 years or so, multiple imputation has gone through rapid development in both theories and applications. It is nowadays the most versatile, popular, and effective missing-data strategy that is used by researchers and practitioners across different fields. There is a strong need to better understand and learn about multiple imputation in the research and practical community.

    Accessible to a broad audience, this book explains statistical concepts of missing data problems and the associated terminology. It focuses on how to address missing data problems using multiple imputation. It describes the basic theory behind multiple imputation and many commonly-used models and methods. These ideas are illustrated by examples from a wide variety of missing data problems. Real data from studies with different designs and features (e.g., cross-sectional data, longitudinal data, complex surveys, survival data, studies subject to measurement error, etc.) are used to demonstrate the methods. In order for readers not only to know how to use the methods, but understand why multiple imputation works and how to choose appropriate methods, simulation studies are used to assess the performance of the multiple imputation methods. Example datasets and sample programming code are either included in the book or available at a github site (https://github.com/he-zhang-hsu/multiple_imputation_book).

    Key Features

    1. Provides an overview of statistical concepts that are useful for better understanding missing data problems and multiple imputation analysis
    2. Provides a detailed discussion on multiple imputation models and methods targeted to different types of missing data problems (e.g., univariate and multivariate missing data problems, missing data in survival analysis, longitudinal data, complex surveys, etc.)
    3. Explores measurement error problems with multiple imputation
    4. Discusses analysis strategies for multiple imputation diagnostics
    5. Discusses data production issues when the goal of multiple imputation is to release datasets for public use, as done by organizations that process and manage large-scale surveys with nonresponse problems
    6. For some examples, illustrative datasets and sample programming code from popular statistical packages (e.g., SAS, R, WinBUGS) are included in the book. For others, they are available at a github site (https://github.com/he-zhang-hsu/multiple_imputation_book)

    1. Introduction
    2. A Motivating Example

      Definition of Missing Data

      Missing Data Patterns

      Missing Data Mechanisms

      Structure of the Book

    3. Statistical Background
    4. Introduction

      Frequentist Theory

      Sampling Experiment

      Model, Parameter, and Estimation

      Hypothesis Testing

      Resampling Methods: the Bootstrap Approach

      Bayesian Analysis


      Prior Distribution

      Bayesian Computation

      Asymptotic Equivalence between Frequentist and Bayesian


      Likelihood-Based Approaches to Missing Data Analysis

      Ad-Hoc Missing Data Methods

      Use of Monte Carlo Simulation Study


    5. Multiple Imputation Analysis: Basics
    6. Introduction

      Basic Ideas

      Bayesian Motivation

      Basic Combining Rules and Their Justifications

      Why Does Multiple Imputation Work

      Statistical Inference on Multiply Imputed Data

      Scalar Inference

      Multi-Parameter Inference

      How to Choose the Number of Imputations

      How to Create Multiple Imputations

      Bayesian Imputation algorithm

      Proper Multiple Imputation

      Alternative Strategies

      Practical Implementation


    7. Multiple Imputation for Univariate Missing Data: Parametric Methods
    8. Overview

      Imputation for Continuous Data Based on Normal Linear Models

      Imputation for Non-Continuous Data Based on Generalized

      Linear Models

      Generalized Linear Models

      Imputation for Binary Data

      Logistic Regression Model Imputation

      Discriminant Analysis Imputation


      Data Separation

      Imputation for Non-Binary Categorical Data

      Imputation for Other Types of Data

      Imputation for a Missing Covariate in a Regression Analysis


    9. Multiple Imputation for Univariate Missing Data: Robust Methods
    10. Overview

      Data Transformation

      Transforming or Not

      How to Apply Transformation in Multiple Imputation

      Imputation Based on Smoothing Methods

      Main Idea

      Practical Use

      Adjustments for Continuous Data with Range Restrictions

      Predictive Mean Matching

      Hot-Deck Imputation

      Basic Idea and Procedure

      PMM for Non-Continuous Data

      Additional Discussion

      Inclusive Imputation Strategy

      Basic Idea

      Dual Modeling Strategy

      Propensity Score

      Calibration Estimation and Doubly Robust

      Imputation Methods


    11. Multiple Imputation for Multivariate Missing Data: the Joint Modeling Approach
    12. Introduction

      Imputation for Monotone Missing Data

      Multivariate Continuous Data

      Multivariate Normal Models

      Nonnormal Continuous Data

      Multivariate Categorical Data

      Log-Linear Models

      Latent Variable Models

      Mixed Categorical and Continuous Variables

      One Continuous Variable and One Binary Variable

      General Location Models

      Latent Variable Models

      Missing Outcome and Covariates in a Regression Analysis

      General Strategy

      Conditional Modeling Framework

      Using WinBUGS


      Missing Interactions and Squared Terms of

      Covariates in Regression Analysis

      Imputation Using Flexible Distributions


    13. Multiple Imputation for Multivariate Missing Data: the Fully Conditional Specification Approach
    14. Introduction

      Basic Idea

      Specification of Conditional Models

      Handling Complex Data Features

      Data Subject to Bounds or Restricted Ranges

      Skip Patterns


      General Algorithm


      Using WinBUGS

      Subtle Issues


      Performance under Model Misspecifications

      A Practical Example


    15. Multiple Imputation in Survival Data Analysis
    16. Introduction

      Imputation for Censored Data

      Theoretical Basis

      Parametric Imputation for Censored Event Times

      Semiparametric Imputation for Censored Event Times

      Merits of Imputing Censored Event Times

      Survival Analysis with Missing Covariates


      Joint Modeling

      Fully Conditional Specification

      Semiparametric Methods


    17. Multiple Imputation for Longitudinal Data
    18. Introduction

      Mixed Models for Longitudinal Data

      Imputation Based on Mixed Models

      Why Using Mixed Models

      General Imputation Algorithm


      Wide Format Imputation

      Multilevel Data


    19. Multiple Imputation Analysis for Complex Survey Data
    20. Introduction

      Design-Based Inference for Survey Data

      Imputation Strategies for Complex Survey Data

      General Principles

      Incorporating the Survey Sampling Design

      Assuming MAR

      Using FCS

      Modeling Options

      Some Examples from the Literature

      Database Construction and Release

      Data Editing

      Documentation and Release


    21. Multiple Imputation for Data Subject to Measurement Error
    22. Introduction


      Imputation Strategies

      True Values Partially Observed

      Basic Setup

      Direct Imputation

      Accommodating a Specific Analysis

      Using FCS

      Predictors under Detection Limits

      True Values Fully Unobserved

      Data Harmonization Using Bridge Studies

      Combining Information from Multiple Data Sources

      Imputation for a Composite Variable


    23. Multiple Imputation Diagnostics
    24. Overview

      Imputation Model Development

      Inclusion of Variables

      Forming Imputation Models

      Comparison between Observed and Imputed Values

      Comparison on Marginal Distributions

      Comparison on Conditional Distributions

      Basic Idea

      Using Propensity Score

      Checking Completed Data

      Posterior Predictive Checking

      Comparing Completed Data with Their Replicates

      Assessing the Fraction of Missing Information

      Relating the Fraction of Missing Information with

      Model Predictability

      Prediction Accuracy

      Comparison among Different Methods


    25. Multiple Imputation Analysis for Nonignorable Missing Data
    26. Introduction

      The Implication of Missing Not at Random

      Using the Inclusive Imputation to Rescue

      Missing Not at Random Models

      Selection Models

      Pattern Mixture Models

      Shared Parameter Models

      Analysis Strategies

      Direct Imputation

      Sensitivity Analysis


    27. Some Advanced Topics


    Uncongeniality in Multiple Imputation Analysis

    Combining Analysis Results from Multiply Imputed Datasets:

    Further Considerations

    Normality Assumption in Question

    Beyond Sufficient Statistics

    Complicated Completed-Data Analyses: Variable Selection

    High-Dimensional Data

    Final Thoughts


    Yulei He and Guangyu Zhang are mathematical statisticians at the National Center for Health Statistics, the U.S. Centers for Disease Control and Prevention. Chiu-Heish Hsu is a Professor of Biostatistics at the University of Arizona. All authors have researched, taught, and consulted in multiple imputation and missing data analysis in the past 20 years.