1st Edition
Multiple Imputation of Missing Data in Practice Basic Theory and Analysis Strategies
Multiple Imputation of Missing Data in Practice: Basic Theory and Analysis Strategies provides a comprehensive introduction to the multiple imputation approach to missing data problems that are often encountered in data analysis. Over the past 40 years or so, multiple imputation has gone through rapid development in both theories and applications. It is nowadays the most versatile, popular, and effective missing-data strategy that is used by researchers and practitioners across different fields. There is a strong need to better understand and learn about multiple imputation in the research and practical community.
Accessible to a broad audience, this book explains statistical concepts of missing data problems and the associated terminology. It focuses on how to address missing data problems using multiple imputation. It describes the basic theory behind multiple imputation and many commonly-used models and methods. These ideas are illustrated by examples from a wide variety of missing data problems. Real data from studies with different designs and features (e.g., cross-sectional data, longitudinal data, complex surveys, survival data, studies subject to measurement error, etc.) are used to demonstrate the methods. In order for readers not only to know how to use the methods, but understand why multiple imputation works and how to choose appropriate methods, simulation studies are used to assess the performance of the multiple imputation methods. Example datasets and sample programming code are either included in the book or available at a github site (https://github.com/he-zhang-hsu/multiple_imputation_book).
Key Features
- Provides an overview of statistical concepts that are useful for better understanding missing data problems and multiple imputation analysis
- Provides a detailed discussion on multiple imputation models and methods targeted to different types of missing data problems (e.g., univariate and multivariate missing data problems, missing data in survival analysis, longitudinal data, complex surveys, etc.)
- Explores measurement error problems with multiple imputation
- Discusses analysis strategies for multiple imputation diagnostics
- Discusses data production issues when the goal of multiple imputation is to release datasets for public use, as done by organizations that process and manage large-scale surveys with nonresponse problems
- For some examples, illustrative datasets and sample programming code from popular statistical packages (e.g., SAS, R, WinBUGS) are included in the book. For others, they are available at a github site (https://github.com/he-zhang-hsu/multiple_imputation_book)
- Introduction
- Statistical Background
- Multiple Imputation Analysis: Basics
- Multiple Imputation for Univariate Missing Data: Parametric Methods
- Multiple Imputation for Univariate Missing Data: Robust Methods
- Multiple Imputation for Multivariate Missing Data: the Joint Modeling Approach
- Multiple Imputation for Multivariate Missing Data: the Fully Conditional Specification Approach
- Multiple Imputation in Survival Data Analysis
- Multiple Imputation for Longitudinal Data
- Multiple Imputation Analysis for Complex Survey Data
- Multiple Imputation for Data Subject to Measurement Error
- Multiple Imputation Diagnostics
- Multiple Imputation Analysis for Nonignorable Missing Data
- Some Advanced Topics
A Motivating Example
Definition of Missing Data
Missing Data Patterns
Missing Data Mechanisms
Structure of the Book
Introduction
Frequentist Theory
Sampling Experiment
Model, Parameter, and Estimation
Hypothesis Testing
Resampling Methods: the Bootstrap Approach
Bayesian Analysis
Rudiments
Prior Distribution
Bayesian Computation
Asymptotic Equivalence between Frequentist and Bayesian
Estimates
Likelihood-Based Approaches to Missing Data Analysis
Ad-Hoc Missing Data Methods
Use of Monte Carlo Simulation Study
Summary
Introduction
Basic Ideas
Bayesian Motivation
Basic Combining Rules and Their Justifications
Why Does Multiple Imputation Work
Statistical Inference on Multiply Imputed Data
Scalar Inference
Multi-Parameter Inference
How to Choose the Number of Imputations
How to Create Multiple Imputations
Bayesian Imputation algorithm
Proper Multiple Imputation
Alternative Strategies
Practical Implementation
Summary
Overview
Imputation for Continuous Data Based on Normal Linear Models
Imputation for Non-Continuous Data Based on Generalized
Linear Models
Generalized Linear Models
Imputation for Binary Data
Logistic Regression Model Imputation
Discriminant Analysis Imputation
Rounding
Data Separation
Imputation for Non-Binary Categorical Data
Imputation for Other Types of Data
Imputation for a Missing Covariate in a Regression Analysis
Summary
Overview
Data Transformation
Transforming or Not
How to Apply Transformation in Multiple Imputation
Imputation Based on Smoothing Methods
Main Idea
Practical Use
Adjustments for Continuous Data with Range Restrictions
Predictive Mean Matching
Hot-Deck Imputation
Basic Idea and Procedure
PMM for Non-Continuous Data
Additional Discussion
Inclusive Imputation Strategy
Basic Idea
Dual Modeling Strategy
Propensity Score
Calibration Estimation and Doubly Robust
Imputation Methods
Summary
Introduction
Imputation for Monotone Missing Data
Multivariate Continuous Data
Multivariate Normal Models
Nonnormal Continuous Data
Multivariate Categorical Data
Log-Linear Models
Latent Variable Models
Mixed Categorical and Continuous Variables
One Continuous Variable and One Binary Variable
General Location Models
Latent Variable Models
Missing Outcome and Covariates in a Regression Analysis
General Strategy
Conditional Modeling Framework
Using WinBUGS
Background
Missing Interactions and Squared Terms of
Covariates in Regression Analysis
Imputation Using Flexible Distributions
Summary
Introduction
Basic Idea
Specification of Conditional Models
Handling Complex Data Features
Data Subject to Bounds or Restricted Ranges
Skip Patterns
Implementation
General Algorithm
Software
Using WinBUGS
Subtle Issues
Compatibility
Performance under Model Misspecifications
A Practical Example
Summary
Introduction
Imputation for Censored Data
Theoretical Basis
Parametric Imputation for Censored Event Times
Semiparametric Imputation for Censored Event Times
Merits of Imputing Censored Event Times
Survival Analysis with Missing Covariates
Overview
Joint Modeling
Fully Conditional Specification
Semiparametric Methods
Summary
Introduction
Mixed Models for Longitudinal Data
Imputation Based on Mixed Models
Why Using Mixed Models
General Imputation Algorithm
Examples
Wide Format Imputation
Multilevel Data
Summary
Introduction
Design-Based Inference for Survey Data
Imputation Strategies for Complex Survey Data
General Principles
Incorporating the Survey Sampling Design
Assuming MAR
Using FCS
Modeling Options
Some Examples from the Literature
Database Construction and Release
Data Editing
Documentation and Release
Summary
Introduction
Rationale
Imputation Strategies
True Values Partially Observed
Basic Setup
Direct Imputation
Accommodating a Specific Analysis
Using FCS
Predictors under Detection Limits
True Values Fully Unobserved
Data Harmonization Using Bridge Studies
Combining Information from Multiple Data Sources
Imputation for a Composite Variable
Summary
Overview
Imputation Model Development
Inclusion of Variables
Forming Imputation Models
Comparison between Observed and Imputed Values
Comparison on Marginal Distributions
Comparison on Conditional Distributions
Basic Idea
Using Propensity Score
Checking Completed Data
Posterior Predictive Checking
Comparing Completed Data with Their Replicates
Assessing the Fraction of Missing Information
Relating the Fraction of Missing Information with
Model Predictability
Prediction Accuracy
Comparison among Different Methods
Summary
Introduction
The Implication of Missing Not at Random
Using the Inclusive Imputation to Rescue
Missing Not at Random Models
Selection Models
Pattern Mixture Models
Shared Parameter Models
Analysis Strategies
Direct Imputation
Sensitivity Analysis
Summary
Overview
Uncongeniality in Multiple Imputation Analysis
Combining Analysis Results from Multiply Imputed Datasets:
Further Considerations
Normality Assumption in Question
Beyond Sufficient Statistics
Complicated Completed-Data Analyses: Variable Selection
High-Dimensional Data
Final Thoughts
Biography
Yulei He and Guangyu Zhang are mathematical statisticians at the National Center for Health Statistics, the U.S. Centers for Disease Control and Prevention. Chiu-Heish Hsu is a Professor of Biostatistics at the University of Arizona. All authors have researched, taught, and consulted in multiple imputation and missing data analysis in the past 20 years.