1st Edition

# Multiple Imputation of Missing Data in Practice Basic Theory and Analysis Strategies

By Yulei He, Guangyu Zhang, Chiu-Hsieh Hsu Copyright 2022
494 Pages 47 B/W Illustrations
by Chapman & Hall

494 Pages 47 B/W Illustrations
by Chapman & Hall

494 Pages 47 B/W Illustrations
by Chapman & Hall

Also available as eBook on:

Multiple Imputation of Missing Data in Practice: Basic Theory and Analysis Strategies provides a comprehensive introduction to the multiple imputation approach to missing data problems that are often encountered in data analysis. Over the past 40 years or so, multiple imputation has gone through rapid development in both theories and applications. It is nowadays the most versatile, popular, and effective missing-data strategy that is used by researchers and practitioners across different fields. There is a strong need to better understand and learn about multiple imputation in the research and practical community.

Accessible to a broad audience, this book explains statistical concepts of missing data problems and the associated terminology. It focuses on how to address missing data problems using multiple imputation. It describes the basic theory behind multiple imputation and many commonly-used models and methods. These ideas are illustrated by examples from a wide variety of missing data problems. Real data from studies with different designs and features (e.g., cross-sectional data, longitudinal data, complex surveys, survival data, studies subject to measurement error, etc.) are used to demonstrate the methods. In order for readers not only to know how to use the methods, but understand why multiple imputation works and how to choose appropriate methods, simulation studies are used to assess the performance of the multiple imputation methods. Example datasets and sample programming code are either included in the book or available at a github site (https://github.com/he-zhang-hsu/multiple_imputation_book).

Key Features

1. Provides an overview of statistical concepts that are useful for better understanding missing data problems and multiple imputation analysis
2. Provides a detailed discussion on multiple imputation models and methods targeted to different types of missing data problems (e.g., univariate and multivariate missing data problems, missing data in survival analysis, longitudinal data, complex surveys, etc.)
3. Explores measurement error problems with multiple imputation
4. Discusses analysis strategies for multiple imputation diagnostics
5. Discusses data production issues when the goal of multiple imputation is to release datasets for public use, as done by organizations that process and manage large-scale surveys with nonresponse problems
6. For some examples, illustrative datasets and sample programming code from popular statistical packages (e.g., SAS, R, WinBUGS) are included in the book. For others, they are available at a github site (https://github.com/he-zhang-hsu/multiple_imputation_book)

1. Introduction
2. A Motivating Example

Definition of Missing Data

Missing Data Patterns

Missing Data Mechanisms

Structure of the Book

3. Statistical Background
4. Introduction

Frequentist Theory

Sampling Experiment

Model, Parameter, and Estimation

Hypothesis Testing

Resampling Methods: the Bootstrap Approach

Bayesian Analysis

Rudiments

Prior Distribution

Bayesian Computation

Asymptotic Equivalence between Frequentist and Bayesian

Estimates

Likelihood-Based Approaches to Missing Data Analysis

Use of Monte Carlo Simulation Study

Summary

5. Multiple Imputation Analysis: Basics
6. Introduction

Basic Ideas

Bayesian Motivation

Basic Combining Rules and Their Justifications

Why Does Multiple Imputation Work

Statistical Inference on Multiply Imputed Data

Scalar Inference

Multi-Parameter Inference

How to Choose the Number of Imputations

How to Create Multiple Imputations

Bayesian Imputation algorithm

Proper Multiple Imputation

Alternative Strategies

Practical Implementation

Summary

7. Multiple Imputation for Univariate Missing Data: Parametric Methods
8. Overview

Imputation for Continuous Data Based on Normal Linear Models

Imputation for Non-Continuous Data Based on Generalized

Linear Models

Generalized Linear Models

Imputation for Binary Data

Logistic Regression Model Imputation

Discriminant Analysis Imputation

Rounding

Data Separation

Imputation for Non-Binary Categorical Data

Imputation for Other Types of Data

Imputation for a Missing Covariate in a Regression Analysis

Summary

9. Multiple Imputation for Univariate Missing Data: Robust Methods
10. Overview

Data Transformation

Transforming or Not

How to Apply Transformation in Multiple Imputation

Imputation Based on Smoothing Methods

Main Idea

Practical Use

Adjustments for Continuous Data with Range Restrictions

Predictive Mean Matching

Hot-Deck Imputation

Basic Idea and Procedure

PMM for Non-Continuous Data

Inclusive Imputation Strategy

Basic Idea

Dual Modeling Strategy

Propensity Score

Calibration Estimation and Doubly Robust

Imputation Methods

Summary

11. Multiple Imputation for Multivariate Missing Data: the Joint Modeling Approach
12. Introduction

Imputation for Monotone Missing Data

Multivariate Continuous Data

Multivariate Normal Models

Nonnormal Continuous Data

Multivariate Categorical Data

Log-Linear Models

Latent Variable Models

Mixed Categorical and Continuous Variables

One Continuous Variable and One Binary Variable

General Location Models

Latent Variable Models

Missing Outcome and Covariates in a Regression Analysis

General Strategy

Conditional Modeling Framework

Using WinBUGS

Background

Missing Interactions and Squared Terms of

Covariates in Regression Analysis

Imputation Using Flexible Distributions

Summary

13. Multiple Imputation for Multivariate Missing Data: the Fully Conditional Specification Approach
14. Introduction

Basic Idea

Specification of Conditional Models

Handling Complex Data Features

Data Subject to Bounds or Restricted Ranges

Skip Patterns

Implementation

General Algorithm

Software

Using WinBUGS

Subtle Issues

Compatibility

Performance under Model Misspecifications

A Practical Example

Summary

15. Multiple Imputation in Survival Data Analysis
16. Introduction

Imputation for Censored Data

Theoretical Basis

Parametric Imputation for Censored Event Times

Semiparametric Imputation for Censored Event Times

Merits of Imputing Censored Event Times

Survival Analysis with Missing Covariates

Overview

Joint Modeling

Fully Conditional Specification

Semiparametric Methods

Summary

17. Multiple Imputation for Longitudinal Data
18. Introduction

Mixed Models for Longitudinal Data

Imputation Based on Mixed Models

Why Using Mixed Models

General Imputation Algorithm

Examples

Wide Format Imputation

Multilevel Data

Summary

19. Multiple Imputation Analysis for Complex Survey Data
20. Introduction

Design-Based Inference for Survey Data

Imputation Strategies for Complex Survey Data

General Principles

Incorporating the Survey Sampling Design

Assuming MAR

Using FCS

Modeling Options

Some Examples from the Literature

Database Construction and Release

Data Editing

Documentation and Release

Summary

21. Multiple Imputation for Data Subject to Measurement Error
22. Introduction

Rationale

Imputation Strategies

True Values Partially Observed

Basic Setup

Direct Imputation

Accommodating a Specific Analysis

Using FCS

Predictors under Detection Limits

True Values Fully Unobserved

Data Harmonization Using Bridge Studies

Combining Information from Multiple Data Sources

Imputation for a Composite Variable

Summary

23. Multiple Imputation Diagnostics
24. Overview

Imputation Model Development

Inclusion of Variables

Forming Imputation Models

Comparison between Observed and Imputed Values

Comparison on Marginal Distributions

Comparison on Conditional Distributions

Basic Idea

Using Propensity Score

Checking Completed Data

Posterior Predictive Checking

Comparing Completed Data with Their Replicates

Assessing the Fraction of Missing Information

Relating the Fraction of Missing Information with

Model Predictability

Prediction Accuracy

Comparison among Different Methods

Summary

25. Multiple Imputation Analysis for Nonignorable Missing Data
26. Introduction

The Implication of Missing Not at Random

Using the Inclusive Imputation to Rescue

Missing Not at Random Models

Selection Models

Pattern Mixture Models

Shared Parameter Models

Analysis Strategies

Direct Imputation

Sensitivity Analysis

Summary

Overview

Uncongeniality in Multiple Imputation Analysis

Combining Analysis Results from Multiply Imputed Datasets:

Further Considerations

Normality Assumption in Question

Beyond Sufficient Statistics

Complicated Completed-Data Analyses: Variable Selection

High-Dimensional Data

Final Thoughts

### Biography

Yulei He and Guangyu Zhang are mathematical statisticians at the National Center for Health Statistics, the U.S. Centers for Disease Control and Prevention. Chiu-Heish Hsu is a Professor of Biostatistics at the University of Arizona. All authors have researched, taught, and consulted in multiple imputation and missing data analysis in the past 20 years.