Flexible Imputation of Missing Data

By Stef van Buuren

© 2012 – Chapman and Hall/CRC

342 pages | 58 B/W Illus.

Purchasing Options:
Hardback: 9781439868249
pub: 2012-03-28
US Dollars$97.95

e–Inspection Copy

About the Book

Missing data form a problem in every scientific discipline, yet the techniques required to handle them are complicated and often lacking. One of the great ideas in statistical science—multiple imputation—fills gaps in the data with plausible values, the uncertainty of which is coded in the data itself. It also solves other problems, many of which are missing data problems in disguise.

Flexible Imputation of Missing Data is supported by many examples using real data taken from the author's vast experience of collaborative research, and presents a practical guide for handling missing data under the framework of multiple imputation. Furthermore, detailed guidance of implementation in R using the author’s package MICE is included throughout the book.

Assuming familiarity with basic statistical concepts and multivariate methods, Flexible Imputation of Missing Data is intended for two audiences:

  • (Bio)statisticians, epidemiologists, and methodologists in the social and health sciences
  • Substantive researchers who do not call themselves statisticians, but who possess the necessary skills to understand the principles and to follow the recipes

This graduate-tested book avoids mathematical and technical details as much as possible: formulas are accompanied by a verbal statement that explains the formula in layperson terms. Readers less concerned with the theoretical underpinnings will be able to pick up the general idea, and technical material is available for those who desire deeper understanding. The analyses can be replicated in R using a dedicated package developed by the author.


"The opening chapters of this book will be useful to the newcomer to missing data, including the nonstatistician. Many of the recommendations in the ‘Do’s and don’ts’ section will be useful to the researcher who encounters missing data and wishes to deal with it responsibly. Finally, the code examples provide a reassuring companion to the user of the mice software package."

Biometrical Journal, 2014

"This book would be well suited as a textbook, especially at the graduate level, possibly for biostatisticians, epidemiologists, or applied scientists and users of statistical methodology. …a very enjoyable read, and—at least in my opinion—it is a book that belongs on everyone’s shelf as it does open one’s eyes to a problem that has surrounded us (and that many of us have ignored!) for a very long time."

—Wolfgang S. Jank, Journal of the American Statistical Association, June 2013

"From the first lines of Chapter 1 throughout the entire monograph, the author presents numerous R language codes, so the book also serves as a good introduction to R. Each chapter is complete with various examples and exercises. The book is very useful to graduate students and researchers for solving practical problems with real data."

Technometrics, February 2013

"It’s excellent and I highly recommend it. … van Buuren’s book is great even if you don’t end up using the algorithm described in the book … he supplies lots of intuition, examples, and graphs."

—Andrew Gelman, Columbia University

"… a beautiful book that is so full of guidance for statisticians … exceptionally up to date and has more useful wisdom about dealing with common missing data problems than any other source I've seen."

—Frank Harrell, Vanderbilt University

"I’m delighted to see this new book on multiple imputation by Stef van Buuren …This book represents a 'no nonsense' straightforward approach to the application of multiple imputation. I particularly like Stef’s use of graphical displays … It’s great to have Stef’s book on multiple imputation, and I look forward to seeing more editions as this rapidly developing methodology continues to become even more effective at handling missing data problems in practice."

—From the Foreword by Donald B. Rubin

Table of Contents



The problem of missing data

Concepts of MCAR, MAR and MNAR

Simple solutions that do not (always) work

Multiple imputation in a nutshell

Goal of the book

What the book does not cover

Structure of the book


Multiple imputation

Historic overview

Incomplete data concepts

Why and when multiple imputation works

Statistical intervals and tests

Evaluation criteria

When to use multiple imputation

How many imputations?


Univariate missing data

How to generate multiple imputations

Imputation under the normal linear normal

Imputation under non-normal distributions

Predictive mean matching

Categorical data

Other data types

Classification and regression trees

Multilevel data

Non-ignorable methods


Multivariate missing data

Missing data pattern

Issues in multivariate imputation

Monotone data imputation

Joint Modeling

Fully Conditional Specification

FCS and JM



Imputation in practice

Overview of modeling choices

Ignorable or non-ignorable?

Model form and predictors

Derived variables

Algorithmic options




Analysis of imputed data

What to do with the imputed data?

Parameter pooling

Statistical tests for multiple imputation

Stepwise model selection



Case studies

Measurement issues

Too many columns

Sensitivity analysis

Correct prevalence estimates from self-reported data

Enhancing comparability


Selection issues

Correcting for selective drop-out

Correcting for non-response


Longitudinal data

Long and wide format

SE Fireworks Disaster Study

Time raster imputation





Some dangers, some do's and some don'ts


Other applications

Future developments


Appendices: Software






Other software


Author Index

Subject Index

About the Series

Chapman & Hall/CRC Interdisciplinary Statistics

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
MATHEMATICS / Probability & Statistics / General