2nd Edition

Flexible Imputation of Missing Data, Second Edition

By Stef van Buuren Copyright 2018
    444 Pages 76 Color Illustrations
    by Chapman & Hall

    444 Pages 76 Color Illustrations
    by Chapman & Hall

    444 Pages 76 Color Illustrations
    by Chapman & Hall

    Missing data pose challenges to real-life data analysis. Simple ad-hoc fixes, like deletion or mean imputation, only work under highly restrictive conditions, which are often not met in practice. Multiple imputation replaces each missing value by multiple plausible values. The variability between these replacements reflects our ignorance of the true (but missing) value. Each of the completed data set is then analyzed by standard methods, and the results are pooled to obtain unbiased estimates with correct confidence intervals. Multiple imputation is a general approach that also inspires novel solutions to old problems by reformulating the task at hand as a missing-data problem.

    This is the second edition of a popular book on multiple imputation, focused on explaining the application of methods through detailed worked examples using the MICE package as developed by the author. This new edition incorporates the recent developments in this fast-moving field.

    This class-tested book avoids mathematical and technical details as much as possible: formulas are accompanied by verbal statements that explain the formula in accessible terms. The book sharpens the reader’s intuition on how to think about missing data, and provides all the tools needed to execute a well-grounded quantitative analysis in the presence of missing data.


    Multiple imputation

    Univariate missing data

    Multivariate missing data

    Analysis of imputed data

    Imputation in practice

    Multilevel multiple imputation

    Individual Causal Effects

    Measurement issues

    Selection issues

    Longitudinal data



    Stef van Buuren is a statistical consultant at the Netherlands Organisation for Applied Scientific Research TNO in Leiden with a broad knowledge of quantitative issues in public health. Since 2015, Van Buuren holds is the world's first Professor of Missing Data at the department of Methodology & Statistics, FSS, University of Utrecht. He is the originator of various new statistical tools.

    Praise for the first edition:

    "As an applied biostatistician, the introductory chapter spoke directly to me. It began motivating the issues in multiple imputation from the perspective of applied data problems and problematic approaches to them…Foundational examples start with simple scenarios that are gradually and clearly expanded upon. At each step, R code is shown and illustrations visually show the effects of different approaches. For every major concept there is a half-page "Algorithm Box", a short summary in pseudo-code of the algorithm being discussed. These, in conjunction with the explanatory text, made things extremely clear and easy to grasp… Overall, this book does an excellent job of bringing one from no knowledge of multiple imputation to a working knowledge of multiple imputation."
    —ISCB News, July 2016

    "The opening chapters of this book will be useful to the newcomer to missing data, including the nonstatistician. Many of the recommendations in the ‘Do’s and don’ts’ section will be useful to the researcher who encounters missing data and wishes to deal with it responsibly. Finally, the code examples provide a reassuring companion to the user of the mice software package."
    —Biometrical Journal, 2014

    "This book would be well suited as a textbook, especially at the graduate level, possibly for biostatisticians, epidemiologists, or applied scientists and users of statistical methodology. …a very enjoyable read, and—at least in my opinion—it is a book that belongs on everyone’s shelf as it does open one’s eyes to a problem that has surrounded us (and that many of us have ignored!) for a very long time."
    —Wolfgang S. Jank, Journal of the American Statistical Association, June 2013

    "From the first lines of Chapter 1 throughout the entire monograph, the author presents numerous R language codes, so the book also serves as a good introduction to R. Each chapter is complete with various examples and exercises. The book is very useful to graduate students and researchers for solving practical problems with real data."
    —Technometrics, February 2013

    "It’s excellent and I highly recommend it. … van Buuren’s book is great even if you don’t end up using the algorithm described in the book … he supplies lots of intuition, examples, and graphs."
    —Andrew Gelman, Columbia University

    "… a beautiful book that is so full of guidance for statisticians … exceptionally up to date and has more useful wisdom about dealing with common missing data problems than any other source I've seen."
    —Frank Harrell, Vanderbilt University

    "I’m delighted to see this new book on multiple imputation by Stef van Buuren …This book represents a 'no nonsense' straightforward approach to the application of multiple imputation. I particularly like Stef’s use of graphical displays … It’s great to have Stef’s book on multiple imputation, and I look forward to seeing more editions as this rapidly developing methodology continues to become even more effective at handling missing data problems in practice."
    —From the Foreword by Donald B. Rubin

    "Flexible Imputation of Missing Data (2nd Edition) will definitely appeal to practitioners who analyze real world data with missing values, particularly clinical and health data. The book covers all types of missing data and missing data patterns...The most prominent feature of this book is the clarity of exposition achieved by presenting clear description, examples, plots, and code. In addition, the example data sets used in the book are very familiar to researchers and practitioners in clinical and health data analysis, creating a tangible connection between the text and the practice. Students can use the example datasets to understand very clearly the process of missing data management and analysis. Another great feature of the book is the use of the R programming language and focus on one package. This structure gives the book coherence in terms of methods and tools used. In addition, on the theory side there is enough information, challenging questions, and reference to literature that make this book a rich resource for theoretical researchers. The intended audience of this book are practitioners in data analysis (especially biostatisticians), advanced graduate students, and theoretical researchers."
    - Abdolvahab Khademi, JSS, April 2020