Implementing Reproducible Research
In computational science, reproducibility requires that researchers make code and data available to others so that the data can be analyzed in a similar manner as in the original publication. Code must be available to be distributed, data must be accessible in a readable format, and a platform must be available for widely distributing the data and code. In addition, both data and code need to be licensed permissively enough so that others can reproduce the work without a substantial legal burden.
Implementing Reproducible Research covers many of the elements necessary for conducting and distributing reproducible research. It explains how to accurately reproduce a scientific result.
Divided into three parts, the book discusses the tools, practices, and dissemination platforms for ensuring reproducibility in computational science. It describes:
- Computational tools, such as Sweave, knitr, VisTrails, Sumatra, CDE, and the Declaratron system
- Open source practices, good programming practices, trends in open science, and the role of cloud computing in reproducible research
- Software and methodological platforms, including open source software packages, RunMyCode platform, and open access journals
Each part presents contributions from leaders who have developed software and other products that have advanced the field. Supplementary material is available at www.ImplementingRR.org.
Tools: knitr: A Comprehensive Tool for Reproducible Research in R. Reproducibility Using VisTrails. Sumatra: A Toolkit for Reproducible Research. CDE: Automatically Package and Reproduce Computational Experiments. Reproducible Physical Science and the Declaratron. Practices and Guidelines: Developing Open-Source Scientific Practice. Reproducible Bioinformatics Research for Biologists. Reproducible Research for Large-Scale Data Analysis. Practicing Open Science. Reproducibility, Virtual Appliances, and Cloud Computing. The Reproducibility Project: A Model of Large-Scale Collaboration for Empirical Research on Reproducibility—Open Science Collaboration. What Computational Scientists Need to Know about Intellectual Property Law: A Primer. Platforms: Open Science in Machine Learning. RunMyCode.org: A Research-Reproducibility Tool for Computational Sciences. Open Science and the Role of Publishers in Reproducible Research. Index.
"This collection brings together the expertise and experience of numerous authors and is likely to be valuable to scientists and statisticians alike. … This book should have broad appeal … introduces some extremely useful tools and practices from leaders in the field. On top of that, it also contains an exciting vision for the future of scientific research. … The challenge of reproducibility in the computational era is being confronted across the sciences, with each field developing its own tools and best practices. This book is an important step in bringing together a broad group of scientists to share what has been learned."
—Journal of the American Statistical Association, June 2015
"The book as a whole has something for everybody and provides an interesting snapshot of the available tools, platforms, and good practices for researchers as the scientific community aims to be more self-correcting."
—Journal of Statistical Software, October 2014
"Three recent books have significantly influenced how I use R in reproducible work: Dynamic Documents with R and knitr by Yihui Xie, Reproducible Research with R and RStudio by Christopher Gandrud, and Implementing Reproducible Research edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng … I recommend all three books to R users at any level. There really is something here for everyone."
—Richard Layton, PhD, PE, Rose-Hulman Institute of Technology, Terre Haute, Indiana, USA
"In total, this book provides information on almost all aspects of reproducible research in the open science environment … I would recommend this book to anybody who wants to learn more about reproducible research in the context of open science."