3rd Edition

Reproducible Research with R and RStudio

By Christopher Gandrud Copyright 2020
    298 Pages
    by Chapman & Hall

    298 Pages
    by Chapman & Hall

    Praise for previous editions:
    "Gandrud has written a great outline of how a fully reproducible research project should look from start to finish, with brief explanations of each tool that he uses along the way… Advanced undergraduate students in mathematics, statistics, and similar fields as well as students just beginning their graduate studies would benefit the most from reading this book. Many more experienced R users or second-year graduate students might find themselves thinking, ‘I wish I’d read this book at the start of my studies, when I was first learning R!’…This book could be used as the main text for a class on reproducible research …" (The American Statistician)

    Reproducible Research with R and R Studio, Third Edition brings together the skills and tools needed for doing and presenting computational research. Using straightforward examples, the book takes you through an entire reproducible research workflow. This practical workflow enables you to gather and analyze data as well as dynamically present results in print and on the web. Supplementary materials and example are available on the author’s website.

    New to the Third Edition

    • Updated package recommendations, examples, URLs, and removed technologies no longer in regular use.
    • More advanced R Markdown (and less LaTeX) in discussions of markup languages and examples.
    • Stronger focus on reproducible working directory tools.
    • Updated discussion of cloud storage services and persistent reproducible material citation.
    • Added discussion of Jupyter notebooks and reproducible practices in industry.
    • Examples of data manipulation with Tidyverse tibbles (in addition to standard data frames) and pivot_longer() and pivot_wider() functions for pivoting data.

    Features

    • Incorporates the most important advances that have been developed since the editions were published
    • Describes a complete reproducible research workflow, from data gathering to the presentation of results
    • Shows how to automatically generate tables and figures using R
    • Includes instructions on formatting a presentation document via markup languages

    • Discusses cloud storage and versioning services, particularly Github
    • Explains how to use Unix-like shell programs for working with large research projects

    I Getting Started

    1 Introducing Reproducible Research

    What Is Reproducible Research?

    Why Should Research Be Reproducible?

    For science

    For you

    Who Should Read This Book?

    Academic researchers

    Students

    Instructors

    Editors

    Private sector researchers

    The Tools of Reproducible Research

    Why Use R, knitr/R Markdown, and RStudio for Reproducible Research?

    Installing the main software

    Installing markup languages

    GNU Make

    Other Tools

    Book Overview

    How to read this book

    Reproduce this book

    Contents overview

    2 Getting Started with Reproducible Research

    The Big Picture: A Workflow for Reproducible Research

    Reproducible theory

    Practical Tips for Reproducible Research

    Document everything!

    Everything is a (text) file

    All files should be human readable

    Explicitly tie your files together

    Have a plan to organize, store, and make your files available

    3 Getting Started with R, RStudio, and knitr/R Markdown

    Using R: The Basics

    Objects

    Functions

    The workspace & history

    R history

    Global R options

    Installing new packages and loading functions

    Using RStudio

    Using knitr and R Markdown: The basics

    What knitr does

    What rmarkdown does

    File extensions

    Code chunks

    Global chunk options

    knitr package options

    Hooks

    knitr, R Markdown, & RStudio

    knitr & R

    R Markdown and R

    4 Getting Started with File Management

    File Paths & Naming Conventions

    Root directories

    Sub-directories & parent directories

    Working directories

    Absolute vs relative paths

    Spaces in directory & file names

    Organizing Your Research Project

    Organizing Research with RStudio Projects

    R File Manipulation Functions

    Unix-like Shell Commands for File Management

    File Navigation in RStudio

    II Data Gathering and Storage

    5 Storing, Collaborating, Accessing Files, and Versioning

    Saving Data in Reproducible Formats

    Storing Your Files in the Cloud: Dropbox

    Storage

    Accessing data

    Contents v

    Collaboration

    Version control

    Storing Your Files in the Cloud: GitHub

    Setting up GitHub: Basic

    Version control with Git

    Remote storage on GitHub

    Accessing on GitHub

    Summing up the GitHub workflow

    RStudio & GitHub

    Setting up Git/GitHub with Projects

    Using Git in RStudio Projects

    6 Gathering Data with R

    Organize Your Data Gathering: Makefiles

    R Make-like files

    GNU Make

    Importing Locally Stored Data Sets

    Importing Data Sets from the Internet

    Data from non-secure (http) URLs

    Data from secure (https) URLs

    Compressed data stored online

    Data APIs & feeds

    Advanced Automatic Data Gathering: Web Scraping

    7 Preparing Data for Analysis

    Cleaning Data for Merging

    Get a handle on your data

    Reshaping data

    Renaming variables

    Ordering data

    Subsetting data

    Recoding string/numeric variables

    Creating new variables from old

    Changing variable types

    Merging Data Sets

    Binding

    Merging data frames

    Duplicate columns

    8 Statistical Modeling and knitr/R Markdown

    Incorporating Analyses into the Markup

    Full code chunks

    Showing code & results inline

    Dynamically including non-R code in code chunks

    vi Contents

    Dynamically Including Modular Analysis Files

    Source from a local file

    Source from a URL

    Reproducibly Random: setseed()

    Computationally Intensive Analyses

    9 Showing Results with Tables

    Basic knitr Syntax for Tables

    Table Basics

    Tables in LaTeX

    Tables in Markdown/HTML

    Creating Tables from Supported Class R Objects

    kable for Markdown and LaTeX

    xtable for LaTeX and HTML

    Fitting Large Tables in LaTeX

    xtable with non-supported class objects

    Creating variable description documents with xtable

    10 Showing Results with Figures

    Including Non-knitted Graphics

    Including graphics in LaTeX

    Including graphics in Markdown/HTML

    Non-knitted graphics with knitr/rmarkdown

    Basic knitr/rmarkdown Figure Options

    Chunk options

    Global options

    Knitting R’s Default Graphics

    Including ggplot Graphics

    Showing regression results with caterpillar plots

    JavaScript Graphs with googleVis

    Basic googleVis figures

    Including googleVis in knitted documents

    JavaScript Graphs with htmlwidgets-based packages

    11 Presenting with LaTeX

    The Basics

    Getting started with LaTeX editors

    Basic LaTeX command syntax

    The LaTeX preamble & body

    Headings

    Paragraphs & spacing

    Horizontal lines

    Text formatting

    Math

    Lists

    Footnotes

    Cross-references

    Bibliographies with BibTeX

    The bib file

    Including citations in LaTeX documents

    Generating a BibTeX file of R package citations

    Presentations with LaTeX Beamer

    Beamer basics

    knitr with LaTeX slideshows

    12 Presenting in a Variety of Formats with R Markdown

    The Basics

    Getting started with Markdown editors

    Preamble and document structure

    Headings

    Horizontal lines

    Paragraphs and new lines

    Italics and bold

    Links

    Lists

    Math with MathJax

    Further Customizability with rmarkdown

    CSS style files and Markdown

    Slideshows with Markdown, R Markdown, and HTML

    HTML Slideshows with rmarkdown

    LaTeX Beamer Slideshows with rmarkdown

    Slideshows with Markdown and RStudio’s R Presentations

    Publishing HTML Documents Created with R Markdown

    Further information on R Markdown

    13 Conclusion

    Citing Reproducible Research

    Licensing Your Reproducible Research

    Sharing Your Code in Packages

    Project Development: Public or Private?

    Is it Possible to Completely Future-Proof Your Research?

    Biography

    Christopher Gandrud is Head of Economics and Experimentation at Zalando SE where he leads teams of social data scientists and software engineers building large scale automated decision-making systems. He was previously a research fellow at the Institute for Quantitative Social Science, Harvard University developing statistical software for the social and physical sciences. He has published many articles in peer-reviewed journals, including the Journal of Common Market Studies, Review of International Political Economy, Political Science Research and Methods, Journal of Statistical Software, and International Political Science Review. He earned a PhD in quantitative political science from the London School of Economics.

    I recommend this book for students studying statistical sciences, individuals beginning their research career, and advanced researchers looking to up their reproducibility game. I am thrilled to have this resource for my own lab and indent on having my students follow the recommendations within closely.

    - Lucy D’Agostino McGowan, Biometrics, 2020, Volume 76, Issue 4

    In summary, I found this book to be a very good introduction to R and reproducible research, one that I can certainly recommend.

    - Anikó Lovik, International Society for Clinical Biostatistics, June 2021 Number 71