Reproducible Research with R and RStudio: 3rd Edition (Paperback) book cover

Reproducible Research with R and RStudio

3rd Edition

By Christopher Gandrud

Chapman and Hall/CRC

272 pages

Purchasing Options:$ = USD
Paperback: 9780367143985
pub: 2020-02-03
Available for pre-order. Item will ship after 3rd February 2020
$69.95
x
Hardback: 9780367144029
pub: 2020-02-03
Available for pre-order. Item will ship after 3rd February 2020
$175.00
x


FREE Standard Shipping!

Description

Praise for previous editions:

"Gandrud has written a great outline of how a fully reproducible research project should look from start to finish, with brief explanations of each tool that he uses along the way… Advanced undergraduate students in mathematics, statistics, and similar fields as well as students just beginning their graduate studies would benefit the most from reading this book. Many more experienced R users or second-year graduate students might find themselves thinking, ‘I wish I’d read this book at the start of my studies, when I was first learning R!’…This book could be used as the main text for a class on reproducible research …" (The American Statistician)

Reproducible Research with R and R Studio, Third Edition brings together the skills and tools needed for doing and presenting computational research. Using straightforward examples, the book takes you through an entire reproducible research workflow. This practical workflow enables you to gather and analyze data as well as dynamically present results in print and on the web. Supplementary materials and example are available on the author’s website.

New to the Third Edition

  • Updated package recommendations, examples, URLs, and removed technologies no longer in regular use.
  • More advanced R Markdown (and less LaTeX) in discussions of markup languages and examples.
  • Stronger focus on reproducible working directory tools.
  • Updated discussion of cloud storage services and persistent reproducible material citation.
  • Added discussion of Jupyter notebooks and reproducible practices in industry.
  • Examples of data manipulation with Tidyverse tibbles (in addition to standard data frames) and pivot_longer() and pivot_wider() functions for pivoting data.

Features

  • Incorporates the most important advances that have been developed since the editions were published
  • Describes a complete reproducible research workflow, from data gathering to the presentation of results
  • Shows how to automatically generate tables and figures using R
  • Includes instructions on formatting a presentation document via markup languages

  • Discusses cloud storage and versioning services, particularly Github
  • Explains how to use Unix-like shell programs for working with large research projects

Table of Contents

I Getting Started

1 Introducing Reproducible Research

What Is Reproducible Research?

Why Should Research Be Reproducible?

For science

For you

Who Should Read This Book?

Academic researchers

Students

Instructors

Editors

Private sector researchers

The Tools of Reproducible Research

Why Use R, knitr/R Markdown, and RStudio for Reproducible Research?

Installing the main software

Installing markup languages

GNU Make

Other Tools

Book Overview

How to read this book

Reproduce this book

Contents overview

2 Getting Started with Reproducible Research

The Big Picture: A Workflow for Reproducible Research

Reproducible theory

Practical Tips for Reproducible Research

Document everything!

Everything is a (text) file

All files should be human readable

Explicitly tie your files together

Have a plan to organize, store, and make your files available

3 Getting Started with R, RStudio, and knitr/R Markdown

Using R: The Basics

Objects

Functions

The workspace & history

R history

Global R options

Installing new packages and loading functions

Using RStudio

Using knitr and R Markdown: The basics

What knitr does

What rmarkdown does

File extensions

Code chunks

Global chunk options

knitr package options

Hooks

knitr, R Markdown, & RStudio

knitr & R

R Markdown and R

4 Getting Started with File Management

File Paths & Naming Conventions

Root directories

Sub-directories & parent directories

Working directories

Absolute vs relative paths

Spaces in directory & file names

Organizing Your Research Project

Organizing Research with RStudio Projects

R File Manipulation Functions

Unix-like Shell Commands for File Management

File Navigation in RStudio

II Data Gathering and Storage

5 Storing, Collaborating, Accessing Files, and Versioning

Saving Data in Reproducible Formats

Storing Your Files in the Cloud: Dropbox

Storage

Accessing data

Contents v

Collaboration

Version control

Storing Your Files in the Cloud: GitHub

Setting up GitHub: Basic

Version control with Git

Remote storage on GitHub

Accessing on GitHub

Summing up the GitHub workflow

RStudio & GitHub

Setting up Git/GitHub with Projects

Using Git in RStudio Projects

6 Gathering Data with R

Organize Your Data Gathering: Makefiles

R Make-like files

GNU Make

Importing Locally Stored Data Sets

Importing Data Sets from the Internet

Data from non-secure (http) URLs

Data from secure (https) URLs

Compressed data stored online

Data APIs & feeds

Advanced Automatic Data Gathering: Web Scraping

7 Preparing Data for Analysis

Cleaning Data for Merging

Get a handle on your data

Reshaping data

Renaming variables

Ordering data

Subsetting data

Recoding string/numeric variables

Creating new variables from old

Changing variable types

Merging Data Sets

Binding

Merging data frames

Duplicate columns

8 Statistical Modeling and knitr/R Markdown

Incorporating Analyses into the Markup

Full code chunks

Showing code & results inline

Dynamically including non-R code in code chunks

vi Contents

Dynamically Including Modular Analysis Files

Source from a local file

Source from a URL

Reproducibly Random: setseed()

Computationally Intensive Analyses

9 Showing Results with Tables

Basic knitr Syntax for Tables

Table Basics

Tables in LaTeX

Tables in Markdown/HTML

Creating Tables from Supported Class R Objects

kable for Markdown and LaTeX

xtable for LaTeX and HTML

Fitting Large Tables in LaTeX

xtable with non-supported class objects

Creating variable description documents with xtable

10 Showing Results with Figures

Including Non-knitted Graphics

Including graphics in LaTeX

Including graphics in Markdown/HTML

Non-knitted graphics with knitr/rmarkdown

Basic knitr/rmarkdown Figure Options

Chunk options

Global options

Knitting R’s Default Graphics

Including ggplot Graphics

Showing regression results with caterpillar plots

JavaScript Graphs with googleVis

Basic googleVis figures

Including googleVis in knitted documents

JavaScript Graphs with htmlwidgets-based packages

11 Presenting with LaTeX

The Basics

Getting started with LaTeX editors

Basic LaTeX command syntax

The LaTeX preamble & body

Headings

Paragraphs & spacing

Horizontal lines

Text formatting

Math

Lists

Footnotes

Cross-references

Bibliographies with BibTeX

The bib file

Including citations in LaTeX documents

Generating a BibTeX file of R package citations

Presentations with LaTeX Beamer

Beamer basics

knitr with LaTeX slideshows

12 Presenting in a Variety of Formats with R Markdown

The Basics

Getting started with Markdown editors

Preamble and document structure

Headings

Horizontal lines

Paragraphs and new lines

Italics and bold

Links

Lists

Math with MathJax

Further Customizability with rmarkdown

CSS style files and Markdown

Slideshows with Markdown, R Markdown, and HTML

HTML Slideshows with rmarkdown

LaTeX Beamer Slideshows with rmarkdown

Slideshows with Markdown and RStudio’s R Presentations

Publishing HTML Documents Created with R Markdown

Further information on R Markdown

13 Conclusion

Citing Reproducible Research

Licensing Your Reproducible Research

Sharing Your Code in Packages

Project Development: Public or Private?

Is it Possible to Completely Future-Proof Your Research?

About the Author

Christopher Gandrud is Head of Economics and Experimentation at Zalando SE where he leads teams of social data scientists and software engineers building large scale automated decision-making systems. He was previously a research fellow at the Institute for Quantitative Social Science, Harvard University developing statistical software for the social and physical sciences. He has published many articles in peer-reviewed journals, including the Journal of Common Market Studies, Review of International Political Economy, Political Science Research and Methods, Journal of Statistical Software, and International Political Science Review. He earned a PhD in quantitative political science from the London School of Economics.

About the Series

Chapman & Hall/CRC The R Series

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
BUS061000
BUSINESS & ECONOMICS / Statistics
MAT029000
MATHEMATICS / Probability & Statistics / General