Using R and RStudio for Data Management, Statistical Analysis, and Graphics: 2nd Edition (Hardback) book cover

Using R and RStudio for Data Management, Statistical Analysis, and Graphics

2nd Edition

By Nicholas J. Horton, Ken Kleinman

Chapman and Hall/CRC

313 pages | 50 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781482237368
pub: 2015-03-10
eBook (VitalSource) : 9780429161568
pub: 2015-03-10
from $38.48

FREE Standard Shipping!


Improve Your Analytical Skills

Incorporating the latest R packages as well as new case studies and applications, Using R and RStudio for Data Management, Statistical Analysis, and Graphics, Second Edition covers the aspects of R most often used by statistical analysts. New users of R will find the book’s simple approach easy to understand while more sophisticated users will appreciate the invaluable source of task-oriented information.

New to the Second Edition

  • The use of RStudio, which increases the productivity of R users and helps users avoid error-prone cut-and-paste workflows
  • New chapter of case studies illustrating examples of useful data management tasks, reading complex files, making and annotating maps, "scraping" data from the web, mining text files, and generating dynamic graphics
  • New chapter on special topics that describes key features, such as processing by group, and explores important areas of statistics, including Bayesian methods, propensity scores, and bootstrapping
  • New chapter on simulation that includes examples of data generated from complex models and distributions
  • A detailed discussion of the philosophy and use of the knitr and markdown packages for R
  • New packages that extend the functionality of R and facilitate sophisticated analyses
  • Reorganized and enhanced chapters on data input and output, data management, statistical and mathematical functions, programming, high-level graphics plots, and the customization of plots

Easily Find Your Desired Task

Conveniently organized by short, clear descriptive entries, this edition continues to show users how to easily perform an analytical task in R. Users can quickly find and implement the material they need through the extensive indexing, cross-referencing, and worked examples in the text. Datasets and code are available for download on a supplementary website.


"The second edition of the book preserves the many good points of the first, and makes some improvements to the structure, e.g., on the graphical compendium. It also contains added material on more recent possibilities…is a good buy, if the goal is to have a reference book which allows to quickly find a way of accomplishing a task at hand in R, be it with or without RStudio."

— Ulrike Grömping, Beuth University of Applied Sciences Berlin, Journal of Statistical Software, November 2015

"… the book is easy to use. I have had it on my desk for the past few weeks and it has become invaluable. For those, like me, who find themselves regularly switching between R, MATLAB, and Python—or similar packages—it can save a lot of time."

Significance Magazine, February 2016

Praise for the First Edition:

This book is an excellent reference resource. Used this way, it can be helpful for years to come for both experienced and novice users. The organization of the material makes it easy to find the relevant piece of information either by topic (from the table of contents) or using one of the indexes. The task entries are self-contained. Users with experience in technical computing may use it as a quick starter in R, as well.

—Georgi N. Boshnakov, Journal of Applied Statistics, June 2012

This book provides a concise reference and annotated examples for R … . It is needed because R does not come with a coordinated manual … It is much easier to find information in Horton and Kleinman’s book because of their more detailed indices and table of contents. … Horton and Kleinman have succeeded very well in their goal of providing a concise reference manual and annotated examples. If you know the statistics (or can look them up) and have some experience using R, it is an extremely useful reference, and it has become my most consulted R book. … it would be an excellent reference for those wanting look up the syntax of a command together with an example of how to use it. It is also very useful if you cannot remember the command and want to know how to do it in R.

—Paul H. Geissler, The American Statistician, November 2011

The interesting aspect of the book is that it does not only describe the basic statistics and graphics function of the basic R system but it describes the use of 40 additional available from the CRAN website. The website contains also the R code to install all the packages that contain the described features. In summary, the book is a useful complement to introductory statistics books and lectures … Those who know R might get additional hints on new features of statistical analyses.

International Statistical Review (2011), 79

Table of Contents

Data Input and Output



Further resources

Data Management

Structure and metadata

Derived variables and data manipulation

Merging, combining, and subsetting datasets

Date and time variables

Further resources


Statistical and Mathematical Functions

Probability distributions and random number generation

Mathematical functions

Matrix operations


Programming and Operating System Interface

Control flow, programming, and data generation


Interactions with the operating system

Common Statistical Procedures

Summary statistics

Bivariate statistics

Contingency tables

Tests for continuous variables

Analytic power and sample size calculations

Further resources


Linear Regression and ANOVA

Model fitting

Tests, contrasts, and linear functions of parameters

Model results and diagnostics

Model parameters and results

Further resources


Regression Generalizations and Modeling

Generalized linear models

Further generalizations

Robust methods

Models for correlated data

Survival analysis

Multivariate statistics and discriminant procedures

Complex survey design

Model selection and assessment

Further resources


A Graphical Compendium

Univariate plots

Univariate plots by grouping variable

Bivariate plots

Multivariate plots

Special-purpose plots

Further resources


Graphical Options and Configuration

Adding elements

Options and parameters

Saving graphs


Generating data

Simulation applications

Further resources

Special Topics

Processing by group

Simulation-based power calculations

Reproducible analysis and output

Advanced statistical methods

Further resources

Case Studies

Data management and related tasks

Read variable format files

Plotting maps

Data scraping

Text mining

Interactive visualization

Manipulating bigger datasets

Constrained optimization: the knapsack problem

Appendix A: Introduction to R and RStudio

Appendix B: The HELP Study Dataset

Appendix C: References

Appendix D: Indices

About the Authors

Nicholas J. Horton is a professor of statistics at Amherst College. His research interests include longitudinal regression models and missing data methods, with applications in psychiatric epidemiology and substance abuse research.

Ken Kleinman is an associate professor in the Department of Population Medicine at Harvard Medical School. His research deals with clustered data analysis, surveillance, and epidemiological applications in projects ranging from vaccine and bioterrorism surveillance to observational epidemiology to individual-, practice-, and community-randomized interventions.

Subject Categories

BISAC Subject Codes/Headings:
MATHEMATICS / Probability & Statistics / General