Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving, 1st Edition (Paperback) book cover

Data Science in R

A Case Studies Approach to Computational Reasoning and Problem Solving, 1st Edition

By Deborah Nolan, Duncan Temple Lang

Chapman and Hall/CRC

539 pages | 79 B/W Illus.

Purchasing Options:$ = USD
Paperback: 9781482234817
pub: 2015-04-21
SAVE ~$18.59
Hardback: 9781138469297
pub: 2017-11-15
SAVE ~$41.00
eBook (VitalSource) : 9781498759878
pub: 2015-09-15
from $44.48

FREE Standard Shipping!


Effectively Access, Transform, Manipulate, Visualize, and Reason about Data and Computation

Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving illustrates the details involved in solving real computational problems encountered in data analysis. It reveals the dynamic and iterative process by which data analysts approach a problem and reason about different ways of implementing solutions.

The book’s collection of projects, comprehensive sample solutions, and follow-up exercises encompass practical topics pertaining to data processing, including:

  • Non-standard, complex data formats, such as robot logs and email messages
  • Text processing and regular expressions
  • Newer technologies, such as Web scraping, Web services, Keyhole Markup Language (KML), and Google Earth
  • Statistical methods, such as classification trees, k-nearest neighbors, and naïve Bayes
  • Visualization and exploratory data analysis
  • Relational databases and Structured Query Language (SQL)
  • Simulation
  • Algorithm implementation
  • Large data and efficiency

Suitable for self-study or as supplementary reading in a statistical computing course, the book enables instructors to incorporate interesting problems into their courses so that students gain valuable experience and data science skills. Students learn how to acquire and work with unstructured or semistructured data as well as how to narrow down and carefully frame the questions of interest about the data.

Blending computational details with statistical and data analysis concepts, this book provides readers with an understanding of how professional data scientists think about daily computational tasks. It will improve readers’ computational reasoning of real-world data analyses.

Table of Contents

Data Manipulation and Modeling

Predicting Location via Indoor Positioning Systems Deborah Nolan and Duncan Temple Lang

Modeling Runners’ Times in the Cherry Blossom Race Daniel Kaplan and Deborah Nolan

Using Statistics to Identify Spam Deborah Nolan and Duncan Temple Lang

Processing Robot and Sensor Log Files: Seeking a Circular Target Samuel E. Buttrey, Timothy H. Chung, James N. Eagle, and Duncan W. Temple Lang

Strategies for Analyzing a 12 Gigabyte Data Set: Airline Flight Delays Michael Kane

Simulation Studies

Pairs Trading Cari Kaufman and Duncan Temple Lang

Simulation Study of a Branching Process Deborah Nolan and Duncan Temple Lang

A Self-Organizing Dynamic System with a Phase Transition Deborah Nolan and Duncan Temple Lang

Simulating Blackjack Hadley Wickham

Data- and Web-Technologies

Baseball: Exploring Data in a Relational Database Deborah Nolan and Duncan Temple Lang

CIA Factbook Mashup Deborah Nolan and Duncan Temple Lang

Exploring Data Science Jobs with Web Scraping and Text Mining Deborah Nolan and Duncan Temple Lang


Exercises appear at the end of most chapters.

About the Authors

Deborah Nolan holds the Zaffaroni Family Chair in Undergraduate Education at the University of California, Berkeley. She is a fellow of the American Statistical Association and the Institute of Mathematical Statistics. Her research has involved the empirical process, high-dimensional modeling, and, more recently, technology in education and reproducible research.

Duncan Temple Lang is the director of the Data Science Initiative at the University of California, Davis. He has been involved in the development of R and S for 20 years and has developed over 100 R packages. His research focuses on statistical computing, data technologies, meta-computing, reproducibility, and visualization.

About the Series

Chapman & Hall/CRC The R Series

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
COMPUTERS / Database Management / Data Mining
MATHEMATICS / Probability & Statistics / General