Probability, Statistics, and Data : A Fresh Approach Using R book cover
1st Edition

Probability, Statistics, and Data
A Fresh Approach Using R

ISBN 9780367436674
Published November 26, 2021 by Chapman and Hall/CRC
512 Pages 130 Color & 84 B/W Illustrations

FREE Standard Shipping
SAVE $19.99
was $99.95
USD $79.96

Prices & shipping based on shipping country


Book Description

This book is a fresh approach to a calculus based, first course in probability and statistics, using R throughout to give a central role to data and simulation.

The book introduces probability with Monte Carlo simulation as an essential tool. Simulation makes challenging probability questions quickly accessible and easily understandable. Mathematical approaches are included, using calculus when appropriate, but are always connected to experimental computations.

Using R and simulation gives a nuanced understanding of statistical inference. The impact of departure from assumptions in statistical tests is emphasized, quantified using simulations, and demonstrated with real data. The book compares parametric and non-parametric methods through simulation, allowing for a thorough investigation of testing error and power. The text builds R skills from the outset, allowing modern methods of resampling and cross validation to be introduced along with traditional statistical techniques.

Fifty-two data sets are included in the complementary R package fosdata. Most of these data sets are from recently published papers, so that you are working with current, real data, which is often large and messy. Two central chapters use powerful tidyverse tools (dplyr, ggplot2, tidyr, stringr) to wrangle data and produce meaningful visualizations. Preliminary versions of the book have been used for five semesters at Saint Louis University, and the majority of the more than 400 exercises have been classroom tested.

Table of Contents

1. Data in R. 2. Probability. 3. Discrete Random Variables. 4. Continuous Random Variables. 5. Simulation of Random Variables. 6. Data Manipulation. 7. Data Visualization with ggplot. 8. Inference on the Mean. 9. Rank Based Tests. 10. Tabular Data. 11. Simple Linear Regression. 11. Analysis of Variance and Comparison of Multiple Groups. 13. Multiple Regression.

View More



Darrin Speegle has 25 years of experience teaching probability and statistics at Saint Louis University, where he is a Professor and the Director of Data Science. He served as the program committee chair on the organizing team for UseR!2020 in St. Louis. His research has been supported by the National Science Foundation and the Simons Foundation.

Bryan Clair is the Chair of the Mathematics and Statistics Department at Saint Louis University. His research is in topology and combinatorics. His work writing mathematics for general audiences has appeared in the New York Times, Washington Post, Math Horizons, and the SF magazine Strange Horizons.


"The manuscript is technically correct, clearly written and appropriate for a first course in probability and statistics and also a second course. The strengths are that it is relevant, modern, and uses R. It has a lot of sample problems and they are more modern than is typical of other texts."
~Kathy Gray, Cal State University Chico

"I have been already employing the online version of this book as a reference during the classes I taught in 2020. It is an excellent complement to the material that I have been using in my lectures along the last years. I do not know any similar text, which introduces R from scratch, supplying at the same time a simulation-oriented probability course in R. I believe it represents a major contribution to the existing literature.
~Mariela Sued, Universidad de Buenos Aires Argentina

"I think the manuscript is technically correct, clearly written, and at an appropriate level of difficulty. A particular strength of the book is that it is fully integrated with R. I believe students will benefit from this feature by learning statistical techniques and practicing R simultaneously."
~Haomiao Jin, University of Southern California

"The department is working on setting up a data science minor…I could see such a book being used as the textbook for a one semester course in probability and statistics. It would be appropriate for such a course, where the theory is somewhat deemphasized and simulations are used sometimes to justify the
theorems rather than formal proofs."
~Daniel Chambers, Boston College

"I would adopt this book for my class. I like that it is so easy to read and provides some of the theory and derivations that the students need (not quite as much as would be ideal, but when combined with the coverage of the material during class periods this would be ok) but also has a heavy emphasis on the practical applications of the material. It can be very difficult to explain to students how and why the material matters and can be applied, having that built into the textbook is a hugely useful resource. I can teach my students how to work through the derivation of the pmf for a distribution or look up values in a z-table, but being able to engage them with current problems and interesting data sets is a much larger task."
~Erin Garcia, Auburn University

"I think quantitatively skilled students who would be bored in a basic intro course but don’t need the full theory are the right audience. This is about the level I would want to use with the engineering-bound students."
~Aimee Schwab-McCoy, Technology Sligo, Ireland

"I would use this book. It is a good beginning book for students who want to learn about probability simulation applications in Statistics…Someone with one semester of calculus would be fine for almost everything presented in the book…This books fills a need at the undergraduate level as I am not aware of any good book existing."
~Eric Suess, Cal State California State University, East Bay