1st Edition

Probability, Statistics, and Data A Fresh Approach Using R

By Darrin Speegle, Bryan Clair Copyright 2022
    512 Pages 130 Color & 84 B/W Illustrations
    by Chapman & Hall

    512 Pages 130 Color & 84 B/W Illustrations
    by Chapman & Hall

    This book is a fresh approach to a calculus based, first course in probability and statistics, using R throughout to give a central role to data and simulation.

    The book introduces probability with Monte Carlo simulation as an essential tool. Simulation makes challenging probability questions quickly accessible and easily understandable. Mathematical approaches are included, using calculus when appropriate, but are always connected to experimental computations.

    Using R and simulation gives a nuanced understanding of statistical inference. The impact of departure from assumptions in statistical tests is emphasized, quantified using simulations, and demonstrated with real data. The book compares parametric and non-parametric methods through simulation, allowing for a thorough investigation of testing error and power. The text builds R skills from the outset, allowing modern methods of resampling and cross validation to be introduced along with traditional statistical techniques.

    Fifty-two data sets are included in the complementary R package fosdata. Most of these data sets are from recently published papers, so that you are working with current, real data, which is often large and messy. Two central chapters use powerful tidyverse tools (dplyr, ggplot2, tidyr, stringr) to wrangle data and produce meaningful visualizations. Preliminary versions of the book have been used for five semesters at Saint Louis University, and the majority of the more than 400 exercises have been classroom tested.

    1. Data in R. 2. Probability. 3. Discrete Random Variables. 4. Continuous Random Variables. 5. Simulation of Random Variables. 6. Data Manipulation. 7. Data Visualization with ggplot. 8. Inference on the Mean. 9. Rank Based Tests. 10. Tabular Data. 11. Simple Linear Regression. 11. Analysis of Variance and Comparison of Multiple Groups. 13. Multiple Regression.


    Darrin Speegle has 25 years of experience teaching probability and statistics at Saint Louis University, where he is a Professor and the Director of Data Science. He served as the program committee chair on the organizing team for UseR!2020 in St. Louis. His research has been supported by the National Science Foundation and the Simons Foundation.

    Bryan Clair is the Chair of the Mathematics and Statistics Department at Saint Louis University. His research is in topology and combinatorics. His work writing mathematics for general audiences has appeared in the New York Times, Washington Post, Math Horizons, and the SF magazine Strange Horizons.

    "Overall, this textbook is an excellent resource for working with applied statistics in R. As an instructor in a statistics department, I would highly recommend it as a complementary text for any student with an interest in either statistical inference or R; the combination works great together."
    - Scott A. Roths in The American Statistician, November 2022

    "This book provides a new approach to teaching classical statistics in data science flavour without offering many mathematical details of the results and their proofs. Still, it gives an excellent exposition on how to teach statistics to those who are more interested in the applications of statistical tools but do not have a strong background in mathematics. It considers many numerical examples and explains the concepts using simulations, a strength of the book."
    - Shalabh in Royal Statistical Society Series A, September 2022

    "After reading the book, I think that book is an excellent choice for applied statisticians, data analysts, and data scientists. This book permits the study of data while developing a deeper comprehension of the R programming language. The examples are frequently repeated with modest modifications in R so that the reader may observe how the results vary."
    - Luca Bertolaccini in ISCB News, June 2022

    "The manuscript is technically correct, clearly written and appropriate for a first course in probability and statistics and also a second course. The strengths are that it is relevant, modern, and uses R. It has a lot of sample problems and they are more modern than is typical of other texts."
    - Kathy Gray, Cal State University Chico

    "I have been already employing the online version of this book as a reference during the classes I taught in 2020. It is an excellent complement to the material that I have been using in my lectures along the last years. I do not know any similar text, which introduces R from scratch, supplying at the same time a simulation-oriented probability course in R. I believe it represents a major contribution to the existing literature.
    - Mariela Sued, Universidad de Buenos Aires Argentina

    "I think the manuscript is technically correct, clearly written, and at an appropriate level of difficulty. A particular strength of the book is that it is fully integrated with R. I believe students will benefit from this feature by learning statistical techniques and practicing R simultaneously."
    - Haomiao Jin, University of Southern California

    "The department is working on setting up a data science minor…I could see such a book being used as the textbook for a one semester course in probability and statistics. It would be appropriate for such a course, where the theory is somewhat deemphasized and simulations are used sometimes to justify the
    theorems rather than formal proofs."
    - Daniel Chambers, Boston College

    "I would adopt this book for my class. I like that it is so easy to read and provides some of the theory and derivations that the students need (not quite as much as would be ideal, but when combined with the coverage of the material during class periods this would be ok) but also has a heavy emphasis on the practical applications of the material. It can be very difficult to explain to students how and why the material matters and can be applied, having that built into the textbook is a hugely useful resource. I can teach my students how to work through the derivation of the pmf for a distribution or look up values in a z-table, but being able to engage them with current problems and interesting data sets is a much larger task."
    - Erin Garcia, Auburn University

    "I think quantitatively skilled students who would be bored in a basic intro course but don’t need the full theory are the right audience. This is about the level I would want to use with the engineering-bound students."
    - Aimee Schwab-McCoy, Technology Sligo, Ireland

    "I would use this book. It is a good beginning book for students who want to learn about probability simulation applications in Statistics…Someone with one semester of calculus would be fine for almost everything presented in the book…This books fills a need at the undergraduate level as I am not aware of any good book existing."
    - Eric Suess, Cal State California State University, East Bay