3rd Edition

Analyzing Baseball Data with R

    418 Pages 56 Color & 32 B/W Illustrations
    by Chapman & Hall

    418 Pages 56 Color & 32 B/W Illustrations
    by Chapman & Hall

    “Our community has continued to grow exponentially, thanks to those who inspire the next generation. And inspiring the next generation is what the authors of Analyzing Baseball Data with R are doing. They are setting the career path for still thousands more. We all need some sort of kickstart to take that first or second step. You may be a beginner R coder, but you need access to baseball data. How do you access this data, how do you manipulate it, how do you analyze it? This is what this book does for you. But it does more, by doing what sabermetrics does best: it asks baseball questions. Throughout the book, baseball questions are asked, some straightforward, and others more thought-provoking.”

    -           From the Foreword by Tom Tango

     

    Analyzing Baseball Data with R Third Edition introduces R to sabermetricians, baseball enthusiasts, and students interested in exploring the richness of baseball data. It equips you with the necessary skills and software tools to perform all the analysis steps, from importing the data to transforming them into an appropriate format to visualizing the data via graphs to performing a statistical analysis.

    The authors first present an overview of publicly available baseball datasets and a gentle introduction to the type of data structures and exploratory and data management capabilities of R. They also cover the ggplot2 graphics functions and employ a tidyverse-friendly workflow throughout. Much of the book illustrates the use of R through popular sabermetrics topics, including the Pythagorean formula, runs expectancy, catcher framing, career trajectories, simulation of games and seasons, patterns of streaky behavior of players, and launch angles and exit velocities. All the datasets and R code used in the text are available for download online.

    New to the third edition is the revised R code to make use of new functions made available through the tidyverse. The third edition introduces three chapters of new material, focusing on communicating results via presentations using the Quarto publishing system, web applications using the Shiny package, and working with large data files. An online version of this book is hosted at https://beanumber.github.io/abdwr3e/. 

    Foreword

    Preface

    1. The Baseball Datasets

    2. Introduction to R

    3. Graphics

    4. The Relation Between Runs and Wins

    5. Value of Plays Using Run Expectancy

    6. Balls and Strikes Effects

    7. Catcher Framing

    8. Career Trajectories

    9. Simulation

    10. Exploring Streaky Performances

    11. Using a Database to Compute Park Factors

    12. Working with Large Data

    13. Home Run Hitting

    14. Making a Scientific Presentation using Quarto

    15. Using Shiny for Baseball Applications

    Appendices

    A. Retrosheet Files Reference

    B. Historical Notes on PITCHf/x Data

    C. Statcast Data Reference

    References

    Indices

    Subject index

    R index

    Biography

    Jim Albert is a Distinguished University Professor of Statistics at Bowling Green State University. He has authored or co-authored several books including Curve Ball and Visualizing Baseball and was the editor of the Journal of Quantitative Analysis of Sports. He received the Significant Contributor to Statistics in Sports award in 2003 from the Section of Statistics in Sports of the American Statistical Association.

    Ben Baumer is a Professor of Statistical and Data Sciences at Smith College. Previously a statistical analyst for the New York Mets, he is a co-author of The Sabermetric Revolution and Modern Data Science with R. He has received the Waller Education Award from the ASA Section on Statistics and Data Science Education, the Significant Contributor Award from the ASA Section on Statistics in Sports, and the Contemporary Baseball Analysis Award from the Society for American Baseball Research.

    Max Marchi is a Baseball Analytics Analyst for the Cleveland Indians. He was a regular contributor to The Hardball Times and Baseball Prospectus websites and previously consulted for other MLB clubs.