Statistics Author of the Month - January: Jim Albert
CRC Press is pleased to share with you our author Q&A session with Jim Albert!
Analyzing Baseball Data with R, Second Edition, 2nd Edition
Author(s): Max Marchi, Jim Albert, Max Marchi, Jim Albert, Benjamin S. Baumer
Cat. #: K346109
Publication Date: December 04, 2018
Q&A with Jim Albert
The main point of the Visualizing Baseball was to communicate interesting patterns in baseball through the use of graphs. It seems that tables rather than graphs are used to communicate statistics about players and teams, and I think graphs can be used to describe interesting discoveries that are less obvious with tables of numbers.
What inspired you to write Visualizing Baseball?
Nowadays we have the opportunity to work with large datasets and in the future it will be easier to gain access to massive datasets through the open data initiative. For example, large crime datasets are currently available for many of the large cities in the U.S. These massive datasets present many challenges from a statistical perspective. One needs to download, clean, and manipulate this data (so-called data wrangling) to get into a format that is easier to analyse. In addition, we have to modify our statistical methods to accommodate these large datasets.
As in many other disciplines, we are collecting sports data in more detailed ways and this new data leads to opportunities for analysis. In the past, we only collected baseball data at the game or player level. Currently, we collect a variety of variables on each pitch that is thrown in a game. Teams have access to the locations of each player on a baseball field every split second. This new baseball data leads to challenges and opportunities for analysis. Both Visualizing Baseball and the 2nd Edition of Analyzing Baseball with R illustrate showing patterns using these new types of baseball data.
Visualizing Baseball is written to two potential audiences, baseball fans who want to learn about baseball through a statistical perspective. Also the book would be of interest to statisticians or other math-oriented fans who are interested in graphing data using a modern graphics system such as ggplot2. The 2nd Edition of Analyzing Baseball with R is written to the mathematically oriented baseball fan who would like to use the R statistical system to perform his or her own baseball studies. Actually, one can use the books to learn data science methods in an interesting context, that is, baseball.
Both Visualizing Baseball and the 2nd Edition of Analyzing Baseball with R are unique in illustrating the use of statistical thinking and graphical displays to learn more about baseball.
One thing that is fascinating is that with the newer types of baseball data, one can discover some interesting player performances that really were not well known. For example, a baseball catcher has the ability to adjust how an umpire calls a pitch (either ball or strike) by the way he catches the ball. This ``pitch-framing” is now considered to be an important skill of a catcher.
What did you enjoy about writing these books?
The famous statistician John Tukey once said that one of the neat things about being a statistician is that you can play in other people’s backyards. That is, one has the opportunity to work with people from a wide variety of disciplines. I actually can play in my own background – that is, I am able to write and do statistical work in a sport (baseball) that I have loved my whole life.
I was a math major at Bucknell University and received my doctorate in statistics from Purdue University.
I am a Bayesian statistician and I tend to view modelling from a Bayesian perspective. Due to the advances in computing, Bayesian methods are much easier to implement. For many years I have had the desire to introduce Bayesian thinking at the undergraduate level. And of course, my work is innovative since like the opportunity to work on Bayesian modelling for problems in sports.
I was very fortunate to have Jim Berger as my advisor who is one of the leading researchers in Bayesian statistics. My work in sports was preceded by many statisticians such as Fred Mosteller, Brad Efron, and Carl Morris who used sports data to illustrate statistical results.
What do you think are your most significant research accomplishments?
Bayesian methods quickly become more popular due to the new computational algorithms, known as Markov Chain Monte Carlo (MCMC) for simulating from Bayesian posterior distributions. My most cited research paper, co-written with Sid Chib, described how one can use MCMC methods to simulate from binary response regression models and its extensions.
I was fortunate to interact with statisticians at Bucknell University who helped me learn that statistics was an interesting discipline where one can use one’s math knowledge to explore patterns in data.
Tell us an unusual fact about yourself and your teaching or writing style.
I really think that I am more effective not in lecturing but interacting with students on a one-to-one basis. So I am in favor of active-learning strategies in teaching. For example, in my data science class, students work together on instructional modules that illustrate a particular data science concept.
What advice would you give to an aspiring researcher in your field?
I think one learns statistics best not through coursework but through a research study that addresses a real problem. For example, if a student wishes to work as an analyst for a professional sports team, then I would encourage the student to work on her or his own problem and present the findings on a blog site. Sports teams are not only interested in an applicant’s background in statistics or data science, but also in the applicant’s ability to communicate statistical results.
I am currently working on a Bayesian text Probability and Bayesian Modeling with Monika Hu. This would serve as the introduction to probability and Bayesian statistical reasoning for an undergraduate with a calculus background. There are currently few books available at that level that introduce Bayesian thinking.
As a statistician, one typically publishes papers in statistics journals and your community of readers are fellow statisticians. Curve Ball, co-authored by Jay Bennett, was our first attempt to write for a general baseball audience and we were gratified with the positive reception to the book. This book encouraged me to continue working and publishing on statistical thinking in sports.
I plan on retiring as a professor at Bowling Green State University soon. But I anticipate that I will continue working on some of my statistical interests such as statistical thinking and applications to sports. But my wife and I also plan on enjoying retirement and traveling.
My last book I read was Power Ball by Rob Neyer -- it is an interesting look at the modern game of baseball in the context of watching a single nine-inning game.