Understanding Regression Analysis unifies diverse regression applications including the classical model, ANOVA models, generalized models including Poisson, Negative binomial, logistic, and survival, neural networks, and decision trees under a common umbrella -- namely, the conditional distribution model. It explains why the conditional distribution model is the correct model, and it also explains (proves) why the assumptions of the classical regression model are wrong. Unlike other regression books, this one from the outset takes a realistic approach that all models are just approximations. Hence, the emphasis is to model Nature’s processes realistically, rather than to assume (incorrectly) that Nature works in particular, constrained ways.
Key features of the book include:
- Numerous worked examples using the R software
- Key points and self-study questions displayed "just-in-time" within chapters
- Simple mathematical explanations ("baby proofs") of key concepts
- Clear explanations and applications of statistical significance (p-values), incorporating the American Statistical Association guidelines
- Use of "data-generating process" terminology rather than "population"
- Random-X framework is assumed throughout (the fixed-X case is presented as a special case of the random-X case)
- Clear explanations of probabilistic modelling, including likelihood-based methods
- Use of simulations throughout to explain concepts and to perform data analyses
This book has a strong orientation towards science in general, as well as chapter-review and self-study questions, so it can be used as a textbook for research-oriented students in the social, biological and medical, and physical and engineering sciences. As well, its mathematical emphasis makes it ideal for a text in mathematics and statistics courses. With its numerous worked examples, it is also ideally suited to be a reference book for all scientists.
Table of Contents
1. Introduction to Regression Models
2. Estimating Regression Model Parameters
3. The Classical Model and Its Consequences
4. Evaluating Assumptions
6. The Multiple Regression Model
7. Multiple Regression from the Matrix Point of View
8. R-squared, Adjusted R-Squared, the F Test, and Multicollinearity
9. Polynomial Models and Interaction (Moderator) Analysis
10. ANOVA, ANCOVA, and Other Applications of Indicator Variables
11. Variable Selection
12. Heteroscedasticity and Non-independence
13. Models for Binary, Nominal, and Ordinal Response Variables
14. Models for Poisson and Negative Binomial Response
15. Censored Data Models
16. Outliers, Identification, Problems, and Remedies (Good and Bad)
17. Neural Network Regression
18. Regression Trees
Peter H. Westfall has a Ph.D. in Statistics from the University of California at Davis, as well as many years of teaching, research, and consulting experience, in a variety of statistics-related disciplines. He has published over 100 papers on statistical theory, methods, and applications; and he has written several books, spanning academic, practitioner, and textbook genres. He is former editor of The American Statistician, and a Fellow of the American Statistical Association.
Andrea L. Arias is a Senior Operations Research Specialist at BNSF Railway. She has a Ph.D. in Industrial Engineering with a minor in Business Statistics from Texas Tech University, and a Doctoral Degree in Industrial Engineering from Pontificia Universidad Católica de Valparaiso, Chile. Her main areas of expertise include Mathematical Programming, Network Optimization, Statistics and Simulation. She is an active member of the Institute for Operations Research and the Management Sciences (INFORMS.)
"...The authors suggest their book is suitable for those who are “research-oriented”, regardless of any prior advanced training in statistics...I particularly like the emphasis on assumptions. Rather than discuss regression in idealized terms, Westfall and Arias are upfront about why assumptions are often wrong in practice, and what an analyst can do about violations. These discussions are woven into many of the chapters, and in some cases, they are featured in stand-alone chapters...I am a fan of learning statistics by doing, so the large amount of R code woven into the book’s chapters and the hands-on exercises at the end of each chapter are valuable and a welcomed feature of the book...To me, this textbook would be most suitable for a one-semester survey course in statistical methods for students outside of biostatistics or statistics. A motivated student could even use this book for self-study...Overall, I believe this is a worthwhile addition to the literature."
- Ryan Andrews, ISCB News, June 2021