1st Edition
Beyond Multiple Linear Regression Applied Generalized Linear Models And Multilevel Models in R
Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R is designed for undergraduate students who have successfully completed a multiple linear regression course, helping them develop an expanded modeling toolkit that includes non-normal responses and correlated structure. Even though there is no mathematical prerequisite, the authors still introduce fairly sophisticated topics such as likelihood theory, zero-inflated Poisson, and parametric bootstrapping in an intuitive and applied manner. The case studies and exercises feature real data and real research questions; thus, most of the data in the textbook comes from collaborative research conducted by the authors and their students, or from student projects. Every chapter features a variety of conceptual exercises, guided exercises, and open-ended exercises using real data. After working through this material, students will develop an expanded toolkit and a greater appreciation for the wider world of data and statistical modeling.
A solutions manual for all exercises is available to qualified instructors at the book’s website at www.routledge.com, and data sets and Rmd files for all case studies and exercises are available at the authors’ GitHub repo (https://github.com/proback/BeyondMLR)
- Review of Multiple Linear Regression
- Beyond Least Squares: Using Likelihoods to Fit and Compare Models
- Distribution Theory
- Poisson Regression
- Generalized Linear Models (GLMs): A Unifying Theory
- Logistic Regression
- Correlated Data
- Introduction to Multilevel Models
- Two Level Longitudinal Data
- Multilevel Data With More Than Two Levels
- Multilevel Generalized Linear Models
Learning Objectives
Introduction to Beyond Multiple Linear Regression
Assumptions for Linear Least Squares Regression (LLSR)
Cases that do not violate assumptions for inference in LLSR
Cases where assumptions for inference in LLSR are violated
Review of Multiple Linear Regression
Case Study: Kentucky Derby
Initial Exploratory Analyses
Data Organization
Univariate Summaries
Bivariate Summaries
Multiple linear regression modeling
Simple linear regression with a continuous predictor
Linear regression with a binary predictor
Multiple linear regression with two predictors
Inference in multiple linear regression: normal theory
Inference in multiple linear regression: bootstrapping
Multiple linear regression with an interaction term
Building a multiple linear regression model
Preview of remaining chapters
Soccer
Elephant Mating
Parenting and Gang Activity
Crime
Exercises
Conceptual Exercises
Guided Exercises
Open-ended Exercises
Learning Objectives
Case Study: Does sex run in families?
Research Questions
Model: Sex Unconditional Model (Equal probabilities, Independence)
Model: Sex Unconditional Model (Any Probability, Independence)
What is a likelihood?
Finding MLEs
Summary
Is a likelihood a probability function? (Optional)
Model: Sex Conditional Model (Sex Bias)
Model Specification
Application to Hypothetical Data
Case Study: Analysis of the NLSY data
Model Building Plan
Family Composition of Boys and Girls, NLSY: Exploratory Data Analysis
Likelihood for the Sex Unconditional Model: the NLSY data
Likelihood for the Sex Conditional Model
Comparing the Sex Unconditional to the Sex Conditional Model
Model: Stopping Rule Model (Waiting for a boy)
Non-nested Models
Summary of Model Building
Likelihood-based Methods
Likelihoods and this Course
Exercises
Conceptual Exercises
Guided Exercises
Open-ended Exercise
Learning Objectives
Introduction
Discrete Random Variables
Binary Random Variable
Binomial Random Variable
Geometric Random Variable
Negative Binomial Random Variable
Hypergeometric Random Variable
Poisson Random Variable
Continuous Random Variables
Exponential Random Variable
Gamma Random Variable
Normal (Gaussian) Random Variable
Beta Random Variable
Distributions used in Testing
□□ Distribution
Student’s □□・Distribution
□□ ・Distribution
Additional Resources
Exercises
Conceptual Exercises
Guided Exercises
Learning Objectives
Introduction to Poisson Regression
Poisson Regression Assumptions
A Graphical Look at Poisson Regression
Case Studies Overview
Case Study: Household Size in the Philippines
Data Organization
Exploratory Data Analyses
Estimation and Inference
Using Deviances to Compare Models
Using Likelihoods to fit Poisson Regression Models (Optional)
Second Order Model
Adding a covariate
Residuals for Poisson Models (Optional)
Goodness-of-fit
Linear Least Squares Regression vs Poisson Regression
Case Study: Campus Crime
Data Organization
Exploratory Data Analysis
Accounting for Enrollment
Modeling Assumptions
Initial Models
Tukey’s Honestly Significant Differences
Overdispersion
Dispersion parameter adjustment
No dispersion vs overdispersion
Negative binomial modeling
Case Study: Weekend drinking
Research Question
Data Organization
Exploratory Data Analysis
Modeling
Fitting a ZIP Model
Comparing ZIP to ordinary Poisson with the Vuong Test (Optional)
Residual Plot
Limitations
Exercises
Conceptual Exercises
Guided Exercises
Open-ended Exercises
Learning Objectives
One parameter exponential families
One Parameter Exponential Family: Possion
One parameter exponential family: Normal
Generalized Linear Modeling
Exercises
Learning Objectives
Introduction to Logistic Regression
Logistic Regression Assumptions
A Graphical Look at Logistic Regression
Case Studies Overview
Case Study: Soccer Goalkeepers
Modeling Odds
Logistic Regression Models for Binomial Responses
Theoretical rationale for logistic regression models (Optional)
Case Study: Reconstructing Alabama
Data Organization
Exploratory Analyses
Initial Models
Tests for significance of model coefficients
Confidence intervals for model coefficients
Testing for goodness of fit
Residuals for Binomial Regression
Overdispersion
Summary
Linear Least Squares Regression vs Binomial Logistic Regression
Case Study: Trying to Lose Weight
Data Organization
Exploratory Data Analysis
Initial Models
Drop-in-deviance Tests
Model Discussion and Summary
Exercises
Conceptual Exercises
Guided Exercises
Open-ended Exercises
Learning Objectives
Introduction
Recognizing correlation
Case Study: Dams and pups
Sources of Variability
Scenario: No covariates
Scenario: Dose effect
Case Study: Tree Growth
Format of the data set
Sources of variability
Analysis preview: accounting for correlation within transect
Summary
Exercises
Conceptual Exercises
Guided Exercises
Note on Correlated Binary Outcomes
Learning Objectives
Case Study: Music Performance Anxiety
Initial Exploratory Analyses
Data Organization
Exploratory Analyses: Univariate Summaries
Exploratory Analyses: Bivariate Summaries
Two level modeling: preliminary considerations
Ignoring the two level structure (not recommended)
A two-stage modeling approach (better but imperfect)
Two level modeling: a unified approach
Our framework
Random vs fixed effects
Distribution of errors: the multivariate normal distribution
Technical issues when estimating and testing parameters (Optional)
An initial model with parameter interpretations
Building a multilevel model
Model building strategy
An initial model: unconditional means or random intercepts
Binary covariates at Level One and Level Two
Random slopes and intercepts model
Pseudo □□ values
Adding a covariate at Level Two
Additional covariates: model comparison and interpretability
Interpretation of parameter estimates
Model comparisons
Center covariates
A potential final model for music performance anxiety
Modeling the multilevel structure: is it really necessary?
Notes on Using R (Optional)
Exercises
Conceptual Exercises
Guided Exercise
Open-ended Exercises
Learning objectives
Case study: Charter schools
Initial Exploratory Analyses
Data organization
Missing data
Exploratory analyses for general multilevel models
Exploratory analyses for longitudinal data
Preliminary two-stage modeling
Linear trends within schools
Effects of level two covariates on linear time trends
Error structure within schools
Initial models
Unconditional means model
Unconditional growth model
Modeling other trends over time
Building to a final model
Uncontrolled effects of school type
Add percent free and reduced lunch as a covariate
A potential final model with three Level Two covariates
Parametric bootstrap testing
Covariance structure among observations
Standard covariance structure
Alternative covariance structures
Covariance structure in non-longitudinal multilevel models
Final thoughts regarding covariance structures
Details of covariance structures (Optional)
Notes on Using R (Optional)
Exercises
Conceptual Exercises
Guided Exercise
Open-ended Exercises
Learning Objectives
Case Studies: Seed Germination
Initial Exploratory Analyses
Data Organization
Exploratory Analyses
Initial models: unconditional means and unconditional growth
Encountering boundary constraints
Parametric bootstrap testing
Exploding variance components
Building to a final model
Covariance structure (Optional)
Details of covariance structures
Notes on Using R (Optional)
Exercises
Conceptual Exercises
Guided Exercises
Open-ended Exercises
Learning Objectives
Case Study: College Basketball Referees
Initial Exploratory Analyses
Data organization
Exploratory analyses
Two level Modeling with a Generalized Response
A GLM approach (correlation not accounted for)
A two-stage modeling approach (provides the basic idea for multilevel modeling)
A unified multilevel approach (the framework we’ll use)
Crossed Random Effects
Model Comparisons Using the Parametric Bootstrap
A Potential Final Model for Examining Referee Bias
Estimated Random Effects
Notes on Using R (Optional)
Exercises
Conceptual Exercises
Open-ended Exercises
Biography
Authors
Paul Roback is the Kenneth O. Bjork Distinguished Professor of Statistics and Data Science and Julie Legler is Professor Emeritus of Statistics at St. Olaf College in Northfield, MN. Both are Fellows of the American Statistical Association and are founders of the Center for Interdisciplinary Research at St. Olaf. Dr. Roback is the past Chair of the ASA Section on Statistics and Data Science Education, conducts applied research using multilevel modeling, text analysis, and Bayesian methods, and has been a statistical consultant in the pharmaceutical, health care, and food processing industries. Dr. Legler is past Chair of the ASA/MAA Joint Committee on Undergraduate Statistics, is a co-author of Stat2: Modelling with Regression and ANOVA, and was a biostatistician at the National Cancer Institute.
"Overall, this is an excellent text that is highly appropriate for undergraduate students. I am a really big fan of Chapter 2. The authors introduce the concepts of likelihood and model comparisons via likelihood in a very gentle and intuitive way. It will be very useful for the wide audience anticipated for the course we are designing. In Chapter 4, the authors do an excellent job discussing some of the common ‘extensions’ of Poisson regression that are likely to be observed in practice (overdispersion and ZIP). In particular, they do an excellent job describing situations that might lead to zero-inflate Poissons. The use of case studies across all chapters is a major strength of the textbook."
-Jessica Chapman, St. Lawrence University"This text would be ideal for statistics undergrad majors & minors as a 2nd or 3rd course in statistics…In particular, this book intuitively covers many topics without delving into technical proofs and details which are not needed for successful application of the methods described. It is a strength that it uses the software R. Use of R is a skill welcomed in any industry, and is not a burden for students to obtain. The book emphasizes methods as well as numerical literacy. For example, it guides the student in how to assess the appropriateness of methods (e.g. assumptions of linear model), not just the use and interpretation of the results. There is a strong focus on understanding and checking assumptions, as well as the effect violations of those assumptions will have on the result. I think this may be an effective way to train the reader to think like a statistician, without overwhelming the reader with technical details." ---Kirsten Eilertson, Colorado State University