1st Edition

# Beyond Multiple Linear Regression Applied Generalized Linear Models And Multilevel Models in R

**Also available as eBook on:**

**Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R **is designed for undergraduate students who have successfully completed a multiple linear regression course, helping them develop an expanded modeling toolkit that includes non-normal responses and correlated structure. Even though there is no mathematical prerequisite, the authors still introduce fairly sophisticated topics such as likelihood theory, zero-inflated Poisson, and parametric bootstrapping in an intuitive and applied manner. The case studies and exercises feature real data and real research questions; thus, most of the data in the textbook comes from collaborative research conducted by the authors and their students, or from student projects. Every chapter features a variety of conceptual exercises, guided exercises, and open-ended exercises using real data. After working through this material, students will develop an expanded toolkit and a greater appreciation for the wider world of data and statistical modeling.

A solutions manual for all exercises is available to qualified instructors at the book’s website at __www.routledge.com__, and data sets and Rmd files for all case studies and exercises are available at the authors’ GitHub repo (https://github.com/proback/BeyondMLR)

- Review of Multiple Linear Regression
- Beyond Least Squares: Using Likelihoods to Fit and Compare Models
- Distribution Theory
- Poisson Regression
- Generalized Linear Models (GLMs): A Unifying Theory
- Logistic Regression
- Correlated Data
- Introduction to Multilevel Models
- Two Level Longitudinal Data
- Multilevel Data With More Than Two Levels
- Multilevel Generalized Linear Models

Learning Objectives

Introduction to Beyond Multiple Linear Regression

Assumptions for Linear Least Squares Regression (LLSR)

Cases that do not violate assumptions for inference in LLSR

Cases where assumptions for inference in LLSR are violated

Review of Multiple Linear Regression

Case Study: Kentucky Derby

Initial Exploratory Analyses

Data Organization

Univariate Summaries

Bivariate Summaries

Multiple linear regression modeling

Simple linear regression with a continuous predictor

Linear regression with a binary predictor

Multiple linear regression with two predictors

Inference in multiple linear regression: normal theory

Inference in multiple linear regression: bootstrapping

Multiple linear regression with an interaction term

Building a multiple linear regression model

Preview of remaining chapters

Soccer

Elephant Mating

Parenting and Gang Activity

Crime

Exercises

Conceptual Exercises

Guided Exercises

Open-ended Exercises

Learning Objectives

Case Study: Does sex run in families?

Research Questions

Model: Sex Unconditional Model (Equal probabilities, Independence)

Model: Sex Unconditional Model (Any Probability, Independence)

What is a likelihood?

Finding MLEs

Summary

Is a likelihood a probability function? (Optional)

Model: Sex Conditional Model (Sex Bias)

Model Specification

Application to Hypothetical Data

Case Study: Analysis of the NLSY data

Model Building Plan

Family Composition of Boys and Girls, NLSY: Exploratory Data Analysis

Likelihood for the Sex Unconditional Model: the NLSY data

Likelihood for the Sex Conditional Model

Comparing the Sex Unconditional to the Sex Conditional Model

Model: Stopping Rule Model (Waiting for a boy)

Non-nested Models

Summary of Model Building

Likelihood-based Methods

Likelihoods and this Course

Exercises

Conceptual Exercises

Guided Exercises

Open-ended Exercise

Learning Objectives

Introduction

Discrete Random Variables

Binary Random Variable

Binomial Random Variable

Geometric Random Variable

Negative Binomial Random Variable

Hypergeometric Random Variable

Poisson Random Variable

Continuous Random Variables

Exponential Random Variable

Gamma Random Variable

Normal (Gaussian) Random Variable

Beta Random Variable

Distributions used in Testing

□□ Distribution

Student’s □□・Distribution

□□ ・Distribution

Additional Resources

Exercises

Conceptual Exercises

Guided Exercises

Learning Objectives

Introduction to Poisson Regression

Poisson Regression Assumptions

A Graphical Look at Poisson Regression

Case Studies Overview

Case Study: Household Size in the Philippines

Data Organization

Exploratory Data Analyses

Estimation and Inference

Using Deviances to Compare Models

Using Likelihoods to fit Poisson Regression Models (Optional)

Second Order Model

Adding a covariate

Residuals for Poisson Models (Optional)

Goodness-of-fit

Linear Least Squares Regression vs Poisson Regression

Case Study: Campus Crime

Data Organization

Exploratory Data Analysis

Accounting for Enrollment

Modeling Assumptions

Initial Models

Tukey’s Honestly Significant Differences

Overdispersion

Dispersion parameter adjustment

No dispersion vs overdispersion

Negative binomial modeling

Case Study: Weekend drinking

Research Question

Data Organization

Exploratory Data Analysis

Modeling

Fitting a ZIP Model

Comparing ZIP to ordinary Poisson with the Vuong Test (Optional)

Residual Plot

Limitations

Exercises

Conceptual Exercises

Guided Exercises

Open-ended Exercises

Learning Objectives

One parameter exponential families

One Parameter Exponential Family: Possion

One parameter exponential family: Normal

Generalized Linear Modeling

Exercises

Learning Objectives

Introduction to Logistic Regression

Logistic Regression Assumptions

A Graphical Look at Logistic Regression

Case Studies Overview

Case Study: Soccer Goalkeepers

Modeling Odds

Logistic Regression Models for Binomial Responses

Theoretical rationale for logistic regression models (Optional)

Case Study: Reconstructing Alabama

Data Organization

Exploratory Analyses

Initial Models

Tests for significance of model coefficients

Confidence intervals for model coefficients

Testing for goodness of fit

Residuals for Binomial Regression

Overdispersion

Summary

Linear Least Squares Regression vs Binomial Logistic Regression

Case Study: Trying to Lose Weight

Data Organization

Exploratory Data Analysis

Initial Models

Drop-in-deviance Tests

Model Discussion and Summary

Exercises

Conceptual Exercises

Guided Exercises

Open-ended Exercises

Learning Objectives

Introduction

Recognizing correlation

Case Study: Dams and pups

Sources of Variability

Scenario: No covariates

Scenario: Dose effect

Case Study: Tree Growth

Format of the data set

Sources of variability

Analysis preview: accounting for correlation within transect

Summary

Exercises

Conceptual Exercises

Guided Exercises

Note on Correlated Binary Outcomes

Learning Objectives

Case Study: Music Performance Anxiety

Initial Exploratory Analyses

Data Organization

Exploratory Analyses: Univariate Summaries

Exploratory Analyses: Bivariate Summaries

Two level modeling: preliminary considerations

Ignoring the two level structure (not recommended)

A two-stage modeling approach (better but imperfect)

Two level modeling: a unified approach

Our framework

Random vs fixed effects

Distribution of errors: the multivariate normal distribution

Technical issues when estimating and testing parameters (Optional)

An initial model with parameter interpretations

Building a multilevel model

Model building strategy

An initial model: unconditional means or random intercepts

Binary covariates at Level One and Level Two

Random slopes and intercepts model

Pseudo □□ values

Adding a covariate at Level Two

Additional covariates: model comparison and interpretability

Interpretation of parameter estimates

Model comparisons

Center covariates

A potential final model for music performance anxiety

Modeling the multilevel structure: is it really necessary?

Notes on Using R (Optional)

Exercises

Conceptual Exercises

Guided Exercise

Open-ended Exercises

Learning objectives

Case study: Charter schools

Initial Exploratory Analyses

Data organization

Missing data

Exploratory analyses for general multilevel models

Exploratory analyses for longitudinal data

Preliminary two-stage modeling

Linear trends within schools

Effects of level two covariates on linear time trends

Error structure within schools

Initial models

Unconditional means model

Unconditional growth model

Modeling other trends over time

Building to a final model

Uncontrolled effects of school type

Add percent free and reduced lunch as a covariate

A potential final model with three Level Two covariates

Parametric bootstrap testing

Covariance structure among observations

Standard covariance structure

Alternative covariance structures

Covariance structure in non-longitudinal multilevel models

Final thoughts regarding covariance structures

Details of covariance structures (Optional)

Notes on Using R (Optional)

Exercises

Conceptual Exercises

Guided Exercise

Open-ended Exercises

Learning Objectives

Case Studies: Seed Germination

Initial Exploratory Analyses

Data Organization

Exploratory Analyses

Initial models: unconditional means and unconditional growth

Encountering boundary constraints

Parametric bootstrap testing

Exploding variance components

Building to a final model

Covariance structure (Optional)

Details of covariance structures

Notes on Using R (Optional)

Exercises

Conceptual Exercises

Guided Exercises

Open-ended Exercises

Learning Objectives

Case Study: College Basketball Referees

Initial Exploratory Analyses

Data organization

Exploratory analyses

Two level Modeling with a Generalized Response

A GLM approach (correlation not accounted for)

A two-stage modeling approach (provides the basic idea for multilevel modeling)

A unified multilevel approach (the framework we’ll use)

Crossed Random Effects

Model Comparisons Using the Parametric Bootstrap

A Potential Final Model for Examining Referee Bias

Estimated Random Effects

Notes on Using R (Optional)

Exercises

Conceptual Exercises

Open-ended Exercises

### Biography

**Authors**

**Paul Roback **is the Kenneth O. Bjork Distinguished Professor of Statistics and Data Science and **Julie Legler **is Professor Emeritus of Statistics at St. Olaf College in Northfield, MN. Both are Fellows of the American Statistical Association and are founders of the Center for Interdisciplinary Research at St. Olaf. Dr. Roback is the past Chair of the ASA Section on Statistics and Data Science Education, conducts applied research using multilevel modeling, text analysis, and Bayesian methods, and has been a statistical consultant in the pharmaceutical, health care, and food processing industries. Dr. Legler is past Chair of the ASA/MAA Joint Committee on Undergraduate Statistics, is a co-author of *Stat2: Modelling with Regression and ANOVA*, and was a biostatistician at the National Cancer Institute.

"Overall, this is an excellent text that is highly appropriate for undergraduate students. I am a really big fan of Chapter 2. The authors introduce the concepts of likelihood and model comparisons via likelihood in a very gentle and intuitive way. It will be very useful for the wide audience anticipated for the course we are designing. In Chapter 4, the authors do an excellent job discussing some of the common ‘extensions’ of Poisson regression that are likely to be observed in practice (overdispersion and ZIP). In particular, they do an excellent job describing situations that might lead to zero-inflate Poissons. The use of case studies across all chapters is a major strength of the textbook."

-Jessica Chapman, St. Lawrence University"This text would be ideal for statistics undergrad majors & minors as a 2nd or 3rd course in statistics…In particular, this book intuitively covers many topics without delving into technical proofs and details which are not needed for successful application of the methods described. It is a strength that it uses the software R. Use of R is a skill welcomed in any industry, and is not a burden for students to obtain. The book emphasizes methods as well as numerical literacy. For example, it guides the student in how to assess the appropriateness of methods (e.g. assumptions of linear model), not just the use and interpretation of the results. There is a strong focus on understanding and checking assumptions, as well as the effect violations of those assumptions will have on the result. I think this may be an effective way to train the reader to think like a statistician, without overwhelming the reader with technical details." ---

Kirsten Eilertson, Colorado State University