1st Edition

Beyond Multiple Linear Regression Applied Generalized Linear Models And Multilevel Models in R

By Paul Roback, Julie Legler Copyright 2021
    436 Pages
    by Chapman & Hall

    436 Pages
    by Chapman & Hall

    Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R is designed for undergraduate students who have successfully completed a multiple linear regression course, helping them develop an expanded modeling toolkit that includes non-normal responses and correlated structure. Even though there is no mathematical prerequisite, the authors still introduce fairly sophisticated topics such as likelihood theory, zero-inflated Poisson, and parametric bootstrapping in an intuitive and applied manner. The case studies and exercises feature real data and real research questions; thus, most of the data in the textbook comes from collaborative research conducted by the authors and their students, or from student projects. Every chapter features a variety of conceptual exercises, guided exercises, and open-ended exercises using real data. After working through this material, students will develop an expanded toolkit and a greater appreciation for the wider world of data and statistical modeling.

    A solutions manual for all exercises is available to qualified instructors at the book’s website at www.routledge.com, and data sets and Rmd files for all case studies and exercises are available at the authors’ GitHub repo (https://github.com/proback/BeyondMLR)

    1. Review of Multiple Linear Regression
    2. Learning Objectives

      Introduction to Beyond Multiple Linear Regression

      Assumptions for Linear Least Squares Regression (LLSR)

      Cases that do not violate assumptions for inference in LLSR

      Cases where assumptions for inference in LLSR are violated

      Review of Multiple Linear Regression

      Case Study: Kentucky Derby

      Initial Exploratory Analyses

      Data Organization

      Univariate Summaries

      Bivariate Summaries

      Multiple linear regression modeling

      Simple linear regression with a continuous predictor

      Linear regression with a binary predictor

      Multiple linear regression with two predictors

      Inference in multiple linear regression: normal theory

      Inference in multiple linear regression: bootstrapping

      Multiple linear regression with an interaction term

      Building a multiple linear regression model

      Preview of remaining chapters

      Soccer

      Elephant Mating

      Parenting and Gang Activity

      Crime

      Exercises

      Conceptual Exercises

      Guided Exercises

      Open-ended Exercises

    3. Beyond Least Squares: Using Likelihoods to Fit and Compare Models
    4. Learning Objectives

      Case Study: Does sex run in families?

      Research Questions

      Model: Sex Unconditional Model (Equal probabilities, Independence)

      Model: Sex Unconditional Model (Any Probability, Independence)

      What is a likelihood?

      Finding MLEs

      Summary

      Is a likelihood a probability function? (Optional)

      Model: Sex Conditional Model (Sex Bias)

      Model Specification

      Application to Hypothetical Data

      Case Study: Analysis of the NLSY data

      Model Building Plan

      Family Composition of Boys and Girls, NLSY: Exploratory Data Analysis

      Likelihood for the Sex Unconditional Model: the NLSY data

      Likelihood for the Sex Conditional Model

      Comparing the Sex Unconditional to the Sex Conditional Model

      Model: Stopping Rule Model (Waiting for a boy)

      Non-nested Models

      Summary of Model Building

      Likelihood-based Methods

      Likelihoods and this Course

      Exercises

      Conceptual Exercises

      Guided Exercises

      Open-ended Exercise

    5. Distribution Theory
    6. Learning Objectives

      Introduction

      Discrete Random Variables

      Binary Random Variable

      Binomial Random Variable

      Geometric Random Variable

      Negative Binomial Random Variable

      Hypergeometric Random Variable

      Poisson Random Variable

      Continuous Random Variables

      Exponential Random Variable

      Gamma Random Variable

      Normal (Gaussian) Random Variable

      Beta Random Variable

      Distributions used in Testing

      □□ Distribution

      Student’s □□・Distribution

      □□ ・Distribution

      Additional Resources

      Exercises

      Conceptual Exercises

      Guided Exercises

    7. Poisson Regression
    8. Learning Objectives

      Introduction to Poisson Regression

      Poisson Regression Assumptions

      A Graphical Look at Poisson Regression

      Case Studies Overview

      Case Study: Household Size in the Philippines

      Data Organization

      Exploratory Data Analyses

      Estimation and Inference

      Using Deviances to Compare Models

      Using Likelihoods to fit Poisson Regression Models (Optional)

      Second Order Model

      Adding a covariate

      Residuals for Poisson Models (Optional)

      Goodness-of-fit

      Linear Least Squares Regression vs Poisson Regression

      Case Study: Campus Crime

      Data Organization

      Exploratory Data Analysis

      Accounting for Enrollment

      Modeling Assumptions

      Initial Models

      Tukey’s Honestly Significant Differences

      Overdispersion

      Dispersion parameter adjustment

      No dispersion vs overdispersion

      Negative binomial modeling

      Case Study: Weekend drinking

      Research Question

      Data Organization

      Exploratory Data Analysis

      Modeling

      Fitting a ZIP Model

      Comparing ZIP to ordinary Poisson with the Vuong Test (Optional)

      Residual Plot

      Limitations

      Exercises

      Conceptual Exercises

      Guided Exercises

      Open-ended Exercises

    9. Generalized Linear Models (GLMs): A Unifying Theory
    10. Learning Objectives

      One parameter exponential families

      One Parameter Exponential Family: Possion

      One parameter exponential family: Normal

      Generalized Linear Modeling

      Exercises

    11. Logistic Regression
    12. Learning Objectives

      Introduction to Logistic Regression

      Logistic Regression Assumptions

      A Graphical Look at Logistic Regression

      Case Studies Overview

      Case Study: Soccer Goalkeepers

      Modeling Odds

      Logistic Regression Models for Binomial Responses

      Theoretical rationale for logistic regression models (Optional)

      Case Study: Reconstructing Alabama

      Data Organization

      Exploratory Analyses

      Initial Models

      Tests for significance of model coefficients

      Confidence intervals for model coefficients

      Testing for goodness of fit

      Residuals for Binomial Regression

      Overdispersion

      Summary

      Linear Least Squares Regression vs Binomial Logistic Regression

      Case Study: Trying to Lose Weight

      Data Organization

      Exploratory Data Analysis

      Initial Models

      Drop-in-deviance Tests

      Model Discussion and Summary

      Exercises

      Conceptual Exercises

      Guided Exercises

      Open-ended Exercises

    13. Correlated Data
    14. Learning Objectives

      Introduction

      Recognizing correlation

      Case Study: Dams and pups

      Sources of Variability

      Scenario: No covariates

      Scenario: Dose effect

      Case Study: Tree Growth

      Format of the data set

      Sources of variability

      Analysis preview: accounting for correlation within transect

      Summary

      Exercises

      Conceptual Exercises

      Guided Exercises

      Note on Correlated Binary Outcomes

    15. Introduction to Multilevel Models
    16. Learning Objectives

      Case Study: Music Performance Anxiety

      Initial Exploratory Analyses

      Data Organization

      Exploratory Analyses: Univariate Summaries

      Exploratory Analyses: Bivariate Summaries

      Two level modeling: preliminary considerations

      Ignoring the two level structure (not recommended)

      A two-stage modeling approach (better but imperfect)

      Two level modeling: a unified approach

      Our framework

      Random vs fixed effects

      Distribution of errors: the multivariate normal distribution

      Technical issues when estimating and testing parameters (Optional)

      An initial model with parameter interpretations

      Building a multilevel model

      Model building strategy

      An initial model: unconditional means or random intercepts

      Binary covariates at Level One and Level Two

      Random slopes and intercepts model

      Pseudo □□ values

      Adding a covariate at Level Two

      Additional covariates: model comparison and interpretability

      Interpretation of parameter estimates

      Model comparisons

      Center covariates

      A potential final model for music performance anxiety

      Modeling the multilevel structure: is it really necessary?

      Notes on Using R (Optional)

      Exercises

      Conceptual Exercises

      Guided Exercise

      Open-ended Exercises

    17. Two Level Longitudinal Data
    18. Learning objectives

      Case study: Charter schools

      Initial Exploratory Analyses

      Data organization

      Missing data

      Exploratory analyses for general multilevel models

      Exploratory analyses for longitudinal data

      Preliminary two-stage modeling

      Linear trends within schools

      Effects of level two covariates on linear time trends

      Error structure within schools

      Initial models

      Unconditional means model

      Unconditional growth model

      Modeling other trends over time

      Building to a final model

      Uncontrolled effects of school type

      Add percent free and reduced lunch as a covariate

      A potential final model with three Level Two covariates

      Parametric bootstrap testing

      Covariance structure among observations

      Standard covariance structure

      Alternative covariance structures

      Covariance structure in non-longitudinal multilevel models

      Final thoughts regarding covariance structures

      Details of covariance structures (Optional)

      Notes on Using R (Optional)

      Exercises

      Conceptual Exercises

      Guided Exercise

      Open-ended Exercises

    19. Multilevel Data With More Than Two Levels
    20. Learning Objectives

      Case Studies: Seed Germination

      Initial Exploratory Analyses

      Data Organization

      Exploratory Analyses

      Initial models: unconditional means and unconditional growth

      Encountering boundary constraints

      Parametric bootstrap testing

      Exploding variance components

      Building to a final model

      Covariance structure (Optional)

      Details of covariance structures

      Notes on Using R (Optional)

      Exercises

      Conceptual Exercises

      Guided Exercises

      Open-ended Exercises

    21. Multilevel Generalized Linear Models

    Learning Objectives

    Case Study: College Basketball Referees

    Initial Exploratory Analyses

    Data organization

    Exploratory analyses

    Two level Modeling with a Generalized Response

    A GLM approach (correlation not accounted for)

    A two-stage modeling approach (provides the basic idea for multilevel modeling)

    A unified multilevel approach (the framework we’ll use)

    Crossed Random Effects

    Model Comparisons Using the Parametric Bootstrap

    A Potential Final Model for Examining Referee Bias

    Estimated Random Effects

    Notes on Using R (Optional)

    Exercises

    Conceptual Exercises

    Open-ended Exercises

    Biography

    Authors

    Paul Roback is the Kenneth O. Bjork Distinguished Professor of Statistics and Data Science and Julie Legler is Professor Emeritus of Statistics at St. Olaf College in Northfield, MN. Both are Fellows of the American Statistical Association and are founders of the Center for Interdisciplinary Research at St. Olaf. Dr. Roback is the past Chair of the ASA Section on Statistics and Data Science Education, conducts applied research using multilevel modeling, text analysis, and Bayesian methods, and has been a statistical consultant in the pharmaceutical, health care, and food processing industries. Dr. Legler is past Chair of the ASA/MAA Joint Committee on Undergraduate Statistics, is a co-author of Stat2: Modelling with Regression and ANOVA, and was a biostatistician at the National Cancer Institute.

    "Overall, this is an excellent text that is highly appropriate for undergraduate students. I am a really big fan of Chapter 2. The authors introduce the concepts of likelihood and model comparisons via likelihood in a very gentle and intuitive way. It will be very useful for the wide audience anticipated for the course we are designing. In Chapter 4, the authors do an excellent job discussing some of the common ‘extensions’ of Poisson regression that are likely to be observed in practice (overdispersion and ZIP). In particular, they do an excellent job describing situations that might lead to zero-inflate Poissons. The use of case studies across all chapters is a major strength of the textbook."
    -Jessica Chapman, St. Lawrence University

    "This text would be ideal for statistics undergrad majors & minors as a 2nd or 3rd course in statistics…In particular, this book intuitively covers many topics without delving into technical proofs and details which are not needed for successful application of the methods described. It is a strength that it uses the software R. Use of R is a skill welcomed in any industry, and is not a burden for students to obtain. The book emphasizes methods as well as numerical literacy. For example, it guides the student in how to assess the appropriateness of methods (e.g. assumptions of linear model), not just the use and interpretation of the results. There is a strong focus on understanding and checking assumptions, as well as the effect violations of those assumptions will have on the result. I think this may be an effective way to train the reader to think like a statistician, without overwhelming the reader with technical details." ---Kirsten Eilertson, Colorado State University