1st Edition

Beyond Multiple Linear Regression
Applied Generalized Linear Models And Multilevel Models in R



  • Available for pre-order. Item will ship after December 18, 2020
ISBN 9781439885383
December 18, 2020 Forthcoming by Chapman and Hall/CRC
440 Pages

USD $99.95

Prices & shipping based on shipping country


Preview

Book Description

Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R is designed for undergraduate students who have successfully completed a multiple linear regression course, helping them develop an expanded modeling toolkit that includes non-normal responses and correlated structure. Even though there is no mathematical prerequisite, Beyond Multiple Linear Regression still introduces fairly sophisticated topics such as likelihood theory, zero-inflated Poisson, and parametric bootstrapping in an intuitive and applied manner. Thecase studies and exercises feature real data and real research questions; thus, most of the data in the textbook comes from collaborative research conducted by the authors and their students, or from student projects. Every chapter features a variety of conceptual exercises, guided exercises, and open-ended exercises using real data. After working through this material, students will develop an expanded toolkit and a greater appreciation for the wider world of data and statistical modeling.

Beyond Multiple Linear Regression is organized as follows:

  • Chapter 1 – review of multiple linear regression and introduction to our approach to exploratory data analysis and model building
  • Chapter 2 – build intuition for likelihoods and their usefulness in testing and estimation
  • Chapters 3-6 – Generalized Linear Models, featuring Poisson, binomial, logistic, and negative binomial regression.
  • Chapter 7 – build intuition and vocabulary about correlated data through an extended simulation and a real case study
  • Chapters 8-10 – Multilevel Models, with extensions to longitudinal data and more than two levels.
  • Chapter 11 – Multilevel Generalized Linear Models, where everything is brought together: multilevel data with non-normal responses.
  • Supplemental material – a solutions manual for all exercises, available to qualified instructors at our book’s website (www.routledge.com), and data sets and Rmd files for all case studies and exercises, available at our GitHub repo (https://github.com/proback/BYSH)

Table of Contents

  1. Review of Multiple Linear Regression
  2. Learning Objectives

    Introduction to Beyond Multiple Linear Regression

    Assumptions for Linear Least Squares Regression (LLSR)

    Cases that do not violate assumptions for inference in LLSR

    Cases where assumptions for inference in LLSR are violated

    Review of Multiple Linear Regression

    Case Study: Kentucky Derby

    Initial Exploratory Analyses

    Data Organization

    Univariate Summaries

    Bivariate Summaries

    Multiple linear regression modeling

    Simple linear regression with a continuous predictor

    Linear regression with a binary predictor

    Multiple linear regression with two predictors

    Inference in multiple linear regression: normal theory

    Inference in multiple linear regression: bootstrapping

    Multiple linear regression with an interaction term

    Building a multiple linear regression model

    Preview of remaining chapters

    Soccer

    Elephant Mating

    Parenting and Gang Activity

    Crime

    Exercises

    Conceptual Exercises

    Guided Exercises

    Open-ended Exercises

  3. Beyond Least Squares: Using Likelihoods to Fit and Compare Models
  4. Learning Objectives

    Case Study: Does sex run in families?

    Research Questions

    Model: Sex Unconditional Model (Equal probabilities, Independence)

    Model: Sex Unconditional Model (Any Probability, Independence)

    What is a likelihood?

    Finding MLEs

    Summary

    Is a likelihood a probability function? (Optional)

    Model: Sex Conditional Model (Sex Bias)

    Model Specification

    Application to Hypothetical Data

    Case Study: Analysis of the NLSY data

    Model Building Plan

    Family Composition of Boys and Girls, NLSY: Exploratory Data Analysis

    Likelihood for the Sex Unconditional Model: the NLSY data

    Likelihood for the Sex Conditional Model

    Comparing the Sex Unconditional to the Sex Conditional Model

    Model: Stopping Rule Model (Waiting for a boy)

    Non-nested Models

    Summary of Model Building

    Likelihood-based Methods

    Likelihoods and this Course

    Exercises

    Conceptual Exercises

    Guided Exercises

    Open-ended Exercise

  5. Distribution Theory
  6. Learning Objectives

    Introduction

    Discrete Random Variables

    Binary Random Variable

    Binomial Random Variable

    Geometric Random Variable

    Negative Binomial Random Variable

    Hypergeometric Random Variable

    Poisson Random Variable

    Continuous Random Variables

    Exponential Random Variable

    Gamma Random Variable

    Normal (Gaussian) Random Variable

    Beta Random Variable

    Distributions used in Testing

    □□ Distribution

    Student’s □□・Distribution

    □□ ・Distribution

    Additional Resources

    Exercises

    Conceptual Exercises

    Guided Exercises

  7. Poisson Regression
  8. Learning Objectives

    Introduction to Poisson Regression

    Poisson Regression Assumptions

    A Graphical Look at Poisson Regression

    Case Studies Overview

    Case Study: Household Size in the Philippines

    Data Organization

    Exploratory Data Analyses

    Estimation and Inference

    Using Deviances to Compare Models

    Using Likelihoods to fit Poisson Regression Models (Optional)

    Second Order Model

    Adding a covariate

    Residuals for Poisson Models (Optional)

    Goodness-of-fit

    Linear Least Squares Regression vs Poisson Regression

    Case Study: Campus Crime

    Data Organization

    Exploratory Data Analysis

    Accounting for Enrollment

    Modeling Assumptions

    Initial Models

    Tukey’s Honestly Significant Differences

    Overdispersion

    Dispersion parameter adjustment

    No dispersion vs overdispersion

    Negative binomial modeling

    Case Study: Weekend drinking

    Research Question

    Data Organization

    Exploratory Data Analysis

    Modeling

    Fitting a ZIP Model

    Comparing ZIP to ordinary Poisson with the Vuong Test (Optional)

    Residual Plot

    Limitations

    Exercises

    Conceptual Exercises

    Guided Exercises

    Open-ended Exercises

  9. Generalized Linear Models (GLMs): A Unifying Theory
  10. Learning Objectives

    One parameter exponential families

    One Parameter Exponential Family: Possion

    One parameter exponential family: Normal

    Generalized Linear Modeling

    Exercises

  11. Logistic Regression
  12. Learning Objectives

    Introduction to Logistic Regression

    Logistic Regression Assumptions

    A Graphical Look at Logistic Regression

    Case Studies Overview

    Case Study: Soccer Goalkeepers

    Modeling Odds

    Logistic Regression Models for Binomial Responses

    Theoretical rationale for logistic regression models (Optional)

    Case Study: Reconstructing Alabama

    Data Organization

    Exploratory Analyses

    Initial Models

    Tests for significance of model coefficients

    Confidence intervals for model coefficients

    Testing for goodness of fit

    Residuals for Binomial Regression

    Overdispersion

    Summary

    Linear Least Squares Regression vs Binomial Logistic Regression

    Case Study: Trying to Lose Weight

    Data Organization

    Exploratory Data Analysis

    Initial Models

    Drop-in-deviance Tests

    Model Discussion and Summary

    Exercises

    Conceptual Exercises

    Guided Exercises

    Open-ended Exercises

  13. Correlated Data
  14. Learning Objectives

    Introduction

    Recognizing correlation

    Case Study: Dams and pups

    Sources of Variability

    Scenario: No covariates

    Scenario: Dose effect

    Case Study: Tree Growth

    Format of the data set

    Sources of variability

    Analysis preview: accounting for correlation within transect

    Summary

    Exercises

    Conceptual Exercises

    Guided Exercises

    Note on Correlated Binary Outcomes

  15. Introduction to Multilevel Models
  16. Learning Objectives

    Case Study: Music Performance Anxiety

    Initial Exploratory Analyses

    Data Organization

    Exploratory Analyses: Univariate Summaries

    Exploratory Analyses: Bivariate Summaries

    Two level modeling: preliminary considerations

    Ignoring the two level structure (not recommended)

    A two-stage modeling approach (better but imperfect)

    Two level modeling: a unified approach

    Our framework

    Random vs fixed effects

    Distribution of errors: the multivariate normal distribution

    Technical issues when estimating and testing parameters (Optional)

    An initial model with parameter interpretations

    Building a multilevel model

    Model building strategy

    An initial model: unconditional means or random intercepts

    Binary covariates at Level One and Level Two

    Random slopes and intercepts model

    Pseudo □□ values

    Adding a covariate at Level Two

    Additional covariates: model comparison and interpretability

    Interpretation of parameter estimates

    Model comparisons

    Center covariates

    A potential final model for music performance anxiety

    Modeling the multilevel structure: is it really necessary?

    Notes on Using R (Optional)

    Exercises

    Conceptual Exercises

    Guided Exercise

    Open-ended Exercises

  17. Two Level Longitudinal Data
  18. Learning objectives

    Case study: Charter schools

    Initial Exploratory Analyses

    Data organization

    Missing data

    Exploratory analyses for general multilevel models

    Exploratory analyses for longitudinal data

    Preliminary two-stage modeling

    Linear trends within schools

    Effects of level two covariates on linear time trends

    Error structure within schools

    Initial models

    Unconditional means model

    Unconditional growth model

    Modeling other trends over time

    Building to a final model

    Uncontrolled effects of school type

    Add percent free and reduced lunch as a covariate

    A potential final model with three Level Two covariates

    Parametric bootstrap testing

    Covariance structure among observations

    Standard covariance structure

    Alternative covariance structures

    Covariance structure in non-longitudinal multilevel models

    Final thoughts regarding covariance structures

    Details of covariance structures (Optional)

    Notes on Using R (Optional)

    Exercises

    Conceptual Exercises

    Guided Exercise

    Open-ended Exercises

  19. Multilevel Data With More Than Two Levels
  20. Learning Objectives

    Case Studies: Seed Germination

    Initial Exploratory Analyses

    Data Organization

    Exploratory Analyses

    Initial models: unconditional means and unconditional growth

    Encountering boundary constraints

    Parametric bootstrap testing

    Exploding variance components

    Building to a final model

    Covariance structure (Optional)

    Details of covariance structures

    Notes on Using R (Optional)

    Exercises

    Conceptual Exercises

    Guided Exercises

    Open-ended Exercises

  21. Multilevel Generalized Linear Models

Learning Objectives

Case Study: College Basketball Referees

Initial Exploratory Analyses

Data organization

Exploratory analyses

Two level Modeling with a Generalized Response

A GLM approach (correlation not accounted for)

A two-stage modeling approach (provides the basic idea for multilevel modeling)

A unified multilevel approach (the framework we’ll use)

Crossed Random Effects

Model Comparisons Using the Parametric Bootstrap

A Potential Final Model for Examining Referee Bias

Estimated Random Effects

Notes on Using R (Optional)

Exercises

Conceptual Exercises

Open-ended Exercises

...
View More

Author(s)

Biography

Paul Roback is the Kenneth O. Bjork Distinguished Professor of Statistics and Data Science and Julie Legler is Professor Emeritus of Statistics at St. Olaf College in Northfield, MN. Both are Fellows of the American Statistical Association and were founders of the Center for Interdisciplinary Research at St. Olaf. Dr. Roback is the past Chair of the ASA Section on Statistical and Data Science Education, conducts applied research using multilevel modeling, text analysis, and Bayesian methods, and has been a statistical consultant in the pharmaceutical, health care, and food processing industries. Dr. Legler is past Chair of the ASA/MAA Joint Committee on Undergraduate Statistics, is a co-author of Stat2: Modelling with Regression and ANOVA, and was a biostatistician at the National Cancer Institute.

Reviews

"Overall, this is an excellent text that is highly appropriate for undergraduate students. I am a really big fan of Chapter 2. The authors introduce the concepts of likelihood and model comparisons via likelihood in a very gentle and intuitive way. It will be very useful for the wide audience anticipated for the course we are designing. In Chapter 4, the authors do an excellent job discussing some of the common ‘extensions’ of Poisson regression that are likely to be observed in practice (overdispersion and ZIP). In particular, they do an excellent job describing situations that might lead to zero-inflate Poissons.

The use of case studies across all chapters is a major strength of the textbook." (Jessica Chapman, St. Lawrence University)

"This text would be ideal for statistics undergrad majors & minors as a 2nd or 3rd course in statistics…In particular, this book intuitively covers many topics without delving into technical proofs and details which are not needed for successful application of the methods described. It is a strength that it uses the software R. Use of R is a skill welcomed in any industry, and is not a burden for students to obtain. The book emphasizes methods as well as numerical literacy. For example, it guides the student in how to assess the appropriateness of methods (e.g. assumptions of linear model), not just the use and interpretation of the results. There is a strong focus on understanding and checking assumptions, as well as the effect violations of those assumptions will have on the result. I think this may be an effective way to train the reader to think like a statistician, without overwhelming the reader with technical details." (Kirsten Eilertson, Colorado State University)

[email protected]