1st Edition

Statistical Methods in Health Disparity Research

By J. Sunil Rao Copyright 2023
    298 Pages 130 Color Illustrations
    by Chapman & Hall

    298 Pages 130 Color Illustrations
    by Chapman & Hall

    A health disparity refers to a higher burden of illness, injury, disability, or mortality experienced by one group relative to others attributable to multiple factors including socioeconomic status, environmental factors, insufficient access to health care, individual risk factors, and behaviors and inequalities in education. These disparities may be due to many factors including age, income, and race. Statistical Methods in Health Disparity Research will focus on their estimation, ranging from classical approaches including the quantification of a disparity, to more formal modeling, to modern approaches involving more flexible computational approaches.


    • Presents an overview of methods and applications of health disparity estimation
    • First book to synthesize research in this field in a unified statistical framework
    • Covers classical approaches, and builds to more modern computational techniques
    • Includes many worked examples and case studies using real data
    • Discusses available software for estimation

    The book is designed primarily for researchers and graduate students in biostatistics, data science, and computer science. It will also be useful to many quantitative modelers in genetics, biology, sociology, and epidemiology.




    1 Basic Concepts

    1.1 What is a health disparity

    1.2 A brief historical perspective

    1.3 Some examples

    1.4 Determinants of Health

    1.4.1 Biology and genetics

    1.4.2 Individual behavior

    1.4.3 Health services

    1.4.4 Social determinants of health

    1.4.5 (Health) policies


      1. The challenging issue of race
        1. Racial segregation as a social determinant of health
        2. Racism, segregation, and inequality

      2. Role of data visualization in health disparities research
      3. A note on notation adopted in this book

      1. Overall Estimation of Health Disparities
        1. Data and Measurement
        2. Disparity indices
          1. Total disparity indices
          2. Disparity indices measuring differences between groups
          3. Disparity indices from complex surveys

        3. Randomized experiments: an idealized estimate of disparity
        4. Model-based estimation: adjusting for confounders
          1. Regression approach
            1. Model-assisted survey regression

          2. Peters-Belson approach
            1. Peters-Belson approach for complex survey

 Peters- Belson approach for clustered data

          2.4.3 Disparity drivers

 Disparity drivers for complex survey data

        5. Matching and propensity scoring
        6. Discrete outcomes
          1. Binary outcomes
          2. Nominal and ordinal outcomes
          3. Poisson regression and log-linear models

        7. Survival analysis
          1. Survivor and hazard functions
          2. Common parametric models
          3. Estimation
          4. Inference
          5. Non-parametric estimation of S(y)
          6. Cox proportional hazards model

        8. Multi-level modeling
          1. Estimation and inference

        9. Generalized estimating equations
          1. Pseudo GEE for complex survey data

        10. Bayesian methods
          1. Intuitive motivation and practical advantages to Bayesian analyses
          2. An overview of Bayesian inference
          3. Markov Chain Monte Carlo (MCMC)

      2. Domain-specific Estimates
        1. What is a domain?
        2. Direct estimates
        3. Indirect estimates
        4. Small area model-based estimates
          1. Small area estimation models
          2. Estimation
          3. Inference

        5. Bayesian approaches
        6. Observed best prediction (OBP)
        7. OBP versus the BLUP

          1. Nonparametric/semi-parametric small area estimation
          2. Model selection and diagnostics
            1. A simplified adaptive fence procedure

      3. Causality, Moderation and Mediation
        1. Socioecological framework for health disparities
        2. Causal inference in health disparities
          1. Experimental versus observational studies
          2. Challenges with certain variables to be treated causal factors

        3. Average treatment effects
            1. What are we trying to estimate and are these identifi-
            2. able ?

            3. Estimation of ATE and related quantities
              1. Regression estimators
              2. Matching estimators
              3. Propensity score methods
              4. Combination methods
              5. Bayesian methods
              6. Uncertainty estimation for ATE estimators

            4. Assessing the assumptions

        4. Use of instrumental variables
            1. Verifying the assumptions
            2. Estimation

        5. Traditional mediation versus causal mediation
            1. Effect identification

        6. Mediation analysis for health disparities
        7. Traditional moderation versus causal moderation
            1. Parallel estimation framework
            2. Causal moderation without randomized treatments

      4. Machine Learning Based Approaches to Disparity Estima- tion
        1. What is machine learning (ML)?
          1. Supervised versus unsupervised machine learning
          2. Why is ML relevant for health disparity research?

        2. Tree-based models

          1. Understanding the decision boundary
          2. Bagging trees

        1. Tree-based models for health disparity research

      1. Tree-based models for complex survey data
      2. Random forests
        1. Hypothesis testing for feature significance

      3. Shrinkage estimation
        1. Generalized Ridge Regression (GRR)
          1. Geometrical and theoretical properties of the

          GRR estimator in high dimensions

        2. Ideal variable selection for GRR
        3. Spike and slab regression
          1. Selective shrinkage and the oracle property

        4. The elastic net (enet) and lasso
        5. Model assisted lasso for complex survey data

      4. Deep Learning
        1. Deep architectures
        2. Forward and backpropagation to train the ANN

      5. Proofs

      1. Health Disparity Estimation Under a Precision Medicine Paradigm
        1. What is precision medicine?
          1. The role of genomic data

        2. Disparity subtype identification using tree-based methods
          1. PRISM approximation
          2. Level set estimation (LSE) for disparity subtypes

        3. Classified mixed model prediction
          1. Prediction of mixed effects associated with new obser- vations
          2. CMMP without matching assumption
          3. Prediction of responses of future observations
          4. Some simulations

        4. Assessing the uncertainty in classified mixed model predictions
          1. Overview of Sumca
          2. Implementation of Sumca to CMMP
          3. A simulation study of Sumca

        5. Proofs

    7 Extended Topics

    7.1 Correcting for sampling bias in disease surveillance studies

    7.1.1 The model

    7.1.2 Bias correction Large values of s Small values of s Prevalence Middle values of s M= 2

    7.1.3 Estimated variance

    7.2 Geocoding

    7.2.1 The Public Health Geocoding Project

    7.2.2 Geocoding considerations

    7.2.3 Pseudo-Bayesian classified mixed model prediction for imputing area-level covariates Consistency and asymptotic optimality of MPPM search

    7.3 Differential privacy and the impact on health disparities

    7.4. R software for health disparity research

    7.4.1 Typical R software development and testing/debugging workflow

    7.4.2 The need for a health disparity R research repository

    7.4.3 R packages relevant to Chapter 3

        1. R packages relevant to Chapter 4
        2. R packages relevant to Chapter 5
        3. R packages relevant to Chapter 6
        4. R packages relevant to Chapter 7




    J. Sunil Rao, Ph.D. is Professor of Biostatistics in the School of Public Health at the University of Minnesota, Twin Cities and Founding Director Emeritus in the Division of Biostatistics at the Miller School of Medicine, University of Miami.

    He has published widely about methods for complex data modeling including high dimensional model selection, mixed model prediction, small area estimation, and bump hunting machine learning, as well as statistical methods for applied cancer biostatistics.

    He is a Fellow of the American Statistical Association and an elected member of the International Statistical Institute.