# Analysis of Incidence Rates

## Preview

## Book Description

Incidence rates are counts divided by person-time; mortality rates are a well-known example. *Analysis of Incidence Rates* offers a detailed discussion of the practical aspects of analyzing incidence rates. Important pitfalls and areas of controversy are discussed. The text is aimed at graduate students, researchers, and analysts in the disciplines of epidemiology, biostatistics, social sciences, economics, and psychology.

Features:

- Compares and contrasts incidence rates with risks, odds, and hazards.
- Shows stratified methods, including standardization, inverse-variance weighting, and Mantel-Haenszel methods
- Describes Poisson regression methods for adjusted rate ratios and rate differences.
- Examines linear regression for rate differences with an emphasis on common problems.
- Gives methods for correcting confidence intervals.
- Illustrates problems related to collapsibility.
- Explores extensions of count models for rates, including negative binomial regression, methods for clustered data, and the analysis of longitudinal data. Also, reviews controversies and limitations.
- Presents matched cohort methods in detail.
- Gives marginal methods for converting adjusted rate ratios to rate differences, and vice versa.
- Demonstrates instrumental variable methods.
- Compares Poisson regression with the Cox proportional hazards model. Also, introduces Royston-Parmar models.
- All data and analyses are in online Stata files which readers can download.

**Peter Cummings** is Professor Emeritus, Department of Epidemiology, School of Public Health, University of Washington, Seattle WA. His research was primarily in the field of injuries. He used matched cohort methods to estimate how the use of seat belts and presence of airbags were related to death in a traffic crash. He is author or co-author of over 100 peer-reviewed articles.

## Table of Contents

Analysis of Incidence Rates

Peter Cummings, Emeritus Professor, Department of Epidemiology School of Public Health, University of Washington, Seattle, WA

**Preface**

**1. Do Storks Bring Babies?**

Karl Pearson and spurious correlation

Jerzy Neyman, storks, and babies

Is Poisson regression the solution to the stork problem?

Further reading

**2. Risks and Rates**

What is a rate?

Closed and open populations

Measures of time

Numerators for rates: counts

Numerators that may be mistaken for counts

Prevalence proportions

Denominators for rates: count denominators for incidence proportions (risks)

Denominators for rates: person-time for incidence rates

Rate numerators and denominators for recurrent events

Rate denominators other than person-time

Different incidence rates tell different stories

Potential advantages of incidence rates compared with incidence proportions (risks)

Potential advantages of incidence proportions (risks) compared with incidence rates

Limitations of risks and rates

Radioactive decay: an example of exponential decline

The relevance of exponential decay to human populations

Relationships between rates, risks, and hazards

Further reading

**3. Rate Ratios and Differences**

Estimated associations and causal effects

Sources of bias in estimates of causal effect

Estimation versus prediction

Ratios and differences for risks and rates

Relationships between measures of association in a closed population

The hypothetical TEXCO study

Breaking the rules: Army data for Companies A and B

Relationships between odds ratios, risk ratios, and rate ratios in case-control studies

Symmetry of measures of association

Convergence problems for estimating associations

Some history regarding the choice between ratios and differences

Other influences on the choice between use of ratios or differences

The data may sometimes be used to choose between a ratio or a difference

**4. The Poisson Distribution**

Alpha particle radiation

The Poisson distribution

Prussian soldiers kicked to death by horses

Variances, standard deviations, and standard errors for counts and rates

An example: mortality from Alzheimer’s disease

Large sample P-values for counts, rates, and their differences using the Wald statistic

Comparisons of rates as differences versus ratios

Large sample P-values for counts, rates, and their differences using the score statistic

Large sample confidence intervals for counts, rates, and their differences

Large sample P-values for counts, rates, and their ratios

Large sample confidence intervals for ratios of counts and rates

A constant rate based on more person-time is more precise

Exact methods

What is a Poisson process?

Simulated examples

What if the data are not from a Poisson process? Part , overdispersion

What if the data are not from a Poisson process? Part , underdispersion

Must anything be rare?

Bicyclist deaths in 2010 and 2011

**5. Criticism of Incidence Rates**

Florence Nightingale, William Farr, and hospital mortality rates Debate in 1864

Florence Nightingale, William Farr, and hospital mortality rates Debate in 1996-97

Criticism of rates in the British Medical Journal in 1995

Criticism of incidence rates in 2009

**6. Stratified Analysis: Standardized Rates **Why standardize?

External weights from a standard population: direct standardization

Comparing directly standardized rates

Choice of the standard influences the comparison of standardized rates

Standardized comparisons versus adjusted comparisons from variance-minimizing methods

Stratified analyses

Variations on directly standardized rates

Internal weights from a population: indirect standardization

The standardized mortality ratio (SMR)

Advantages of SMRs compared with SRRs (ratios of directly standardized rates)

Disadvantages of SMRs compared with SRRs (ratios of directly standardized rates)

The terminology of direct and indirect standardization

P-values for directly standardized rates

Confidence intervals for directly standardized rates

P-values and confidence intervals for SRRs (ratios of directly standardized rates)

Large sample P-values and confidence intervals for SMRs

Small sample P-values and confidence intervals for SMRs

Standardized rates should not be used as regression outcomes

Standardization is not always the best choice

**7. Stratified Analysis: Inverse-variance and Mantel-Haenszel Methods**Inverse-variance methods

Inverse-variance analysis of rate ratios

Inverse-variance analysis of rate differences

Choosing between rate ratios and differences

Mantel-Haenszel methods

Mantel-Haenszel analysis of rate ratios

Mantel-Haenszel analysis of rate differences

P-values for stratified rate ratios or differences

Analysis of sparse data

Maximum-likelihood stratified methods

Stratified methods versus regression

**8. Collapsibility and Confounding**

What is collapsibility?

The British X-Trial: introducing variation in risk

Rate ratios and differences are noncollapsible because exposure influences person-time

Which estimate of the rate ratio should we prefer?

Behavior of risk ratios and differences

Hazard ratios and odds ratios

Comparing risks with other outcome measures

The Italian X-Trial: -levels of risk under no exposure

The American X-Cohort study: -levels of risk in a cohort study

The Swedish X-Cohort study: a collapsible risk ratio in confounded data

A summary of findings

A different view of collapsibility

Practical implications: avoid common outcomes

Practical implications: use risks or survival functions

Practical implications: case-control studies

Practical implications: uniform risk

Practical implications: use all events

**9. Poisson Regression for Rate Ratios**

The Poisson regression model for rate ratios

A short comparison with ordinary linear regression

A Poisson model without variables

A Poisson regression model with one explanatory variable

The iteration log

The header information above the table of estimates

Using a generalized linear model to estimate rate ratios

An alternative parameterization for Poisson models: a regression trick

Further comments about person-time

A short summary

**10. Poisson Regression for Rate Differences**A regression model for rate differences

Florida and Alaska cancer mortality: regression models that fail

Florida and Alaska cancer mortality: regression models that succeed

A generalized linear model with a power link

A caution

**11. Linear Regression**Limitations of ordinary least squares linear regression

Florida and Alaska cancer mortality rates

Weighted least squares linear regression

Importance weights for weighted least squares linear regression

Comparison of Poisson, weighted least squares, and ordinary least squares regression

Exposure to a carcinogen: ordinary linear regression ignores the precision of each rate

Differences in homicide rates: simple averages versus population-weighted averages

The place of ordinary least squares linear regression for the analysis of incidence rates

Variance weighted least squares regression

Cautions regarding inverse-variance weights

Why use variance weighted least squares?

A short comparison of weighted Poisson regression, variance weighted least squares, and weighted linear regression

Problems when age-standardized rates are used as outcomes

Ratios and spurious correlation

Linear regression with ln(rate) as the outcome

Predicting negative rates

Summary

**12. Model Fit**

Tabular and graphic displays

Goodness of fit tests: deviance and Pearson statistics

A conditional moment chi-squared test of fit

Limitations of goodness-of-fit statistics

Measures of dispersion

Robust variance estimator as a test of fit

Comparing models using the deviance

Comparing models using Akaike and Bayesian information criterion

Example : using Stata’s generalized linear model command to decide between a rate ratio or a rate difference model for the randomized controlled trial of exercise and falls

Example : a rate ratio or a rate difference model for hypothetical data regarding the association between fall rates and age

A test of the model link

Residuals, influence analysis, and other measures

Adding model terms to improve fit

A caution

**13. Adjusting Standard Errors and Confidence Intervals**

Estimating the variance without regression

Poisson regression

Rescaling the variance using the Pearson dispersion statistic

Robust variance

Generalized Estimating Equations

Using the robust variance to study length of hospital stay

Computer intensive methods

The bootstrap idea

The bootstrap normal method

The bootstrap percentile method

The bootstrap bias-corrected percentile method

The bootstrap bias-corrected and accelerated method

The bootstrap-t method

Which bootstrap CI is best?

Permutation and Randomization

Randomization to nearly equal groups

Better randomization using the randomized block design of the original study

A summary

**14. Storks and Babies, Revisited**Neyman’s approach to his data

Using methods for incidence rates

A model that uses the stork/women ratio

**15. Flexible Treatment of Continuous Variables**The problem

Quadratic splines

Fractional polynomials

Flexible adjustment for time

Which method is best?

**16. Judging Variation in Size of an Association**

An example: shoes and falls

Problem : Using subgroup P-values for interpretation

Problem : Failure to include main effect terms when interaction terms are used

Problem : Incorrectly concluding that there is no variation in association

Problem : Interaction may be present on a ratio scale but not on a difference scale, and vice versa

Problem : Failure to report all subgroup estimates in an evenhanded manner

**17. Negative Binomial Regression**

Negative binomial regression is a random effects or mixed model

An example: accidents among workers in a munitions factory

Introducing equal person-time in the homicide data

Letting person-time vary in the homicide data

Estimating a rate ratio for the homicide data

Another example using hypothetical data for five regions

Unobserved heterogeneity

Observing heterogeneity in the shoe data

Underdispersion

A rate difference negative binomial regression model

Conclusion

**18. Clustered Data**

Data from fictitious nursing homes

Results from , data simulations for the nursing homes

A single random set of data for the nursing homes

Variance adjustment methods

Generalized estimating equations (GEE)

Mixed model methods

What do mixed models estimate?

Mixed model estimates for the nursing home intervention

Simulation results for some mixed models

Mixed models weight observations differently than Poisson regression

Which should we prefer for clustered data, variance-adjusted or mixed models?

Additional model commands for clustered data

Further reading

**19.Longitudinal Data**

Just use rates

Using rates to evaluate governmental policies

Study designs for governmental policies

A fictitious water treatment and US mortality 1999-2013

Poisson regression

Population-averaged estimates (GEE)

Conditional Poisson regression, a fixed-effects approach

Negative binomial regression

Which method is best?

Water treatment in only 10 states

Conditional Poisson regression for the -state water-treatment data

A published study

**20. Matched Data**

Matching in case-control studies

Matching in randomized controlled trials

Matching in cohort studies

Matching to control confounding in some randomized trials and cohort studies

A benefit of matching; only matched sets with at least one outcome are needed

Studies designs that match a person to themselves

A matched analysis can account for matching ratios that are not constant

Choosing between risks and rates for the crash data in Tables 20.1 and 20.2

Stratified methods for estimating risk ratios for matched data

Odds ratios, risk ratios, cell A, and matched data

Regression analysis of matched data for the odds ratio

Regression analysis of matched data for the risk ratio

Matched analysis of rates with one outcome event

Matched analysis of rates for recurrent events

The randomized trial of exercise and falls; some problems revealed

Final words

**21. Marginal Methods**

What are margins?

Converting logistic regression results into risk ratios or risk differences: marginal standardization

Estimating a rate difference from a rate ratio model

Death by age and sex: a short example

Skunk bite data: a long example

Obtaining the rate difference: crude rates

Using the robust variance

Adjusting for age

Full adjustment for age and sex

Marginal commands for interactions

Marginal methods for a continuous variable

Using a rate difference model to estimate a rate ratio: use the ln scale

**22. Bayesian Methods**Cancer mortality rate in Alaska

The rate ratio for falling in a trial of exercise

**23. Exact Poisson Regression**

A simple example

A perfectly predicted outcome

Memory problems

A caveat

**24. Instrumental Variables**The problem: what does a randomized controlled trial estimate?

Analysis by treatment received may yield biased estimates of treatment effect

Using an instrumental variable

Two-stage linear regression for instrumental variables

Generalized method of moments

Generalized method of moments for rates

What does an instrumental variable analysis estimate?

There is no free lunch

Final comments

**25. Hazards**

Data for a hypothetical treatment with exponential survival times

Poisson regression and exponential proportional hazards regression

Poisson and Cox proportional hazards regression

Hypothetical data for a rate that changes over time

A piecewise Poisson model

A more flexible Poisson model: quadratic splines

Another flexible Poisson model: restricted cubic splines

Flexibility with fractional polynomials

When should a Poisson model be used? Randomized trial of a terrible treatment

A real randomized trial, the PLCO screening trial

What if events are common?

Cox model or a flexible parametric model?

Collapsibility and survival functions

Relaxing the assumption of proportional hazards in the Cox model

Relaxing the assumption of proportional hazards for the Poisson model

Relaxing proportional hazards for the Royston-Parmar model

The life expectancy difference or ratio

Recurrent or multiple events

A short summary

## Author(s)

### Biography

**Peter Cummings** is Professor Emeritus, Department of Epidemiology, School of Public Health, University of Washington, Seattle WA. His research was primarily in the field of injuries. He used matched cohort methods to estimate how the use of seat belts and presence of airbags were related to death in a traffic crash. He is author or co-author of over 100 peer-reviewed articles.