Regression Models as a Tool in Medical Research

By Werner Vach

© 2013 – Chapman and Hall/CRC

496 pages | 158 B/W Illus.

Purchasing Options:
Hardback: 9781466517486
pub: 2012-11-26
US Dollars$97.95

Comp Exam Copy

About the Book

While regression models have become standard tools in medical research, understanding how to properly apply the models and interpret the results is often challenging for beginners. Regression Models as a Tool in Medical Research presents the fundamental concepts and important aspects of regression models most commonly used in medical research, including the classical regression model for continuous outcomes, the logistic regression model for binary outcomes, and the Cox proportional hazards model for survival data. The text emphasizes adequate use, correct interpretation of results, appropriate presentation of results, and avoidance of potential pitfalls.

After reviewing popular models and basic methods, the book focuses on advanced topics and techniques. It considers the comparison of regression coefficients, the selection of covariates, the modeling of nonlinear and nonadditive effects, and the analysis of clustered and longitudinal data, highlighting the impact of selection mechanisms, measurement error, and incomplete covariate data. The text then covers the use of regression models to construct risk scores and predictors. It also gives an overview of more specific regression models and their applications as well as alternatives to regression modeling. The mathematical details underlying the estimation and inference techniques are provided in the appendices.


"With its focus on conceptual understanding and practical applications, this book is highly recommended to medical and other health science researchers who desire to improve their understanding of regression analysis for a better understanding of medical literature, for the adequate presentation of their own regression outcomes, or for improved interpretation of their results for publications and presentations. … Additionally, this book can serve as supplemental reading for an applied graduate level course on general regression models."

—Journal of Agricultural, Biological, and Environmental Statistics

"The book can be a very helpful contribution especially for researchers in medical sciences when performing their statistical analyses and trying to interpret the results obtained. … This book provides plenty of practical knowledge about these basic models and also some of their extensions that is often not easy to find from statistical textbooks or from software manuals. The basic methods are well explained and illustrated by numerous practical examples, mainly using simulated datasets."

—Tapio Nummi, International Statistical Review

Table of Contents


Why Use Regression Models?

Why using simple regression models?

Why using multiple regression models?

Some basic notation

An Introductory Example

A single line model

Fitting a single line model

Taking uncertainty into account

A two lines model

How to perform these steps with Stata

Exercise 5-HIAA and serotonin

Exercise Haemoglobin

Exercise Scaling of variables

The Classical Multiple Regression Model

Adjusted Effects

Adjusting for confounding

Adjusting for imbalances

Exercise Physical activity in school children

Inference for the Classical Multiple Regression Model

The traditional and the modern way of inference

How to perform the modern way of inference with Stata

How valid and good are least squares estimates?

A note on the use and interpretation of p-values in regression analyses

Logistic Regression

The definition of the logistic regression model

Analyzing a dose response experiment by logistic regression

How to fit a dose response model with Stata

Estimating odds ratios and adjusted odds ratios using logistic regression

How to compute (adjusted) odds ratios using logistic regression in Stata

Exercise Allergy in children

More on logit scale and odds scale

Inference for the Logistic Regression Model

The maximum likelihood principle

Properties of the ML estimates for logistic regression

Inference for a single regression parameter

How to perform Wald tests and likelihood ratio tests in Stata

Categorical Covariates

Incorporating categorical covariates in a regression model

Some technicalities in using categorical covariates

Testing the effect of a categorical covariate

The handling of categorical covariates in Stata

Presenting results of a regression analysis involving categorical covariates in a table

Exercise Physical occupation and back pain

Exercise Odds ratios and categorical covariates

Handling Ordered Categories: A First Lesson in Regression Modeling Strategies

The Cox Proportional Hazard Model

Modeling the risk of dying

Modeling the risk of dying in continuous time

Using the Cox proportional hazards model to quantify the difference in survival between groups

How to fit a Cox proportional hazards model with Stata

Exercise Prognostic factors in breast cancer patients – Part 1

Common Pitfalls in Using Regression Models

Association vs. causation

Difference between subjects vs. difference within subjects

Real world models vs. statistical models

Relevance vs. significance

Exercise Prognostic factors in breast cancer patients – Part 2


Some Useful Technicalities

Illustrating models by using model based predictions

How to work with predictions in Stata

Residuals and the standard deviation of the error term

Working with residuals and the RMSE in Stata

Linear and nonlinear functions of regression parameters

Transformations of regression parameters

Centering of covariate values

Exercise Paternal smoking vs. maternal smoking

Comparing Regression Coefficients

Comparing regression coefficients among continuous covariates

Comparing regression coefficients among binary covariates

Measuring the impact of changing covariate values

Translating regression coefficients

How to compare regression coefficients in Stata

Exercise Health in young people

Power and Sample Size

The power of a regression analysis

Determinants of power in regression models with a single covariate

Determinants of power in regression models with several covariates

Power and sample size calculations when a sample from the covariate distribution is given

Power and sample size calculations given a sample from the covariate distribution with Stata

The choice of the values of the regression parameters in a simulation study

Simulating a covariate distribution

Simulating a covariate distribution with Stata

Choosing the parameters to simulate a covariate distribution

Necessary sample sizes to justify asymptotic methods

Exercise Power considerations for a study on neck pain

Exercise Choosing between two outcomes

The Selection of the Sample

Selection in dependence on the covariates

Selection in dependence on the outcome

Sampling in dependence on covariate values

The Selection of Covariates

Fitting regression models with correlated covariates

The "Adjustment vs. power" dilemma

The "Adjustment makes effects small" dilemma

Adjusting for mediators

Adjusting for confounding - A useful academic game

Adjusting for correlated confounders

Including predictive covariates

Automatic variable selection

How to choose relevant sets of covariates

Preparing the selection of covariates: Analyzing the association among covariates

Preparing the selection of covariates: Univariate analyses?

Exercise Vocabulary size in young children – Part 1

Preprocessing of the covariate space

How to preprocess the covariate space with Stata

Exercise Vocabulary size in young children – Part 2

What is a confounder?

Modeling Nonlinear Effects

Quadratic regression

Polynomial regression


Fractional Polynomials

Gain in power by modeling nonlinear effects?

Demonstrating the effect of a covariate

Demonstrating a nonlinear effect

Describing the shape of a nonlinear effect

Detecting nonlinearity by analysis of residuals

Judging of nonlinearity may require adjustment

How to model nonlinear effects in Stata

The impact of ignoring nonlinearity

Modeling the nonlinear effect of confounders

Nonlinear models

Exercise Serum markers for AMI

Transformation of Covariates

Transformations to obtain a linear relationship

Transformation of skewed covariates

To categorize or not to categorize

Effect Modification and Interactions

Modeling effect modification

Adjusted effect modifications


Modeling effect modifications in several covariates

The effect of a covariate in the presence of interactions

Interactions as deviations from additivity

Scales and interactions

Ceiling effects and interactions

Hunting for interactions

How to analyze effect modification and interactions with Stata

Exercise Treatment interactions in a randomized clinical trial for the treatment of malignant glioma

Applying Regression Models to Clustered Data

Why clustered data can invalidate inference

Robust standard errors

Improving the efficiency

Within and between cluster effects

Some unusual but useful usages of robust standard errors in clustered data

How to take clustering into account in Stata

Applying Regression Models to Longitudinal Data

Analyzing time trends in the outcome

Analyzing time trends in the effect of covariates

Analyzing the effect of covariates

Analyzing individual variation in time trends

Analyzing summary measures

Analyzing the effect of change

How to perform regression modeling of longitudinal data in Stata

Exercise Increase of body fat in adolescents

The Impact of Measurement Error

The impact of systematic and random measurement error

The impact of misclassification

The impact of measurement error in confounders

The impact of differential misclassification and measurement error

Studying the measurement error

Exercise Measurement error and interactions

The Impact of Incomplete Covariate Data

Missing value mechanisms

Properties of a complete case analysis

Bias due to using ad hoc methods

Advanced techniques to handle incomplete covariate data

Handling of partially defined covariates


Risk Scores

What is a risk score?

Judging the usefulness of a risk score

The precision of risk score values

The overall precision of a risk score

Using Stata’s predict command to compute risk scores

Categorization of risk scores

Exercise Computing risk scores for breast cancer patients

Construction of Predictors

From risk scores to predictors

Predictions and prediction intervals for a continuous outcome

Predictions for a binary outcome

Construction of predictions for time to event data

How to construct predictions with Stata

The overall precision of a predictor

Evaluating the Predictive Performance

The predictive performance of an existing predictor

How to assess the predictive performance of an existing predictor in Stata

Estimating the predictive performance of a new predictor

How to assess the predictive performance via cross validation in Stata

Exercise Assessing the predictive performance of a prognostic score in breast cancer patients

Outlook: Construction of Parsimonious Predictors


Alternatives to Regression Modeling


Measures of association: Correlation coefficients

Measures of association: The odds ratio

Propensity scores

Classification and regression trees

Specific Regression Models

Probit regression for binary outcomes

Generalized linear models

Regression models for count data

Regression models for ordinal outcome data

Quantile regression and robust regression

ANOVA and regression

Specific Usages of Regression Models

Logistic regression for the analysis of case control studies

Logistic regression for the analysis of matched case control studies

Adjusting for baseline values in randomized clinical trials

Assessing predictive factors

Incorporating time varying covariates in a Cox model

Time dependent effects in a Cox model

Using the Cox model in the presence of competing risks

Using the Cox model to analyze multi state models

What Is a Good Model?

Does the model fit the data?

How good are predictions?

Explained variation

Goodness of fit

Model stability

The usefulness of a model

Final Remarks on the Role of Prespecified Models and Model Development


Mathematics behind the Classical Linear Regression Model

Computing regression parameters in simple linear regression

Computing regression parameters in the classical multiple regression model

Estimation of the standard error

Construction of confidence intervals and p-values

Mathematics behind the Logistic Regression Model

The least squares principle as a maximum likelihood principle

Maximizing the likelihood of a logistic regression model

Estimating the standard error of the ML estimates

Testing composite hypotheses

The Modern Way of Inference

Robust estimation of standard errors

Robust estimation of standard errors in the presence of clustering

Mathematics for Risk Scores and Predictors

Computing individual survival probabilities after fitting a Cox model

Standard errors for risk scores

The delta rule



About the Author

Werner Vach is a professor of medical informatics and clinical epidemiology at the University of Freiburg. Dr. Vach has co-authored more than 150 publications in medical journals. His research encompasses biostatistics methodology in the areas of incomplete covariate data, prognostic studies, diagnostic studies, and agreement studies.

Subject Categories

BISAC Subject Codes/Headings:
MATHEMATICS / Probability & Statistics / General
MEDICAL / Biostatistics