An Introduction to Survival Analysis Using Stata, Revised Third Edition: 4th Edition (Paperback) book cover

An Introduction to Survival Analysis Using Stata, Revised Third Edition

4th Edition

By Mario Cleves, William Gould, Yulia Marchenko

Stata Press

428 pages

Purchasing Options:$ = USD
Paperback: 9781597181747
pub: 2016-05-10

FREE Standard Shipping!


An Introduction to Survival Analysis Using Stata, Revised Third Edition is the ideal tutorial for professional data analysts who want to learn survival analysis for the first time or who are well versedin survival analysis but are not as dexterous in using Stata toanalyze survival data. This text also serves as a valuable reference to those readers who already have experience using Stata’s survival analysis routines.

The revised third edition has been updated for Stata 14, and it includes a new section on predictive margins and marginal effects, which demonstrates how to obtain and visualize marginal predictions and marginal effects using the margins and marginsplot commands after survival regression models.

Survival analysis is a field of its own that requires specialized data management and analysis procedures. To meet this requirement, Stata provides the st family of commands for organizing and summarizing survival data.

This book provides statistical theory, step-by-step procedures for analyzing survival data, an in-depth usage guide for Stata's most widely used st commands, and a collection of tips for using Stata to analyze survival data and to present the results. This book develops from first principles the statistical concepts unique to survival data and assumes only a knowledge of basic probability and statistics and a working knowledge of Stata.

The first three chapters of the text cover basic theoretical concepts: hazard functions, cumulative hazard functions, and their interpretations; survivor functions; hazard models; and a comparison of nonparametric, semiparametric, and parametric methodologies. Chapter 4 deals with censoring and truncation. The next three chapters cover the formatting, manipulation, stsetting, and error checking involved in preparing survival data for analysis using Stata's st analysis commands. Chapter 8 covers nonparametric methods, including the Kaplan–Meier and Nelson–Aalen estimators and the various nonparametric tests for the equality of survival experience.

Chapters 9–11 discuss Cox regression and include various examples of fitting a Cox model, obtaining predictions, interpreting results, building models, model diagnostics, and regression with survey data. The next four chapters cover parametric models, which are fit using Stata's streg command. These chapters include detailed derivations of all six parametric models currently supported in Stata and methods for determining which model is appropriate, as well as information on stratification, obtaining predictions, and advanced topics such as frailty models. Chapter 16 is devoted to power and sample-size calculations for survival studies. The final chapter covers survival analysis in the presence of competing risks.


"This is an application-oriented introduction to survival analysis using Stata. The authors have focused on intuitions without getting into technical details. For example … the rather mysterious partial likelihood was elegantly illustrated with a small dataset and simple derivations for conditional probabilities. The book provides an excellent coverage of commonly used nonparametric, semiparametric, and parametric analyses of survival data, with ample application examples. The implementation of each survival approach has been carefully laid out in Stata syntax and real data analyses. Moreover, the material covered in the book is surprisingly comprehensive, including Coxmodels with time-varying covariates, shared frailty models, multiple imputations, and competing risk regression. Those topics are often encountered in practice but usually missing from an introductory book of survival analysis. The revised third edition has been updated to reflect the welcome additions in Stata 14 relative to previous versions. … The revised third edition provides not only an excellent tutorial to anyone who is interested in learning survival models with examples, but also an extremely handy reference to researchers who would like to perform survival analyses in Stata."

—Yu Cheng, University of Pittsburgh, in The American Statistician, April 2018

Table of Contents

The problem of survival analysis

Parametric modeling 

Semiparametric modeling

Nonparametric analysis 

Linking the three approaches

Describing the distribution of failure times

The survivor and hazard functions

The quantile function

Interpreting the cumulative hazard and hazard rate

Means and medians

Hazard models

Parametric models

Semiparametric models

Analysis time (time at risk)

Censoring and truncation



Recording survival data

The desired format 

Other formats

Example: Wide-form snapshot data

Using stset

A short lesson on dates

Purposes of the stset command

Syntax of the stset command

After stset

Look at stset’s output

List some of your data 

Use stdescribe

Use stvary 

Perhaps use stfill 

Example: Hip-fracture data

Nonparametric analysis

Inadequacies of standard univariate methods 

The Kaplan–Meier estimator

The Nelson–Aalen estimator

Estimating the hazard function

Estimating mean and median survival times

Tests of hypothesis

The Cox proportional hazards model

Using stcox

Likelihood calculations

Stratified analysis

Cox models with shared frailty

Cox models with survey data

Cox model with missing data—multiple imputation

Model building using stcox

Indicator variables

Categorical variables

Continuous variables


Time-varying variables

Modeling group effects: fixed-effects, random-effects, stratification, and clustering

The Cox model: Diagnostics

Testing the proportional-hazards assumption

Residuals and diagnostic measures Reye’s syndrome data

Parametric models


Classes of parametric models

A survey of parametric regression models in Stata

The exponential model

Weibull regression

Gompertz regression (PH metric)

Lognormal regression (AFT metric)

Loglogistic regression (AFT metric)

Generalized gamma regression (AFT metric)

Choosing among parametric models

Postestimation commands for parametric models

Use of predict after streg

Using stcurve

Predictive margins and marginal effects

Generalizing the parametric regression model

Frailty models

Power and sample-size determination for survival analysis

Estimating sample size

Accounting for withdrawal and accrual of subjects 

Estimating power and effect size 

Tabulating or graphing results

Competing risks

Cause-specific hazards

Cumulative incidence functions

Nonparametric analysis

Semiparametric analysis

Parametric analysis

About the Authors

Mario Cleves is Professor and the Biostatistics Section Chief in the Department of Pediatrics at the University of Arkansas for Medical Sciences.

William Gould is the president and head of development at StataCorp.

Yulia Marchenko is a senior statistician at StataCorp.

All are authors of Stata statistical software, in particular, Stata’s widely used survival analysis suite.

Subject Categories

BISAC Subject Codes/Headings:
MATHEMATICS / Probability & Statistics / General