Biostatistics in Public Health Using STATA: 1st Edition (Paperback) book cover

Biostatistics in Public Health Using STATA

1st Edition

By Erick L. Suárez, Cynthia M. Pérez, Graciela M. Nogueras, Camille Moreno-Gorrín

CRC Press

190 pages | 48 B/W Illus.

Purchasing Options:$ = USD
Paperback: 9780367341480
pub: 2019-09-08
Available for pre-order
Hardback: 9781498721998
pub: 2016-03-24
eBook (VitalSource) : 9780429257858
pub: 2016-03-24
from $48.98

FREE Standard Shipping!


Striking a balance between theory, application, and programming, Biostatistics in Public Health Using STATA is a user-friendly guide to applied statistical analysis in public health using STATA version 14. The book supplies public health practitioners and students with the opportunity to gain expertise in the application of statistics in epidemiologic studies.

The book shares the authors’ insights gathered through decades of collective experience teaching in the academic programs of biostatistics and epidemiology. Maintaining a focus on the application of statistics in public health, it facilitates a clear understanding of the basic commands of STATA for reading and saving databases.

The book includes coverage of data description, graph construction, significance tests, linear regression models, analysis of variance, categorical data analysis, logistic regression model, poisson regression model, survival analysis, analysis of correlated data, and advanced programming in STATA.

Each chapter is based on one or more research problems linked to public health. Additionally, every chapter includes exercise sets for practicing concepts and exercise solutions for self or group study. Several examples are presented that illustrate the applications of the statistical method in the health sciences using epidemiologic study designs.

Presenting high-level statistics in an accessible manner across research fields in public health, this book is suitable for use as a textbook for biostatistics and epidemiology courses or for consulting the statistical applications in public health.

For readers new to STATA, the first three chapters should be read sequentially, as they form the basis of an introductory course to this software.

Table of Contents

Basic Commands


Entering STATA



STATA Working Directories

Reading a Data File

insheet Procedure

Types of Files

Data Editor

Data Description

Most Useful Commands

list Command

Mathematical and Logical Operators

generate Command

recode Command

drop Command

replace Command

label Command

summarize Command

do-file Editor

Descriptive Statistics and Graphs

tabulate Command

Graph Construction


Box Plot


Bar Chart

Significance Tests


Normality Test

Variance Homogeneity

Student’s t-Test for Independent Samples

Confidence Intervals for Testing the Null Hypothesis

Nonparametric Tests for Unpaired Groups

Sample Size and Statistical Power

Linear Regression Models


Model Assumptions

Parameter Estimation

Hypothesis Testing

Coefficient of Determination

Pearson Correlation Coefficient

Scatter Plot

Running the Model



Multiple Linear Regression Model

Partial Hypothesis


Polynomial Linear Regression Model

Sample Size and Statistical Power

Considerations for the Assumptions of the Linear Regression Model

Analysis of Variance


Data Structure

Example for Fixed Effects

Linear Model with Fixed Effects

Analysis of Variance with Fixed Effects

Programming for ANOVA

Planned Comparisons (before Observing the Data)

Multiple Comparisons: Unplanned Comparisons

Random Effects

Other Measures Related to the Random Effects Model

Example of a Random Effects Model

Sample Size and Statistical Power

Categorical Data Analysis


Cohort Study

Case-Control Study

Sample Size and Statistical Power

Logistic Regression Model

Model Definition

Parameter Estimation

Programming the Logistic Regression Model

Alternative Database

Estimating the Odds Ratio

Significance Tests

Extension of the Logistic Regression Model

Adjusted OR and the Confounding Effect

Effect Modification

Prevalence Ratio

Nominal and Ordinal Outcomes


Sample Size and Statistical Power

Poisson Regression Model

Model Definition

Relative Risk

Parameter Estimation


Programming the Poisson Regression Model

Assessing Interaction Terms


Survival Analysis


Probability of Survival

Components of the Study Design

Kaplan–Meier Method

Programming of S(t)

Hazard Function

Relationship between S(t) and h(t)

Cumulative Hazard Function

Median Survival Time and Percentiles

Comparison of Survival Curves

Proportional Hazards Assumption

Significance Assessment

Cox Proportional Hazards Model

Assessment of the Proportional Hazards Assumption

Survival Function Estimation Using the Cox Proportional Hazards Model

Stratified Cox Proportional Hazards Model

Analysis of Correlated Data

Regression Models with Correlated Data

Mixed Models

Random Intercept

Using the mixed and gllamm Commands with a Random Intercept

Using the mixed Command with Random Intercept and Slope

Mixed Models in a Sampling Design

Introduction to Advanced Programming in STATA



program Command

Log Files

trace Command



Local Macros


Loops (foreach and forvalues)

Application of matrix and local Commands for Prevalence




About the Authors

Erick L. Suárez is a professor of biostatistics in the Department of Biostatistics and Epidemiology at the University of Puerto Rico Graduate School of Public Health. He has more than 25 years of experience teaching biostatistics at the graduate level and has co-authored more than 75 peer-reviewed publications in chronic and infectious diseases. Dr. Suarez has been a co-investigator of several NIH-funded grants related to cancer, HPV, HCV, and diabetes. He has extensive experience in statistical consulting with biomedical researchers, particularly in the analysis of microarrays data in breast cancer.

Cynthia M. Pérez is a professor of epidemiology in the Department of Biostatistics and Epidemiology at the University of Puerto Rico Graduate School of Public Health. She has taught epidemiology and biostatistics for over 20 years. She has also directed efforts in mentoring and training to public health and medical students at the University of Puerto Rico. She has been the principal investigator or co-investigator of research grants in diverse areas of public health including diabetes, metabolic syndrome, periodontal disease, viral hepatitis, and HPV infection. She is the author or co-author of more than 75 peer-reviewed publications.

Graciela M. Nogueras is a statistical analyst at the University of Texas MD Anderson Cancer Center in Houston, Texas. She is currently enrolled in the PhD program in biostatistics at the University of Texas—Graduate School of Public Health. She has co-authored more than 30 peer-reviewed publications. For the past nine years, she has been performing statistical analyses for clinical and basic science researchers. She has been assisting with the design of clinical trials and animal research studies, performing sample size calculations, and writing the clinical trial reports of clinical trial progress and interim analyses of efficacy and safety data to the University of Texas MD Anderson Data and Safety Monitoring Board.

Camille Moreno-Gorrín is a graduate of the Master of Science Program in Epidemiology at the University of Puerto Rico Graduate School of Public Health. During her graduate studies, she was a research assistant at the Comprehensive Cancer Center of the University of Puerto Rico where she co-authored several articles in biomedical journals. She also worked as a research coordinator for the HIV/AIDS Surveillance System of the Puerto Rico Department of Health, where she conducted research on intervention programs to link HIV patients to care.

Subject Categories

BISAC Subject Codes/Headings:
MATHEMATICS / Probability & Statistics / General
MEDICAL / Health Care Delivery
MEDICAL / Public Health