1st Edition
Biostatistics in Public Health Using STATA
Striking a balance between theory, application, and programming, Biostatistics in Public Health Using STATA is a user-friendly guide to applied statistical analysis in public health using STATA version 14. The book supplies public health practitioners and students with the opportunity to gain expertise in the application of statistics in epidemiologic studies.
The book shares the authors’ insights gathered through decades of collective experience teaching in the academic programs of biostatistics and epidemiology. Maintaining a focus on the application of statistics in public health, it facilitates a clear understanding of the basic commands of STATA for reading and saving databases.
The book includes coverage of data description, graph construction, significance tests, linear regression models, analysis of variance, categorical data analysis, logistic regression model, poisson regression model, survival analysis, analysis of correlated data, and advanced programming in STATA.
Each chapter is based on one or more research problems linked to public health. Additionally, every chapter includes exercise sets for practicing concepts and exercise solutions for self or group study. Several examples are presented that illustrate the applications of the statistical method in the health sciences using epidemiologic study designs.
Presenting high-level statistics in an accessible manner across research fields in public health, this book is suitable for use as a textbook for biostatistics and epidemiology courses or for consulting the statistical applications in public health.
For readers new to STATA, the first three chapters should be read sequentially, as they form the basis of an introductory course to this software.
Basic Commands
Introduction
Entering STATA
Taskbar
Help
STATA Working Directories
Reading a Data File
insheet Procedure
Types of Files
Data Editor
Data Description
Most Useful Commands
list Command
Mathematical and Logical Operators
generate Command
recode Command
drop Command
replace Command
label Command
summarize Command
do-file Editor
Descriptive Statistics and Graphs
tabulate Command
Graph Construction
Introduction
Box Plot
Histogram
Bar Chart
Significance Tests
Introduction
Normality Test
Variance Homogeneity
Student’s t-Test for Independent Samples
Confidence Intervals for Testing the Null Hypothesis
Nonparametric Tests for Unpaired Groups
Sample Size and Statistical Power
Linear Regression Models
Introduction
Model Assumptions
Parameter Estimation
Hypothesis Testing
Coefficient of Determination
Pearson Correlation Coefficient
Scatter Plot
Running the Model
Centering
Bootstrapping
Multiple Linear Regression Model
Partial Hypothesis
Prediction
Polynomial Linear Regression Model
Sample Size and Statistical Power
Considerations for the Assumptions of the Linear Regression Model
Analysis of Variance
Introduction
Data Structure
Example for Fixed Effects
Linear Model with Fixed Effects
Analysis of Variance with Fixed Effects
Programming for ANOVA
Planned Comparisons (before Observing the Data)
Multiple Comparisons: Unplanned Comparisons
Random Effects
Other Measures Related to the Random Effects Model
Example of a Random Effects Model
Sample Size and Statistical Power
Categorical Data Analysis
Introduction
Cohort Study
Case-Control Study
Sample Size and Statistical Power
Logistic Regression Model
Model Definition
Parameter Estimation
Programming the Logistic Regression Model
Alternative Database
Estimating the Odds Ratio
Significance Tests
Extension of the Logistic Regression Model
Adjusted OR and the Confounding Effect
Effect Modification
Prevalence Ratio
Nominal and Ordinal Outcomes
Overdispersion
Sample Size and Statistical Power
Poisson Regression Model
Model Definition
Relative Risk
Parameter Estimation
Example
Programming the Poisson Regression Model
Assessing Interaction Terms
Overdispersion
Survival Analysis
Introduction
Probability of Survival
Components of the Study Design
Kaplan–Meier Method
Programming of S(t)
Hazard Function
Relationship between S(t) and h(t)
Cumulative Hazard Function
Median Survival Time and Percentiles
Comparison of Survival Curves
Proportional Hazards Assumption
Significance Assessment
Cox Proportional Hazards Model
Assessment of the Proportional Hazards Assumption
Survival Function Estimation Using the Cox Proportional Hazards Model
Stratified Cox Proportional Hazards Model
Analysis of Correlated Data
Regression Models with Correlated Data
Mixed Models
Random Intercept
Using the mixed and gllamm Commands with a Random Intercept
Using the mixed Command with Random Intercept and Slope
Mixed Models in a Sampling Design
Introduction to Advanced Programming in STATA
Introduction
do-files
program Command
Log Files
trace Command
Delimiters
Indexing
Local Macros
Scalars
Loops (foreach and forvalues)
Application of matrix and local Commands for Prevalence
Estimation
References
Index
Biography
Erick L. Suárez is a professor of biostatistics in the Department of Biostatistics and Epidemiology at the University of Puerto Rico Graduate School of Public Health. He has more than 25 years of experience teaching biostatistics at the graduate level and has co-authored more than 75 peer-reviewed publications in chronic and infectious diseases. Dr. Suarez has been a co-investigator of several NIH-funded grants related to cancer, HPV, HCV, and diabetes. He has extensive experience in statistical consulting with biomedical researchers, particularly in the analysis of microarrays data in breast cancer.
Cynthia M. Pérez is a professor of epidemiology in the Department of Biostatistics and Epidemiology at the University of Puerto Rico Graduate School of Public Health. She has taught epidemiology and biostatistics for over 20 years. She has also directed efforts in mentoring and training to public health and medical students at the University of Puerto Rico. She has been the principal investigator or co-investigator of research grants in diverse areas of public health including diabetes, metabolic syndrome, periodontal disease, viral hepatitis, and HPV infection. She is the author or co-author of more than 75 peer-reviewed publications.
Graciela M. Nogueras is a statistical analyst at the University of Texas MD Anderson Cancer Center in Houston, Texas. She is currently enrolled in the PhD program in biostatistics at the University of Texas—Graduate School of Public Health. She has co-authored more than 30 peer-reviewed publications. For the past nine years, she has been performing statistical analyses for clinical and basic science researchers. She has been assisting with the design of clinical trials and animal research studies, performing sample size calculations, and writing the clinical trial reports of clinical trial progress and interim analyses of efficacy and safety data to the University of Texas MD Anderson Data and Safety Monitoring Board.
Camille Moreno-Gorrín is a graduate of the Master of Science Program in Epidemiology at the University of Puerto Rico Graduate School of Public Health. During her graduate studies, she was a research assistant at the Comprehensive Cancer Center of the University of Puerto Rico where she co-authored several articles in biomedical journals. She also worked as a research coordinator for the HIV/AIDS Surveillance System of the Puerto Rico Department of Health, where she conducted research on intervention programs to link HIV patients to care.