Data Management Using Stata : A Practical Handbook book cover
1st Edition

Data Management Using Stata
A Practical Handbook

ISBN 9781597180764
Published May 24, 2010 by Stata Press
387 Pages

SAVE ~ $15.99
was $79.95
USD $63.96

Prices & shipping based on shipping country


Book Description

Using simple language and illustrative examples, this book comprehensively covers data management tasks that bridge the gap between raw data and statistical analysis. Rather than focus on clusters of commands, the author takes a modular approach that enables readers to quickly identify and implement the necessary task without having to access background information first. Each section in the chapters presents a self-contained lesson that illustrates a particular data management task via examples, such as creating data variables and automating error checking. The text also discusses common pitfalls and how to avoid them and provides strategic data management advice. Ideal for both beginning statisticians and experienced users, this handy book helps readers solve problems and learn comprehensive data management skills.

Table of Contents

Using this book
Overview of this book
Listing observations in this book

Reading and Writing Datasets
Reading Stata datasets
Saving Stata datasets
Reading comma-separated and tab-separated files
Reading space-separated files
Reading fixed-column files
Reading fixed-column files with multiple lines of raw data per observation
Reading SAS XPORT files
Common errors reading files
Entering data directly into the Stata Data Editor
Saving comma-separated and tab-separated files
Saving space-separated files
Saving SAS XPORT files

Data Cleaning
Double data entry
Checking individual variables
Checking categorical by categorical variables
Checking categorical by continuous variables
Checking continuous by continuous variables
Correcting errors in data
Identifying duplicates
Final thoughts on data cleaning

Labeling Datasets
Describing datasets
Labeling variables
Labeling values
Labeling utilities
Labeling variables and values in different languages
Adding comments to your dataset using notes
Formatting the display of variables
Changing the order of variables in a dataset

Creating Variables
Creating and changing variables
Numeric expressions and functions
String expressions and functions
Coding missing values
Dummy variables
Date variables
Date-and-time variables
Computations across variables
Computations across observations
More examples using the egen command
Converting string variables to numeric variables
Converting numeric variables to string variables
Renaming and ordering variables

Combining Datasets
Appending: Appending datasets
Appending: Problems
Merging: One-to-one match-merging
Merging: One-to-many match-merging
Merging: Merging multiple datasets
Merging: Update merges
Merging: Additional options when merging datasets
Merging: Problems merging datasets
Joining datasets
Crossing datasets

Processing Observations across Subgroups
Obtaining separate results for subgroups
Computing values separately by subgroups
Computing values within subgroups: Subscripting observations
Computing values within subgroups: Computations across observations
Computing values within subgroups: Running sums
Computing values within subgroups: More examples
Comparing the by and tsset commands

Changing the Shape of Your Data
Wide and long datasets
Introduction to reshaping long to wide
Reshaping long to wide: Problems
Introduction to reshaping wide to long
Reshaping wide to long: Problems
Multilevel datasets
Collapsing datasets

Programming for Data Management
Tips on long-term goals in data management
Executing do-files and making log files
Automating data checking
Combining do-files
Introducing Stata macros
Manipulating Stata macros
Repeating commands by looping over variables
Repeating commands by looping over numbers
Repeating commands by looping over anything
Accessing results saved from Stata commands
Saving results of estimation commands as data
Writing Stata programs

Additional Resources
Online resources for this book
Finding and installing additional programs
More online resources

Appendix: Common elements


View More



Michael N. Mitchell is a senior statistician in health services research. For 12 years, he worked in the Statistical Consulting Group of the UCLA Academic Technology Services.


The author uses a "learning by example" approach in the book. Overall this works well …
—Morteza Marzjarani, The American Statistician, November 2011