Data Management Using Stata: A Practical Handbook, 1st Edition (Paperback) book cover

Data Management Using Stata

A Practical Handbook, 1st Edition

By Michael N. Mitchell

Stata Press

387 pages

Purchasing Options:$ = USD
Paperback: 9781597180764
pub: 2010-05-24
SAVE ~$15.99

FREE Standard Shipping!


Using simple language and illustrative examples, this book comprehensively covers data management tasks that bridge the gap between raw data and statistical analysis. Rather than focus on clusters of commands, the author takes a modular approach that enables readers to quickly identify and implement the necessary task without having to access background information first. Each section in the chapters presents a self-contained lesson that illustrates a particular data management task via examples, such as creating data variables and automating error checking. The text also discusses common pitfalls and how to avoid them and provides strategic data management advice. Ideal for both beginning statisticians and experienced users, this handy book helps readers solve problems and learn comprehensive data management skills.


The author uses a "learning by example" approach in the book. Overall this works well …

—Morteza Marzjarani, The American Statistician, November 2011

Table of Contents


Using this book

Overview of this book

Listing observations in this book

Reading and Writing Datasets


Reading Stata datasets

Saving Stata datasets

Reading comma-separated and tab-separated files

Reading space-separated files

Reading fixed-column files

Reading fixed-column files with multiple lines of raw data per observation

Reading SAS XPORT files

Common errors reading files

Entering data directly into the Stata Data Editor

Saving comma-separated and tab-separated files

Saving space-separated files

Saving SAS XPORT files

Data Cleaning


Double data entry

Checking individual variables

Checking categorical by categorical variables

Checking categorical by continuous variables

Checking continuous by continuous variables

Correcting errors in data

Identifying duplicates

Final thoughts on data cleaning

Labeling Datasets


Describing datasets

Labeling variables

Labeling values

Labeling utilities

Labeling variables and values in different languages

Adding comments to your dataset using notes

Formatting the display of variables

Changing the order of variables in a dataset

Creating Variables


Creating and changing variables

Numeric expressions and functions

String expressions and functions


Coding missing values

Dummy variables

Date variables

Date-and-time variables

Computations across variables

Computations across observations

More examples using the egen command

Converting string variables to numeric variables

Converting numeric variables to string variables

Renaming and ordering variables

Combining Datasets


Appending: Appending datasets

Appending: Problems

Merging: One-to-one match-merging

Merging: One-to-many match-merging

Merging: Merging multiple datasets

Merging: Update merges

Merging: Additional options when merging datasets

Merging: Problems merging datasets

Joining datasets

Crossing datasets

Processing Observations across Subgroups


Obtaining separate results for subgroups

Computing values separately by subgroups

Computing values within subgroups: Subscripting observations

Computing values within subgroups: Computations across observations

Computing values within subgroups: Running sums

Computing values within subgroups: More examples

Comparing the by and tsset commands

Changing the Shape of Your Data


Wide and long datasets

Introduction to reshaping long to wide

Reshaping long to wide: Problems

Introduction to reshaping wide to long

Reshaping wide to long: Problems

Multilevel datasets

Collapsing datasets

Programming for Data Management


Tips on long-term goals in data management

Executing do-files and making log files

Automating data checking

Combining do-files

Introducing Stata macros

Manipulating Stata macros

Repeating commands by looping over variables

Repeating commands by looping over numbers

Repeating commands by looping over anything

Accessing results saved from Stata commands

Saving results of estimation commands as data

Writing Stata programs

Additional Resources

Online resources for this book

Finding and installing additional programs

More online resources

Appendix: Common elements


About the Author

Michael N. Mitchell is a senior statistician in health services research. For 12 years, he worked in the Statistical Consulting Group of the UCLA Academic Technology Services.

Subject Categories

BISAC Subject Codes/Headings:
MATHEMATICS / Probability & Statistics / General