2nd Edition

Statistical Programming in SAS

By A. John Bailer Copyright 2020
    378 Pages 50 B/W Illustrations
    by Chapman & Hall

    378 Pages 50 B/W Illustrations
    by Chapman & Hall

    378 Pages 50 B/W Illustrations
    by Chapman & Hall

    Statistical Programming in SAS Second Edition provides a foundation for programming to implement statistical solutions using SAS, a system that has been used to solve data analytic problems for more than 40 years. The author includes motivating examples to inspire readers to generate programming solutions. Upper-level undergraduates, beginning graduate students, and professionals involved in generating programming solutions for data-analytic problems will benefit from this book. The ideal background for a reader is some background in regression modeling and introductory experience with computer programming.

    The coverage of statistical programming in the second edition includes

     Getting data into the SAS system, engineering new features, and formatting variables

     Writing readable and well-documented code

     Structuring, implementing, and debugging programs that are well documented

     Creating solutions to novel problems

     Combining data sources, extracting parts of data sets, and reshaping data sets as needed for other analyses

     Generating general solutions using macros

     Customizing output

     Producing insight-inspiring data visualizations

     Parsing, processing, and analyzing text

     Programming solutions using matrices and connecting to R

     Processing text

     Programming with matrices

     Connecting SAS with R

     Covering topics that are part of both base and certification exams.

    1. Structuring, implementing, and debugging programs to learn about data
        Statistical Programming
        Learning from Constructed, Artificial Data
        Good Programming Practice
        SAS Program Structure
        What Is a SAS Data Set?
        Internally Documenting SAS Program
        Basic Debugging
        Getting Help
        Exercises

    2. Reading, Creating and Formatting Data Sets
        What does a SAS Data Step do?
        Reading Data from External Files
        Reading CSV, Excel and TEXT files
        Temporary versus Permanent Status of Data Sets
        Formatting and Labeling Variables
        User-defined Formatting
        Recoding and Transforming Variables in a DATA Step
        Writing Out a File or Making a Simple Report
        Exercises

    3. Programming a DATA step
        Writing Programs by subdividing tasks
        Ordering How Tasks are Done
        Index-able Lists of variables, aka arrays
        Functions associated with Statistical Distributions
        Generating Variables Using Random Number Generators
        Remembering Variable Values across Observations
        Processing multiple observations for a single observation
        Case Study 1: Is the Two-Sample t-Test Robust to Violations of the Heterogeneous Variance assumption?
        Efficiency considerations – how long does it take?
        Case Study 2: Monte Carlo Integration to Estimate an Integral
        Case Study 3: Simple Percentile-Based Bootstrap
        Case Study 4: Randomization Test for the Equality of Two Populations
        Exercises

    4. Combining, extracting and reshaping data
        Adding observations by SET-ing data sets
        Adding variables by MERGE-ing data sets
        Working with tables in PROC SQL
        Converting wide to long formats
        Converting long to wide formats
        Case Study: Reshaping a World Bank data set
        Building training and validation data sets
        Exercises
        Self-Study lab

    5. Macro Programming
        What Is a Macro and Why Would You Use It?
        Motivation for Macros: Numerical Integration to Determine P(0<Z<1.645)
        Processing Macros
        Macro Variables, Parameters, and Functions
        Conditional Execution, Looping, and Macros
        Saving Macros
        Functions and Routines for Macros
        Case Study:  Macro for constructing training and test data set for Model    Comparison
        Case Study: Processing Multiple Data Sets
        Exercises

    6. Customizing Output and Generating Data Visualizations
        Using the Output Delivery System
        Graphics in SAS
        ODS Statistical Graphics
        Modifying Graphics Using the ODS Graphics Editor
        Graphing with Styles and Templates
        Statistical Graphics—Entering the Land of SG Procedures
        Case Study: Using the SG Procedures
        Enhancing SG displays – options with SG procedure statements
        Using Annotate Data Sets to enhance SG displays
        Using Attribute Maps to enhance SG displays
        Exercises

    7. Processing Text
        Cleaning and Processing Text Data
        Starting with Character Functions
        Processing Text
        Case Study:  Sentiment in State of the Union addresses
        Case Study:  Reading Text from a Web Page
        Regular Expressions
        Case Study (revisited) – Applying Regular Expressions
        Exercises

    8. Programming with Matrices and Vectors
        Defining a Matrix and Subscripting
        Using Diagonal Matrices and Stacking Matrices
        Using Elementwise Operations, Repeating, and Multiplying Matrices
        Importing a Data Set into SAS/IML and Exporting Matrices from SAS/IML to a Data    Set
        Case Study 1: Monte Carlo Integration to Estimate π
        Case Study 2: Bisection Root Finder
        Case Study 3: Randomization Test Using Matrices Imported from PROC PLAN
        Case Study 4: SAS/IML Module to Implement Monte Carlo Integration to Estimate π
        Storing and loading SAS/IML modules
        SAS/IML and R
        Exercises

    References

    Biography

    A. John Bailer, PhD, PStat®, is a University Distinguished Professor and a founding chair of the Department of Statistics and an affiliate member of the Departments of Biology and Sociology and Gerontology as well as the Institute for the Environment and Sustainability at the Miami University in Oxford, Ohio. He is President of the International Statistical Institute (2019–2021). He previously served on the Board of Directors of the American Statistical Association. He is a Fellow of the American Statistical Association, the Society for Risk Analysis, and the American Association for the Advancement of Science. His research has focused on the quantitative risk estimation but has collaborations addressing problems in toxicology, environmental health, and occupational safety. He received the E. Phillips Knox Distinguished Teaching Award in 2018 after previously receiving the Distinguished Teaching Award for Excellence in Graduate Instruction and Mentoring and the College of Arts and Science Distinguished Teaching Award. He is also the co-founder and continuing panelist on the Stats+Stories podcast (www.statsandstories.net).

    "This book is useful for people who want to learn SAS programing, and assumes the students have knowledge of multiple linear regression and one-way ANOVA models.…The second edition has added a chapter on text processing, and reorganized the chapter order…Some topics that are relevant for the SAS Base and Certifications exams are covered, and a nice feature is the highlighting of programing tips in gray."
    ~Technometrics

    "This is a very complete book for programming SAS in statistical analyses. This second edition offers the possibility to debug some programs and provides new examples and applications, which are very useful. This book is a very useful companion tool for students or beginners in SAS, or for more experienced statisticians who already use SAS for statistical analyses."
    ~ISCB News