1st Edition
Discrete Data Analysis with R Visualization and Modeling Techniques for Categorical and Count Data
An Applied Treatment of Modern Graphical Methods for Analyzing Categorical Data
Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data presents an applied treatment of modern methods for the analysis of categorical data, both discrete response data and frequency data. It explains how to use graphical methods for exploring data, spotting unusual features, visualizing fitted models, and presenting results.
The book is designed for advanced undergraduate and graduate students in the social and health sciences, epidemiology, economics, business, statistics, and biostatistics as well as researchers, methodologists, and consultants who can use the methods with their own data and analyses. Along with describing the necessary statistical theory, the authors illustrate the practical application of the techniques to a large number of substantive problems, including how to organize data, conduct an analysis, produce informative graphs, and evaluate what the graphs reveal about the data.
The first part of the book contains introductory material on graphical methods for discrete data, basic R skills, and methods for fitting and visualizing one-way discrete distributions. The second part focuses on simple, traditional nonparametric tests and exploratory methods for visualizing patterns of association in two-way and larger frequency tables. The final part of the text discusses model-based methods for the analysis of discrete data.
Web Resource
The data sets and R software used, including the authors’ own vcd and vcdExtra packages, are available at http://cran.r-project.org.
Getting Started
Introduction
Data visualization and categorical data: Overview
What is categorical data?
Strategies for categorical data analysis
Graphical methods for categorical data
Working with Categorical Data
Working with R data: vectors, matrices, arrays, and data frames
Forms of categorical data: case form, frequency form, and table form
Ordered factors and reordered tables
Generating tables: table and xtabs
Printing tables: structable and ftable
Subsetting data
Collapsing tables
Converting among frequency tables and data frames
A complex example: TV viewing data
Fitting and Graphing Discrete Distributions
Introduction to discrete distributions
Characteristics of discrete distributions
Fitting discrete distributions
Diagnosing discrete distributions: Ord plots
Poissonness plots and generalized distribution plots
Fitting discrete distributions as generalized linear models
Exploratory and Hypothesis-Testing Methods
Two-Way Contingency Tables
Introduction
Tests of association for two-way tables
Stratified analysis
Fourfold display for 2 x 2 tables
Sieve diagrams
Association plots
Observer agreement
Trilinear plots
Mosaic Displays for n-Way Tables
Introduction
Two-way tables
The strucplot framework
Three-way and larger tables
Model and plot collections
Mosaic matrices for categorical data
3D mosaics
Visualizing the structure of loglinear models
Related visualization methods
Correspondence Analysis
Introduction
Simple correspondence analysis
Multi-way tables: Stacking and other tricks
Multiple correspondence analysis
Biplots for contingency tables
Model-Building Methods
Logistic Regression Models
Introduction
The logistic regression model
Multiple logistic regression models
Case studies
Influence and diagnostic plots
Models for Polytomous Responses
Ordinal response
Nested dichotomies
Generalized logit model
Loglinear and Logit Models for Contingency Tables
Introduction
Loglinear models for frequencies
Fitting and testing loglinear models
Equivalent logit models
Zero frequencies
Extending Loglinear Models
Models for ordinal variables
Square tables
Three-way and higher-dimensional tables
Multivariate responses
Generalized Linear Models for Count Data
Components of generalized linear models
GLMs for count data
Models for overdispersed count data
Models for excess zero counts
Case studies
Diagnostic plots for model checking
Multivariate response GLM models
A summary and lab exercises appear at the end of each chapter.
Biography
Michael Friendly is a professor of psychology, founding chair of the Graduate Program in Quantitative Methods, and an associate coordinator with the Statistical Consulting Service at York University. He earned a PhD in psychology from Princeton University, specializing in psychometrics and cognitive psychology. In addition to his research interests in psychology, Professor Friendly has broad experience in data analysis, statistics, and computer applications. His main research areas are the development of graphical methods for categorical and multivariate data and the history of data visualization. He is an associate editor of the Journal of Computational and Graphical Statistics and Statistical Science.
David Meyer is a professor of business informatics at the University of Applied Sciences Technikum Wien. He earned a PhD in business administration from the Vienna University of Economics and Business, with an emphasis on computational economics. Dr. Meyer has published numerous papers in various computer science and statistical journals. His research interests include R, business intelligence, data mining, and operations research.
"This is an excellent book, nearly encyclopedic in its coverage. I personally find it very useful and expect that many other readers will as well. The book can certainly serve as a reference. It could also serve as a supplementary text in a course on categorical data analysis that uses R for computation or—because so much statistical detail is provided—even as the main text for a course on the topic that emphasizes graphical methods."
—John Fox, McMaster University"For many years, Prof. Friendly has been the most effective promoter in Statistics of graphical methods for categorical data. We owe thanks to Friendly and Meyer for promoting graphical methods and showing how easy it is to implement them in R. This impressive book is a very worthy addition to the library of anyone who spends much time analyzing categorical data." (Alan Agresti, Biometrics)