1st Edition

Discrete Data Analysis with R Visualization and Modeling Techniques for Categorical and Count Data

By Michael Friendly, David Meyer Copyright 2016
    564 Pages 257 Color Illustrations
    by Chapman & Hall

    An Applied Treatment of Modern Graphical Methods for Analyzing Categorical Data

    Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data presents an applied treatment of modern methods for the analysis of categorical data, both discrete response data and frequency data. It explains how to use graphical methods for exploring data, spotting unusual features, visualizing fitted models, and presenting results.

    The book is designed for advanced undergraduate and graduate students in the social and health sciences, epidemiology, economics, business, statistics, and biostatistics as well as researchers, methodologists, and consultants who can use the methods with their own data and analyses. Along with describing the necessary statistical theory, the authors illustrate the practical application of the techniques to a large number of substantive problems, including how to organize data, conduct an analysis, produce informative graphs, and evaluate what the graphs reveal about the data.

    The first part of the book contains introductory material on graphical methods for discrete data, basic R skills, and methods for fitting and visualizing one-way discrete distributions. The second part focuses on simple, traditional nonparametric tests and exploratory methods for visualizing patterns of association in two-way and larger frequency tables. The final part of the text discusses model-based methods for the analysis of discrete data.

    Web Resource
    The data sets and R software used, including the authors’ own vcd and vcdExtra packages, are available at http://cran.r-project.org.

    Getting Started
    Data visualization and categorical data: Overview
    What is categorical data?
    Strategies for categorical data analysis
    Graphical methods for categorical data

    Working with Categorical Data
    Working with R data: vectors, matrices, arrays, and data frames
    Forms of categorical data: case form, frequency form, and table form
    Ordered factors and reordered tables
    Generating tables: table and xtabs
    Printing tables: structable and ftable
    Subsetting data
    Collapsing tables
    Converting among frequency tables and data frames
    A complex example: TV viewing data

    Fitting and Graphing Discrete Distributions
    Introduction to discrete distributions
    Characteristics of discrete distributions
    Fitting discrete distributions
    Diagnosing discrete distributions: Ord plots
    Poissonness plots and generalized distribution plots
    Fitting discrete distributions as generalized linear models

    Exploratory and Hypothesis-Testing Methods
    Two-Way Contingency Tables
    Tests of association for two-way tables
    Stratified analysis
    Fourfold display for 2 x 2 tables
    Sieve diagrams
    Association plots
    Observer agreement
    Trilinear plots

    Mosaic Displays for n-Way Tables
    Two-way tables
    The strucplot framework
    Three-way and larger tables
    Model and plot collections
    Mosaic matrices for categorical data
    3D mosaics
    Visualizing the structure of loglinear models
    Related visualization methods

    Correspondence Analysis
    Simple correspondence analysis
    Multi-way tables: Stacking and other tricks
    Multiple correspondence analysis
    Biplots for contingency tables

    Model-Building Methods
    Logistic Regression Models
    The logistic regression model
    Multiple logistic regression models
    Case studies
    Influence and diagnostic plots

    Models for Polytomous Responses
    Ordinal response
    Nested dichotomies
    Generalized logit model

    Loglinear and Logit Models for Contingency Tables
    Loglinear models for frequencies
    Fitting and testing loglinear models
    Equivalent logit models
    Zero frequencies

    Extending Loglinear Models
    Models for ordinal variables
    Square tables
    Three-way and higher-dimensional tables
    Multivariate responses

    Generalized Linear Models for Count Data
    Components of generalized linear models
    GLMs for count data
    Models for overdispersed count data
    Models for excess zero counts
    Case studies
    Diagnostic plots for model checking
    Multivariate response GLM models

    A summary and lab exercises appear at the end of each chapter.


    Michael Friendly is a professor of psychology, founding chair of the Graduate Program in Quantitative Methods, and an associate coordinator with the Statistical Consulting Service at York University. He earned a PhD in psychology from Princeton University, specializing in psychometrics and cognitive psychology. In addition to his research interests in psychology, Professor Friendly has broad experience in data analysis, statistics, and computer applications. His main research areas are the development of graphical methods for categorical and multivariate data and the history of data visualization. He is an associate editor of the Journal of Computational and Graphical Statistics and Statistical Science.

    David Meyer is a professor of business informatics at the University of Applied Sciences Technikum Wien. He earned a PhD in business administration from the Vienna University of Economics and Business, with an emphasis on computational economics. Dr. Meyer has published numerous papers in various computer science and statistical journals. His research interests include R, business intelligence, data mining, and operations research.

    "This is an excellent book, nearly encyclopedic in its coverage. I personally find it very useful and expect that many other readers will as well. The book can certainly serve as a reference. It could also serve as a supplementary text in a course on categorical data analysis that uses R for computation or—because so much statistical detail is provided—even as the main text for a course on the topic that emphasizes graphical methods."
    —John Fox, McMaster University

    "For many years, Prof. Friendly has been the most effective promoter in Statistics of graphical methods for categorical data. We owe thanks to Friendly and Meyer for promoting graphical methods and showing how easy it is to implement them in R. This impressive book is a very worthy addition to the library of anyone who spends much time analyzing categorical data." (Alan Agresti, Biometrics)