Robust Methods for Data Reduction gives a non-technical overview of robust data reduction techniques, encouraging the use of these important and useful methods in practical applications. The main areas covered include principal components analysis, sparse principal component analysis, canonical correlation analysis, factor analysis, clustering, double clustering, and discriminant analysis.
The first part of the book illustrates how dimension reduction techniques synthesize available information by reducing the dimensionality of the data. The second part focuses on cluster and discriminant analysis. The authors explain how to perform sample reduction by finding groups in the data.
Despite considerable theoretical achievements, robust methods are not often used in practice. This book fills the gap between theoretical robust techniques and the analysis of real data sets in the area of data reduction. Using real examples, the authors show how to implement the procedures in R. The code and data for the examples are available on the book’s CRC Press web page.
Table of Contents
Introduction and Overview
What is contamination
What is data reduction
An overview of robust dimension reduction
An overview of robust sample reduction
Multivariate Estimation Methods
Robust univariate methods
Classical multivariate estimation
Robust multivariate estimation
Identification of multivariate outliers
Principal Component Analysis
PCA based on robust covariance estimation
PCA based on projection pursuit
PCA in high dimensions
Outlier identification using principal components
Sparse Robust PCA
Basic concepts and sPCA
Choice of the degree of sparsity
Sparse projection pursuit
Canonical Correlation Analysis
Classical canonical correlation analysis
CCA based on robust covariance estimation
The FA model
Robust factor analysis
k-Means and Model-Based Clustering
A brief overview of applications of cluster analysis
Choosing the number of clusters
Partitioning around medoids
Choosing the trimming and snipping levels
Robust Model-Based Clustering
Robust heterogeneous clustering based on trimming
Robust heterogeneous clustering based on snipping
Trimmed double k-means
Snipped double k-means
Classical discriminant analysis
Robust discriminant analysis
Appendix: Use of the Software R for Data Reduction
Alessio Farcomeni is an assistant professor in the Department of Public Health and Infectious Diseases at the University of Rome Sapienza. His work focuses on robust statistics, longitudinal models, categorical data analysis, cluster analysis, and multiple testing. He also is involved in clinical, ecological, and econometric research.
Luca Greco is an assistant professor in the Department of Law, Economics, Management and Quantitative Methods at the University of Sannio. His research interests include robust statistics, likelihood asymptotics, pseudolikelihood functions, and skew elliptical distributions.
"… this book tries to avoid technicalities and focuses on illustrating the power of robust techniques in action. Additionally, it covers some novel techniques, involving data reduction … An important concept addressed in Part 2 of the book is independent cell-wise contamination. A large number of variables and a relatively small number of cases are commonplace in modern statistical applications. … The proposed snipping methodology is tailored to be applied in the presence of cell-wise contamination, and from my point of view, is one of the principal methodological contributions of the book. …
In summary, this book is interesting and useful. The book is not an attempt to systematically review all the literature in robust data reduction. However, it proposes a selection of techniques that are simple to understand or to use in practice."
—Luis Angel García Escudero, Dpto. de Estadística e I. O., Universidad de Valladolid, in Biometrics, June 2017
"'Robust Methods for Data Reduction' makes it easy for practitioners of big-data analytics to conduct robust and efficient data reduction. It is a timely topic in which recently prescribed algorithms and methodological research findings are properly assimilated and presented in a lucid fashion. The book serves as a good introductory book that motivates and teaches the art of developing robust frameworks for synthesis and reduction of large, complex datasets…The most appealing aspect of this book is that all of the concepts and algorithms described are inspired by real-data examples. All of the methods presented in this book are accompanied by extensive codes and exhaustive documentation on how to implement them in the R computing environment. Readers can download the data and the computer code used in the book from the publisher’s webpage…The collection of data examples and the pedagogical writing style make it an ideal text for instructors aiming to quickly train students on proper data-reduction techniques…This book will be particularly useful for courses with R labs. It is bound to find a wide and enduring readership and will be a valuable addition to the library of any data scientist."
—Gourab Mukherjee, University of Southern California, in Journal of the American Statistical Association, Volume 111, 2016