1st Edition

Robust Methods for Data Reduction




ISBN 9781466590625
Published April 16, 2015 by Chapman and Hall/CRC
297 Pages 67 B/W Illustrations

USD $115.00

Prices & shipping based on shipping country


Preview

Book Description

Robust Methods for Data Reduction gives a non-technical overview of robust data reduction techniques, encouraging the use of these important and useful methods in practical applications. The main areas covered include principal components analysis, sparse principal component analysis, canonical correlation analysis, factor analysis, clustering, double clustering, and discriminant analysis.

The first part of the book illustrates how dimension reduction techniques synthesize available information by reducing the dimensionality of the data. The second part focuses on cluster and discriminant analysis. The authors explain how to perform sample reduction by finding groups in the data.

Despite considerable theoretical achievements, robust methods are not often used in practice. This book fills the gap between theoretical robust techniques and the analysis of real data sets in the area of data reduction. Using real examples, the authors show how to implement the procedures in R. The code and data for the examples are available on the book’s CRC Press web page.

Table of Contents

Introduction and Overview
What is contamination
Evaluating robustness
What is data reduction
An overview of robust dimension reduction
An overview of robust sample reduction
Example datasets

Multivariate Estimation Methods
Robust univariate methods
Classical multivariate estimation
Robust multivariate estimation
Identification of multivariate outliers
Examples

Dimension Reduction
Principal Component Analysis

Classical PCA
PCA based on robust covariance estimation
PCA based on projection pursuit
Spherical PCA
PCA in high dimensions
Outlier identification using principal components
Examples

Sparse Robust PCA
Basic concepts and sPCA
Robust sPCA
Choice of the degree of sparsity
Sparse projection pursuit
Examples

Canonical Correlation Analysis
Classical canonical correlation analysis
CCA based on robust covariance estimation
Other methods
Examples

Factor Analysis
The FA model
Robust factor analysis
Examples

Sample Reduction
k
-Means and Model-Based Clustering
A brief overview of applications of cluster analysis
Basic concepts
k-means
Model-based clustering
Choosing the number of clusters

Robust Clustering
Partitioning around medoids
Trimmed k-means
Snipped k-means
Choosing the trimming and snipping levels
Examples

Robust Model-Based Clustering
Robust heterogeneous clustering based on trimming
Robust heterogeneous clustering based on snipping
Examples

Double Clustering
Double k-means
Trimmed double k-means
Snipped double k-means
Robustness properties

Discriminant Analysis
Classical discriminant analysis
Robust discriminant analysis

Appendix: Use of the Software R for Data Reduction

Bibliography

Index

...
View More

Author(s)

Biography

Alessio Farcomeni is an assistant professor in the Department of Public Health and Infectious Diseases at the University of Rome Sapienza. His work focuses on robust statistics, longitudinal models, categorical data analysis, cluster analysis, and multiple testing. He also is involved in clinical, ecological, and econometric research.

Luca Greco is an assistant professor in the Department of Law, Economics, Management and Quantitative Methods at the University of Sannio. His research interests include robust statistics, likelihood asymptotics, pseudolikelihood functions, and skew elliptical distributions.

Reviews

"… this book tries to avoid technicalities and focuses on illustrating the power of robust techniques in action. Additionally, it covers some novel techniques, involving data reduction … An important concept addressed in Part 2 of the book is independent cell-wise contamination. A large number of variables and a relatively small number of cases are commonplace in modern statistical applications. … The proposed snipping methodology is tailored to be applied in the presence of cell-wise contamination, and from my point of view, is one of the principal methodological contributions of the book. …
In summary, this book is interesting and useful. The book is not an attempt to systematically review all the literature in robust data reduction. However, it proposes a selection of techniques that are simple to understand or to use in practice."
—Luis Angel García Escudero, Dpto. de Estadística e I. O., Universidad de Valladolid, in Biometrics, June 2017

"'Robust Methods for Data Reduction' makes it easy for practitioners of big-data analytics to conduct robust and efficient data reduction. It is a timely topic in which recently prescribed algorithms and methodological research findings are properly assimilated and presented in a lucid fashion. The book serves as a good introductory book that motivates and teaches the art of developing robust frameworks for synthesis and reduction of large, complex datasets…The most appealing aspect of this book is that all of the concepts and algorithms described are inspired by real-data examples. All of the methods presented in this book are accompanied by extensive codes and exhaustive documentation on how to implement them in the R computing environment. Readers can download the data and the computer code used in the book from the publisher’s webpage…The collection of data examples and the pedagogical writing style make it an ideal text for instructors aiming to quickly train students on proper data-reduction techniques…This book will be particularly useful for courses with R labs. It is bound to find a wide and enduring readership and will be a valuable addition to the library of any data scientist."
—Gourab Mukherjee, University of Southern California, in Journal of the American Statistical Association, Volume 111, 2016

Support Material

Ancillaries