Robust Methods for Data Reduction: 1st Edition (Hardback) book cover

Robust Methods for Data Reduction

1st Edition

By Alessio Farcomeni, Luca Greco

Chapman and Hall/CRC

297 pages | 67 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781466590625
pub: 2015-04-16
SAVE ~$22.00
eBook (VitalSource) : 9780429167966
pub: 2016-01-13
from $28.98

FREE Standard Shipping!


Robust Methods for Data Reduction gives a non-technical overview of robust data reduction techniques, encouraging the use of these important and useful methods in practical applications. The main areas covered include principal components analysis, sparse principal component analysis, canonical correlation analysis, factor analysis, clustering, double clustering, and discriminant analysis.

The first part of the book illustrates how dimension reduction techniques synthesize available information by reducing the dimensionality of the data. The second part focuses on cluster and discriminant analysis. The authors explain how to perform sample reduction by finding groups in the data.

Despite considerable theoretical achievements, robust methods are not often used in practice. This book fills the gap between theoretical robust techniques and the analysis of real data sets in the area of data reduction. Using real examples, the authors show how to implement the procedures in R. The code and data for the examples are available on the book’s CRC Press web page.


"… this book tries to avoid technicalities and focuses on illustrating the power of robust techniques in action. Additionally, it covers some novel techniques, involving data reduction … An important concept addressed in Part 2 of the book is independent cell-wise contamination. A large number of variables and a relatively small number of cases are commonplace in modern statistical applications. … The proposed snipping methodology is tailored to be applied in the presence of cell-wise contamination, and from my point of view, is one of the principal methodological contributions of the book. …

In summary, this book is interesting and useful. The book is not an attempt to systematically review all the literature in robust data reduction. However, it proposes a selection of techniques that are simple to understand or to use in practice."

—Luis Angel García Escudero, Dpto. de Estadística e I. O., Universidad de Valladolid, in Biometrics, June 2017

"'Robust Methods for Data Reduction' makes it easy for practitioners of big-data analytics to conduct robust and efficient data reduction. It is a timely topic in which recently prescribed algorithms and methodological research findings are properly assimilated and presented in a lucid fashion. The book serves as a good introductory book that motivates and teaches the art of developing robust frameworks for synthesis and reduction of large, complex datasets…The most appealing aspect of this book is that all of the concepts and algorithms described are inspired by real-data examples. All of the methods presented in this book are accompanied by extensive codes and exhaustive documentation on how to implement them in the R computing environment. Readers can download the data and the computer code used in the book from the publisher’s webpage…The collection of data examples and the pedagogical writing style make it an ideal text for instructors aiming to quickly train students on proper data-reduction techniques…This book will be particularly useful for courses with R labs. It is bound to find a wide and enduring readership and will be a valuable addition to the library of any data scientist."

—Gourab Mukherjee, University of Southern California, in Journal of the American Statistical Association, Volume 111, 2016

Table of Contents

Introduction and Overview

What is contamination

Evaluating robustness

What is data reduction

An overview of robust dimension reduction

An overview of robust sample reduction

Example datasets

Multivariate Estimation Methods

Robust univariate methods

Classical multivariate estimation

Robust multivariate estimation

Identification of multivariate outliers


Dimension Reduction

Principal Component Analysis

Classical PCA

PCA based on robust covariance estimation

PCA based on projection pursuit

Spherical PCA

PCA in high dimensions

Outlier identification using principal components


Sparse Robust PCA

Basic concepts and sPCA

Robust sPCA

Choice of the degree of sparsity

Sparse projection pursuit


Canonical Correlation Analysis

Classical canonical correlation analysis

CCA based on robust covariance estimation

Other methods


Factor Analysis

The FA model

Robust factor analysis


Sample Reduction

k-Means and Model-Based Clustering

A brief overview of applications of cluster analysis

Basic concepts


Model-based clustering

Choosing the number of clusters

Robust Clustering

Partitioning around medoids

Trimmed k-means

Snipped k-means

Choosing the trimming and snipping levels


Robust Model-Based Clustering

Robust heterogeneous clustering based on trimming

Robust heterogeneous clustering based on snipping


Double Clustering

Double k-means

Trimmed double k-means

Snipped double k-means

Robustness properties

Discriminant Analysis

Classical discriminant analysis

Robust discriminant analysis

Appendix: Use of the Software R for Data Reduction



About the Authors

Alessio Farcomeni is an assistant professor in the Department of Public Health and Infectious Diseases at the University of Rome Sapienza. His work focuses on robust statistics, longitudinal models, categorical data analysis, cluster analysis, and multiple testing. He also is involved in clinical, ecological, and econometric research.

Luca Greco is an assistant professor in the Department of Law, Economics, Management and Quantitative Methods at the University of Sannio. His research interests include robust statistics, likelihood asymptotics, pseudolikelihood functions, and skew elliptical distributions.

Subject Categories

BISAC Subject Codes/Headings:
MATHEMATICS / Probability & Statistics / General