1st Edition

Introduction to High-Dimensional Statistics

ISBN 9781482237948
Published December 17, 2014 by Chapman and Hall/CRC
270 Pages 33 B/W Illustrations

USD $84.95

Prices & shipping based on shipping country


Book Description

Ever-greater computing technologies have given rise to an exponentially growing volume of data. Today massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity, including networks, finance, and genetics. However, analyzing such data has presented a challenge for statisticians and data analysts and has required the development of new statistical methods capable of separating the signal from the noise.

Introduction to High-Dimensional Statistics is a concise guide to state-of-the-art models, techniques, and approaches for handling high-dimensional data. The book is intended to expose the reader to the key concepts and ideas in the most simple settings possible while avoiding unnecessary technicalities.

Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this highly accessible text:

  • Describes the challenges related to the analysis of high-dimensional data
  • Covers cutting-edge statistical methods including model selection, sparsity and the lasso, aggregation, and learning theory
  • Provides detailed exercises at the end of every chapter with collaborative solutions on a wikisite
  • Illustrates concepts with simple but clear practical examples

Introduction to High-Dimensional Statistics is suitable for graduate students and researchers interested in discovering modern statistics for massive data. It can be used as a graduate text or for self-study.

Table of Contents




High-Dimensional Data

Curse of Dimensionality

Lost in the Immensity of High-Dimensional Spaces

Fluctuations Cumulate

An Accumulation of Rare Events May Not Be Rare

Computational Complexity

High-Dimensional Statistics

Circumventing the Curse of Dimensionality

A Paradigm Shift

Mathematics of High-Dimensional Statistics

About This Book

Statistics and Data Analysis

Purpose of This Book


Discussion and References

Take-Home Message



Strange Geometry of High-Dimensional Spaces

Volume of a p-Dimensional Ball

Tails of a Standard Gaussian Distribution

Principal Component Analysis

Basics of Linear Regression

Concentration of the Square Norm of a Gaussian Random Variable

Model Selection

Statistical Setting

To Select among a Collection of Models

Models and Oracle

Model Selection Procedures

Risk Bound for Model Selection

Oracle Risk Bound


Minimax Optimality

Frontier of Estimation in High Dimensions

Minimal Penalties

Computational Issues


An Alternative Point of View on Model Selection

Discussion and References

Take-Home Message



Orthogonal Design

Risk Bounds for the Different Sparsity Settings

Collections of Nested Models

Segmentation with Dynamic Programming

Goldenshluger–Lepski Method

Minimax Lower Bounds

Aggregation of Estimators


Gibbs Mixing of Estimators

Oracle Risk Bound

Numerical Approximation by Metropolis–Hastings

Numerical Illustration

Discussion and References

Take-Home Message



Gibbs Distribution

Orthonormal Setting with Power Law Prior

Group-Sparse Setting

Gain of Combining

Online Aggregation

Convex Criteria

Reminder on Convex Multivariate Functions


Two Useful Properties

Lasso Estimator

Geometric Insights

Analytic Insights

Oracle Risk Bound

Computing the Lasso Estimator

Removing the Bias of the Lasso Estimator

Convex Criteria for Various Sparsity Patterns

Group–Lasso (Group Sparsity)

Sparse–Group Lasso (Sparse–Group Sparsity)

Fused–Lasso (Variation Sparsity)

Discussion and References

Take-Home Message



When Is the Lasso Solution Unique?

Support Recovery via the Witness Approach

Lower Bound on the Compatibility Constant

On the Group–Lasso

Dantzig Selector

Projection on the l1-Ball

Ridge and Elastic-Net

Estimator Selection

Estimator Selection

Cross-Validation Techniques

Complexity Selection Techniques

Coordinate-Sparse Regression

Group-Sparse Regression

Multiple Structures

Scaled-Invariant Criteria

References and Discussion

Take-Home Message



Expected V-Fold CV l2-Risk

Proof of Corollary 5.5

Some Properties of Penalty (5.4)

Selecting the Number of Steps for the Forward Algorithm

Multivariate Regression

Statistical Setting

A Reminder on Singular Values

Low-Rank Estimation

If We Knew the Rank of A*

When the Rank of A* Is Unknown

Low Rank and Sparsity

Row-Sparse Matrices

Criterion for Row-Sparse and Low-Rank Matrices

Convex Criterion for Low Rank Matrices

Convex Criterion for Sparse and Low-Rank Matrices

Discussion and References

Take-Home Message



Hard-Thresholding of the Singular Values

Exact Rank Recovery

Rank Selection with Unknown Variance

Graphical Models

Reminder on Conditional Independence

Graphical Models

Directed Acyclic Graphical Models

Nondirected Models

Gaussian Graphical Models (GGM)

Connection with the Precision Matrix and the Linear Regression

Estimating g by Multiple Testing

Sparse Estimation of the Precision Matrix

Estimation of g by Regression

Practical Issues

Discussion and References

Take-Home Message



Factorization in Directed Models

Moralization of a Directed Graph

Convexity of –log(det(K))

Block Gradient Descent with the l1 / l2 Penalty

Gaussian Graphical Models with Hidden Variables

Dantzig Estimation of Sparse Gaussian Graphical Models

Gaussian Copula Graphical Models

Restricted Isometry Constant for Gaussian Matrices

Multiple Testing

An Introductory Example

Differential Expression of a Single Gene

Differential Expression of Multiple Genes

Statistical Setting


Multiple Testing Setting

Bonferroni Correction

Controlling the False Discovery Rate


Step-Up Procedures

FDR Control under the WPRDS Property


Discussion and References

Take-Home Message



FDR versus FWER

WPRDS Property

Positively Correlated Normal Test Statistics

Supervised Classification

Statistical Modeling

Bayes Classifier

Parametric Modeling

Semi-Parametric Modeling

Nonparametric Modeling

Empirical Risk Minimization

Misclassification Probability of the Empirical Risk Minimizer

Vapnik–Chervonenkis Dimension

Dictionary Selection

From Theoretical to Practical Classifiers

Empirical Risk Convexification

Statistical Properties

Support Vector Machines


Classifier Selection

Discussion and References

Take-Home Message



Linear Discriminant Analysis

VC Dimension of Linear Classifiers in Rd

Linear Classifiers with Margin Constraints

Spectral Kernel

Computation of the SVM Classifier

Kernel Principal Component Analysis (KPCA)

Gaussian Distribution

Gaussian Random Vectors

Chi-Square Distribution

Gaussian Conditioning

Probabilistic Inequalities

Basic Inequalities

Concentration Inequalities

McDiarmid Inequality

Gaussian Concentration Inequality

Symmetrization and Contraction Lemmas

Symmetrization Lemma

Contraction Principle

Birgé’s Inequality

Linear Algebra

Singular Value Decomposition (SVD)

Moore–Penrose Pseudo-Inverse

Matrix Norms

Matrix Analysis

Subdifferentials of Convex Functions

Subdifferentials and Subgradients

Examples of Subdifferentials

Reproducing Kernel Hilbert Spaces




View More



Christophe Giraud was a student of the École Normale Supérieure de Paris, and he received a Ph.D in probability theory from the University Paris 6. He was assistant professor at the University of Nice from 2002 to 2008. He has been associate professor at the École Polytechnique since 2008 and professor at Paris Sud University (Orsay) since 2012. His current research focuses mainly on the statistical theory of high-dimensional data analysis and its applications to life sciences.


"Introduction to High-Dimensional Statistics by Christophe Giraud succeeds singularly at providing a structured introduction to this active field of research. … it is arguably the most accessible overview yet published of the mathematical ideas and principles that one needs to master to enter the field of high-dimensional statistics. … recommended to anyone interested in the main results of current research in high-dimensional statistics as well as anyone interested in acquiring the core mathematical skills to enter this area of research."
Journal of the American Statistical Association, December 2015

"This is an attractive textbook. It will prove a very useful addition to any library or personal reference collection. … This book achieves well what it sets out to provide, an introduction to the mathematical foundations of high-dimensional statistics. … likely to stand the test of time well."
International Statistical Review, 83, 2015

"There is a real need for this book. It can quickly make someone new to the field familiar with modern topics in high-dimensional statistics and machine learning, and it is great as a textbook for an advanced graduate course."
—Marten H. Wegkamp, Cornell University, Ithaca, New York, USA

"As a mathematician, I am quite charmed by the book and its focus on getting the important ideas through in as short a form as possible, all the while sacrificing none of the mathematical correctness. I certainly plan to use it myself as a support in my own lectures!"
—Gilles Blanchard, University of Potsdam, Germany

"The book Introduction to High-Dimensional Statistics by Christophe Giraud succeeds singularly at providing a structured introduction to this active field of research. It describes a statistical pipeline where statistical principles enable the development of new methods, which, in turn, require a new mathematical analysis...A striking aspect of this book is the omnipresence of computational considerations across chapters. The author carefully points to potential implementations, R packages and algorithmic details that have now become inherent to modern high-dimensional statistical research...Giraud also offers informative and fairly comprehensive bibliographical notes that point to the main results of the field as well as connected work...It should be recommended to anyone interested in the main results of current research in high-dimensional statistics as well as anyone interested in acquiring the core mathematical skills to enter this area of research."
- Philippe Rigollet, Massachusetts Institute of Technology, USA