Introduction to High-Dimensional Statistics: 1st Edition (Hardback) book cover

Introduction to High-Dimensional Statistics

1st Edition

By Christophe Giraud

Chapman and Hall/CRC

270 pages | 33 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781482237948
pub: 2014-12-17
SAVE ~$15.79
$78.95
$63.16
x
eBook (VitalSource) : 9780429173899
pub: 2014-12-17
from $37.98


FREE Standard Shipping!

Description

Ever-greater computing technologies have given rise to an exponentially growing volume of data. Today massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity, including networks, finance, and genetics. However, analyzing such data has presented a challenge for statisticians and data analysts and has required the development of new statistical methods capable of separating the signal from the noise.

Introduction to High-Dimensional Statistics is a concise guide to state-of-the-art models, techniques, and approaches for handling high-dimensional data. The book is intended to expose the reader to the key concepts and ideas in the most simple settings possible while avoiding unnecessary technicalities.

Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this highly accessible text:

  • Describes the challenges related to the analysis of high-dimensional data
  • Covers cutting-edge statistical methods including model selection, sparsity and the lasso, aggregation, and learning theory
  • Provides detailed exercises at the end of every chapter with collaborative solutions on a wikisite
  • Illustrates concepts with simple but clear practical examples

Introduction to High-Dimensional Statistics is suitable for graduate students and researchers interested in discovering modern statistics for massive data. It can be used as a graduate text or for self-study.

Reviews

"Introduction to High-Dimensional Statistics by Christophe Giraud succeeds singularly at providing a structured introduction to this active field of research. … it is arguably the most accessible overview yet published of the mathematical ideas and principles that one needs to master to enter the field of high-dimensional statistics. … recommended to anyone interested in the main results of current research in high-dimensional statistics as well as anyone interested in acquiring the core mathematical skills to enter this area of research."

Journal of the American Statistical Association, December 2015

"This is an attractive textbook. It will prove a very useful addition to any library or personal reference collection. … This book achieves well what it sets out to provide, an introduction to the mathematical foundations of high-dimensional statistics. … likely to stand the test of time well."

International Statistical Review, 83, 2015

"There is a real need for this book. It can quickly make someone new to the field familiar with modern topics in high-dimensional statistics and machine learning, and it is great as a textbook for an advanced graduate course."

—Marten H. Wegkamp, Cornell University, Ithaca, New York, USA

"As a mathematician, I am quite charmed by the book and its focus on getting the important ideas through in as short a form as possible, all the while sacrificing none of the mathematical correctness. I certainly plan to use it myself as a support in my own lectures!"

—Gilles Blanchard, University of Potsdam, Germany

Table of Contents

Preface

Acknowledgments

Introduction

High-Dimensional Data

Curse of Dimensionality

Lost in the Immensity of High-Dimensional Spaces

Fluctuations Cumulate

An Accumulation of Rare Events May Not Be Rare

Computational Complexity

High-Dimensional Statistics

Circumventing the Curse of Dimensionality

A Paradigm Shift

Mathematics of High-Dimensional Statistics

About This Book

Statistics and Data Analysis

Purpose of This Book

Overview

Discussion and References

Take-Home Message

References

Exercises

Strange Geometry of High-Dimensional Spaces

Volume of a p-Dimensional Ball

Tails of a Standard Gaussian Distribution

Principal Component Analysis

Basics of Linear Regression

Concentration of the Square Norm of a Gaussian Random Variable

Model Selection

Statistical Setting

To Select among a Collection of Models

Models and Oracle

Model Selection Procedures

Risk Bound for Model Selection

Oracle Risk Bound

Optimality

Minimax Optimality

Frontier of Estimation in High Dimensions

Minimal Penalties

Computational Issues

Illustration

An Alternative Point of View on Model Selection

Discussion and References

Take-Home Message

References

Exercises

Orthogonal Design

Risk Bounds for the Different Sparsity Settings

Collections of Nested Models

Segmentation with Dynamic Programming

Goldenshluger–Lepski Method

Minimax Lower Bounds

Aggregation of Estimators

Introduction

Gibbs Mixing of Estimators

Oracle Risk Bound

Numerical Approximation by Metropolis–Hastings

Numerical Illustration

Discussion and References

Take-Home Message

References

Exercises

Gibbs Distribution

Orthonormal Setting with Power Law Prior

Group-Sparse Setting

Gain of Combining

Online Aggregation

Convex Criteria

Reminder on Convex Multivariate Functions

Subdifferentials

Two Useful Properties

Lasso Estimator

Geometric Insights

Analytic Insights

Oracle Risk Bound

Computing the Lasso Estimator

Removing the Bias of the Lasso Estimator

Convex Criteria for Various Sparsity Patterns

Group–Lasso (Group Sparsity)

Sparse–Group Lasso (Sparse–Group Sparsity)

Fused–Lasso (Variation Sparsity)

Discussion and References

Take-Home Message

References

Exercises

When Is the Lasso Solution Unique?

Support Recovery via the Witness Approach

Lower Bound on the Compatibility Constant

On the Group–Lasso

Dantzig Selector

Projection on the l1-Ball

Ridge and Elastic-Net

Estimator Selection

Estimator Selection

Cross-Validation Techniques

Complexity Selection Techniques

Coordinate-Sparse Regression

Group-Sparse Regression

Multiple Structures

Scaled-Invariant Criteria

References and Discussion

Take-Home Message

References

Exercises

Expected V-Fold CV l2-Risk

Proof of Corollary 5.5

Some Properties of Penalty (5.4)

Selecting the Number of Steps for the Forward Algorithm

Multivariate Regression

Statistical Setting

A Reminder on Singular Values

Low-Rank Estimation

If We Knew the Rank of A*

When the Rank of A* Is Unknown

Low Rank and Sparsity

Row-Sparse Matrices

Criterion for Row-Sparse and Low-Rank Matrices

Convex Criterion for Low Rank Matrices

Convex Criterion for Sparse and Low-Rank Matrices

Discussion and References

Take-Home Message

References

Exercises

Hard-Thresholding of the Singular Values

Exact Rank Recovery

Rank Selection with Unknown Variance

Graphical Models

Reminder on Conditional Independence

Graphical Models

Directed Acyclic Graphical Models

Nondirected Models

Gaussian Graphical Models (GGM)

Connection with the Precision Matrix and the Linear Regression

Estimating g by Multiple Testing

Sparse Estimation of the Precision Matrix

Estimation of g by Regression

Practical Issues

Discussion and References

Take-Home Message

References

Exercises

Factorization in Directed Models

Moralization of a Directed Graph

Convexity of –log(det(K))

Block Gradient Descent with the l1 / l2 Penalty

Gaussian Graphical Models with Hidden Variables

Dantzig Estimation of Sparse Gaussian Graphical Models

Gaussian Copula Graphical Models

Restricted Isometry Constant for Gaussian Matrices

Multiple Testing

An Introductory Example

Differential Expression of a Single Gene

Differential Expression of Multiple Genes

Statistical Setting

p-Values

Multiple Testing Setting

Bonferroni Correction

Controlling the False Discovery Rate

Heuristics

Step-Up Procedures

FDR Control under the WPRDS Property

Illustration

Discussion and References

Take-Home Message

References

Exercises

FDR versus FWER

WPRDS Property

Positively Correlated Normal Test Statistics

Supervised Classification

Statistical Modeling

Bayes Classifier

Parametric Modeling

Semi-Parametric Modeling

Nonparametric Modeling

Empirical Risk Minimization

Misclassification Probability of the Empirical Risk Minimizer

Vapnik–Chervonenkis Dimension

Dictionary Selection

From Theoretical to Practical Classifiers

Empirical Risk Convexification

Statistical Properties

Support Vector Machines

AdaBoost

Classifier Selection

Discussion and References

Take-Home Message

References

Exercises

Linear Discriminant Analysis

VC Dimension of Linear Classifiers in Rd

Linear Classifiers with Margin Constraints

Spectral Kernel

Computation of the SVM Classifier

Kernel Principal Component Analysis (KPCA)

Gaussian Distribution

Gaussian Random Vectors

Chi-Square Distribution

Gaussian Conditioning

Probabilistic Inequalities

Basic Inequalities

Concentration Inequalities

McDiarmid Inequality

Gaussian Concentration Inequality

Symmetrization and Contraction Lemmas

Symmetrization Lemma

Contraction Principle

Birgé’s Inequality

Linear Algebra

Singular Value Decomposition (SVD)

Moore–Penrose Pseudo-Inverse

Matrix Norms

Matrix Analysis

Subdifferentials of Convex Functions

Subdifferentials and Subgradients

Examples of Subdifferentials

Reproducing Kernel Hilbert Spaces

Notations

Bibliography

Index

About the Author

Christophe Giraud was a student of the École Normale Supérieure de Paris, and he received a Ph.D in probability theory from the University Paris 6. He was assistant professor at the University of Nice from 2002 to 2008. He has been associate professor at the École Polytechnique since 2008 and professor at Paris Sud University (Orsay) since 2012. His current research focuses mainly on the statistical theory of high-dimensional data analysis and its applications to life sciences.

About the Series

Chapman & Hall/CRC Monographs on Statistics and Applied Probability

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
BUS061000
BUSINESS & ECONOMICS / Statistics
COM037000
COMPUTERS / Machine Theory
MAT029000
MATHEMATICS / Probability & Statistics / General