Handbook of Big Data: 1st Edition (Hardback) book cover

Handbook of Big Data

1st Edition

Edited by Peter Bühlmann, Petros Drineas, Michael Kane, Mark van der Laan

Chapman and Hall/CRC

464 pages | 41 Color Illus. | 56 B/W Illus.

Purchasing Options:$ = USD
Paperback: 9780367330736
pub: 2019-09-27
SAVE ~$15.99
Hardback: 9781482249071
pub: 2016-02-18
SAVE ~$39.00
eBook (VitalSource) : 9780429162985
pub: 2016-02-22
from $39.98

FREE Standard Shipping!


Handbook of Big Data provides a state-of-the-art overview of the analysis of large-scale datasets. Featuring contributions from well-known experts in statistics and computer science,this handbookpresents a carefully curated collection of techniques from both industry and academia. Thus, the text instills a working understanding of key statistical and computing ideas that can be readily applied in research and practice.

Offering balanced coverage of methodology, theory, and applications, this handbook:

  • Describes modern, scalable approaches for analyzing increasingly large datasets
  • Defines the underlying concepts of the available analytical tools and techniques
  • Details intercommunity advances in computational statistics and machine learning

Handbook of Big Data also identifies areas in need of further development, encouraging greater communication and collaboration between researchers in big data sub-specialties such as genomics, computational biology, and finance.


"The book contains a nice mix of philosophical musings, survey articles and cutting-edge research. It was designed as ‘a useful resource for seasoned practitioners and enthusiastic neophytes alike’ . . . Enthusiastic neophytes are still left with plenty to get their teeth into. In summary, I am happy to recommend the book to those seeking to broaden their understanding of the underpinning methodologies for analysing Big Data." ~ Richard J. Samworth, University of Cambridge, UK

“. . . Handbook of Big Data is the first compilation on this emerging subject in our field and is therefore highly recommended to all statisticians and computer scientists."

~The International Biometric Society

"The book strikes a great balance between the breadth and depth of recent research-active topics. It is an excellent reference book to keep for both academic researchers and industrial practitioners. It is also a good reference book for whoever teaches in the area of big data analysis.

~Journal of the American Statistical Association

Table of Contents


The Advent of Data Science: Some Considerations on the Unreasonable Effectiveness of Data

Richard Starmans

Big n versus Big p in Big Data

Norman Matloff


Divide and Recombine: Approach for Detailed Analysis and Visualization of Large Complex Data

Ryan Hafen

Integrate Big Data for Better Operation, Control, and Protection of Power Systems

Guang Lin

Interactive Visual Analysis of Big Data

Carlos Scheidegger

A Visualization Tool for Mining Large Correlation Tables: The Association Navigator

Andreas Buja, Abba M. Krieger, and Edward I. George


High-Dimensional Computational Geometry

Alexandr Andoni

IRLBA: Fast Partial SVD Method

James Baglama

Structural Properties Underlying High-Quality Randomized Numerical Linear Algebra Algorithms

Michael W. Mahoney and Petros Drineas

Something for (Almost) Nothing: New Advances in Sublinear-Time Algorithms

Ronitt Rubinfeld and Eric Blais



Elizabeth L. Ogburn and Alexander Volfovsky

Mining Large Graphs

David F. Gleich and Michael W. Mahoney


Estimator and Model Selection Using Cross-Validation

Iván Díaz

Stochastic Gradient Methods for Principled Estimation with Large Datasets

Panos Toulis and Edoardo M. Airoldi

Learning Structured Distributions

Ilias Diakonikolas

Penalized Estimation in Complex Models

Jacob Bien and Daniela Witten

High-Dimensional Regression and Inference

Lukas Meier


Divide and Recombine Subsemble, Exploiting the Power of Cross-Validation

Stephanie Sapp and Erin LeDell

Scalable Super Learning

Erin LeDell


Tutorial for Causal Inference

Laura Balzer, Maya Petersen, and Mark van der Laan

A Review of Some Recent Advances in Causal Inference

Marloes H. Maathuis and Preetam Nandy


Targeted Learning for Variable Importance

Sherri Rose

Online Estimation of the Average Treatment Effect

Sam Lendle

Mining with Inference: Data-Adaptive Target Parameters

Alan Hubbard and Mark van der Laan

About the Editors

Peter Bühlmann is a professor of statistics at ETH Zürich, Switzerland, fellow of the Institute of Mathematical Statistics, elected member of the International Statistical Institute, and co-author of the book titled Statistics for High-Dimensional Data: Methods, Theory and Applications. He was named a Thomson Reuters’ 2014 Highly Cited Researcher in mathematics, served on various editorial boards and as editor of the Annals of Statistics, and delivered numerous presentations including a Medallion Lecture at the 2009 Joint Statistical Meetings, a read paper to the Royal Statistical Society in 2010, the 14th Bahadur Memorial Lectures at the University of Chicago, Illinois, USA, and other named lectures.

Petros Drineas is an associate professor in the Computer Science Department at Rensselaer Polytechnic Institute, Troy, New York, USA. He is the recipient of an Outstanding Early Research Award from Rensselaer Polytechnic Institute, an NSF CAREER award, and two fellowships from the European Molecular Biology Organization. He has served as a visiting professor at the US Sandia National Laboratories; visiting fellow at the Institute for Pure and Applied Mathematics, University of California, Los Angeles; long-term visitor at the Simons Institute for the Theory of Computing, University of California, Berkeley; program director in two divisions at the US National Science Foundation; and worked for industrial labs. He is a co-organizer of the series of workshops on Algorithms for Modern Massive Datasets and his research has been featured in numerous popular press articles.

Michael Kane is a member of the research faculty at Yale University, New Haven, Connecticut, USA. He is a winner of the American Statistical Association’s Chambers Statistical Software Award for The Bigmemory Project, a set of software libraries that allow the R programming environment to accommodate large datasets for statistical analysis. He is a grantee on the Defense Advanced Research Projects Agency’s XDATA project, part of the White House’s Big Data Initiative, and on the Gates Foundation’s Round 11 Grand Challenges Exploration. He has collaborated with companies including AT&T Labs Research, Paradigm4, Sybase, (a SAP company), and Oracle.

Mark van der Laan is the Jiann-Ping Hsu/Karl E. Peace professor of biostatistics and statistics at the University of California, Berkeley, USA. He is the inventor of targeted maximum likelihood estimation, a general semiparametric efficient estimation method that incorporates the state of the art in machine learning through the ensemble method super learning. He is the recipient of the 2005 COPPS Presidents’ and Snedecor Awards, the 2005-van Dantzig Award, and the 2004 Spiegelman Award. He is also the founding editor of the International Journal of Biostatistics and the Journal of Causal Inference, and the co-author of more than 250 publications and various books.

About the Series

Chapman & Hall/CRC Handbooks of Modern Statistical Methods

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
COMPUTERS / Machine Theory
MATHEMATICS / Probability & Statistics / General