1st Edition

Handbook of Big Data





ISBN 9780367330736
Published September 11, 2019 by Chapman and Hall/CRC
464 Pages - 41 Color & 56 B/W Illustrations

USD $79.95

Prices & shipping based on shipping country


Preview

Book Description

Handbook of Big Data provides a state-of-the-art overview of the analysis of large-scale datasets. Featuring contributions from well-known experts in statistics and computer science, this handbook presents a carefully curated collection of techniques from both industry and academia. Thus, the text instills a working understanding of key statistical and computing ideas that can be readily applied in research and practice.

Offering balanced coverage of methodology, theory, and applications, this handbook:

  • Describes modern, scalable approaches for analyzing increasingly large datasets
  • Defines the underlying concepts of the available analytical tools and techniques
  • Details intercommunity advances in computational statistics and machine learning

Handbook of Big Data also identifies areas in need of further development, encouraging greater communication and collaboration between researchers in big data sub-specialties such as genomics, computational biology, and finance.

Table of Contents

GENERAL PERSPECTIVES ON BIG DATA

The Advent of Data Science: Some Considerations on the Unreasonable Effectiveness of Data
Richard Starmans

Big n versus Big p in Big Data
Norman Matloff

DATA-CENTRIC, EXPLORATORY METHODS

Divide and Recombine: Approach for Detailed Analysis and Visualization of Large Complex Data
Ryan Hafen

Integrate Big Data for Better Operation, Control, and Protection of Power Systems
Guang Lin

Interactive Visual Analysis of Big Data
Carlos Scheidegger

A Visualization Tool for Mining Large Correlation Tables: The Association Navigator
Andreas Buja, Abba M. Krieger, and Edward I. George

EFFICIENT ALGORITHMS

High-Dimensional Computational Geometry
Alexandr Andoni

IRLBA: Fast Partial SVD Method
James Baglama

Structural Properties Underlying High-Quality Randomized Numerical Linear Algebra Algorithms
Michael W. Mahoney and Petros Drineas

Something for (Almost) Nothing: New Advances in Sublinear-Time Algorithms
Ronitt Rubinfeld and Eric Blais

GRAPH APPROACHES

Networks
Elizabeth L. Ogburn and Alexander Volfovsky

Mining Large Graphs
David F. Gleich and Michael W. Mahoney

MODEL FITTING AND REGULARIZATION

Estimator and Model Selection Using Cross-Validation
Iván Díaz

Stochastic Gradient Methods for Principled Estimation with Large Datasets
Panos Toulis and Edoardo M. Airoldi

Learning Structured Distributions
Ilias Diakonikolas

Penalized Estimation in Complex Models
Jacob Bien and Daniela Witten

High-Dimensional Regression and Inference
Lukas Meier

ENSEMBLE METHODS

Divide and Recombine Subsemble, Exploiting the Power of Cross-Validation
Stephanie Sapp and Erin LeDell

Scalable Super Learning
Erin LeDell

CAUSAL INFERENCE

Tutorial for Causal Inference
Laura Balzer, Maya Petersen, and Mark van der Laan

A Review of Some Recent Advances in Causal Inference
Marloes H. Maathuis and Preetam Nandy

TARGETED LEARNING

Targeted Learning for Variable Importance
Sherri Rose

Online Estimation of the Average Treatment Effect
Sam Lendle

Mining with Inference: Data-Adaptive Target Parameters
Alan Hubbard and Mark van der Laan

...
View More

Editor(s)

Biography

Peter Bühlmann is a professor of statistics at ETH Zürich, Switzerland, fellow of the Institute of Mathematical Statistics, elected member of the International Statistical Institute, and co-author of the book titled Statistics for High-Dimensional Data: Methods, Theory and Applications. He was named a Thomson Reuters’ 2014 Highly Cited Researcher in mathematics, served on various editorial boards and as editor of the Annals of Statistics, and delivered numerous presentations including a Medallion Lecture at the 2009 Joint Statistical Meetings, a read paper to the Royal Statistical Society in 2010, the 14th Bahadur Memorial Lectures at the University of Chicago, Illinois, USA, and other named lectures.

Petros Drineas is an associate professor in the Computer Science Department at Rensselaer Polytechnic Institute, Troy, New York, USA. He is the recipient of an Outstanding Early Research Award from Rensselaer Polytechnic Institute, an NSF CAREER award, and two fellowships from the European Molecular Biology Organization. He has served as a visiting professor at the US Sandia National Laboratories; visiting fellow at the Institute for Pure and Applied Mathematics, University of California, Los Angeles; long-term visitor at the Simons Institute for the Theory of Computing, University of California, Berkeley; program director in two divisions at the US National Science Foundation; and worked for industrial labs. He is a co-organizer of the series of workshops on Algorithms for Modern Massive Datasets and his research has been featured in numerous popular press articles.

Michael Kane is a member of the research faculty at Yale University, New Haven, Connecticut, USA. He is a winner of the American Statistical Association’s Chambers Statistical Software Award for The Bigmemory Project, a set of software libraries that allow the R programming environment to accommodate large datasets for statistical analysis. He is a grantee on the Defense Advanced Research Projects Agency’s XDATA project, part of the White House’s Big Data Initiative, and on the Gates Foundation’s Round 11 Grand Challenges Exploration. He has collaborated with companies including AT&T Labs Research, Paradigm4, Sybase, (a SAP company), and Oracle.

Mark van der Laan is the Jiann-Ping Hsu/Karl E. Peace professor of biostatistics and statistics at the University of California, Berkeley, USA. He is the inventor of targeted maximum likelihood estimation, a general semiparametric efficient estimation method that incorporates the state of the art in machine learning through the ensemble method super learning. He is the recipient of the 2005 COPPS Presidents’ and Snedecor Awards, the 2005-van Dantzig Award, and the 2004 Spiegelman Award. He is also the founding editor of the International Journal of Biostatistics and the Journal of Causal Inference, and the co-author of more than 250 publications and various books.

Reviews

"The book contains a nice mix of philosophical musings, survey articles and cutting-edge research. It was designed as ‘a useful resource for seasoned practitioners and enthusiastic neophytes alike’ . . . Enthusiastic neophytes are still left with plenty to get their teeth into. In summary, I am happy to recommend the book to those seeking to broaden their understanding of the underpinning methodologies for analysing Big Data." ~ Richard J. Samworth, University of Cambridge, UK

“. . . Handbook of Big Data is the first compilation on this emerging subject in our field and is therefore highly recommended to all statisticians and computer scientists."
~The International Biometric Society

"The book strikes a great balance between the breadth and depth of recent research-active topics. It is an excellent reference book to keep for both academic researchers and industrial practitioners. It is also a good reference book for whoever teaches in the area of big data analysis.
~Journal of the American Statistical Association