Multivariate Kernel Smoothing and Its Applications: 1st Edition (Hardback) book cover

Multivariate Kernel Smoothing and Its Applications

1st Edition

By José E. Chacón, Tarn Duong

Chapman and Hall/CRC

226 pages

Purchasing Options:$ = USD
Hardback: 9781498763011
pub: 2018-05-08
$99.95
x
eBook (VitalSource) : 9780429485572
pub: 2018-05-08
from $28.98


FREE Standard Shipping!

Description

Kernel smoothing has greatly evolved since its inception to become an essential methodology in the data science tool kit for the 21st century. Its widespread adoption is due to its fundamental role for multivariate exploratory data analysis, as well as the crucial role it plays in composite solutions to complex data challenges.

Multivariate Kernel Smoothing and Its Applications offers a comprehensive overview of both aspects. It begins with a thorough exposition of the approaches to achieve the two basic goals of estimating probability density functions and their derivatives. The focus then turns to the applications of these approaches to more complex data analysis goals, many with a geometric/topological flavour, such as level set estimation, clustering (unsupervised learning), principal curves, and feature significance. Other topics, while not direct applications of density (derivative) estimation but sharing many commonalities with the previous settings, include classification (supervised learning), nearest neighbour estimation, and deconvolution for data observed with error.

For a data scientist, each chapter contains illustrative Open data examples that are analysed by the most appropriate kernel smoothing method. The emphasis is always placed on an intuitive understanding of the data provided by the accompanying statistical visualisations. For a reader wishing to investigate further the details of their underlying statistical reasoning, a graduated exposition to a unified theoretical framework is provided. The algorithms for efficient software implementation are also discussed.

José E. Chacón is an associate professor at the Department of Mathematics of the Universidad de Extremadura in Spain.

Tarn Duong is a Senior Data Scientist for a start-up which provides short distance carpooling services in France.

Both authors have made important contributions to kernel smoothing research over the last couple of decades.

Reviews

"I am very impressed with this book. It addresses issues that are not discussed in any detail in any other book on density estimation. Furthermore, it is very well-written and contains a wealth of interesting examples. In fact, this is probably one of the best books I have seen on density estimation. Some topics in this book that are not covered in detail in any other book include: multivariate bandwidth matrices, details of the asymptotic MSE for general bandwidth matrices, derivative estimation, level sets, density clustering and significance testing for modal regions. This makes the book unique. The authors have written the book in such a way that it can be used by two different types of readers: data analysts who are not interested in the mathematical details, and students/researchers who do want the details. The `how to read this monograph' is very useful."

~Larry Wasserman, Carnegie Mellon University

Table of Contents

Preface

List of Figures

List of Tables

List of Algorithms

Introduction

Exploratory data analysis with density estimation

Exploratory data analysis with density derivatives estimation

Clustering/Unsupervised learning

Classification/Supervised learning

Suggestions on how to read this monograph

Density estimation

Histogram density estimation

Kernel density estimation

Probability contours as multivariate quantiles

Contour colour scales

Gains from unconstrained bandwidth matrices

Advice for practical bandwidth selection

Squared error analysis

Asymptotic squared error formulas

Optimal bandwidths

Convergence of density estimators

Further mathematical analysis of density estimators

Asymptotic expansion of the MISE

Asymptotically optimal bandwidth

Vector versus vector half parametrisations

Bandwidth selectors for density estimation

Normal scale bandwidths

Maximal smoothing bandwidths

Normal mixture bandwidths

Unbiased cross validation bandwidths

Biased cross validation bandwidths

Plug in bandwidths

Smoothed cross validation bandwidths

Empirical comparison of bandwidth selectors

Theoretical comparison of bandwidth selectors

Further mathematical analysis of bandwidth selectors

Relative convergence rates of bandwidth selectors

Optimal pilot bandwidth selectors

Convergence rates with data-based bandwidths

Modified density estimation

Variable bandwidth density estimators

Balloon density estimators

Sample point density estimators

Bandwidth selectors for variable kernel estimation

Transformation density estimators

Boundary kernel density estimators

Beta boundary kernels

Linear boundary kernels

Kernel choice

Higher order kernels

Further mathematical analysis of modified density estimators

Asymptotic error for sample point variable bandwidth

estimators

Asymptotic error for linear boundary estimators

Density derivative estimation

Kernel density derivative estimators

Density gradient estimators

Density Hessian estimators

General density derivative estimators

Gains from unconstrained bandwidth matrices

Advice for practical bandwidth selection

Empirical comparison of bandwidths of different derivative orders

Squared error analysis

Bandwidth selection for density derivative estimators

Normal scale bandwidths

Normal mixture bandwidths

Unbiased cross validation bandwidths

Plug in bandwidths

Smoothed cross validation bandwidths

Convergence rates of bandwidth selectors

Case study: the normal density

Exact MISE

Curvature matrix

Asymptotic MISE

Normal scale bandwidth

Asymptotic MSE for curvature estimation

Further mathematical analysis

Taylor expansions for vector-valued functions

Relationship between multivariate normal moments

Applications related to density and density derivative estimation

Level set estimation

Modal region and bump estimation

Density support estimation

Density-based clustering

Stable/unstable manifolds

Mean shift clustering

Choice of the normalising matrix in the mean shift

Density ridge estimation

Feature significance

Supplementary topics in data analysis

Density difference estimation and significance testing

Classification

Density estimation for data measured with error

Classical density deconvolution estimation

Weighted density deconvolution estimation

Manifold estimation

Nearest neighbour estimation

Further mathematical analysis

Squared error analysis for deconvolution kernel density estimators

Optimal selection of the number of nearest neighbours

Computational algorithms

R implementation

Approximate binned estimation

Approximate density estimation

Approximate density derivative and functional

estimation

Recursive normal density derivatives

Recursive normal functionals

Numerical optimisation over matrix spaces

About the Authors

José E. Chacón is an associate professor at the Department of Mathematics of the Universidad de Extremadura in Spain.

Tarn Duong is a Senior Data Scientist for a start-up which provides short distance carpooling services in France.

Both authors have made important contributions to kernel smoothing research over the last couple of decades.

About the Series

Chapman & Hall/CRC Monographs on Statistics and Applied Probability

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
MAT029000
MATHEMATICS / Probability & Statistics / General