Exploratory Data Analysis with MATLAB: 3rd Edition (Hardback) book cover

Exploratory Data Analysis with MATLAB

3rd Edition

By Wendy L. Martinez, Angel R. Martinez, Jeffrey Solka

Chapman and Hall/CRC

590 pages | 16 Color Illus. | 84 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781498776066
pub: 2017-07-27
SAVE ~$25.00
$125.00
$100.00
x
eBook (VitalSource) : 9781315366968
pub: 2017-08-07
from $28.98


FREE Standard Shipping!

Description

Praise for the Second Edition:

"The authors present an intuitive and easy-to-read book. … accompanied by many examples, proposed exercises, good references, and comprehensive appendices that initiate the reader unfamiliar with MATLAB."

—Adolfo Alvarez Pinto, International Statistical Review

"Practitioners of EDA who use MATLAB will want a copy of this book. … The authors have done a great service by bringing together so many EDA routines, but their main accomplishment in this dynamic text is providing the understanding and tools to do EDA.

—David A Huckaby, MAA Reviews

Exploratory Data Analysis (EDA) is an important part of the data analysis process. The methods presented in this text are ones that should be in the toolkit of every data scientist. As computational sophistication has increased and data sets have grown in size and complexity, EDA has become an even more important process for visualizing and summarizing data before making assumptions to generate hypotheses and models.

Exploratory Data Analysis with MATLAB, Third Edition presents EDA methods from a computational perspective and uses numerous examples and applications to show how the methods are used in practice. The authors use MATLAB code, pseudo-code, and algorithm descriptions to illustrate the concepts. The MATLAB code for examples, data sets, and the EDA Toolbox are available for download on the book’s website.

New to the Third Edition

  • Random projections and estimating local intrinsic dimensionality
  • Deep learning autoencoders and stochastic neighbor embedding
  • Minimum spanning tree and additional cluster validity indices
  • Kernel density estimation
  • Plots for visualizing data distributions, such as beanplots and violin plots
  • A chapter on visualizing categorical data

Table of Contents

Part I

Introduction to Exploratory Data Analysis

What is Exploratory Data Analysis

Overview of the Text

A Few Words about Notation

Data Sets Used in the Book

Unstructured Text Documents

Gene Expression Data

Oronsay Data Set

Software Inspection

Transforming Data

Power Transformations

Standardization

Sphering the Data

Further Reading

Exercises

Part II

EDA as Pattern Discovery

Dimensionality Reduction — Linear Methods

Introduction

Principal Component Analysis — PCA

PCA Using the Sample Covariance Matrix

PCA Using the Sample Correlation Matrix

How Many Dimensions Should We Keep?

Singular Value Decomposition — SVD

Nonnegative Matrix Factorization

Factor Analysis

Fisher’s Linear Discriminant

Random Projections

Intrinsic Dimensionality

Nearest Neighbor Approach

Correlation Dimension

Maximum Likelihood Approach

Estimation Using Packing Numbers

Estimation of Local Dimension

Summary and Further Reading

Exercises

Dimensionality Reduction — Nonlinear Methods

Multidimensional Scaling — MDS

Metric MDS

Nonmetric MDS

Manifold Learning

Locally Linear Embedding

Isometric Feature Mapping — ISOMAP

Hessian Eigenmaps

Artificial Neural Network Approaches

Self-Organizing Maps

Generative Topographic Maps

Curvilinear Component Analysis

Autoencoders

Stochastic Neighbor Embedding

Summary and Further Reading

Exercises

Data Tours

Grand Tour

Torus Winding Method

Pseudo Grand Tour

Interpolation Tours

Projection Pursuit

Projection Pursuit Indexes

Posse Chi-Square Index

Moment Index

Independent Component Analysis

Summary and Further Reading

Exercises

Finding Clusters

Introduction

Hierarchical Methods

Optimization Methods — k-Means

Spectral Clustering

Document Clustering

Nonnegative Matrix Factorization — Revisited

Probabilistic Latent Semantic Analysis

Minimal Spanning Trees and Clustering

Definitions

Minimum Spanning Tree Clustering

Evaluating the Clusters

Rand Index

Cophenetic Correlation

Upper Tail Rule

Silhouette Plot

Gap Statistic

Cluster Validity Indices

Summary and Further Reading

Exercises

Model-Based Clustering

Overview of Model-Based Clustering

Finite Mixtures

Multivariate Finite Mixtures

Component Models — Constraining the Covariances

Expectation-Maximization Algorithm

Hierarchical Agglomerative Model-Based Clustering

Model-Based Clustering

MBC for Density Estimation and Discriminant Analysis

Introduction to Pattern Recognition

Bayes Decision Theory

Estimating Probability Densities with MBC

Generating Random Variables from a Mixture Model

Summary and Further Reading

Exercises

Smoothing Scatterplots

Introduction

Loess

Robust Loess

Residuals and Diagnostics with Loess

Residual Plots

Spread Smooth

Loess Envelopes — Upper and Lower Smooths

Smoothing Splines

Regression with Splines

Smoothing Splines

Smoothing Splines for Uniformly Spaced Data

Choosing the Smoothing Parameter

Bivariate Distribution Smooths

Pairs of Middle Smoothings

Polar Smoothing

Curve Fitting Toolbox

Summary and Further Reading

Exercises

Part III

Graphical Methods for EDA

Visualizing Clusters

Dendrogram

Treemaps

Rectangle Plots

ReClus Plots

Data Image

Summary and Further Reading

Exercises

Distribution Shapes

Histograms

Univariate Histograms

Bivariate Histograms

Kernel Density

Univariate Kernel Density Estimation

Multivariate Kernel Density Estimation

Boxplots

The Basic Boxplot

Variations of the Basic Boxplot

Violin Plots

Beeswarm Plot

Bean Plot

Quantile Plots

Probability Plots

Quantile-Quantile Plot

Quantile Plot

Bagplots

Rangefinder Boxplot

Summary and Further Reading

Exercises

Multivariate Visualization

Glyph Plots

Scatterplots

2-D and 3-D Scatterplots

Scatterplot Matrices

Scatterplots with Hexagonal Binning

Dynamic Graphics

Identification of Data

Linking

Brushing

Coplots

Dot Charts

Basic Dot Chart

Multiway Dot Chart

Plotting Points as Curves

Parallel Coordinate Plots

Andrews’ Curves

Andrews’ Images

More Plot Matrices

Data Tours Revisited

Grand Tour

Permutation Tour

Biplots

Summary and Further Reading

Exercises

Visualizing Categorical Data

Discrete Distributions

Binomial Distribution

Poisson Distribution

Exploring Distribution Shapes

Poissonness Plot

Binomialness Plot

Hanging Rootogram

Contingency Tables

Background

Bar Plots

Spine Plots

Mosaic Plots

Sieve Diagrams

Log Odds Plot

Summary and Further Reading

Exercises

Appendix A

Proximity Measures

Appendix B

Software Resources for EDA

Appendix C

Appendix D

MATLAB® Basics

About the Authors

Wendy L. Martinez is a mathematical statistician with the U.S. Bureau of Labor Statistics. She is a fellow of the American Statistical Association, a co-author of several popular Chapman & Hall/CRC books, and a MATLAB® user for more than 20 years. Her research interests include text data mining, probability density estimation, signal processing, scientific visualization, and statistical pattern recognition. She earned an M.S. in aerospace engineering from George Washington University and a Ph.D. in computational sciences and informatics from George Mason University.

Angel R. Martinez is fully retired after a long career with the U.S. federal government and as an adjunct professor at Strayer University, where he taught undergraduate and graduate courses in statistics and mathematics. Before retiring from government service, he worked for the U.S. Navy as an operations research analyst and a computer scientist. He earned an M.S. in systems engineering from the Virginia Polytechnic Institute and State University and a Ph.D. in computational sciences and informatics from George Mason University.

Since 1984, Jeffrey L. Solka has been working in statistical pattern recognition for the Department of the Navy. He has published over 120 journal, conference, and technical papers; has won numerous awards; and holds 4 patents. He earned an M.S. in mathematics from James Madison University, an M.S. in physics from Virginia Polytechnic Institute and State University, and a Ph.D. in computational sciences and informatics from George Mason University.

About the Series

Chapman & Hall/CRC Computer Science & Data Analysis

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
MAT029000
MATHEMATICS / Probability & Statistics / General