Understanding Complex Datasets: Data Mining with Matrix Decompositions, 1st Edition (Hardback) book cover

Understanding Complex Datasets

Data Mining with Matrix Decompositions, 1st Edition

By David Skillicorn

Chapman and Hall/CRC

258 pages | 18 Color Illus. | 84 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781584888321
pub: 2007-05-17
$96.95
x
eBook (VitalSource) : 9780429140860
pub: 2007-05-17
from $28.98


FREE Standard Shipping!

Description

Making obscure knowledge about matrix decompositions widely available, Understanding Complex Datasets: Data Mining with Matrix Decompositions discusses the most common matrix decompositions and shows how they can be used to analyze large datasets in a broad range of application areas. Without having to understand every mathematical detail, the book helps you determine which matrix is appropriate for your dataset and what the results mean.

Explaining the effectiveness of matrices as data analysis tools, the book illustrates the ability of matrix decompositions to provide more powerful analyses and to produce cleaner data than more mainstream techniques. The author explores the deep connections between matrix decompositions and structures within graphs, relating the PageRank algorithm of Google's search engine to singular value decomposition. He also covers dimensionality reduction, collaborative filtering, clustering, and spectral analysis. With numerous figures and examples, the book shows how matrix decompositions can be used to find documents on the Internet, look for deeply buried mineral deposits without drilling, explore the structure of proteins, detect suspicious emails or cell phone calls, and more.

Concentrating on data mining mechanics and applications, this resource helps you model large, complex datasets and investigate connections between standard data mining techniques and matrix decompositions.

Reviews

… One of this book’s attractive features is that every chapter contains a discussion relating to the algorithmic issues. One scenario is used as a running illustrative example throughout the book. Several other examples are discussed in different chapters. These examples should help the reader understand the advantages as well as the practical problems associated with any of the proposed matrix-based data mining techniques covered in the book. I recommend this book for anyone interested in using matrix methods for data mining.

Technometrics, February 2009, Vol. 51, No. 1

This could be a nice companion book for courses in data mining or applied linear algebra. Producing a clear taxonomy of the use and intentions of matrix decompositions in data analysis is very useful to both students and researchers. … Those working with large-scale complex datasets will definitely find this work useful. … I would definitely use it in my own course in data mining.

—Michael W. Berry, University of Tennessee, Knoxville, USA

[This book] is suffused with insightful suggestions for analytical methods and interpretations, drawn from the author's own research and his reading of the literature. …The book has two great strengths. The first is its attempt to provide a unifying framework from which to view a host of important analytical methodologies based on matrix methods. … Second, the book is extremely strong on interpreting the results of matrix methods. … [It] assembles and explains a diverse set of insights that are otherwise widely scattered in the literature. This alone makes the book an important contribution to the community.

—Bruce Hendrickson, Sandia National Laboratories, Albuquerque, New Mexico, USA

Table of Contents

DATA MINING

What Is Data Like?

Data Mining Techniques

Why Use Matrix Decompositions?

MATRIX DECOMPOSITIONS

Definition

Interpreting Decompositions

Applying Decompositions

Algorithm Issues

SINGULAR VALUE DECOMPOSITION (SVD)

Definition

Interpreting an SVD

Applying SVD

Algorithm Issues

Applications of SVD

Extensions

GRAPH ANALYSIS

Graphs versus Datasets

Adjacency Matrix

Eigenvalues and Eigenvectors

Connections to SVD

Google's PageRank

Overview of the Embedding Process

Datasets versus Graphs

Eigendecompositions

Clustering

Edge Prediction

Graph Substructures

The ATHENS System for Novel Knowledge Discovery

Bipartite Graphs

SEMIDISCRETE DECOMPOSITION (SDD)

Definition

Interpreting an SDD

Applying an SDD

Algorithm Issues

Extensions

USING SVD AND SDD TOGETHER

SVD Then SDD

Applications of SVD and SDD Together

INDEPENDENT COMPONENT ANALYSIS (ICA)

Definition

Interpreting an ICA

Applying an ICA

Algorithm Issues

Applications of ICA

NON-NEGATIVE MATRIX FACTORIZATION (NNMF)

Definition

Interpreting an NNMF

Applying an NNMF

Algorithm Issues

Applications of NNMF

TENSORS

The Tucker3 Tensor Decomposition

The CP Decomposition

Applications of Tensor Decompositions

Algorithmic Issues

CONCLUSION

APPENDIX: MATLAB SCRIPTS

BIBLIOGRAPHY

INDEX

About the Series

Chapman & Hall/CRC Data Mining and Knowledge Discovery Series

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
BUS061000
BUSINESS & ECONOMICS / Statistics
COM021030
COMPUTERS / Database Management / Data Mining
MAT021000
MATHEMATICS / Number Systems