1st Edition

# Understanding Complex Datasets Data Mining with Matrix Decompositions

266 Pages 84 B/W Illustrations
by Chapman & Hall

260 Pages
by Chapman & Hall

Also available as eBook on:

Making obscure knowledge about matrix decompositions widely available, Understanding Complex Datasets: Data Mining with Matrix Decompositions discusses the most common matrix decompositions and shows how they can be used to analyze large datasets in a broad range of application areas. Without having to understand every mathematical detail, the book helps you determine which matrix is appropriate for your dataset and what the results mean.

Explaining the effectiveness of matrices as data analysis tools, the book illustrates the ability of matrix decompositions to provide more powerful analyses and to produce cleaner data than more mainstream techniques. The author explores the deep connections between matrix decompositions and structures within graphs, relating the PageRank algorithm of Google's search engine to singular value decomposition. He also covers dimensionality reduction, collaborative filtering, clustering, and spectral analysis. With numerous figures and examples, the book shows how matrix decompositions can be used to find documents on the Internet, look for deeply buried mineral deposits without drilling, explore the structure of proteins, detect suspicious emails or cell phone calls, and more.

Concentrating on data mining mechanics and applications, this resource helps you model large, complex datasets and investigate connections between standard data mining techniques and matrix decompositions.

DATA MINING
What Is Data Like?
Data Mining Techniques
Why Use Matrix Decompositions?

MATRIX DECOMPOSITIONS
Definition
Interpreting Decompositions
Applying Decompositions
Algorithm Issues

SINGULAR VALUE DECOMPOSITION (SVD)
Definition
Interpreting an SVD
Applying SVD
Algorithm Issues
Applications of SVD
Extensions

GRAPH ANALYSIS
Graphs versus Datasets
Eigenvalues and Eigenvectors
Connections to SVD
Overview of the Embedding Process
Datasets versus Graphs
Eigendecompositions
Clustering
Edge Prediction
Graph Substructures
The ATHENS System for Novel Knowledge Discovery
Bipartite Graphs

SEMIDISCRETE DECOMPOSITION (SDD)
Definition
Interpreting an SDD
Applying an SDD
Algorithm Issues
Extensions

USING SVD AND SDD TOGETHER
SVD Then SDD
Applications of SVD and SDD Together

INDEPENDENT COMPONENT ANALYSIS (ICA)
Definition
Interpreting an ICA
Applying an ICA
Algorithm Issues
Applications of ICA

NON-NEGATIVE MATRIX FACTORIZATION (NNMF)
Definition
Interpreting an NNMF
Applying an NNMF
Algorithm Issues
Applications of NNMF

TENSORS
The Tucker3 Tensor Decomposition
The CP Decomposition
Applications of Tensor Decompositions
Algorithmic Issues

CONCLUSION
APPENDIX: MATLAB SCRIPTS
BIBLIOGRAPHY
INDEX

### Biography

David Skillicorn

… One of this book’s attractive features is that every chapter contains a discussion relating to the algorithmic issues. One scenario is used as a running illustrative example throughout the book. Several other examples are discussed in different chapters. These examples should help the reader understand the advantages as well as the practical problems associated with any of the proposed matrix-based data mining techniques covered in the book. I recommend this book for anyone interested in using matrix methods for data mining.
Technometrics, February 2009, Vol. 51, No. 1

This could be a nice companion book for courses in data mining or applied linear algebra. Producing a clear taxonomy of the use and intentions of matrix decompositions in data analysis is very useful to both students and researchers. … Those working with large-scale complex datasets will definitely find this work useful. … I would definitely use it in my own course in data mining.
—Michael W. Berry, University of Tennessee, Knoxville, USA

[This book] is suffused with insightful suggestions for analytical methods and interpretations, drawn from the author's own research and his reading of the literature. …The book has two great strengths. The first is its attempt to provide a unifying framework from which to view a host of important analytical methodologies based on matrix methods. … Second, the book is extremely strong on interpreting the results of matrix methods. … [It] assembles and explains a diverse set of insights that are otherwise widely scattered in the literature. This alone makes the book an important contribution to the community.
—Bruce Hendrickson, Sandia National Laboratories, Albuquerque, New Mexico, USA