1st Edition

Understanding Complex Datasets Data Mining with Matrix Decompositions

By David Skillicorn Copyright 2008
    266 Pages 84 B/W Illustrations
    by Chapman & Hall

    Making obscure knowledge about matrix decompositions widely available, Understanding Complex Datasets: Data Mining with Matrix Decompositions discusses the most common matrix decompositions and shows how they can be used to analyze large datasets in a broad range of application areas. Without having to understand every mathematical detail, the book helps you determine which matrix is appropriate for your dataset and what the results mean.

    Explaining the effectiveness of matrices as data analysis tools, the book illustrates the ability of matrix decompositions to provide more powerful analyses and to produce cleaner data than more mainstream techniques. The author explores the deep connections between matrix decompositions and structures within graphs, relating the PageRank algorithm of Google's search engine to singular value decomposition. He also covers dimensionality reduction, collaborative filtering, clustering, and spectral analysis. With numerous figures and examples, the book shows how matrix decompositions can be used to find documents on the Internet, look for deeply buried mineral deposits without drilling, explore the structure of proteins, detect suspicious emails or cell phone calls, and more.

    Concentrating on data mining mechanics and applications, this resource helps you model large, complex datasets and investigate connections between standard data mining techniques and matrix decompositions.

    DATA MINING
    What Is Data Like?
    Data Mining Techniques
    Why Use Matrix Decompositions?

    MATRIX DECOMPOSITIONS
    Definition
    Interpreting Decompositions
    Applying Decompositions
    Algorithm Issues

    SINGULAR VALUE DECOMPOSITION (SVD)
    Definition
    Interpreting an SVD
    Applying SVD
    Algorithm Issues
    Applications of SVD
    Extensions

    GRAPH ANALYSIS
    Graphs versus Datasets
    Adjacency Matrix
    Eigenvalues and Eigenvectors
    Connections to SVD
    Google's PageRank
    Overview of the Embedding Process
    Datasets versus Graphs
    Eigendecompositions
    Clustering
    Edge Prediction
    Graph Substructures
    The ATHENS System for Novel Knowledge Discovery
    Bipartite Graphs

    SEMIDISCRETE DECOMPOSITION (SDD)
    Definition
    Interpreting an SDD
    Applying an SDD
    Algorithm Issues
    Extensions

    USING SVD AND SDD TOGETHER
    SVD Then SDD
    Applications of SVD and SDD Together

    INDEPENDENT COMPONENT ANALYSIS (ICA)
    Definition
    Interpreting an ICA
    Applying an ICA
    Algorithm Issues
    Applications of ICA

    NON-NEGATIVE MATRIX FACTORIZATION (NNMF)
    Definition
    Interpreting an NNMF
    Applying an NNMF
    Algorithm Issues
    Applications of NNMF

    TENSORS
    The Tucker3 Tensor Decomposition
    The CP Decomposition
    Applications of Tensor Decompositions
    Algorithmic Issues

    CONCLUSION
    APPENDIX: MATLAB SCRIPTS
    BIBLIOGRAPHY
    INDEX

    Biography

    David Skillicorn

    … One of this book’s attractive features is that every chapter contains a discussion relating to the algorithmic issues. One scenario is used as a running illustrative example throughout the book. Several other examples are discussed in different chapters. These examples should help the reader understand the advantages as well as the practical problems associated with any of the proposed matrix-based data mining techniques covered in the book. I recommend this book for anyone interested in using matrix methods for data mining.
    Technometrics, February 2009, Vol. 51, No. 1

    This could be a nice companion book for courses in data mining or applied linear algebra. Producing a clear taxonomy of the use and intentions of matrix decompositions in data analysis is very useful to both students and researchers. … Those working with large-scale complex datasets will definitely find this work useful. … I would definitely use it in my own course in data mining.
    —Michael W. Berry, University of Tennessee, Knoxville, USA

    [This book] is suffused with insightful suggestions for analytical methods and interpretations, drawn from the author's own research and his reading of the literature. …The book has two great strengths. The first is its attempt to provide a unifying framework from which to view a host of important analytical methodologies based on matrix methods. … Second, the book is extremely strong on interpreting the results of matrix methods. … [It] assembles and explains a diverse set of insights that are otherwise widely scattered in the literature. This alone makes the book an important contribution to the community.
    —Bruce Hendrickson, Sandia National Laboratories, Albuquerque, New Mexico, USA