1st Edition

Model-Based Clustering, Classification, and Density Estimation Using mclust in R

    268 Pages 72 Color & 28 B/W Illustrations
    by Chapman & Hall

    268 Pages 72 Color & 28 B/W Illustrations
    by Chapman & Hall

    268 Pages 72 Color & 28 B/W Illustrations
    by Chapman & Hall

    Model-based clustering and classification methods provide a systematic statistical approach to clustering, classification, and density estimation via mixture modeling. The model-based framework allows the problems of choosing or developing an appropriate clustering or classification method to be understood within the context of statistical modeling. The mclust package for the statistical environment R is a widely adopted platform implementing these model-based strategies. The package includes both summary and visual functionality, complementing procedures for estimating and choosing models.

    Key features of the book:

    • An introduction to the model-based approach and the mclust R package
    • A detailed description of mclust and the underlying modeling strategies
    • An extensive set of examples, color plots, and figures along with the R code for reproducing them
    • Supported by a companion website, including the R code to reproduce the examples and figures presented in the book, errata, and other supplementary material

    Model-Based Clustering, Classification, and Density Estimation Using mclust in R is accessible to quantitatively trained students and researchers with a basic understanding of statistical methods, including inference and computing. In addition to serving as a reference manual for mclust, the book will be particularly useful to those wishing to employ these model-based techniques in research or applications in statistics, data science, clinical research, social science, and many other disciplines.

    1. Introduction. 2. Finite Mixture Models. 3. Model-based Clustering. 4. Mixture-based Classification. 5. Model-based Density Estimation. 6. Visualizing Gaussian Mixture Models is the new name for chapter. 7. Miscellanea.
                                

     

    Biography

    Luca Scrucca
    Associate Professor of Statistics at Università degli Studi di Perugia, his research interests include: mixture models, model-based clustering and classification, statistical learning, dimension reduction methods, genetic and evolutionary algorithms. He is currently Associate Editor for the Journal of Statistical Software and Statistics and Computing. He has developed and he is the maintainer of several high profile R packages available on The Comprehensive R Archive Network (CRAN).

    Chris Fraley
    Most recently a lead research staff member at Tableau, she previously held research positions in Statistics at the University of Washington and at Insightful from its early days as Statistical Sciences. She has contributed to computational methods in a number of areas of applied statistics, and is the principal author of several widely-used R packages. She was the originator (at Statistical Sciences) of numerical functions such as nlminb that have long been available in the R core stats package.

    T. Brendan Murphy
    Professor of Statistics at University College Dublin, his research interests include: model-based clustering, classification, network modeling and latent variable modeling. He is interested in applications in social science, political science, medicine, food science and biology. He served as Associate Editor for the journal Statistics and Computing, he is currently Editor for the Annals of Applied Statistics and Associate Editor for Statistical Analysis and Data Mining.

    Adrian Raftery
    Boeing International Professor of Statistics and Sociology, and Adjunct Professor of Atmospheric Sciences at the University of Washington, Seattle. He is also a faculty affiliate of the Center for Statistics and the Social Sciences and the Center for Studies in Demography and Ecology at University of Washington. He was one of the founding researchers in model-based clustering, having published in the area since 1984. His research interests include: model-based clustering, Bayesian statistics, social network analysis and statistical demography. He is interested in applications in social, environmental, biological and health sciences. He is a member of the U.S. National Academy of Sciences and was identified by Thomson-Reuter as the most cited researcher in mathematics in the world for the decade 1995–-2005. He served as Editor of the Journal of the American Statistical Association (JASA).

    "The book gives an excellent introduction to using the R package mclust for mixture modeling with (multivariate) Gaussian distributions as well as covering the supervised and semi-supervised aspects. A thorough introduction to the theoretic concepts is given, the software implementation described in detail and the application shown on many examples. I particularly enjoyed the in-depth discussion of different visualization methods."
    ~ Bettina Grün, WU (Vienna University of Economics and Business), Austria

    "Cluster analysis, and its sister subjects of density estimation and mixture-model classification, used to be underserved topics in statistical texts. This magisterial book corrects that imbalance and does so comprehensively."
    ~ David Banks (Duke University)

    "mclust is probably the R-package I use most. This book provides a clear, comprehensive, well-illustrated hands-on introduction to its many features. I particularly like the emphasis on various visualization methods and uncertainty quantification."
    ~Christian Martin Hennig (University of Bologna) 

    "The mclust R package has become synonymous with model-based clustering, classification and density estimation, and this book provides an excellent resource for the now millions of users, and future users, of mclust. The book elegantly balances the statistical detail required to have a broad understanding of the methods available in mclust, alongside practical applications of these methods through detailed code and real data examples. The book provides an excellent scaffold to support an mclust user through the concepts and application of model-based clustering, classification and density estimation. The chapter on visualisation in the context of model-based clustering and classification is a unique contribution, collating important topics that to date have received scant attention in this area. This book is essential reading for any practitioner of model-based clustering, classification or density estimation."
    ~Claire Gormley (University College Dublin)