Model-Based Clustering, Classification, and Density Estimation Using mclust in R  book cover
1st Edition

Model-Based Clustering, Classification, and Density Estimation Using mclust in R




  • Available for pre-order on May 18, 2023. Item will ship after June 8, 2023
ISBN 9781032234953
June 8, 2023 Forthcoming by Chapman & Hall
288 Pages 72 Color & 28 B/W Illustrations

FREE Standard Shipping
USD $69.95

Prices & shipping based on shipping country


Preview

Book Description

Model-based clustering and classification methods provide a systematic statistical approach to clustering, classification, and density estimation via mixture modeling. The model-based framework allows the problems of choosing or developing methods to be understood within the context of statistical modeling. The mclust package for the statistical environment R is a widely-adopted platform implementing these model-based strategies. The package includes both summary and visual functionality, complementing procedures for estimating and choosing models.

Key features of the book:

  • An introduction to the model-based approach and the mclust R package
  • A detailed description of mclust and the underlying modeling strategies
  • An extensive set of examples, color plots and figures along with the R code for reproducing them
  • Supported by a companion website, including the R code to reproduce the examples and figures presented in the book, errata, and other supplementary material

The book is accessible to quantitatively trained students and researchers with a basic understanding of statistical methods, including inference and computing. In addition to serving as a reference manual for mclust, the book will be particularly useful to those wishing to employ these model-based techniques in research or applications in statistics, data science, clinical research, social science, and many other disciplines.

Table of Contents

List of Figures
List of Tables
List of Examples
Preface

1. Introduction
 Model-based Clustering and Finite Mixture Modeling     
 mclust                                
 Overview                              
 Organization of the Book                     

2. Finite Mixture Models
 Finite Mixture Models                       
 Maximum Likelihood Estimation and the EM Algorithm
 Issues in Maximum Likelihood Estimation        
 Gaussian Mixture Models                     
 Parsimonious Covariance Decomposition         
 EM Algorithm for Gaussian Mixtures          
 Initialization of EM Algorithm              
 Maximum A Posteriori (MAP) Classification      
 Model Selection                           
 Information Criteria                    
 Likelihood Ratio Testing                  
 Resampling-Based Inference                    

3. Model-based Clustering
 Gaussian Mixture Models for Cluster Analysis          
 Clustering in mclust                        
 Model Selection                           
 BIC                             
 ICL                             
 Bootstrap Likelihood Ratio Testing            
 Resampling-Based Inference in mclust              
 Clustering Univariate Data                    
 Model-Based Agglomerative Hierarchical Clustering      
 Agglomerative Clustering for Large Datasets      
 Initialization in mclust                       
 EM Algorithm in mclust                      
 Further Considerations                       

4. Mixture-based Classification
 Classification as Supervised Learning               
 Gaussian Mixture Models for Classification           
 Prediction                          
 Estimation                         
 Classification in mclust                       
 Evaluating Classifier Performance                 
 Evaluating Predicted Classes: Classification Error    
 Evaluating Class Probabilities: Brier Score       
 Estimating Classifier Performance: Test Set and Resampling-Based Validation               
 Cross-validation in mclust                 
 Classification with Unequal Costs of Misclassification      
 Classification with Unbalanced Classes              
 Classification of Univariate Data                 
 Semi-supervised Classification                   

5. Model-based Density Estimation
 Density Estimation                         
 Finite Mixture Modeling for Density Estimation with mclust 
 Univariate Density Estimation                  
 Diagnostics for Univariate Density Estimation      
 Density Estimation in Higher Dimensions            
 Density Estimation for Bounded Data              
 Highest Density Regions                      

6. Visualizing Gaussian Mixture Models is the new name for chapter
 Displays for Univariate Data                   
 Displays for Bivariate Data                    
 Displays for Higher Dimensional Data              
 Coordinate Projections                   
 Random Projections                    
 Discriminant Coordinate Projections           
 Visualizing Model-Based Clustering and Classification on Projection Subspaces
 Projection Subspaces for Visualizing Cluster Separation
 Incorporating Variation in Covariances          
 Projection Subspaces for Classification          
 Relationship to Other Methods              
 Using ggplot with mclust                     
 Using Color-Blind-Friendly Palettes               

 7. Miscellanea
 Accounting for Noise and Outliers                
 Using a Prior for Regularization                 
 Adding a Prior in mclust                 
 Non-Gaussian Clusters from GMMs               
 Combining Gaussian Mixture Components for Clustering
 Identifying Connected Components in GMMs      
 Simulation from Mixture Densities                
 Large Datasets                           
 High-Dimensional Data                      
 Missing Data                            

 

...
View More

Author(s)

Biography

Luca Scrucca
Associate Professor of Statistics at Università degli Studi di Perugia, his research interests include: mixture models, model-based clustering and classification, statistical learning, dimension reduction methods, genetic and evolutionary algorithms. He is currently Associate Editor for the Journal of Statistical Software and Statistics and Computing. He has developed and he is the maintainer of several high profile R packages available on The Comprehensive R Archive Network (CRAN).

Chris Fraley
Most recently a lead research staff member at Tableau, she previously held research positions in Statistics at the University of Washington and at Insightful from its early days as Statistical Sciences. She has contributed to computational methods in a number of areas of applied statistics, and is the principal author of several widely-used R packages. She was the originator (at Statistical Sciences) of numerical functions such as nlminb that have long been available in the R core stats package.

T. Brendan Murphy
Professor of Statistics at University College Dublin, his research interests include: model-based clustering, classification, network modeling and latent variable modeling. He is interested in applications in social science, political science, medicine, food science and biology. He served as Associate Editor for the journal Statistics and Computing, he is currently Editor for the Annals of Applied Statistics and Associate Editor for Statistical Analysis and Data Mining.

Adrian Raftery
Boeing International Professor of Statistics and Sociology, and Adjunct Professor of Atmospheric Sciences at the University of Washington, Seattle. He is also a faculty affiliate of the Center for Statistics and the Social Sciences and the Center for Studies in Demography and Ecology at University of Washington. He was one of the founding researchers in model-based clustering, having published in the area since 1984. His research interests include: model-based clustering, Bayesian statistics, social network analysis and statistical demography. He is interested in applications in social, environmental, biological and health sciences. He is a member of the U.S. National Academy of Sciences and was identified by Thomson-Reuter as the most cited researcher in mathematics in the world for the decade 1995–-2005. He served as Editor of the Journal of the American Statistical Association (JASA).

Reviews

"The book gives an excellent introduction to using the R package mclust for mixture modeling with (multivariate) Gaussian distributions as well as covering the supervised and semi-supervised aspects. A thorough introduction to the theoretic concepts is given, the software implementation described in detail and the application shown on many examples. I particularly enjoyed the in-depth discussion of different visualization methods."
~ Bettina Grün, WU (Vienna University of Economics and Business), Austria

"Cluster analysis, and its sister subjects of density estimation and mixture-model classification, used to be underserved topics in statistical texts. This magisterial book corrects that imbalance and does so comprehensively."
~ David Banks (Duke University)

"mclust is probably the R-package I use most. This book provides a clear, comprehensive, well-illustrated hands-on introduction to its many features. I particularly like the emphasis on various visualization methods and uncertainty quantification."
~Christian Martin Hennig (University of Bologna) 

"The mclust R package has become synonymous with model-based clustering, classification and density estimation, and this book provides an excellent resource for the now millions of users, and future users, of mclust. The book elegantly balances the statistical detail required to have a broad understanding of the methods available in mclust, alongside practical applications of these methods through detailed code and real data examples. The book provides an excellent scaffold to support an mclust user through the concepts and application of model-based clustering, classification and density estimation. The chapter on visualisation in the context of model-based clustering and classification is a unique contribution, collating important topics that to date have received scant attention in this area. This book is essential reading for any practitioner of model-based clustering, classification or density estimation."
~Claire Gormley (University College Dublin)