As the amount of information recorded and stored electronically grows ever larger, it becomes increasingly useful, if not essential, to develop better and more efficient ways to summarize and extract information from these large, multivariate data sets. The field of classification does just that-investigates sets of "objects" to see if they can be summarized into a small number of classes comprising similar objects.
Researchers have made great strides in the field over the last twenty years, and classification is no longer perceived as being concerned solely with exploratory analyses. The second edition of Classification incorporates many of the new and powerful methodologies developed since its first edition. Like its predecessor, this edition describes both clustering and graphical methods of representing data, and offers advice on how to decide which methods of analysis best apply to a particular data set. It goes even further, however, by providing critical overviews of recent developments not widely known, including efficient clustering algorithms, cluster validation, consensus classifications, and the classification of symbolic data.
The author has taken an approach accessible to researchers in the wide variety of disciplines that can benefit from classification analysis and methods. He illustrates the methodologies by applying them to data sets-smaller sets given in the text, larger ones available through a Web site.
Large multivariate data sets can be difficult to comprehend-the sheer volume and complexity can prove overwhelming. Classification methods provide efficient, accurate ways to make them less unwieldy and extract more information. Classification, Second Edition offers the ideal vehicle for gaining the background and learning the methodologies-and begin putting these techniques to use.
Table of Contents
Introduction Classification, Assignment, and Dissection Aims of Classification Stages in a Numerical Classification Data Sets Measures of Similarity and Dissimilarity Introduction Selected Measures of Similarity and Dissimilarity Some Difficulties Construction of Relevant Measures Partitions Partitioning Criteria Iterative Relocation Algorithms Mathematical Programming Other Partitioning Algorithms How Many Clusters? Links with Statistical Models Hierarchical Classifications Definitions and Representations Algorithms Choice of Clustering Strategy Consensus Trees More General Tree Models Other Clustering Procedures Fuzzy Clustering Constrained Classification Overlapping Classification Conceptual Clustering Classification of Symbolic Data Partitions of Partitions Graphical Representations Introduction Principal Coordinates Analysis Non-Metric Multidimensional Scaling Interactive Graphics and Self-Organizing Maps Biplots Cluster Validation and Description Introduction Cluster Validation Cluster Description References Author Index Subject Index