Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains.
The book focuses on three primary aspects of data clustering:
- Methods, describing key techniques commonly used for clustering, such as feature selection, agglomerative clustering, partitional clustering, density-based clustering, probabilistic clustering, grid-based clustering, spectral clustering, and nonnegative matrix factorization
- Domains, covering methods used for different domains of data, such as categorical data, text data, multimedia data, graph data, biological data, stream data, uncertain data, time series clustering, high-dimensional clustering, and big data
- Variations and Insights, discussing important variations of the clustering process, such as semisupervised clustering, interactive clustering, multiview clustering, cluster ensembles, and cluster validation
In this book, top researchers from around the world explore the characteristics of clustering problems in a variety of application areas. They also explain how to glean detailed insight from the clustering process—including how to verify the quality of the underlying clusters—through supervision, human intervention, or the automated generation of alternative clusters.
Table of Contents
An Introduction to Cluster Analysis Charu C. Aggarwal
Feature Selection for Clustering: A Review Salem Alelyani, Jiliang Tang, and Huan Liu
Probabilistic Models for Clustering Hongbo Deng and Jiawei Han
A Survey of Partitional and Hierarchical Clustering Algorithms Chandan K. Reddy and Bhanukiran Vinzamuri
Density-Based Clustering Martin Ester
Grid-Based Clustering Wei Cheng, Wei Wang, and Sandra Batista
Non-Negative Matrix Factorizations for Clustering: A Survey Tao Li and Chris Ding
Spectral Clustering Jialu Liu and Jiawei Han
Clustering High-Dimensional Data Arthur Zimek
A Survey of Stream Clustering Algorithms Charu C. Aggarwal
Big Data Clustering Hanghang Tong and U. Kang
Clustering Categorical Data Bill Andreopoulos
Document Clustering: The Next Frontier David C. Anastasiu, Andrea Tagarelli, and George Karypis
Clustering Multimedia Data Shen-Fu Tsai, Guo-Jun Qi, Shiyu Chang, Min-Hsuan Tsai, and Thomas S. Huang
Time Series Data Clustering Dimitrios Kotsakos, Goce Trajcevski, Dimitrios Gunopulos, and Charu C. Aggarwal
Clustering Biological Data Chandan K. Reddy, Mohammad Al Hasan, and Mohammed J. Zaki
Network Clustering Srinivasan Parthasarathy and S.M. Faisal
A Survey of Uncertain Data Clustering Algorithms Charu C. Aggarwal
Concepts of Visual and Interactive Clustering Alexander Hinneburg
Semi-Supervised Clustering Amrudin Agovic and Arindam Banerjee
Alternative Clustering Analysis: A Review James Bailey
Cluster Ensembles: Theory and Applications Joydeep Ghosh and Ayan Acharya
Clustering Validation Measures Hui Xiong and Zhongmou Li
Educational and Software Resources for Data Clustering Charu C. Aggarwal and Chandan K. Reddy
Charu C. Aggarwal is a Research Scientist at the IBM T. J. Watson Research Center in Yorktown Heights, New York. He completed his B.S. from IIT Kanpur in 1993 and his Ph.D. from Massachusetts Institute of Technology in 1996. His research interest during his Ph.D. years was in combinatorial optimization (network flow algorithms), and his thesis advisor was Professor James B. Orlin. He has since worked in the field of performance analysis, databases, and data mining. He has published over 200 papers in refereed conferences and journals, and has applied for or been granted over 80 patents. He is author or editor of nine books, including this one. Because of the commercial value of the above-mentioned patents, he has received several invention achievement awards and has thrice been designated a Master Inventor at IBM. He is a recipient of an IBM Corporate Award (2003) for his work on bio-terrorist threat detection in data streams, a recipient of the IBM Outstanding Innovation Award (2008) for his scientific contributions to privacy technology, and a recipient of an IBM Research Division Award (2008) for his scientific contributions to data stream research. He has served on the program committees of most major database/data mining conferences, and served as program vice-chairs of the SIAM Conference on Data Mining, 2007, the IEEE ICDM Conference, 2007, the WWW Conference 2009, and the IEEE ICDM Conference, 2009. He served as an associate editor of the IEEE Transactions on Knowledge and Data Engineering Journal from 2004 to 2008. He is an associate editor of the ACM TKDD Journal, an action editor of the Data Mining and Knowledge Discovery Journal, an associate editor of the ACM SIGKDD Explorations, and an associate editor of the Knowledge and Information Systems Journal. He is a fellow of the IEEE for "contributions to knowledge discovery and data mining techniques", and a life-member of the ACM.
Chandan K. Reddy is an Assistant Professor in the Department of Computer Science at Wayne State University. He received his PhD from Cornell University and MS from Michigan State University. His primary research interests are in the areas of data mining and machine learning with applications to healthcare, bioinformatics, and social network analysis. His research is funded by the National Science Foundation, Department of Transportation, and the Susan G. Komen for the Cure Foundation. He has published over 40 peer-reviewed articles in leading conferences and journals. He received the Best Application Paper Award at the ACM SIGKDD conference in 2010 and was a finalist of the INFORMS Franz Edelman Award Competition in 2011. He is a member of IEEE, ACM, and SIAM.