With a DVD of color figures, Clustering in Bioinformatics and Drug Discovery provides an expert guide on extracting the most pertinent information from pharmaceutical and biomedical data. It offers a concise overview of common and recent clustering methods used in bioinformatics and drug discovery.
Setting the stage for subsequent material, the first three chapters of the book introduce statistical learning theory, exploratory data analysis, clustering algorithms, different types of data, graph theory, and various clustering forms. In the following chapters on partitional, cluster sampling, and hierarchical algorithms, the book provides readers with enough detail to obtain a basic understanding of cluster analysis for bioinformatics and drug discovery. The remaining chapters cover more advanced methods, such as hybrid and parallel algorithms, as well as details related to specific types of data, including asymmetry, ambiguity, validation measures, and visualization.
This book explores the application of cluster analysis in the areas of bioinformatics and cheminformatics as they relate to drug discovery. Clarifying the use and misuse of clustering methods, it helps readers understand the relative merits of these methods and evaluate results so that useful hypotheses can be developed and tested.
Table of Contents
Bioinformatics and Drug Discovery
Statistical Learning Theory and Exploratory Data Analysis
Normalization and Scaling
Measures of Similarity
Dimensionality, Components, Discriminants
Cluster Sampling Algorithms
Self-Organizing Tree Algorithm
Divisive Hierarchical K-Means
Exclusion Region Hierarchies
Discrete Valued Data Types
Ties in Proximity
Measure Probability and Distributions
Algorithm Decision Ambiguity
Overlapping Clustering Algorithms Based on Ambiguity
Large Scale and Parallel Algorithms
Leader and Leader-Follower Algorithms
K-Means and Variants
A Glossary and Exercises appear at the end of each chapter.
John D. MacCuish is the founder and president of Mesa Analytics & Computing, Inc. He has co-authored several software patents and has worked on many image processing, data mining, and statistical modeling applications, including IRS fraud detection, credit card fraud detection, and automated reasoning systems for drug discovery.
Norah E. MacCuish is the chief science officer of Mesa Analytics & Computing, Inc., where she acts as a consultant in the areas of drug design and compound acquisition and as a developer of commercial chemical information software products. She earned her Ph.D. in theoretical physical chemistry from Cornell University.
John trained in computer science and has been involved with data mining and statistical analysis; Norah trained as a theoretical physical chemist and has mostly worked for pharmaceutical companies on drug discovery. They run a company that merges their fields, and it is that overlap that they describe here. They explain how cluster analysis, an exploratory data analysis tool, is used in bioinformatics and cheminformatics as they relate to drug discovery. The goal is for practitioners to be aware of the relative merits of clustering methods with the data they have at hand.
—SciTech Book News, February 2011
… In this volume, the authors present sufficient options so that the user can choose the appropriate method for their data. … Practitioners in the pharmaceutical industry need an expert guide, which the authors of this book provide, to extract the most information from their data. Those of us who learned their clustering from Anderberg, Sokal and Sneath, and Willett now have a valuable additional resource suitable for the 21st century.
—From the Foreword by John Bradshaw, Barley, Hertfordshire, UK