Unstructured Mining Approaches to Solve Complex Scientific Problems
As the volume of scientific data and literature increases exponentially, scientists need more powerful tools and methods to process and synthesize information and to formulate new hypotheses that are most likely to be both true and important. Accelerating Discovery: Mining Unstructured Information for Hypothesis Generation describes a novel approach to scientific research that uses unstructured data analysis as a generative tool for new hypotheses.
The author develops a systematic process for leveraging heterogeneous structured and unstructured data sources, data mining, and computational architectures to make the discovery process faster and more effective. This process accelerates human creativity by allowing scientists and inventors to more readily analyze and comprehend the space of possibilities, compare alternatives, and discover entirely new approaches.
Encompassing systematic and practical perspectives, the book provides the necessary motivation and strategies as well as a heterogeneous set of comprehensive, illustrative examples. It reveals the importance of heterogeneous data analytics in aiding scientific discoveries and furthers data science as a discipline.
Table of Contents
Introduction. Why Accelerate Discovery? Form and Function. Exploring Content to Find Entities. Organization. Relationships. Inference. Taxonomies. Orthogonal Comparison. Visualizing the Data Plane. Networks. Examples and Problems. Problem: Discovery of Novel Properties of Known Entities. Problem: Finding New Treatments for Orphan Diseases from Existing Drugs. Example: Target Selection Based on Protein Network Analysis. Example: Gene Expression Analysis for Alternative Indications. Example: Side Effects. Example: Protein Viscosity Analysis Using Medline Abstracts. Example: Finding Microbes to Clean Up Oil Spills. Example: Drug Repurposing. Example: Adverse Events. Example: Discovering New P53 Kinases. Conclusion and Future Work.
Scott Spangler is a principal data scientist, distinguished engineer, and master inventor in the Watson Innovations Group at the IBM Almaden Research Center. He has been involved with knowledge base and data mining research for the past 25 years. His recent work has applied Watson technology to help accelerate cancer research. He holds 45 patents and is the author of over 30 publications. He received a BS in mathematics from MIT and an MS in computer science from the University of Texas.