Data-intensive science has the potential to transform scientific research and quickly translate scientific progress into complete solutions, policies, and economic success. But this collaborative science is still lacking the effective access and exchange of knowledge among scientists, researchers, and policy makers across a range of disciplines. Bringing together leaders from multiple scientific disciplines, Data-Intensive Science shows how a comprehensive integration of various techniques and technological advances can effectively harness the vast amount of data being generated and significantly accelerate scientific progress to address some of the world’s most challenging problems.
In the book, a diverse cross-section of application, computer, and data scientists explores the impact of data-intensive science on current research and describes emerging technologies that will enable future scientific breakthroughs. The book identifies best practices used to tackle challenges facing data-intensive science as well as gaps in these approaches. It also focuses on the integration of data-intensive science into standard research practice, explaining how components in the data-intensive science environment need to work together to provide the necessary infrastructure for community-scale scientific collaborations.
Organizing the material based on a high-level, data-intensive science workflow, this book provides an understanding of the scientific problems that would benefit from collaborative research, the current capabilities of data-intensive science, and the solutions to enable the next round of scientific advancements.
Table of Contents
What Is Data-Intensive Science?, Terence Critchlow and Kerstin Kleese van Dam
Where Does All the Data Come From?, Geoffrey Fox, Tony Hey, and Anne Trefethen
Data-Intensive Grand Challenge Science Problems
Large-Scale Microscopy Imaging Analytics for In Silico Biomedicine, Joel Saltz, Fusheng Wang, George Teodoro, Lee Cooper, Patrick Widener, Jun Kong, David Gutman, Tony Pan, Sharath Cholleti, Ashish Sharma, Daniel Brat, and Tahsin Kurc
Answering Fundamental Questions about the Universe, Eric S. Myra and F. Douglas Swesty
Materials of the Future: From Business Suits to Space Suits, Mark F. Horstemeyer
Earth System Grid Federation: Infrastructure to Support Climate Science Analysis as an International Collaboration: A Data-Driven Activity for Extreme-Scale Climate Science, Dean N. Williams, Ian T. Foster, Bryan Lawrence, and Michael Lautenschlager
Data-Intensive Production Grids, Bob Jones and Ian Bird
EUDAT: Toward a Pan-European Collaborative Data Infrastructure, D. Lecarpentier, J. Reetz, and P. Wittenburg
From Challenges to Solutions
Infrastructure for Data-Intensive Science: A Bottom-Up Approach, Eli Dart and William Johnston
A Posteriori Ontology Engineering for Data-Driven Science, Damian D.G. Gessler, Cliff Joslyn, and Karin Verspoor
Transforming Data into the Appropriate Context, Bill Howe
Bridging the Gap between Scientific Data Producers and Consumers: A Provenance Approach, Eric G. Stephan, Paulo Pinheiro, and Kerstin Kleese van Dam
In Situ Exploratory Data Analysis for Scientific Discovery, Kanchana Padmanabhan, Sriram Lakshminarasimhan, Zhenhuan Gong, John Jenkins, Neil Shah, Eric Schendel, Isha Arkatkar, Rob Ross, Scott Klasky, and Nagiza F. Samatova
Interactive Data Exploration, Brian Summa, Attilay Gyulassy, Peer-Timo Bremer, and Valerio Pascucci
Linked Science: Interconnecting Scientific Assets, Tomi Kauppinen, Alkyoni Baglatzi, and Carsten Keßler
Summary and Conclusions, Terence Critchlow and Kerstin Kleese van Dam
Terence Critchlow is the chief scientist of the Scientific Data Management Group in the Computational Sciences and Mathematics Division of the Pacific Northwest National Laboratory (PNNL), where he leads projects on data analysis, data dissemination, and workflow system. A senior member of IEEE and ACM, Dr. Critchlow earned a PhD in computer science from the University of Utah. His research focuses on large-scale data management, metadata, data analysis, online analytical processing, data integration, data dissemination, and scientific workflows.
Kerstin Kleese van Dam is an associate division director and lead of the Scientific Data Management Group at PNNL. In 2006, she received the British Female Innovators and Inventors Silver Award for the effective management of scientific data. Her research focuses on data management and analysis in extreme-scale environments.
"This nicely integrated collection of contributions is an attempt to familiarize readers with this challenging aspect of science in the 21st century. The editors draw a picture of the future of scientific data production along the lines of the grand challenges identified by the National Academy of Engineering. ... This book is elegantly written, and intended for decision-makers. ... It achieves a good balance between technical and strategic thinking. This makes it a good choice for scientific decision-makers such as directors of institutes and universities, who are in fact in a position to shape the future networking structures for global data management."
--Hamid R. Noori, Computing Reviews