Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science.
The book’s introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications.
With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.
Table of Contents
Part I: Foundational Issues
Classification in Astronomy: Past and Present, Eric Feigelson
Searching the Heavens: Astronomy, Computation, Statistics, Data Mining, and Philosophy, Clark Glymour
Probability and Statistics in Astronomical Machine Learning and Data Mining, Jeffrey D. Scargle
Part II: Astronomical Applications
Automated Science Processing for the Fermi Large Area Telescope, James Chiang
CMB Data Analysis, Paniez Paykari and Jean-Luc Starck
Data Mining and Machine Learning in Time-Domain Discovery and Classification, Joshua S. Bloom and Joseph W. Richards
Cross-Identification of Sources: Theory and Practice, Tamás Budavári
The Sky Pixelization for CMB Mapping, O.V. Verkhodanov and A.G. Doroshkevich
Future Sky Surveys: New Discovery Frontiers, J. Anthony Tyson and Kirk D. Borne
Poisson Noise Removal in Spherical Multichannel Images: Application to Fermi Data, Jérémy Schmitt, Jean-Luc Starck, Jalal Fadili, and Seth Digel
Galaxy Zoo: Morphological Classification and Citizen Science, Lucy Fortson, Karen Masters, Robert Nichol, Kirk D. Borne, Edd Edmondson, Chris Lintoot, Jordan Raddick, Kevin Schawinski, and John Wallin
The Utilization of Classifications in High-Energy Astrophysics Experiments, Bill Atwood
Database-Driven Analyses of Astronomical Spectra, Jan Cami
Weak Gravitational Lensing, Sandrine Pires, Jean-Luc Starck, Adrienne Leonard, and Alexandre Réfrégier
Photometric Redshifts: 50 Years after 345, Tamás Budavári
Galaxy Clusters, Christopher J. Miller
Signal Processing (Time-Series) Analysis
Planet Detection: The Kepler Mission, Jon M. Jenkins, Jeffrey C. Smith, Peter Tenenbaum, Joseph D. Twicken, and Jeffrey Van Cleve
Classification of Variable Objects in Massive Sky Monitoring Surveys, Przemek Woźniak, Lukasz Wyrzykowski, and Vasily Belokurov
Gravitational Wave Astronomy, Lee Samuel Finn
The Largest Data Sets
Virtual Observatory and Distributed Data Mining, Kirk D. Borne
Multitree Algorithms for Large-Scale Astrostatistics, William B. March, Arkadas Ozakin, Dongryeol Lee, Ryan Riegel, and Alexander G. Gray
PART III: Machine Learning Methods
Time–Frequency Learning Machines for Nonstationarity Detection Using Surrogates, Pierre Borgnat, Patrick Flandrin, Cédric Richard, André Ferrari, Hassan Amoud, and Paul Honeine
Classification, Nikunj Oza
On the Shoulders of Gauss, Bessel, and Poisson: Links, Chunks, Spheres, and Conditional Models, William D. Heavlin
Data Clustering, Kiri L. Wagstaff
Ensemble Methods: A Review, Matteo Re and Giorgio Valentini
Parallel and Distributed Data Mining for Astronomy Applications, Kamalika Das and Kanishka Bhaduri
Pattern Recognition in Time Series, Jessica Lin, Sheri Williamson, Kirk D. Borne, and David De Barr
Randomized Algorithms for Matrices and Data, Michael W. Mahoney
Michael J. Way, PhD, is a research scientist at the NASA Goddard Institute for Space Studies in New York and the NASA Ames Research Center in California. He is also an adjunct professor in the Department of Physics and Astronomy at Hunter College. His research focuses on understanding the multiscale structure of our universe, modeling the atmospheres of exoplanets, and applying kernel methods to new areas in astronomy.
Jeffrey D. Scargle, PhD, is an astrophysicist in the Space Science and Astrobiology Division of the NASA Ames Research Center. His main interests encompass the variability of astronomical objects, including the Sun, sources in the Galaxy, and active galactic nuclei; cosmology; plasma astrophysics; planetary detection; and data analysis and statistical methods.
Kamal M. Ali, PhD, is a research scientist in machine learning and data mining. He has a consulting practice and is cofounder of the start-up Metric Avenue. He has carried out research at IBM Almaden, Stanford University, Vividence, Yahoo, and TiVo, where he worked on the Tivo Collaborative Filtering Engine. His current research focuses on combining machine learning in conditional random fields with linguistically rich features to make machines better at reading web pages.
Ashok N. Srivastava, PhD, is the principal scientist for Data Mining and Systems Health Management and leader of the Intelligent Data Understanding group at NASA Ames Research Center. His research includes the development of data mining algorithms for anomaly detection in massive data streams, kernel methods in machine learning, and text mining algorithms.
"The volume is a well-organised collection of articles presenting the importance of modern data mining and machine learning techniques in application to analysis of astronomical data. … A major strength of the volume is its very impressive collection of real examples that can be both inspirational and educational. … The book is particularly successful in showing how collaboration between computer scientists and statisticians on one side and astronomers on the other is needed to search for a scientific discovery in the abundance of data generated by instrumentation and simulations."
—Krzysztof Podgorski, International Statistical Review, 2014