The hybrid/heterogeneous nature of future microprocessors and large high-performance computing systems will result in a reliance on two major types of components: multicore/manycore central processing units and special purpose hardware/massively parallel accelerators. While these technologies have numerous benefits, they also pose substantial performance challenges for developers, including scalability, software tuning, and programming issues.
Researchers at the Forefront Reveal Results from Their Own State-of-the-Art Work
Edited by some of the top researchers in the field and with contributions from a variety of international experts, Scientific Computing with Multicore and Accelerators focuses on the architectural design and implementation of multicore and manycore processors and accelerators, including graphics processing units (GPUs) and the Sony Toshiba IBM (STI) Cell Broadband Engine (BE) currently used in the Sony PlayStation 3. The book explains how numerical libraries, such as LAPACK, help solve computational science problems; explores the emerging area of hardware-oriented numerics; and presents the design of a fast Fourier transform (FFT) and a parallel list ranking algorithm for the Cell BE. It covers stencil computations, auto-tuning, optimizations of a computational kernel, sequence alignment and homology, and pairwise computations. The book also evaluates the portability of drug design applications to the Cell BE and illustrates how to successfully exploit the computational capabilities of GPUs for scientific applications. It concludes with chapters on dataflow frameworks, the Charm++ programming model, scan algorithms, and a portable intracore communication framework.
Explores the New Computational Landscape of Hybrid Processors
By offering insight into the process of constructing and effectively using the technology, this volume provides a thorough and practical introduction to the area of hybrid computing. It discusses introductory concepts and simple examples of parallel computing, logical and performance debugging for parallel computing, and advanced topics and issues related to the use and building of many applications.
Table of Contents
Dense Linear Algebra
Implementing Matrix Multiplication on the Cell B.E, Wesley Alvaro, Jakub Kurzak, and Jack Dongarra
Implementing Matrix Factorizations on the Cell BE, Jakub Kurzak and Jack Dongarra
Dense Linear Algebra for Hybrid GPU-Based Systems, Stanimire Tomov and Jack Dongarra
BLAS for GPUs, Rajib Nath, Stanimire Tomov, and Jack Dongarra
Sparse Linear Algebra
Sparse Matrix-Vector Multiplication on Multicore and Accelerators, Samuel Williams, Nathan Bell, Jee Whan Choi, Michael Garland, Leonid Oliker, and Richard Vuduc
Hardware-Oriented Multigrid Finite Element Solvers on GPU-Accelerated Clusters, Stefan Turek, Dominik Göddeke, Sven H.M. Buijssen, and Hilmar Wobker
Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers, Dominik Göddeke and Robert Strzodka
Fast Fourier Transforms
Designing Fast Fourier Transform (FFT) for the IBM Cell BE, Virat Agarwal and David A. Bader
Implementing FFTs on Multicore Architectures, Alex Chunghen Chow, Gordon C. Fossum, and Daniel A. Brokenshire
Combinatorial Algorithm Design on the Cell/BE Processor, David A. Bader, Virat Agarwal, Kamesh Madduri, and Fabrizio Petrini
Auto-Tuning Stencil Computations on Multicore and Accelerators, Kaushik Datta, Samuel Williams, Vasily Volkov, Jonathan Carter, Leonid Oliker, John Shalf, and Katherine Yelick
Manycore Stencil Computations in Hyperthermia Applications, Matthias Christen, Olaf Schenk, Esra Neufeld, Maarten Paulides, and Helmar Burkhart
Enabling Bioinformatics Algorithms on the Cell/BE Processor, Vipin Sachdeva, Michael Kistler, and Tzy-Hwa Kathy Tzeng
Pairwise Computations on the Cell Processor, Abhinav Sarje, Jaroslaw Zola, and Srinivas Aluru
Drug Design on the Cell BE, Cecilia González-Álvarez, Harald Servat, Daniel Cabrera-Benítez, Xavier Aguilar, Carles Pons, Juan Fernández-Recio, and Daniel Jiménez-González
GPU Algorithms for Molecular Modeling, John E. Stone, David J. Hardy, Barry Isralewitz, and Klaus Schulten
Dataflow Frameworks for Emerging Heterogeneous Architectures and Their Application to Biomedicine, Umit V. Catalyurek, Renato Ferreira, Timothy D.R. Hartley, George Teodoro, and Rafael Sachetto
Accelerator Support in the Charm++ Parallel Programming Model, Laxmikant V. Kalé, David M. Kunzman, and Lukasz Wesolowski
Efficient Parallel Scan Algorithms for Manycore GPUs, Shubhabrata Sengupta, Mark Harris, Michael Garland, and John D. Owens
High Performance Topology-Aware Communication in Multicore Processors, Hari Subramoni, Fabrizio Petrini, Virat Agarwal, and Davide Pasetto
Jakub Kurzak is a research director in the Innovative Computing Laboratory in the Department of Electrical Engineering and Computer Science at the University of Tennessee. Dr. Kurzak is a program committee member for several international conferences and a reviewer for a number of top-ranking journals. His research focuses on utilizing multicore systems and accelerators for scientific computing.
David A. Bader is a professor in the School of Computational Science and Engineering, College of Computing, and executive director for High Performance Computing at the Georgia Institute of Technology. He is a lead scientist in the DARPA Ubiquitous High Performance Computing (UHPC) program, an associate editor for several high-impact journals, and editor of the book Petascale Computing: Algorithms and Applications (CRC Press, 2008). An IEEE Fellow and member of the ACM, Dr. Bader has been an NSF CAREER Award recipient and has received awards from IBM, NVIDIA, Intel, Sun Microsystems, and Microsoft Research. His main areas of research are in parallel algorithms, combinatorial optimization, and computational biology and genomics.
Jack Dongarra is a University Distinguished Professor of Electrical Engineering and Computer Science at the University of Tennessee, where he is the director of the Innovative Computing Laboratory and the director of the Center for Information Technology Research. He also is a member of the Distinguished Research Staff in the Computer Science and Mathematics Division at Oak Ridge National Laboratory, a Turing Fellow at the University of Manchester, and an adjunct professor in the Department of Computer Science at Rice University. A Fellow of the AAAS, ACM, IEEE, and SIAM, Dr. Dongarra has received numerous awards, including the first SIAM Special Interest Group on Supercomputing award for Career Achievement, the first IEEE Medal of Excellence in Scalable Computing, and the IEEE Sidney Fernbach Award. His research encompasses numerical algorithms in linear algebra, parallel computing, the use of advanced computer architectures, programming methodology, and tools for parallel computers.