1st Edition

Multicore Computing Algorithms, Architectures, and Applications

    452 Pages 231 B/W Illustrations
    by Chapman & Hall

    Every area of science and engineering today has to process voluminous data sets. Using exact, or even approximate, algorithms to solve intractable problems in critical areas, such as computational biology, takes time that is exponential in some of the underlying parameters. Parallel computing addresses this issue and has become affordable with the advent of multicore architectures. However, programming multicore machines is much more difficult due to oddities existing in the architectures.

    Offering insights into different facets of this area, Multicore Computing: Algorithms, Architectures, and Applications focuses on the architectures, algorithms, and applications of multicore computing. It will help readers understand the intricacies of these architectures and prepare them to design efficient multicore algorithms.

    Contributors at the forefront of the field cover the memory hierarchy for multicore and manycore processors, the caching strategy Flexible Set Balancing, the main features of the latest SPARC architecture specification, the Cilk and Cilk++ programming languages, the numerical software library Parallel Linear Algebra Software for Multicore Architectures (PLASMA), and the exact multipattern string matching algorithm of Aho-Corasick. They also describe the architecture and programming model of the NVIDIA Tesla GPU, discuss scheduling directed acyclic graphs onto multi/manycore processors, and evaluate design trade-offs among Intel and AMD multicore processors, IBM Cell Broadband Engine, and NVIDIA GPUs. In addition, the book explains how to design algorithms for the Cell Broadband Engine and how to use the backprojection algorithm for generating images from synthetic aperture radar data.

    Memory Hierarchy for Multicore and Manycore Processors
    Mohamed Zahran and Bushra Ahsan
    Design Issues
    Physical Memory
    Cache Hierarchy Organization
    Cache Hierarchy Sharing
    Cache Hierarchy Optimization
    Cache Coherence
    Support for Memory Consistency Models
    Cache Hierarchy in Light of New Technologies
    Concluding Remarks

    FSB: A Flexible Set Balancing Strategy for Last Level Caches
    Mohammad Hammoud, Sangyeun Cho, and Rami Melhem
    Introduction
    Motivation and Background
    Flexible Set Balancing (FSB)
    Quantitative Evaluation
    Related Work
    Conclusions and Future Work

    The SPARC Processor Architecture
    Simone Secchi, Antonino Tumeo, and Oreste Villa
    Introduction
    The SPARC Instruction Set Architecture
    Memory Access
    Synchronization
    The NIAGARA Processor Architecture
    Core Micro-Architecture
    Core Interconnection
    Memory Subsystem
    Niagara Evolutions

    The Cilk and Cilk++ Programming Languages
    Hans Vandierendonck
    Abstract
    Introduction
    The Cilk Language
    Implementation
    Analyzing Parallelism in Cilk Programs
    Hyperobjects
    Conclusion

    Multithreading in the PLASMA Library
    Jakub Kurzak, Piotr Luszczek, Asim YarKhan, Mathieu Faverge, Julien Langou, Henricus Bouwmeester, and Jack Dongarra
    Introduction
    Multithreading in PLASMA
    Dynamic Scheduling with QUARK
    Parallel Composition
    Task Aggregation
    Nested Parallelism

    Efficient Aho-Corasick String Matching on Emerging Multicore Architectures
    Antonino Tumeo, Oreste Villa, Simone Secchi, and Daniel Chavarria-Miranda
    Introduction
    Related Work
    Preliminaries
    Algorithm Design
    Experimental Results
    Conclusions

    Sorting on a Graphics Processing Unit (GPU)
    Shibdas Bandyopadhyay and Sartaj Sahni
    Graphics Processing Units
    Sorting Numbers on GPUs
    Sorting Records on GPUs

    Scheduling DAG Structured Computations
    Yinglong Xia and Viktor K. Prasanna
    Introduction
    Background
    Related Work
    Lock-Free Collaborative Scheduling
    Hierarchical Scheduling with Dynamic Thread Grouping
    Conclusion

    Evaluating Multicore Processors and Accelerators for Dense Numerical Computations
    Seunghwa Kang, Nitin Arora, Aashay Shringarpure, Richard W. Vuduc, and David A. Bader
    Introduction
    Interarchitectural Design Trade-Offs
    Descriptions and Qualitative Analysis of Computational Statistics Kernels
    Baseline Architecture-Specific Implementations for the Computational Statistics Kernels
    Experimental Results for the Computational Statistics Kernels
    Descriptions and Qualitative Analysis of Direct N-Body Kernels
    Direct N-Body Implementations
    Experimental Results and Discussion for the Direct N-Body Implementations
    Conclusions

    Sorting on the Cell Broadband Engine
    Shibdas Bandyopadhyay, Dolly Sharma, Reda A. Ammar, Sanguthevar Rajasekaran, and Sartaj Sahni
    The Cell Broadband Engine
    High-level Strategies for Sorting
    SPU Vector and Memory Operations
    Sorting Numbers
    Sorting Records

    GPU Matrix Multiplication
    Junjie Li, Sanjay Ranka, and Sartaj Sahni
    Introduction
    GPU Architecture
    Programming Model
    Occupancy
    Single Core Matrix Multiply
    Multicore Matrix Multiply
    GPU Matrix Multiply
    A Comparison

    Backprojection Algorithms for Multicore and GPU Architectures
    William Chapman, Sanjay Ranka, Sartaj Sahni, Mark Schmalz, Linda Moore, Uttam Majumder, and Bracy Elton
    Summary of Backprojection
    Partitioning Backprojection for Implementation on a GPU
    Single Core Backprojection
    GPU Backprojection
    Conclusion
    Acknowledgments

    Index

    Biography

    Sanguthevar Rajasekaran is the UTC Chair Professor of Computer Science and Engineering and director of the Booth Engineering Center for Advanced Technologies at the University of Connecticut. He received a Ph.D. in computer science from Harvard University. He is a fellow of the IEEE and the AAAS and an elected member of the Connecticut Academy of Science and Engineering. His research interests include bioinformatics, parallel algorithms, data mining, randomized computing, computer simulations, and combinatorial optimization.

    Lance Fiondella is an assistant professor in the Department of Electrical and Computer Engineering at the University of Massachusetts Dartmouth. He received a Ph.D. in computer science and engineering from the University of Connecticut. His research interests include algorithms, reliability engineering, and risk analysis.

    Mohamed Ahmed is a program manager at Microsoft Windows Azure Mobile. He received a PhD in computer science and engineering from the University of Connecticut. His research interests include multi/many-cores technologies, high-performance computing, parallel programming, cloud computing, and GPU programming.

    Reda A. Ammar is a professor and the head of the Department of Computer Science and Engineering at the University of Connecticut. He received a PhD in computer science from the University of Connecticut. He is the president of the International Society of Computers and Their Applications and editor-in-chief of the International Journal on Computers and Their Applications. His primary research interests encompass distributed and high-performance computing and real-time systems.