Programming for Hybrid Multi/Manycore MPP Systems: 1st Edition (Hardback) book cover

Programming for Hybrid Multi/Manycore MPP Systems

1st Edition

By John Levesque, Aaron Vose

Chapman and Hall/CRC

305 pages | 74 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781439873717
pub: 2017-10-10
SAVE ~$19.59
eBook (VitalSource) : 9781315155944
pub: 2017-10-10
from $48.98

FREE Standard Shipping!


"Ask not what your compiler can do for you, ask what you can do for your compiler."

--John Levesque, Director of Cray’s Supercomputing Centers of Excellence

The next decade of computationally intense computing lies with more powerful multi/manycore nodes where processors share a large memory space. These nodes will be the building block for systems that range from a single node workstation up to systems approaching the exaflop regime. The node itself will consist of 10’s to 100’s of MIMD (multiple instruction, multiple data) processing units with SIMD (single instruction, multiple data) parallel instructions. Since a standard, affordable memory architecture will not be able to supply the bandwidth required by these cores, new memory organizations will be introduced. These new node architectures will represent a significant challenge to application developers.

Programming for Hybrid Multi/Manycore MPP Systems attempts to briefly describe the current state-of-the-art in programming these systems, and proposes an approach for developing a performance-portable application that can effectively utilize all of these systems from a single application. The book starts with a strategy for optimizing an application for multi/manycore architectures. It then looks at the three typical architectures, covering their advantages and disadvantages.

The next section of the book explores the other important component of the target—the compiler. The compiler will ultimately convert the input language to executable code on the target, and the book explores how to make the compiler do what we want. The book then talks about gathering runtime statistics from running the application on the important problem sets previously discussed.

How best to utilize available memory bandwidth and virtualization is covered next, along with hybridization of a program. The last part of the book includes several major applications, and examines future hardware advancements and how the application developer may prepare for those advancements.

Table of Contents



Chapter Overviews

Determining an Exaflop Strategy

Foreword By John Levesque


Looking At The Application

Degree Of Hybridization Required

Decomposition And I/O

Parallel And Vector Lengths

Productivity And Performance Portability


Target Hybrid Multi/Many Core System

Foreword By John Levesque

Understanding The Architecture

Cache Architectures

Memory Hierarchy

Knl Clustering Modes

Knl Mcdram Modes

Importance Of Vectorization

Alignment For Vectorization

How Compilers Optimize Programs

Foreword By John Levesque


Memory Allocation

Memory Alignment

Comment-Line Directive

Interprocedural Analysis

Compiler Switches

Fortran 2003 And Inefficiencies

Compiler Scalar Optimizations

Gathering Runtime Statistics for Optimizing

Foreword By John Levesque


What’s Important To Profile


Utilization of Available Memory Bandwidth

Foreword By John Levesque


Importance Of Cache Optimization

Variable Analysis In Multiple Loops

Optimizing For The Cache Hierarchy

Combining Multiple Loops



Foreword By John Levesque


Vectorization Inhibitors

Vectorization Rejection From Inefficiencies

Striding Versus Contiguous Accessing

Wrap-Around Scalar

Loops Saving Maxima And Minima

Multi-Nested Loop Structures

There’s Matmul And Then There’s Matmul

Decision Processes In Loops

Handling Function Calls Within Loops

Rank Expansion

Outer Loop Vectorization

Hybridization of an Application

Foreword By John Levesque


The Node’s Numa Architecture

First Touch In The Himeno Benchmark

Identifying Which Loops To Thread

Spmd Openmp

Porting Entire Applications

Foreword By John Levesque


Spec Openmp Benchmarks

Nasa Parallel Benchmark (Npb) - Bt

Refactoring Vh-1

Refactoring Leslie3d

Refactoring S3d – 2016 Production Version

Performance Portable – S3d On Titan

Future Hardware Advancements


Future X86 Cpus

Future Arm Cpus

Future Memory Technologies

Future Hardware Conclusions


Supercomputer Cache Architectures

The Translation Look-Aside Buffer

Command Line Options / Compiler Directives

Previously Used Optimizations

I/O Optimization


12-Step Process

About the Authors

John Levesque works in the Chief Technology Office at Cray Inc. where he is responsible for application performance on Cray’s HPC systems. He is also the director of Crays Supercomputing Center of Excellence for the Trinity System installed the end of 2016 at Los Alamos Scientific Laboratory. Prior to Trinity, he was director of the Center of Excellence at the Oak Ridge National Laboratory (ORNL). ORNL installed a 27 Petaflop Cray XK6 system, Titan which was the fastest computer in the world according to the Top500 list in 2012; and a 2.7 Petaflop Cray XT4, Jaguar which was number one in 2009. For the past 50 years, Mr. Levesque has optimized scientific application programs for successful HPC systems. He is an expert in application tuning and compiler analysis of scientific applications. He has written two previous books, on optimization for the Cray 1 in 1989 [20] and on optimization for multi-core MPP systems in 2010 [19].

Aaron Vose is an HPC software engineer who spent two years at Cray’s Supercomputing Center of Excellence at Oak Ridge National Laboratory. Aaron helped domain scientists at ORNL port and optimize scientific software to achieve maximum scalability and performance on world-class, highperformance computing resources, such as the Titan supercomputer. Aaron now works for Cray Inc. as a software engineer helping R&D to design nextgeneration computer systems. Prior to joining Cray, Aaron spent time at the National Institute for Computational Sciences (NICS) as well as the Joint Institute for Computational Sciences (JICS). There, he worked on scaling and porting bioinformatics software to the Kraken supercomputer. Aaron holds a Master’s degree in Computer Science from the University of Tennessee at Knoxville.

About the Series

Chapman & Hall/CRC Computational Science

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
COMPUTERS / Programming / Games
COMPUTERS / Programming Languages / General
MATHEMATICS / Arithmetic