Performance Tuning of Scientific Applications

Edited by David H. Bailey, Robert F. Lucas, Samuel Williams

© 2010 – CRC Press

400 pages | 114 B/W Illus.

Purchasing Options:
Hardback: 9781439815694
pub: 2010-11-23
US Dollars$113.95

About the Book

With contributions from some of the most notable experts in the field, Performance Tuning of Scientific Applications presents current research in performance analysis. The book focuses on the following areas.

Performance monitoring: Describes the state of the art in hardware and software tools that are commonly used for monitoring and measuring performance and managing large quantities of data

Performance analysis: Discusses modern approaches to computer performance benchmarking and presents results that offer valuable insight into these studies

Performance modeling: Explains how researchers deduce accurate performance models from raw performance data or from other high-level characteristics of a scientific computation

Automatic performance tuning: Explores ongoing research into automatic and semi-automatic techniques for optimizing computer programs to achieve superior performance on any computer platform

Application tuning: Provides examples that show how the appropriate analysis of performance and some deft changes have resulted in extremely high performance

Performance analysis has grown into a full-fledged, sophisticated field of empirical science. Describing useful research in modern performance science and engineering, this book helps real-world users of parallel computer systems to better understand both the performance vagaries arising in scientific applications and the practical means for improving performance.

Read about the book on HPCwire and insideHPC

Table of Contents

Introduction, David H. Bailey


"Twelve Ways to Fool the Masses"

Examples from Other Scientific Fields

Guidelines for Reporting High Performance

Modern Performance Science

Parallel Computer Architecture, Samuel W. Williams and David H. Bailey


Parallel Architectures

Processor (Core) Architecture

Memory Architecture

Network Architecture

Heterogeneous Architectures

Software Interfaces to Hardware Counters, Shirley V. Moore, Daniel K. Terpstra, and Vincent M. Weaver


Processor Counters

Off-Core and Shared Counter Resources

Platform Examples

Operating System Interfaces

PAPI in Detail

Counter Usage Modes

Uses of Hardware Counters

Caveats of Hardware Counters

Measurement and Analysis of Parallel Program Performance using TAU and HPCToolkit, Allen D. Malony, John Mellor-Crummey, and Sameer S. Shende



Measurement Approaches

HPCToolkit Performance Tools

TAU Performance System

Trace-Based Tools, Jesus Labarta


Tracing and Its Motivation


Data Acquisition

Techniques to Identify Structure



The Future

Large-Scale Numerical Simulations on High-End Computational Platforms, Leonid Oliker, Jonathan Carter, Vincent Beckner, John Bell, Harvey Wasserman, Mark Adams, Stéphane Ethier, and Erik Schnetter


HPC Platforms and Evaluated Applications

GTC: Turbulent Transport in Magnetic Fusion

GTC Performance

OLYMPUS: Unstructured FEM in Solid Mechanics

Carpet: Higher-Order AMR in Relativistic Astrophysics

CASTRO: Compressible Astrophysics

MILC: Quantum Chromodynamics

Performance Modeling: The Convolution Approach, David H Bailey, Allan Snavely, and Laura Carrington


Applications of Performance Modeling

Basic Methodology

Performance Sensitivity Studies

Analytic Modeling for Memory Access Patterns Based on Apex-MAP, Erich Strohmaier, Hongzhang Shan, and Khaled Ibrahim


Memory Access Characterization

Apex-MAP Model to Characterize Memory Access Patterns

Using Apex-MAP to Assess Processor Performance

Apex-MAP Extension for Parallel Architectures

Apex-MAP as an Application Proxy

Limitations of Memory Access Modeling

The Roofline Model, Samuel W. Williams


The Roofline

Bandwidth Ceilings

In-Core Ceilings

Arithmetic Intensity Walls

Alternate Roofline Models

End-to-End Auto-Tuning with Active Harmony, Jeffrey K. Hollingsworth and Ananta Tiwari



Sources of Tunable Data


Auto-Tuning Experience with Active Harmony

Languages and Compilers for Auto-Tuning, Mary Hall and Jacqueline Chame

Language and Compiler Technology

Interaction between Programmers and Compiler


Code Transformation

Higher-Level Capabilities

Empirical Performance Tuning of Dense Linear Algebra Software, Jack Dongarra and Shirley Moore

Background and Motivation


Auto-Tuning for Multicore

Auto-Tuning for GPUs

Auto-Tuning Memory-Intensive Kernels for Multicore, Samuel W. Williams, Kaushik Datta, Leonid Oliker, Jonathan Carter, John Shalf, and Katherine Yelick


Experimental Setup

Computational Kernels

Optimizing Performance

Automatic Performance Tuning


Flexible Tools Supporting a Scalable First-Principles MD Code, Bronis R. de Supinski, Martin Schulz, and Erik W. Draeger


Qbox: A Scalable Approach to First-Principles Molecular Dynamics

Experimental Setup and Baselines

Optimizing Qbox: Step by Step

Customizing Tool Chains with PN MPI

The Community Climate System Model, Patrick H. Worley


CCSM Overview

Parallel Computing and the CCSM

Case Study: Optimizing Interprocess Communication Performance in the Spectral Transform Method

Performance Portability: Supporting Options and Delaying Decisions

Case Study: Engineering Performance Portability into the Community Atmosphere Model Case Study: Porting the Parallel Ocean Program to the Cray X1

Monitoring Performance Evolution

Performance at Scale

Tuning an Electronic Structure Code, David H. Bailey, Lin-Wang Wang, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier, and Byounghak Lee


LS3DF Algorithm Description

LS3DF Code Optimizations

Test Systems

Performance Results and Analysis

Science Results



About the Editors

David Bailey is a chief technologist in the High Performance Computational Research Department at the Lawrence Berkeley National Laboratory. Dr. Bailey has published several books and numerous research studies on computational and experimental mathematics. He has been a recipient of the ACM Gordon Bell Prize, the IEEE Sidney Fernbach Award, and the MAA Chauvenet Prize and Merten Hasse Prize.

Robert Lucas is the director of computational sciences in the Information Sciences Institute and a research associate professor in computer science in the Viterbi School of Engineering at the University of Southern California. Dr. Lucas has many years of experience working with high-end defense, national intelligence, and energy applications and simulations. His linear solvers are the computational kernels of electrical and mechanical CAD tools.

Samuel Williams is a researcher in the Future Technologies Group at the Lawrence Berkeley National Laboratory. Dr. Williams has authored or co-authored thirty technical papers, including several award-winning papers. His research interests include high-performance computing, auto-tuning, computer architecture, performance modeling, and VLSI.

About the Series

Chapman & Hall/CRC Computational Science

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
MATHEMATICS / Arithmetic
MATHEMATICS / Number Systems