Speculative Execution in High Performance Computer Architectures: 1st Edition (Hardback) book cover

Speculative Execution in High Performance Computer Architectures

1st Edition

Edited by David Kaeli, Pen-Chung Yew

Chapman and Hall/CRC

456 pages | 94 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781584884477
pub: 2005-05-26
SAVE ~$35.00
Currently out of stock
$175.00
$140.00
x
eBook (VitalSource) : 9780429147043
pub: 2005-05-26
from $28.98


FREE Standard Shipping!

Description

Until now, there were few textbooks that focused on the dynamic subject of speculative execution, a topic that is crucial to the development of high performance computer architectures. Speculative Execution in High Performance Computer Architectures describes many recent advances in speculative execution techniques. It covers cutting-edge research projects, as well as numerous commercial implementations that demonstrate the value of this latency-hiding technique.

The book begins with a review of control speculation techniques that use instruction cache prefetching, branch prediction and predication, and multi-path execution. It then examines dataflow speculation techniques including data cache prefetching, address value and data value speculation, pre-computation, and coherence speculation. This textbook also explores multithreaded approaches, emphasizing profile-guided speculation, speculative microarchitectures, and compiler techniques.

Table of Contents

INTRODUCTION

David R. Kaeli, Northeastern University and Pen C. Yew, University of Minnesota

INSTRUCTION CACHE PREFETCHING

Glenn Reinman, UCLA Computer Science Department

Direct Mapped Cache

Set Associative Cache

Pseudo Associative Cache

Way Prediction Cache

Next Line Prefetching

Target Prefetching

Stream Buffers

Nonblocking Instruction Caches and Out-Of-Order Fetch

Fetch Directed Instruction Prefetching

Integrated Prefetching

Wrong-Path Prefetching

Compiler Strategies

BRANCH PREDICTION

Philip G. Emma, IBM T.J. Watson Research Laboratory

The von Neumann Programming Model vs. ENIAC

Dataflow and Control Flow

The Branch Instruction

The IAS Machine: A Primitive Stored-Program Architecture

Virtuality

Branch Instruction Semantics

General Instruction-Set Architectures and Extensions

Memory Consistency and Observable Order

Branches and Performance

Pipelining

Pipeline Disruptions and Their Penalties

Superscalar Processing

Multithreading

Instruction Prefetching and Autonomy

The Delayed Branch Instruction

Branch Flow in a Pipeline: The "When" of Branch Prediction

Predicting Branches at Decode Time

Predicting Branches at Instruction-Prefetch Time

Static Branch Prediction

Dynamic Branch Prediction

Branch Prediction With Counters

Predicting by Profiling Branch Actions

Group Behaviors vs. Predicting Individual Branches

The Decode History Table (a.k.a. Branch History Table)

Discriminators

Using Multiple Discriminators: A Path-Based Approach

Implementation

A Timing Caveat

Hybrid Predictors

Instruction Prefetching

The Branch History Table (a.k.a. Branch Target Buffer)

Operation of the BTB

Fetch Width and Branch Mispredictions

The Subroutine Call and Return Structure

Predicting Return Addresses by Using a Stack

Recognizing Subroutine Calls and Returns

Taking Advantage of the BTB Structure

Eliminating the Stack

Working Sets and Contexts

The Size of a BTB Entry

The BTB and the Instruction Cache: Economies of Size

More Exotic Prediction for the More Difficult Branches

Branches and the Operand Space

Branches and the Operand-Address Space

Tandem Branch Prediction

Accuracy and the Updating of Tables

Predictor Bandwidth and Anomalous Behaviors

The Importance of Fast Prediction Mechanisms

Superscalar Processing and the Monolithic Prediction

of Branch Sequences

Predicting Branches in a Multithreaded Environment

Limitations

Simplicity

Complexity

Two Saving Graces

Implementing Real Branch Prediction Mechanisms

TRACE CACHES

Eric Rotenberg North Carolina State University

Traces

Core Fetch Unit Based on Instruction Cache

Trace Cache Operation

Path Associativity

Indexing Strategy

Partial Matching

Coupling Branch Prediction with the Trace Cache

Trace Selection Policy

Multi-Phase Trace Construction

Managing Overlap between Instruction Cache and Trace

Cache

Speculative vs. Non-Speculative Trace Cache Updates

Powerful vs. Weak Core Fetch Unit

Parallel vs. Serial Instruction Cache Access

L1 vs. L2 Instruction Cache

Loop Caches

BRANCH PREDICATION

David August, Princeton University

Overcoming Branch Problems with Predication

If-Conversion

Predicate Optimization and Analysis

The Predicated Intermediate Representation

Hewlett-Packard Laboratories PD

Cydrome Cydra 5

ARM

Texas Instruments C6X

Systems with Limited Predicated Execution Support

Predication in the Itanium 2 Processor

MULTIPATH EXECUTION

Augustus K. Uht, University of Rhode Island

Branch Tree Geometry

Branch Path/Instruction ID

Phases of Operation

Granularity

With Predication

With Data Speculation

Compiler-Assisted

Hardware: Classically-Based

Hardware: Non Classically-Based

Multiprocessors

Functional or Logic Language Machines

Branch Prediction

Confidence Estimation

Pipeline Depth

Implications of Amdahl's Law - ILP Version

Memory Bandwidth Requirements

DATA CACHE PREFETCHING

Yan Solihin, North Carolina State University, and Donald Yeung, University of Maryland at College Park

Architectural Support

Array Prefetching

Pointer Prefetching

Relationship with Data Locality Optimizations

Stride and Sequential Prefetching

Correlation Prefetching

Content-Based Prefetching

ADDRESS PREDICTION

Avi Mendelson, Intel Mobil Micro-Processor Architect

Terminology and Definitions

Non-Speculative Address Calculation Techniques

Speculative Address Calculation Techniques

Chapter Focus

Characterization of Address Predictability

Address Predictability vs. Value Predictability

Combining Address Prediction with Prefetching Mechanism

Basic Characterization

Load Promotion

Memory Bypassing

Compiler Based Speculative Load Promotion

DATA SPECULATION

Yiannakis Sazeides, University of Cyprus; Pedro Marcuello, Intel-UPC Barcelona Research Center; James E. Smith, University

of Wisconsin-Madison; and Antonio González, Universitat Polit`ecnica de Catalunya

Basic Value Predictors

Value Predictor Alternatives

Confidence Estimation

Implementation Issues

Data Dependence Predictors

Verification

Recovery

Other Microarchitectural Implications of Data Value

Speculation

Related Work: Data Value Speculation

Related Work: Data Dependence Speculation

INSTRUCTION PRECOMPUTATION: DYNAMICALLY REMOVING REDUNDANT COMPUTATIONS USING PROFILING

Joshua J. Yi, Freescale Semiconductor Inc.; Resit Sendag, University of Rhode Island; and David J. Lilja, University of Minnesota at Twin Cities

A Comparison of Instruction Precomputation and Value Reuse

Upper-Bound - Profile A, Run A

Different Input Sets - Profile B, Run A

Combination of Input Sets - Profile AB, Run A

Frequency versus Frequency and Latency Product

Performance of Instruction Precomputation versus Value Reuse

PROFILE-BASED SPECULATION

Youfeng Wu and Jesse Fang, Intel Microprocessor Technology Labs

Control Flow Profile

Memory Profile

Value Profile

Static Analysis

Instrumentation

Hardware Performance Monitoring

Special Hardware

Software-Hardware Collaborative Profiling

Compile-time Profiling

Runtime Profiling

Continuous Profiling

Trace Scheduling

Hot-Cold Optimizations

Code Layout

Data Layout

Stride Prefetching

Hot Data Stream Prefetching

Mississippi Delta Prefetching

Java Runtime Parallelizing Machine

Speculative Parallel Threading

Speculative Computation Reuse

Software-Based Speculative Precomputation

Stability across Multiple Workloads

Update When Program Changes

Maintenance during optimizations

Perturbation by Profiling Code

COMPILATION AND SPECULATION

Jin Lin, Wei-Chung Hsu and Pen-Chung Yew University of Minnesota, Minneapolis

Alias Profiling

Data Dependence Profiling

Speculative Alias and Dataflow Analyses

A Framework for Speculative Alias Analysis and Dataflow Analysis

Overview

F-Insertion Step

Rename Step

Downsafety Step

CodeMotion Step

Recovery Code Generation for General Speculative Optimizations

Check Instructions and Recovery Code Representation for Multi-Level

Speculation

Interaction of the Early Introduced Recovery Code with Later

Optimizations

MULTITHREADING AND SPECULATION

Pedro Marcuello, Jesus Sanchez and Antonio Gonzalez Intel-UPC Barcelona Research Center; Intel Labs; Universitat Politecnica de Catalunya; Barcelona (Spain)

Building Helper Threads

Microarchitectural Support for Helper Threads

Thread Spawning Schemes

Microarchitectural Support for Speculative Architectural Threads

References

Andreas Moshovos, University of Toronto

EXPLOITING LOAD/STORE PARALLELISM VIA MEMORY DEPENDENCE PREDICTION

Static Methods

Hybrid Static/Dynamic Methods

Dynamic Methods

Working Example

Multiple Dependences Per Static Load or Store

Methodology

Performance Potential of Load/Store Parallelism

Performance with Naive Memory Dependence Speculation

Using Address-Based Scheduling to Extract Load/Store Parallelism

Speculation/Synchronization

RESOURCE FLOW MICROARCHITECTURES

David A. Morano and David R. Kaeli, Northeastern University

The Operand as a First Class Entity

Dynamic Dependency Ordering

Handling Multipath Execution

Names and Renaming

The Active Station Idea

Register and Memory Operand Storage

Operand Forwarding and Snooping

Result Forwarding Buses and Operand Filtering

A Small Resource-Flow Microarchitecture

A Distributed Scalable Resource-Flow Microarchitecture

About the Series

Chapman & Hall/CRC Computer and Information Science Series

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
COM000000
COMPUTERS / General
COM011000
COMPUTERS / Systems Architecture / General
COM012040
COMPUTERS / Programming / Games
COM046000
COMPUTERS / Operating Systems / General