1st Edition

Speculative Execution in High Performance Computer Architectures




ISBN 9781584884477
Published May 26, 2005 by Chapman and Hall/CRC
456 Pages 94 B/W Illustrations

USD $190.00

Prices & shipping based on shipping country


Preview

Book Description

Until now, there were few textbooks that focused on the dynamic subject of speculative execution, a topic that is crucial to the development of high performance computer architectures. Speculative Execution in High Performance Computer Architectures describes many recent advances in speculative execution techniques. It covers cutting-edge research projects, as well as numerous commercial implementations that demonstrate the value of this latency-hiding technique.

The book begins with a review of control speculation techniques that use instruction cache prefetching, branch prediction and predication, and multi-path execution. It then examines dataflow speculation techniques including data cache prefetching, address value and data value speculation, pre-computation, and coherence speculation. This textbook also explores multithreaded approaches, emphasizing profile-guided speculation, speculative microarchitectures, and compiler techniques.

Table of Contents

INTRODUCTION
David R. Kaeli, Northeastern University and Pen C. Yew, University of Minnesota

INSTRUCTION CACHE PREFETCHING
Glenn Reinman, UCLA Computer Science Department
Direct Mapped Cache
Set Associative Cache
Pseudo Associative Cache
Way Prediction Cache
Next Line Prefetching
Target Prefetching
Stream Buffers
Nonblocking Instruction Caches and Out-Of-Order Fetch
Fetch Directed Instruction Prefetching
Integrated Prefetching
Wrong-Path Prefetching
Compiler Strategies

BRANCH PREDICTION
Philip G. Emma, IBM T.J. Watson Research Laboratory
The von Neumann Programming Model vs. ENIAC
Dataflow and Control Flow
The Branch Instruction
The IAS Machine: A Primitive Stored-Program Architecture
Virtuality
Branch Instruction Semantics
General Instruction-Set Architectures and Extensions
Memory Consistency and Observable Order
Branches and Performance
Pipelining
Pipeline Disruptions and Their Penalties
Superscalar Processing
Multithreading
Instruction Prefetching and Autonomy
The Delayed Branch Instruction
Branch Flow in a Pipeline: The "When" of Branch Prediction
Predicting Branches at Decode Time
Predicting Branches at Instruction-Prefetch Time
Static Branch Prediction
Dynamic Branch Prediction
Branch Prediction With Counters
Predicting by Profiling Branch Actions
Group Behaviors vs. Predicting Individual Branches
The Decode History Table (a.k.a. Branch History Table)
Discriminators
Using Multiple Discriminators: A Path-Based Approach
Implementation
A Timing Caveat
Hybrid Predictors
Instruction Prefetching
The Branch History Table (a.k.a. Branch Target Buffer)
Operation of the BTB
Fetch Width and Branch Mispredictions
The Subroutine Call and Return Structure
Predicting Return Addresses by Using a Stack
Recognizing Subroutine Calls and Returns
Taking Advantage of the BTB Structure
Eliminating the Stack
Working Sets and Contexts
The Size of a BTB Entry
The BTB and the Instruction Cache: Economies of Size
More Exotic Prediction for the More Difficult Branches
Branches and the Operand Space
Branches and the Operand-Address Space
Tandem Branch Prediction
Accuracy and the Updating of Tables
Predictor Bandwidth and Anomalous Behaviors
The Importance of Fast Prediction Mechanisms
Superscalar Processing and the Monolithic Prediction
of Branch Sequences
Predicting Branches in a Multithreaded Environment
Limitations
Simplicity
Complexity
Two Saving Graces
Implementing Real Branch Prediction Mechanisms

TRACE CACHES
Eric Rotenberg North Carolina State University
Traces
Core Fetch Unit Based on Instruction Cache
Trace Cache Operation
Path Associativity
Indexing Strategy
Partial Matching
Coupling Branch Prediction with the Trace Cache
Trace Selection Policy
Multi-Phase Trace Construction
Managing Overlap between Instruction Cache and Trace
Cache
Speculative vs. Non-Speculative Trace Cache Updates
Powerful vs. Weak Core Fetch Unit
Parallel vs. Serial Instruction Cache Access
L1 vs. L2 Instruction Cache
Loop Caches

BRANCH PREDICATION
David August, Princeton University
Overcoming Branch Problems with Predication
If-Conversion
Predicate Optimization and Analysis
The Predicated Intermediate Representation
Hewlett-Packard Laboratories PD
Cydrome Cydra 5
ARM
Texas Instruments C6X
Systems with Limited Predicated Execution Support
Predication in the Itanium 2 Processor

MULTIPATH EXECUTION
Augustus K. Uht, University of Rhode Island
Branch Tree Geometry
Branch Path/Instruction ID
Phases of Operation
Granularity
With Predication
With Data Speculation
Compiler-Assisted
Hardware: Classically-Based
Hardware: Non Classically-Based
Multiprocessors
Functional or Logic Language Machines
Branch Prediction
Confidence Estimation
Pipeline Depth
Implications of Amdahl's Law - ILP Version
Memory Bandwidth Requirements

DATA CACHE PREFETCHING
Yan Solihin, North Carolina State University, and Donald Yeung, University of Maryland at College Park
Architectural Support
Array Prefetching
Pointer Prefetching
Relationship with Data Locality Optimizations
Stride and Sequential Prefetching
Correlation Prefetching
Content-Based Prefetching

ADDRESS PREDICTION
Avi Mendelson, Intel Mobil Micro-Processor Architect
Terminology and Definitions
Non-Speculative Address Calculation Techniques
Speculative Address Calculation Techniques
Chapter Focus
Characterization of Address Predictability
Address Predictability vs. Value Predictability
Combining Address Prediction with Prefetching Mechanism
Basic Characterization
Load Promotion
Memory Bypassing
Compiler Based Speculative Load Promotion

DATA SPECULATION
Yiannakis Sazeides, University of Cyprus; Pedro Marcuello, Intel-UPC Barcelona Research Center; James E. Smith, University
of Wisconsin-Madison; and Antonio González, Universitat Polit`ecnica de Catalunya
Basic Value Predictors
Value Predictor Alternatives
Confidence Estimation
Implementation Issues
Data Dependence Predictors
Verification
Recovery
Other Microarchitectural Implications of Data Value
Speculation
Related Work: Data Value Speculation
Related Work: Data Dependence Speculation

INSTRUCTION PRECOMPUTATION: DYNAMICALLY REMOVING REDUNDANT COMPUTATIONS USING PROFILING
Joshua J. Yi, Freescale Semiconductor Inc.; Resit Sendag, University of Rhode Island; and David J. Lilja, University of Minnesota at Twin Cities
A Comparison of Instruction Precomputation and Value Reuse
Upper-Bound - Profile A, Run A
Different Input Sets - Profile B, Run A
Combination of Input Sets - Profile AB, Run A
Frequency versus Frequency and Latency Product
Performance of Instruction Precomputation versus Value Reuse

PROFILE-BASED SPECULATION
Youfeng Wu and Jesse Fang, Intel Microprocessor Technology Labs
Control Flow Profile
Memory Profile
Value Profile
Static Analysis
Instrumentation
Hardware Performance Monitoring
Special Hardware
Software-Hardware Collaborative Profiling
Compile-time Profiling
Runtime Profiling
Continuous Profiling
Trace Scheduling
Hot-Cold Optimizations
Code Layout
Data Layout
Stride Prefetching
Hot Data Stream Prefetching
Mississippi Delta Prefetching
Java Runtime Parallelizing Machine
Speculative Parallel Threading
Speculative Computation Reuse
Software-Based Speculative Precomputation
Stability across Multiple Workloads
Update When Program Changes
Maintenance during optimizations
Perturbation by Profiling Code

COMPILATION AND SPECULATION
Jin Lin, Wei-Chung Hsu and Pen-Chung Yew University of Minnesota, Minneapolis
Alias Profiling
Data Dependence Profiling
Speculative Alias and Dataflow Analyses
A Framework for Speculative Alias Analysis and Dataflow Analysis
Overview
F-Insertion Step
Rename Step
Downsafety Step
CodeMotion Step
Recovery Code Generation for General Speculative Optimizations
Check Instructions and Recovery Code Representation for Multi-Level
Speculation
Interaction of the Early Introduced Recovery Code with Later
Optimizations

MULTITHREADING AND SPECULATION
Pedro Marcuello, Jesus Sanchez and Antonio Gonzalez Intel-UPC Barcelona Research Center; Intel Labs; Universitat Politecnica de Catalunya; Barcelona (Spain)
Building Helper Threads
Microarchitectural Support for Helper Threads
Thread Spawning Schemes
Microarchitectural Support for Speculative Architectural Threads
References
Andreas Moshovos, University of Toronto

EXPLOITING LOAD/STORE PARALLELISM VIA MEMORY DEPENDENCE PREDICTION
Static Methods
Hybrid Static/Dynamic Methods
Dynamic Methods
Working Example
Multiple Dependences Per Static Load or Store
Methodology
Performance Potential of Load/Store Parallelism
Performance with Naive Memory Dependence Speculation
Using Address-Based Scheduling to Extract Load/Store Parallelism
Speculation/Synchronization

RESOURCE FLOW MICROARCHITECTURES
David A. Morano and David R. Kaeli, Northeastern University
The Operand as a First Class Entity
Dynamic Dependency Ordering
Handling Multipath Execution
Names and Renaming
The Active Station Idea
Register and Memory Operand Storage
Operand Forwarding and Snooping
Result Forwarding Buses and Operand Filtering
A Small Resource-Flow Microarchitecture
A Distributed Scalable Resource-Flow Microarchitecture

...
View More