[PPT] - Execution-based Prediction Using Speculative Slices Craig Zilles PowerPoint Presentation

SLIDE 1

Execution-based Prediction Using Speculative Slices

Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001

SLIDE 2

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

2

The Problem

Two major barriers to achieving high ILP: MISPREDICTED BRANCHES and CACHE MISSES

TRADITIONAL PREDICTION: SOMEWHAT MATURE TECHNOLOGY

correctly anticipate > 90% instructions
exploit patterns in outcome/address stream
remaining mispredictions still expensive

EXECUTION-BASED PREDICTION

exploit regularity in computations
speculatively compute results early for use as predictions
speedups from 1 to 43% on SPECINT 2000

SLIDE 3

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

3

The Solution

BRANCH LOAD

TIME BRANCH

branch mispredict

STREAM LOAD

cache miss

RETIREMENT

Identify frequently mispredicting instructions Extract and pack dependant computation into code fragments

1 2

PROGRAM

called slices

branch slice load slice

SLIDE 4

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

4

The Solution

TIME BRANCH

branch mispredict

BRANCH LOAD

cache miss

fork

cache hit

prediction fork

RETIREMENT STREAM

}

speedup

idle thread

LOAD

cache miss

Execute slices in helper threads to generate predictions

3

branch slice load slice

idle thread

SLIDE 5

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

5

The Outline

PROBLEM INSTRUCTIONS

BRANCH LOAD

cache miss

branch slice load slice fork

cache hit

prediction fork

}

speedup

SLIDE 6

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

6

The Outline

PROBLEM INSTRUCTIONS
EXECUTION-BASED PREDICTION

BRANCH LOAD

cache miss

branch slice load slice fork

cache hit

prediction fork

}

speedup

SLIDE 7

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

7

The Outline

PROBLEM INSTRUCTIONS
EXECUTION-BASED PREDICTION
PREDICTION CORRELATION

BRANCH LOAD

cache miss

branch slice load slice fork

cache hit

prediction fork

}

speedup

SLIDE 8

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

8

The Outline

PROBLEM INSTRUCTIONS
EXECUTION-BASED PREDICTION
PREDICTION CORRELATION
PERFORMANCE RESULTS

BRANCH LOAD

cache miss

branch slice load slice fork

cache hit

prediction fork

}

speedup

SLIDE 9

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

9 while (...) { ... ptr = ptr->next; }

Problem Instructions

Misses and mispredictions are not evenly distributed. EXAMPLE: PERLBMK

82 static branches: 68% of misp., 9% of dynamic branches
140 static loads: 67% misses, 2% of dynamic memory insts

Fixing just problem inst’s gives > 1/2 perf. of perfect cache/pred OUTCOMES OF THESE INSTRUCTIONS DO NOT EXHIBIT A PREDICTABLE PATTERN...

consistently mispredicted

... BUT SOMETIMES THE COMPUTATION IS REGULAR.

while (i < n) { if (object[i] != NULL) { ... }

SLIDE 10

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

10

Outline

PROBLEM INSTRUCTIONS
EXECUTION-BASED PREDICTION

O An different pre-execution approach O Speculative slices and imprecise transformations O Slice structure O Slice characterization

PREDICTION CORRELATION
PERFORMANCE RESULTS

BRANCH LOAD

cache miss

branch slice load slice fork

cache hit

prediction fork

}

speedup

SLIDE 11

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

11

Previous pre-execution proposal

Speculative Data-driven Multithreading: Roth and Sohi, HPCA’01

Speculatively pre-executes data-driven threads (DDTs)
Register integration matches DDTs to main thread

+ avoids re-execution of DDT instructions + early branch resolution (at decode stage)

DDTs must be sub-set of original program

SLIDE 12

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

12

Two Observations

Two Observations:

benefit comes from prefetches and predictions
strict program subsets not most efficient slices

Our approach: generate predictions/prefetches in as efficient manner as possible. OPTIMIZE SLICES:

+ reduce fetch/execution overhead + reduce critical path to making prediction

need a new mechanism to correlate predictions

SLIDE 13

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

13

Speculative Slices

DON’T ALLOW SLICES TO AFFECT ARCHITECTED STATE

only generate pre-fetches and predictions
need not be 100% accurate

3 CLASSES OF TRANSFORMATIONS: (NOT ORIGINALLY APPLIED BY COMPILER)

Imprecise

O static branch assertion (remove branches/cold code)

Not-provably safe

O register allocation in the presence of aliases

Previously unprofitable

O if-conversion (of a subset of a block)

SLIDE 14

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

14

Slice Structure

problem instructions frequently in loops
encapsulate loop in slice

Program Slice Fork

BENEFITS:

lower overhead
earlier predictions
amortize fork overhead
single helper thread

ISSUES & SOLUTIONS: in paper problem load

SLIDE 15

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

15

Slice Characterization

CONSTRUCTED AND OPTIMIZED SLICES BY HAND

encouraging results

STATISTICS:

85% of slices cover multiple static problem instructions
70% of slices contained loops
small static size

O smaller than 4 * # problem instructions covered

prefetch or prediction generated every ~3 dynamic inst’s.
small number of live-in values

O 80% of slices had 2 or less

slices can be very small

SLIDE 16

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

16

Outline

PROBLEM INSTRUCTIONS
EXECUTION-BASED PREDICTION
PREDICTION CORRELATION

O difficult problem O valid regions

RESULTS AND ANALYSIS

BRANCH LOAD

cache miss

branch slice load slice fork

cache hit

prediction fork

}

speedup

SLIDE 17

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

17

Prediction Correlation

TO BENEFIT FROM A SLICE-GENERATED PREDICTION

must bind it to fetched branch instruction
overrides hardware branch predictor

HOW ARE PREDICTIONS CORRELATED TO DYNAMIC BRANCHES?

BRANCH

branch slice fork prediction

SLIDE 18

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

18

Prediction Correlation

CHALLENGES:

re-ordering predictions produced out-of-order
recovering from misspeculation by main thread
dealing with conditionally-executed problem branches

BRANCH

branch slice fork prediction

F T T T F F T T BRANCH PC T F T T F BRANCH PC BRANCH PC

Tagged prediction queues

Related Work: Farcy, et al, Micro ‘98

SLIDE 19

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

19

Conditionally-executed problem branches

MINIMIZE OVERHEAD BY BUILDING SIMPLEST SLICE

compute prediction for each iteration

NAIVE IMPLEMENTATION

predictions dequeued when used
mis-alignment occurs on path CF

CONDITIONALLY GENERATE PREDICTIONS?

include “existence slice” in slice
too much overhead

INSIGHT

existence slice encoded in fetch path

A G B C F E D

problem branch fork point not executed

n all iterations

program’s CFG

SLIDE 20

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

20

Valid Regions

DEFINE REGION WHERE PREDICTION IS VALID

using assumptions from building slice

A B C F E D B D G F F E G

1st pred 2nd pred

C

first iteration second iteration

SLIDE 21

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

21

Valid Regions

DEFINE REGION WHERE PREDICTION IS VALID

using assumptions from building slice
“markers” to indicate region boundary
implementation discussed in paper

DEQUEUE PREDICTION WHEN MARKER ENCOUNTERED

using a prediction doesn’t dequeue it

greater than 99% correlation accuracy

A B C F E D B D G F F E G

1st pred 2nd pred

C

first iteration second iteration X X X X X X X X

SLIDE 22

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

22

Outline

PROBLEM INSTRUCTIONS
EXECUTION-BASED PREDICTION
PREDICTION CORRELATION
RESULTS AND ANALYSIS

O Methodology O Results O Discussion

BRANCH LOAD

cache miss

branch slice load slice fork

cache hit

prediction fork

}

speedup

SLIDE 23

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

23

Methodology

Used SPEC2000 integer benchmarks

spectrum of program behaviors

Identified dominant program phase

selected 100M inst. region for simulation

Built slices (by hand) to cover problem instructions Warmed up simulator for 100M instructions

SLIDE 24

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

24

Methodology, cont.

AGGRESSIVE BASELINE:

4-wide superscalar, 128 entry window, 14 cycle mispredict

penalty

2 load/store units, 4 fully pipelined integer/floating point

units

64Kb YAGS branch, 32Kb cascaded indirect, RAS predictors
Fetches across basic blocks, perfect BTB for direct

branches

2-way associative 64KB L1 caches (64B blocks)
4-way associative 2MB unified L2 cache (128B blocks)
64-entry unified pre-fetch/victim buffer with hardware

stream pre-fetcher Deeply-pipelined, 4-wide, out-of-order superscalar with big predictors, associative caches, hardware stride pre-fetcher, and victim buffers.

SLIDE 25

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

25

Results

speedups ranging from 1% to 43%

must be regularity in branch/address computation
speedups proportional to memory, branch stall time
low base IPC → lower opportunity cost of slice execution

bzip2 crafty eon gap gcc gzip mcf parser perl twolf vpr vortex

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

IPC

slice base 16% 3% 7% 11% 1% 16% 43% 1% 7% 12% 35% 1%

SLIDE 26

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001

26

Related Work

Pre-execution:

Roth and Sohi: HPCA-2001 and TR-2000

SPECULATIVE SLICES:

Zilles and Sohi: ISCA-2000

Limited forms of pre-execution:

Roth, et al: ASPLOS-1998 and ICS-1999
Farcy, et al: Micro-1998

Slipstream processors:

Sundaramoorthy, et al: ASPLOS-2000

Helper threads:

Chappell, et al: ISCA-1999
Song and Dubois: TR-1998

SLIDE 27

Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001