Data Speculation Adam Wierman Daniel Neill Lipasti and Shen. - - PowerPoint PPT Presentation

data speculation
SMART_READER_LITE
LIVE PREVIEW

Data Speculation Adam Wierman Daniel Neill Lipasti and Shen. - - PowerPoint PPT Presentation

Data Speculation Adam Wierman Daniel Neill Lipasti and Shen. Exceeding the dataflow limit, 1996. Sodani and Sohi. Understanding the differences between value prediction and instruction reuse , 1998. Architecture Carnegie Mellon 1 School of


slide-1
SLIDE 1

Carnegie Mellon

School of Computer Science

1

Architecture

Data Speculation

Adam Wierman Daniel Neill

Lipasti and Shen. Exceeding the dataflow limit, 1996. Sodani and Sohi. Understanding the differences between value prediction and instruction reuse, 1998.

slide-2
SLIDE 2

Carnegie Mellon

School of Computer Science

2

Architecture

A Taxonomy of Speculation

Speculative Execution Control Speculation Data Speculation Branch Direction Branch Target Data Location Data Value Question: What makes speculation possible? What can we speculate on?

slide-3
SLIDE 3

Carnegie Mellon

School of Computer Science

3

Architecture

Value Locality

Question: Where does value locality occur? Single-cycle Arithmetic (i.e. addq $1 $2) Single-cycle Logical (i.e bis $1 $2) Multi-cycle Arithmetic (i.e. mulq $1 $2) Register Move (i.e. cmov $1 $2) Integer Load (i.e. ldq $1 8($2)) Store with base register update FP Load FP Multiply FP Add FP Move Somewhat Yes No Yes Yes No Yes Somewhat Somewhat Yes How often does the same value result from the same instruction twice in a row

slide-4
SLIDE 4

Carnegie Mellon

School of Computer Science

4

Architecture

Value Locality

Question: Why is speculation useful? addq $1 $2 $3 addq $3 $1 $4 addq $3 $2 $5 Speculation lets all these run in parallel on a superscalar machine

slide-5
SLIDE 5

Carnegie Mellon

School of Computer Science

5

Architecture

Exploiting Value Locality

Value Prediction (VP) Instruction Reuse (IR)

“predict the results of instructions based on previously seen results” “recognize that a computation chain has been previously performed and therefore need not be performed again”

slide-6
SLIDE 6

Carnegie Mellon

School of Computer Science

6

Architecture

Exploiting Value Locality

Value Prediction (VP) Instruction Reuse (IR)

Fetch Decode Issue Execute Commit Predict Value Verify

if mispredicted

Fetch Decode Issue Execute Commit Check for previous use Verify arguments are the same

if reused

slide-7
SLIDE 7

Carnegie Mellon

School of Computer Science

7

Architecture

Value Prediction

(Lipasti & Shen, 1996)

slide-8
SLIDE 8

Carnegie Mellon

School of Computer Science

8

Architecture

Value prediction

  • Speculative prediction of register values

– Values predicted during fetch and dispatch, forwarded to dependent instructions. – Dependent instructions can be issued and executed immediately. – Before committing a dependent instruction, we must verify the

  • predictions. If wrong: must restart dependent instruction w/

correct values.

Fetch Decode Issue Execute Commit Predict Value Verify

if mispredicted

slide-9
SLIDE 9

Carnegie Mellon

School of Computer Science

9

Architecture

PC PC Pred History Value History Classification Table (CT) Value Prediction Table (VPT) Prediction Should I predict? Predicted Value

Overview

slide-10
SLIDE 10

Carnegie Mellon

School of Computer Science

10

Architecture

How to predict values?

PC PC Pred History Value History Classification Table (CT) Value Prediction Table (VPT) Prediction Value Prediction Table (VPT)

– Cache indexed by instruction address (PC) – Mapped to one or more 64-bit values – Values replaced (LRU) when instruction first encountered or when prediction incorrect. – 32 KB cache: 4K 8-byte entries

slide-11
SLIDE 11

Carnegie Mellon

School of Computer Science

11

Architecture

Estimating prediction accuracy

PC PC Pred History Value History Classification Table (CT) Value Prediction Table (VPT) Prediction Predicted Value Classification Table (CT)

– Cache indexed by instruction address (PC) – Mapped to 2-bit saturating counter, incremented when correct and decremented when wrong.

0,1 = don’t use prediction 2 = use prediction 3 = use prediction and don’t replace value if wrong

– 1K entries sufficient

slide-12
SLIDE 12

Carnegie Mellon

School of Computer Science

12

Architecture

Verifying predictions

  • Predicted instruction executes normally.
  • Dependent instruction cannot commit until predicted

instruction has finished executing.

  • Computed result compared to predicted; if ok then

dependent instructions can commit.

  • If not, dependent instructions must reissue and execute

with computed value. Miss penalty = 1 cycle later than no prediction. Fetch Decode Issue Execute Commit Predict Value Verify

if mispredicted

slide-13
SLIDE 13

Carnegie Mellon

School of Computer Science

13

Architecture

Results

  • Realistic configuration, on simulated (current and near-future)

PowerPC gave 4.5-6.8% speedups.

– 3-4x more speedup than devoting extra space to cache.

  • Speedups vary between benchmarks (grep: 60%)
  • Potential speedups up to 70% for idealized configurations.

– Can exceed dataflow limit (on idealized machine).

slide-14
SLIDE 14

Carnegie Mellon

School of Computer Science

14

Architecture

Instruction Reuse

(Sodani & Sohi, 1998)

slide-15
SLIDE 15

Carnegie Mellon

School of Computer Science

15

Architecture

Instruction Reuse

  • Obtain results of instructions from their previous

executions.

– If previous results still valid, don’t execute the instruction again, just commit the results!

  • Non-speculative, early verification

– Previous results read in parallel with fetch. – Reuse test in parallel with decode. – Only execute if reuse test fails.

Fetch Decode Issue Execute Commit Check for previous use Verify arguments are the same

if reused

slide-16
SLIDE 16

Carnegie Mellon

School of Computer Science

16

Architecture

How to reuse instructions?

  • Reuse buffer

– Cache indexed by instruction address (PC) – Stores result of instruction along with info needed for establishing reusability:

Operand register names Pointer chain of dependent instructions

– Assume 4K entries (each entry takes 4x as much space as VPT: compare to 16K VP) – 4-way set-associative.

slide-17
SLIDE 17

Carnegie Mellon

School of Computer Science

17

Architecture

Reuse Scheme

  • Dependent chain of results (each points to previous

instruction in chain)

– Entry is reusable if the entries on which it depends have been reused (can’t reuse out of order). – Start of chain: reusable if “valid” bit set; invalidated when

  • perand registers overwritten.

– Special handling of loads and stores.

  • Instruction will not be reused if:

– Inputs not ready for reuse test (decode stage) – Different operand registers

slide-18
SLIDE 18

Carnegie Mellon

School of Computer Science

18

Architecture

Results

  • Attempts to evaluate “realistic” and “comparable” schemes for VP

and IR on simulated MIPS architecture.

  • Are these really realistic? Assume oracle or || test.
  • Net performance: VP better on some benchmarks; IR better on
  • some. All speedups typically 5-10%.
  • More interesting question: can the two schemes be combined?
  • Claim: 84-97% of redundant instructions reusable.
slide-19
SLIDE 19

Carnegie Mellon

School of Computer Science

19

Architecture

Comparing VP and IR

Value Prediction (VP) Instruction Reuse (IR)

“predict the results of instructions based on previously seen results” “recognize that a computation chain has been previously performed and therefore need not be performed again”

slide-20
SLIDE 20

Carnegie Mellon

School of Computer Science

20

Architecture

Comparing VP and IR

Value Prediction (VP) Instruction Reuse (IR)

“predict the results of instructions based on previously seen results” “recognize that a computation chain has been previously performed and therefore need not be performed again” Which captures more redundancy? Which captures more redundancy?

IR can’t predict when: 1. Inputs aren’t ready 2. Same result follows from different inputs 3. VP makes a lucky guess

slide-21
SLIDE 21

Carnegie Mellon

School of Computer Science

21

Architecture

Comparing VP and IR

Value Prediction (VP) Instruction Reuse (IR)

“predict the results of instructions based on previously seen results” “recognize that a computation chain has been previously performed and therefore need not be performed again” Which captures more redundancy? Which handles misprediction better? IR is non-speculative, so it never mispredicts

slide-22
SLIDE 22

Carnegie Mellon

School of Computer Science

22

Architecture

Comparing VP and IR

Value Prediction (VP) Instruction Reuse (IR)

“predict the results of instructions based on previously seen results” “recognize that a computation chain has been previously performed and therefore need not be performed again” Which captures more redundancy? Which integrates best with branches?

IR 1. Mispredicted branches are detected earlier 2. Instructions from mispredicted branches can be reused. VP 1. Causes more misprediction

slide-23
SLIDE 23

Carnegie Mellon

School of Computer Science

23

Architecture

Comparing VP and IR

Value Prediction (VP) Instruction Reuse (IR)

“predict the results of instructions based on previously seen results” “recognize that a computation chain has been previously performed and therefore need not be performed again” Which captures more redundancy? Which is better for resource contention? IR might not even need to execute the instruction

slide-24
SLIDE 24

Carnegie Mellon

School of Computer Science

24

Architecture

Comparing VP and IR

Value Prediction (VP) Instruction Reuse (IR)

“predict the results of instructions based on previously seen results” “recognize that a computation chain has been previously performed and therefore need not be performed again” Which captures more redundancy? Which is better for execution latency? VP causes some instructions to be executed twice (when values are mispredicted), IR executes once or not at all.

slide-25
SLIDE 25

Carnegie Mellon

School of Computer Science

25

Architecture

Value Prediction (VP) Instruction Reuse (IR)

“predict the results of instructions based on previously seen results” “recognize that a computation chain has been previously performed and therefore need not be performed again” Possible class project: Can we get the best of both techniques?

slide-26
SLIDE 26

Carnegie Mellon

School of Computer Science

26

Architecture

Data Speculation

Adam Wierman Daniel Neill

Lipasti and Shen. Exceeding the dataflow limit, 1996. Sodani and Sohi. Understanding the differences between value prediction and instruction reuse, 1998.

slide-27
SLIDE 27

Carnegie Mellon

School of Computer Science

27

Architecture

Notes

  • Value prediction can handle these cases, thus captures

more redundancy.

  • But IR has several advantages…

– Skips execute phase when reusing instruction. – Early, non-speculative test; never “mispredicts”