Speculative Execution Outcome unknown Block 1 Predict future - - PowerPoint PPT Presentation

speculative execution
SMART_READER_LITE
LIVE PREVIEW

Speculative Execution Outcome unknown Block 1 Predict future - - PowerPoint PPT Presentation

Speculative Execution Outcome unknown Block 1 Predict future execution path Begin executing instructions from predicted path ? Speculatively Executed A B Block Also Block 2 Speculatively Executed 1 Speculative Execution Block 1


slide-1
SLIDE 1

Speculative Execution

Block 1 A B Block 2 ? Speculatively Executed Block Also Speculatively Executed

1

Outcome unknown Predict future execution path Begin executing instructions from predicted path

slide-2
SLIDE 2

Speculative Execution

Block 1 A B Block 2 ? Speculatively Executed Block Also Speculatively Executed Branch Resolved Rollback execution to decision point if incorrectly predicted

2

slide-3
SLIDE 3

Speculative Execution

Block 1 A B Block 2 ? Rollback to decision point Begin executing instructions from correct path Speculatively Executed Block Also Speculatively Executed

Rollback mechanism must undo consequences of the speculative tasks

3

slide-4
SLIDE 4

Handling Mis Speculation

  • Method 1:
  • Checkpointing and Rollback
  • Save enough system state at decision point to restart execution from that

point Method 2:

  • Prevent speculative instructions from updating system state
  • Writes by speculative instruction are stalled till speculative status is resolved
  • Speculative instructions can still execute and make progress
  • Use renaming mechanisms to transfer information between speculative instructions
  • Rename source registers (a la Tomasulo)
  • On resolution
  • Mis-speculation: Squash the speculative instructions
  • Correct execution: Commit (the writes) of the speculative instructions

4

slide-5
SLIDE 5

Reorder Buffer for Speculation

  • Reorder Buffer (RoB):
  • Mechanism to support the In-Order Writes of Instructions
  • Buffer the results of completing instructions
  • Update destination registers or memory locations in order
  • Later instructions that complete before earlier ones will wait in the RoB till

earlier ones update destinations

5

Commits: Updates destination Completes execution Remains in RoB till preceding instruction(s) complete their write A:

Squash if misspeculated

B: Speculative Instruction B: (normal instruction)

Commit otherwise

slide-6
SLIDE 6

Extending Tomasulo Pipeline with Reorder Buffer

ISSUE DISPATCH EX WRITE COMMIT Head Tail

Issue Unit adds newly issued instructions to tail of Q Commit Unit removes ready to commit instructions from head of Q

RoB :

  • Storage to buffer writes until ready to commit
  • Circular queue written and released in FIFO (instruction) order
  • Each entry allocates space for 1 instruction to store its results +

identifying information

6

slide-7
SLIDE 7

Tomasulo’s Pipeline with RoB based Commit Issue

LSQ

IR

WB

RS

Dispatch

REG FILE

Common Data Bus (CDB) RoB

EX EX

COMMIT Accesses RoB and REG FILE

7

slide-8
SLIDE 8

Extending Tomasulo Pipeline with Reorder Buffer

ISSUE DISPATCH EX WRITE COMMIT Destination registers need to distinguish between 3 possible states:

  • 1. Available (A): No pending write to register. Register has its final value.
  • 2. In Flight (I): Writer instruction is in flight:

The last instruction with that destination register has not yet completed its wri

  • 3. Ready (R): Writer has completed write but not yet committed. The value from the

reorder buffer will be written to the register when it commits. Note: A and I are the same two register states of regular Tomasulo. The state of a register is used by an issuing instruction to find out where to get its source operand.

8

slide-9
SLIDE 9

Key Features (Tomasulo with RoB)

Issue instruction X (ALU instruction): 1. Allocate RS entry: Let RSX denote the RS index allocated to X 2. Allocate RoB entry: Instruction X will be identified using its Reorder Buffer index RBX 3. Update fields for source operands value in RS 4. Update state of destination register Stall issue until Free Reservation Station and Reorder Buffer slots are available

  • Reservation Station RSX fields exactly the same as regular Tomasulo
  • RBX made up of the following fields:

Destination State Value

Tag: The identifier for instruction X (usually implicit index of entry in the RoB) Destination: The destination register of instruction X State: Yes/No---- RoB entry is valid result Yes: X has completed write No: X is In flight Value: Result of X (broadcast on CDB during write by X)

Tag

9

slide-10
SLIDE 10

Example

A: MUL F4, F0, F2 Issue A

RSA v0 v2 MUL

F4

RSA F4 RBA v0 v2 MUL RBA F4 No

  • F4

RBA F4

STATE:

I

Tomasulo’s without RoB Tomasulo’s with RoB

10

Reservation Station RoB Entry Reservation Station

slide-11
SLIDE 11

Example

A: MUL F4, F0, F2 B: ADD F8, F4, F6 Issue B

RSB v6 ADD

F4

RSB F8 RBB v6 MUL RBA F4 No

  • F4

RBB F8

STATE:

I RSA RBB F8 No

  • RBA

Tomasulo’s without RoB Tomasulo’s with RoB

11

slide-12
SLIDE 12

Key Features: Instruction Issue (contd …. )

For each source operand register S:

  • Action depends on state of source register (A, I, R)
  • A: copy value from S immediately to RSX
  • I (pending write by instruction J): tag the source field of RSX with RBJ
  • R (pending update from RBJ ): read value from RBJ and copy to RSI

F0

RBJ

STATE:

I

F0 F0

RBJ

STATE:

R

STATE:

A

RSX

ADDD F2, F0, F4

RSX

RoB[RBJ]

RoB entry RSX

RBJ

12

slide-13
SLIDE 13

Key Features: Instruction Issue (contd …. )

For destination register D

  • Make X the writer of D
  • Set the state of D to I (Implicitly cancels the previous write if any).

13

slide-14
SLIDE 14

RBA

F4 NO

  • Instruction Issue Example (contd …)

A: MUL F4, F0, F2 B: ADD F8, F4, F6

After Issue of A and B

RBB F8 NO

  • RBA

RBB ADDD v6 F8

RBB

RSB

Reorder Buffer

head tail RB index DEST REG STATUS VALUE 14

RBA

v0 v2

MULD

F4

RBA

STATE:

I

STATE:

I

F4

F4 v0 F4 v2 F4 v6

RSA

slide-15
SLIDE 15

Dispatch and Write Units

  • Execute unit executes instructions from RS whose operands are available
  • When execution is complete the Write Unit is notified
  • Write Unit broadcasts the result of completing instruction X on CDB
  • All units that are waiting on the result of X monitor the broadcast value
  • Reservation Stations copy value into source field of RS.
  • Reorder Buffer entry for X copies value into appropriate field of RBX (*)
  • Destination register D changes state from I to R if X is the current writer; it

however does not update D with the broadcast value(*)

  • To identify the broadcast destination
  • Write Unit broadcasts the tag (RBX) identifying the completing instruction.
  • Units whose tag match the broadcast tag are updated as above

(*) The major change from regular Tomasulo. In regular Tomasulo value copied into destination register and state of register changed from I to R

15

slide-16
SLIDE 16

Example

A: MUL F4, F0, F2 B: ADD F8, F4, F6 A Writes RESULT using CDB

RSA v0 v2 MUL

F4

RSA F4 = RESULT RBA v0 v2 MUL RBA F4 Yes

F4

RBA F4

STATE:

R RESULT

Tomasulo’s without Reorder Buffer: RESULT is copied into F4

Tomasulo’s with Reorder Buffer: RESULT is copied into Reorder Buffer entry RBA F4 changes state from I to R

RSB v6 ADD RESULT RBB v6 ADD RESULT

16

slide-17
SLIDE 17

Example

A: MUL F4, F0, F2 B: ADD F8, F4, F6 C: ADD F10, F4, F6 /* Must get value produced by A*/ A Writes RESULT using CDB

RSA v0 v2 MUL

F4

RSA F4 = RESULT RBA v0 v2 MUL RBA F4 Yes

F4

RBA F4

STATE:

R RESULT

Tomasulo’s without Reorder Buffer: RESULT is copied into F4

Tomasulo’s with Reorder Buffer: RESULT is copied into Reorder Buffer entry RBA F4 changes state from I to R

RSB v6 ADD RESULT RBB v6 ADD RESULT

16

slide-18
SLIDE 18

RBA

F4 YES RESULTA

Instruction Issue Example (contd …)

A: MUL F4, F0, F2 B: ADD F8, F4, F6

After Write of A

RBB F8 NO

  • Reorder Buffer

head tail RB index DEST REG STATUS VALUE 17

F4

RBA

STATE:

R

F4

RBB ADDD

F4 RESULTA F4 v6

RSB

F4

RBB

STATE:

I

F8

slide-19
SLIDE 19

Commit Unit

  • Wait till instruction at head of Reorder Buffer completes Write stage (state = YES)
  • Let the instruction at head of the Reorder Buffer be X (assume not speculative) and the

destination register be D and the value field be v Update D with value v If the current writer of register D is instruction X Set state of D to A else Leave state of D unchanged

  • If X is misspeculated: Flush all speculated instructions

Event: A commits

RBA F4 YES RESULTA

TAG DEST REG STATUS VALUE RBB F6 NO

  • F4: RESULTA

head tail 18

STATE:

A

slide-20
SLIDE 20

Load/Store Instructions

Load and Store Buffers: Reservation Stations for Load and Store instructions C: LD F2, 0(R2) ID ADDR OP

RBC LD ea RSC

  • The memory address of the Load/Store (ea) is calculated during Issue and stored in the RS.
  • The value read from memory is broadcast and stored in the Reorder Buffer till it is

committed like other ALU instructions.

TAG DEST REG STATUS VALUE

RBC F2

NO

  • F2

RBC

I

19

slide-21
SLIDE 21

Load/Store Instructions

D: SD 0(R2), F4 ID ADDR DATA OP

RBD SD

ea

RSD

  • * Destination in this case is a memory address
  • The difference between regular Tomasulo and here is that the store should not begin

writing to memory till it is ready to commit.

  • Signaled by the commit unit when SD becomes the head instruction.

TAG DEST REG STATUS VALUE

v4

RBD [SD]*

YES

  • 20

v4 Note: Later LOADS to the

same address would have to wait till the SD commits and writes to memory. To avoid these stalls can use LOAD FORWARDING to bypass memory and directly pass the result from the waiting SD to the later LD.

slide-22
SLIDE 22

Example

A: DIVD F0, F2, F4 B: BEQZ R1, L1 C: ADDD F6, F8, F10 D: MULTD F0, F6, F12

  • Assume Branch is held up waiting for value of R1
  • We predict the branch as not taken.
  • The assumed sequence of events is:

Issue A, Issue B, Issue C, Issue D C completes write, B completes write, A completes write A commits, B commits

21

slide-23
SLIDE 23

Snapshot after A, B, C and D are issued

Initial State of both F0 and F6 is A

TAG DEST REG DONE? VALUE

RBA F0 NO

  • RBB Predict: NT NO -

RBC F6 NO - RBD F0 NO - F6

RBC

F0

RBD

  • A, B and C are executing
  • D is waiting on C due to RAW dependency

I I

Snapshot of ROB

22

slide-24
SLIDE 24

Snapshot after C completes write

C completes Write: F6 changes status to R to indicate result of C is now available in ROB F6 value not updated with result at this time (will be done when C commits)

TAG DEST REG DONE ? VALUE

RBA

F0 YES

  • RBB Predict: NT NO -

RBC F6 YES vC RBD F0 NO -

F6 RBC F0 RBD

  • A, B still in execution
  • D ready for execution with value received from C
  • BEQ assumed to provide target address and true outcome of the branch as its result.

R I

ROB Snapshot

23

slide-25
SLIDE 25

Snapshot after C, B and A complete writes

TAG DESTINATION WRITE DONE VALUE

RBA

F0 YES

RBB Predict: NT YES target/TAKEN RBC F6 YES vC RBD F0 NO -

F6 RBC F0 RBD

  • D in execution.
  • F0 is still is state I
  • BEQ is assumed to provide the target address and the true outcome of the branch

R I vA

  • Since the writer for F0 is D, when A completes execution it does not

change either the value or state of F0. Merely records the result in the ROB.

  • The value is saved in ROB in order to update F0 when A commits.

The update of F0 is needed (even though A is not the writer) since D is speculative, and may need to be flushed.

24

slide-26
SLIDE 26

Snapshot after A commits.

TAG DESTINATION WRITE DONE VALUE

RBB Predict: NT YES target/TAKEN

RBC F6 YES vC RBD F0 NO -

F6 RBC

F0 vA

RBD

Value vA copied from ROB entry of A to its destination register F0

I R

25

slide-27
SLIDE 27

Snapshot after B commits

TAG DESTINATION WRITE DONE VALUE F6 F0 vA

Misprediction so all instructions/status following B flushed. All registers set to A state

Notice that F6 not updated with result of C

A A

26

slide-28
SLIDE 28

Schedule

flushed

W * * * D D D I D

flushed

W + + D I C C W E D D D D I B C W / / / / / / / D I A 13 12 11 10 9 8 7 6 5 4 3 2 1

27

slide-29
SLIDE 29

Notes

  • When ROB is flushed some of the incorrectly speculated instructions could still be in flight
  • Either flush them while in-flight or lazily when they arrive at the ROB
  • For performance could (should) flush ROB and fetch from correct target as soon as Branch

completes (cycle 8 in example) i.e. why wait for commit of the branch instruction?

  • In this case not all in-flight instructions are speculative (instructions before the branch that are

still in-flight) and need to be distinguished

  • What if there are several simultaneously speculative branches? Gets complicated!
  • Speculation is not free.
  • Complexity of circuits
  • Performance:
  • Correct Speculation helps
  • Incorrect Speculation: May not hurt execution time since it replaces stall cycles
  • Power/Energy: Some overhead for speculating even correctly
  • Mis-speculation: Energy is entirely wasted

28