EE 457 Unit 9b In-Order Completion Speculation 2 Credits Some of - PowerPoint PPT Presentation

1 EE 457 Unit 9b In-Order Completion Speculation

2 Credits • Some of the material in this presentation is taken from: – Computer Architecture: A Quantitative Approach • John Hennessy & David Patterson • Some of the material in this presentation is derived from course notes and slides from – Prof. Michel Dubois (USC) – Prof. Murali Annavaram (USC) – Prof. David Patterson (UC Berkeley)

3 Tomasulo w/ Speculative Execution • In-order Issue • Out-of-Order Execution • In-order Completion – Completion = Commit = Graduation

4 OoO Execution w/ ROB • ROB allows for OoO execution but in-order completion Assume: SW always D-Cache I-Cache hits in cache 1 mult 2 add ROB Consider this sequence: 3 lw Reg. File Instruc. (Reorder (Assume mult takes 4 sub Queue several cycles) Buffer) Br. Pred. mult $5,$6,$7 Buffer Dispatch add $2,$3,$4 lw $8,0($5) sub $9,$0,$2 Mult. Queue L/S Queue Int. Queue Div Queue Addr. Buffer Simplification for EE457: Issue Cache miss can occur for Unit LW only but SW always hits (without this Exec. Unit Integer / simplification we need to Div Mul D-Cache cover store buffer design Branch and related issues) L/S Buffer CDB

5 OoO Execution w/ ROB • ROB allows for OoO execution but in-order completion D-Cache I-Cache 1 mult Current Head 2 Completed Reg. File Instruc. 3 lw 4 Completed Current Tail Queue mult $5,$6,$7 ROB add $2,$3,$4 Br. Pred. lw $8,0($5) Buffer Dispatch sub $9,$0,$2 Mult. Queue L/S Queue Int. Queue Div Queue ROB entry is allocated on dispatch. Addr. lw mult Buffer Issue When an instruction executes, its result is Unit stored in ROB then Exec. Unit committed to register Integer / file when it reaches the Div Mul D-Cache Branch head of the ROB (in- order completion) L/S Buffer CDB

6 Re-Order Buffer (ROB) • ROB is a FIFO (let’s say 32 locations) Valid Rd RegWrite Result – WP = Write pointer = Used by Dispatch Unit 0 0 0 1 • Each instruction issues in order and “takes a 1 0 $2 1 2 0 0 0 number” Top (rp) 3 1 $1 1 – RP = Read pointer = Used for committing 4 1 $2 1 the most senior / oldest instruction when it 5 1 $15 1 has completed without generating an 6 1 $2 1 exception 7 1 $12 1 8 1 $2 0 9 1 $7 0 10 0 $13 1 Bottom (wp) 11 0 0 1 12 0 $4 0 13 0 $2 1 The RP 14 0 0 1 1. WP – RP = number of items The WP in the FIFO (depth) 2. It is a circular FIFO/buffer

7 Dispatch and the ROB • No more token FIFO (for tagging instructions) as in OoO execution and completion – ROB entry is allocated for an instruction on issue/dispatch – When instruction finishes executing its result is buffered in the ROB entry until it can be committed safely • It does not (and cannot) use the RST (Register Status Table) as before – When an instruction is dispatched, the ROB is searched for its source register (Rs and/or Rt) producers • If an entry in the ROB is producing Rs/Rt but has not yet executed the ROB tag/slot of the producer is taken with the dependent instruction • If an entry in the ROB is producing Rs/Rt and the result is there waiting to be committed, that value is taken with the dependent instruction • If no entry in the ROB is producing Rs/Rt, data in the register file is taken with the dependent instruction • Since multiple entries in the ROB may match Rs/Rt a priority resolver is necessary

8 Take a Number vs. Take a Token • ROB forms a virtual queue! • ROB Tag = Paper token taken by the customer – Recall that we wrap back to 0 after the maximum tag number Helps to create a In State Bank of India, the cashier issues virtual queue. brass token to customers trying to draw money as an ID (and not at all to put them in any virtual queue / ordering). Token numbers are in random order. The cashier verifies the signature in the record rooms, returns with money, calls the token number and issues the money. Tokens are reclaimed & reused.

9 Example 1 Solutions • Case 1 Assume now serving customer 52 – Your number is 55 and mine is 65 – I am 10 numbers ( after / before ) you. • Case 2 – Your number is 55 and mine is 45 – I am 90 numbers ( after / before ) you.

10 Computing Distance • To find how many people are waiting subtract Assume now serving customer 52 the “Now Serving” number from the last number pulled • Example – Last number pulled = 92 – “Now Serving” = 52 – # Waiting = 40 • But suppose the last number pulled is 32 – Last number pulled = 32 – “Now Serving” = 52 – # Waiting = (-20) mod 100 = 80 mod 100!

11 Computing Distance • Depth = (WP – RP) mod 8 FIFO Initially Empty FIFO Depth = 4 D = WP-RP = 0-0 = 0 D = WP-RP = 4-0 = 0 WP 6 5 6 5 7 4 7 4 0 3 0 3 1 2 1 2 RP RP WP FIFO Depth = 1 FIFO Depth = 7 D = WP-RP = 4-3 = 1 D = WP-RP = (2-3) mod 8 = 7 WP 6 5 6 5 7 4 7 4 0 3 0 3 1 2 1 2 RP RP WP

12 ROB Dispatch for Rs • $2 is needed by dispatch • Which entry should be selected by you (the ROB)? Scenario 0 Scenario 1 Valid Rd RegWrite Valid Rd RegWrite 0 0 0 1 0 1 0 1 1 0 $2 1 1 1 $2 1 2 0 0 0 2 1 $10 1 Bottom (wp) Top (rp) 3 1 $1 1 3 0 $1 0 4 1 $2 1 4 0 $21 1 5 1 $15 1 5 0 $12 1 6 1 $2 1 6 0 $2 0 7 1 $12 1 7 0 $15 1 8 1 $2 0 8 0 $22 1 Top (rp) 9 1 $7 0 9 1 $7 1 10 0 $13 1 10 1 $13 0 Bottom (wp) 11 0 0 1 11 1 $2 1 12 0 $4 0 12 1 $1 1 13 0 $2 1 13 1 $2 0 14 0 0 1 14 1 $3 1

13 Dealing with Wrapping Scenario 0 Scenario 1 Set 1 0 0 Top Pointer (rp) 1 1 Bottom Pointer (wp) 2 2 Set 1 3 3 4 4 Set 0 Bottom Pointer (wp) Top Pointer (rp) Set 0 30 30 31 31 In each scenario, which set should be given higher priority of selection to forward the value of a particular register?

14 ROB Dispatch for Rs Similar logic for Rt Rd, RdTag, Instruction Resolve highest priority match of Rd to Valid, Instruction Rs for all valid instructions between completed, RdData Top Pointer and Last ROB entry (i.e. entry 31) 0 = Priority Resolver (Pass 1 Rs Data Valid = Highest Rs Data Priority Rs Tag Valid Priority Rs Tag Resolver Active Input) 2 (Pass = Highest ROB Priority Priority Active Input) Resolver = 30 Selects appropriate entry (Pass based on Top and Bottom Highest Pointer locations Priority 31 = Active Input) Resolve highest priority match of Rd to rs Rs for all valid instructions between Top ROB entry (i.e. entry 0) and Bottom Pointer

15 Issue Queues From Dispatch From Controller Dispatch Unit always places instruction in top Reg. register Instruction(s) move forward if there is room at the bottom To Issue Unit Controller Reg. Any instruction is a candidate for Reg. execution provided it is "ready" Choose the senior-most

16 SPECULATIVE EXECUTION

17 Branch Prediction + Speculation • To keep the backend fed with enough work we need to predict a branch's outcome and perform "speculative" execution beyond the predicted (unresolved) branch – Roll back mechanism (flush) in case of misprediction Head of ROB Basic Block NT-path T-path Conditional branches Speculative Execution Path

18 Speculation Example Basic Block • Predict branches and ROB Head (Assume stall) execute most likely path T NT Basic Block Basic Block – Simply flush ROB entries Correct Path after the mispredicted branch T NT Basic Block Basic Block – Need good prediction Spec. Path capabilities to make this useful Head Head Head Commit Unit Commit Unit Commit Unit Commit Unit Wrong-Path Execution Time 1: ROB Time 2a: ROB Time 2b: Time 3: ROB Red Entries = Predicted Black Entry = Mispredicted Flush ROB/Pipeline of Pipeline begins to fill w/ Branches branch instructions behind it correct path

19 Handling Jumps and Branches • IFQ is flushed every time a jump instruction enters the dispatch unit • When a branch enters the dispatch unit, branch prediction is performed using the BPB (Branch Prediction Buffer) – Last n (e.g. 3) bits of PC are used by the branch predictor – Branches are handled aggressively • Executed as soon as they arrive on the CDB without waiting for instruction to become the head of the ROB so as to determine if prediction was correct and take appropriate action • Selective flushing mechanism is used to flush instructions in backend in case of mispredicted branch

20 Flushing Mechanism • In order to flush instructions in the backend a 'flush' signal along with the following are conveyed to the backend – Current Top of ROB – Depth of the Branch Instruction • All instructions in the backend (as well as the ROB) with depth greater than the successful branch need to leave (be flushed) 0 0 Top Pointer (rp) 1 1 Taken Branch 2 2 3 3 Taken Branch WP 4 4 Top Pointer (rp) 5 5 WP 30 30 31 31 Flush Depth = 2 = (4-2) Flush Depth = 29 = (2-5) mod 32

EE 457 Unit 9b In-Order Completion Speculation 2 Credits Some of - PowerPoint PPT Presentation

1 EE 457 Unit 9b In-Order Completion Speculation 2 Credits Some of the material in this presentation is taken from: Computer Architecture: A Quantitative Approach John Hennessy & David Patterson Some of the material in this

457 Retirement Program 41-10390-29 2018/01/05 457 Retirement Program Things You Already Know

Credits These slides were derived from Gandhi Puvvadas EE 457 Class Notes EE 457 Unit 1

EE 457 Focus on CPU Design Microarchitecture EE 457 Unit 0 General Digital System

Deferred Compensation Plans 457(b) & 457(f) Presented By: Nonqualified Deferred Compensation

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

EE 457 Unit 1 Overview of Digital System Design 1.2 Credits These slides were derived from

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Caroline Van Wie AT&T Services Inc. T: 202.457.3053 AVP - Federal Regulatory 1120 20 th

EE 457 Unit 4 Computer System Performance 2 Motivation An individual user wants to:

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy & Caching Use several

EE 457 Unit 2 Fixed Point Systems and Arithmetic 2 Unsigned 2s Complement Sign and Zero

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such

EE 457 Unit 2b Fast Adders (Carry-Lookahead Adder) 2 Carry-Lookahead Adders FAST ADDERS 3

EE 457 Unit 6b Data Hazards 2 Data Hazards Consider the data dependencies in the following

EE 457 Unit 2a Unsigned 2s Complement Sign and Zero Extension Fixed Point Systems and

Privacy-Preserving DNS Analysis of Broadcast, Range Queries and Mixes Hannes Federrath,

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Mobile & Service Robotics

Secrets Management in Mesos Vinod Kone ( vinodkone@apache.org ) MesosCon EU 2017 About me

The Evolving Architecture of the Web Nick Sullivan Head of Cryptography CFSSL Universal SSL

DNS-over-HTTPS (DoH) Arve Gengelbach October 25, 2019 Cryptoparty, Uppsala 1 HTTPS 2 3 4 5

Real World IronPython Dynamic Languages on .NET Michael Foord Resolver Systems

Natural Language Generation and Dialog System Evaluation EE596/LING580 -- Conversational

Alexa, can you help me? hi, how are you doing? I don't know what to do. hi, how are you doing?

EE 457 Unit 9b In-Order Completion Speculation 2 Credits Some of - PowerPoint PPT Presentation

1 EE 457 Unit 9b In-Order Completion Speculation 2 Credits Some of the material in this presentation is taken from: Computer Architecture: A Quantitative Approach John Hennessy & David Patterson Some of the material in this

457 Retirement Program 41-10390-29 2018/01/05 457 Retirement Program Things You Already Know

Credits These slides were derived from Gandhi Puvvadas EE 457 Class Notes EE 457 Unit 1

EE 457 Focus on CPU Design Microarchitecture EE 457 Unit 0 General Digital System

Deferred Compensation Plans 457(b) &amp; 457(f) Presented By: Nonqualified Deferred Compensation

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

EE 457 Unit 1 Overview of Digital System Design 1.2 Credits These slides were derived from

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Caroline Van Wie AT&amp;T Services Inc. T: 202.457.3053 AVP - Federal Regulatory 1120 20 th

EE 457 Unit 4 Computer System Performance 2 Motivation An individual user wants to:

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy &amp; Caching Use several

EE 457 Unit 2 Fixed Point Systems and Arithmetic 2 Unsigned 2s Complement Sign and Zero

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such

EE 457 Unit 2b Fast Adders (Carry-Lookahead Adder) 2 Carry-Lookahead Adders FAST ADDERS 3

EE 457 Unit 6b Data Hazards 2 Data Hazards Consider the data dependencies in the following

EE 457 Unit 2a Unsigned 2s Complement Sign and Zero Extension Fixed Point Systems and

Privacy-Preserving DNS Analysis of Broadcast, Range Queries and Mixes Hannes Federrath,

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Mobile &amp; Service Robotics

Secrets Management in Mesos Vinod Kone ( vinodkone@apache.org ) MesosCon EU 2017 About me

The Evolving Architecture of the Web Nick Sullivan Head of Cryptography CFSSL Universal SSL

DNS-over-HTTPS (DoH) Arve Gengelbach October 25, 2019 Cryptoparty, Uppsala 1 HTTPS 2 3 4 5

Real World IronPython Dynamic Languages on .NET Michael Foord Resolver Systems

Natural Language Generation and Dialog System Evaluation EE596/LING580 -- Conversational

Alexa, can you help me? hi, how are you doing? I don't know what to do. hi, how are you doing?

Deferred Compensation Plans 457(b) & 457(f) Presented By: Nonqualified Deferred Compensation

Caroline Van Wie AT&T Services Inc. T: 202.457.3053 AVP - Federal Regulatory 1120 20 th

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy & Caching Use several

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Mobile & Service Robotics