1 Revise Scheduling* Revise Pipeline Stages RS1: ADD R6,R2,R4 - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 Revise Scheduling* Revise Pipeline Stages RS1: ADD R6,R2,R4 - - PDF document

Tomasulo Performance Observe at the EX IM stage, how many Lecture 8: Modern Dynamic Instruction Fetch Unit cycles to execute Scheduling this code? Reorder Decode Rename Regfile Buffer LW R2,45(R3) Tomasulo weakness, data forwarding,


slide-1
SLIDE 1

1

1

Lecture 8: Modern Dynamic Instruction Scheduling

Tomasulo weakness, data forwarding, reg mapping table, generic superscalar models, examples

2

Tomasulo Performance

Observe at the EX stage, how many cycles to execute this code? LW R2,45(R3) ADD R6,R2,R4 SUB R10,R0,R6 ADD R10,R10,R12 Assume load takes 1 cycle, ALU 1 cycle

Reorder Buffer Decode FU1 FU2 RS RS Fetch Unit Rename L-buf S-buf DM Regfile IM

3

Tomasulo vs MIPS Pipeline

How many cycles on the 5-stage MIPS pipeline? Why does the simple pipeline run faster?

IF ID EX MEM WB Stall check Data forwarding

4

Tomasulo Complexity and Efficiency

Modern processors employ deep pipeline => Can the rename stage be finished in

  • ne fast cycle?

=> How are register content storages?

Reorder Buffer Decode FU1 FU2 RS RS Fetch Unit Rename L-buf S-buf DM Regfile IM

5

Review Tomasulo Inst Scheduling

Both in RS, no contention on CDB or FU

ADD R2,R2,45 # R2=>tag p, result = A SUB R6,R2,R4 # R4 is ready, = B

Cycle 1: ADD starts at FU, producing A Cycle 2: ADD broadcast p + A SUB matches on p and accepts A Cycle 3: SUB starts execution, FU calc A-B A is produced at cycle 1, but consumed at cycle 3 -- unavoidable?

6

Review Data Forwarding

MIPS pipeline data forwarding: FU/MEM => FU Why not in Tomasulo? Cycle 2: forward A from FU output to FU input… FU But tag broadcasting has

  • ne cycle delay!!

When is it known that A will be ready?

Cycle 1: A is to be ready Cycle 2: A and its tag are broadcast

If tag is broadcast one- cycle earlier … REG/ROB ROB bypass

slide-2
SLIDE 2

2

7

Revise Scheduling*

RS1: ADD R6,R2,R4 RS2: SUB R10,R0,R6 RS3: ADD R12,R10,R6 ADD(1) has been ready and selected

1.

  • ADD(1)’s tag is broadcast, and
  • perands are sent to FU;
  • SUB is waken up and selected;

2.

  • SUB’s tag is broadcast,
  • perands are sent to FU;
  • forwarding logic replace 2nd FU
  • perand with FU output;
  • ADD(2) is waken up and

accepts FU output, and is selected

3.

So on and so forth… RS can be centralized or distributed

SELECT RS 1 RS 2 RS 3 RS 4 RS 5 FU One cycle earlier How to address CDB contention? *Updated

8

Revise Pipeline Stages

FETCH ISSUE EXE WB COMMIT FETCH RENAME

REG/ROB Rd

SCHEDULE COMMIT WB EXE

ISSUE: decode, rename, allocate RS and ROB, and read REG/ROB EX: Wakeup and select inst, then fu-execute

9

Examples: Intel P6

Decode Decode Rename ROB Rd

  • 40-entry ROB
  • 20-entry RS station
  • Register Alias Table

… …

10

Rethink RS and ROB design

Data broadcasting to RS stations: Broadcasting saves reg-write to reg- read delay n child instructions can receive data simultaneously However, Data forwarding can be used Not all n child instructions may fu- execute next cycle RS and ROB may store duplicate values

11

Physical Register

  • p

Qj Qk Vj Vk i-type RS entry ROB entry dest result PC valid p1 busy p2 p3 p_n Physical register: collection of all temporary register contents Physical register

12

Register Mapping Approach

Rename architectural register to physical register NO real architectural registers (now virtual register) RS => issue queue Rename stage: allocate issue queue entry, allocate ROB, allocate physical register What is tag now? p1 p2 p3 p_n Mapping Table ra rb rc pc pa pb pa pb valavalb

free list

alloc

slide-3
SLIDE 3

3

13

Mis-speculation Recovery

RS+ROB: no changes to

  • arch. registers, so just

clear pipeline and re-fetch Fundamental issue: software does not see wrong register contents Recovery for mapping approach: Roll back mapping table to the mis- speculation point Architectural registers => virtual registers

p1 p2 p3 p_n

Committed mapping

mapping 1 mapping 2

ROB How to implement mapping table supporting recovery?

mapping table status

14

Change of pipeline

FETCH RENAME REG SCHEDULE COMMIT WB EXE

ROB Decode FU1 FU2 Fetch Unit Rename L-buf S-buf DM

  • phy. regfile

IM

issue queue

15

Example: Intel Pentium 4

128 entries

Alloc Rename Rename Queue Schd Schd Schd Disp Disp Reg Reg Ex

16

Alpha 21264 Pipeline

17

Generic Superscalar Processor Models

Fetch Rename Schedule Wakeup select Regfile FU FU bypass D-cache execute commit Fetch Rename Schedule ROB FU FU bypass D-cache execute commit Reg

Wakeup select

Issue queue based Reservation based Source: Paracharla PhD thesis 1998

18

Summary of Dynamic Scheduling

Pipeline stages

  • Renaming (in-order)
  • Schedule
  • Commit (in-order)

Two organizations

  • Mapping table + phy reg +

issue queue + ROB; REN => SCHD => REG

  • Reg alias table + RS + ROB,

reg in RS and ROB; REN => REG => SCHD

Scheduling methods

  • Tag broadcasting vs.

scoreboarding (later)

CDC6600: introduces scoreboarding Tomasulo: introduces renaming and tag broadcasting Reorder buffer: provides in-

  • rder commit

Real OOO processors

  • very complicated (like a

vehicle)

  • bring impl variants
  • but all root in those basic

designs