DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined - - PowerPoint PPT Presentation

dlx pipeline
SMART_READER_LITE
LIVE PREVIEW

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined - - PowerPoint PPT Presentation

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle non-pipelined Divider EX MEM A1 A2 IF ID WB M1 M2 M3 M4 DIV (5 cycle non pipelined) 9 Structural Hazard: WB stage 1 2 3 4 5 6 A IF ID +


slide-1
SLIDE 1

DLX Pipeline

2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle non-pipelined Divider

IF ID A1 DIV (5 cycle non pipelined) EX A2 MEM WB M1 M2 M3 M4

9

slide-2
SLIDE 2

Structural Hazard: WB stage 2 3 4 5 6

ID +

1

IF A B A: ADD.D F0, F2, F4 B: L.D F18, 100(R4) Contention for: Write ports in Register File in WB stage (cycle 6) Data paths through MEM stage (cycle 5) MEM WB + IF ID EX MEM WB

13

slide-3
SLIDE 3

Structural Hazard: WB stage

2 3 4 5 6

ID *

1

IF A: DIV.D F0, F2, F4 B: MUL.D F6, F8, F10 C: ADD.D F12, F14, F16 D: ADD.D F18, F20, F22 E: L.D F24, 100(R4) Contention for: Write ports in Register File in WB stage (cycle 9) Data paths through MEM stage (cycle 8) * * * IF ID + + ID + IF MEM WB + IF ID EX MEM WB MEM WB MEM WB ID / IF / / / MEM WB /

7 8 9

14

slide-4
SLIDE 4

Solutions for WB Structural Hazards

1. Multiple write ports in register file

  • Extra hardware. Slowdown
  • Should we design for the peak vs average number of writes per cycle?

2. Buffer requests at WB stage and write one at a time

  • How deep should the buffer queue be?
  • 3. Stall: Allow only 1 write to propagate to the WB stage

In MEM stage (EX/MEM pipeline register)

Easy (+) Prioritize based on heuristics (longest latency) (+) Need to propagate stall backwards (-) Two sources of resource stalls (-)

In ID stage : Only release instruction that won’t cause hazard in WB stage

Centralized handling of stalls (+) Occurs earlier than necessary (-) We will allow. S.D and FP instruction to go through MEM stage at the same time

15

slide-5
SLIDE 5

Stall in MEM stage

IF ID A1 DIV (5 cycle non pipelined) EX A2

MEM

WB M1 M2 M3 M4

MUX

16

slide-6
SLIDE 6

Stall in ID stage

Check if instruction currently in ID will use WB at the same cycle as a previously issued instruction. If so Stall else Issue the instruction Simple hardware implementation:

  • Shift register of length L equal to length of longest path from ID to WB

– Tracks the usage of WB for the next L cycles – Bit j of the Shift Register is True whenever an issued instruction will use WB j cycles from now – Every cycle shift the contents by 1 bit (so bit j becomes bit number j-1)

Assume instruction in ID wants to use register file in the WB stage: 1. Determine how many cycles later will instruction in ID use the WB stage (say d) (Depends on FU required by the instruction) 2. Check if bit d of register is set or not. If set Stall current instruction for 1 cycle else Set bit d of shift register to 1 3. Shift register one bit position

17

slide-7
SLIDE 7

Summary of Design Features to Avoid Structural Hazards

DIV MUL1 MUL1 ADD1 ADD1

3 1 2 3 4 5 6 7

1 1 1 1 1 1 1 1 1 ADD2 LOAD DIV Writes 1 1 1 1 1 MUL Writes ADD1 Writes ADD2 Writes 1 1 1

ADD1

1 1 1 1 LOAD 1 1 1 LD Writes

slide-8
SLIDE 8

Handling WB Conflict with stalls in ID Stage 1

IF

2

ID

3

/ IF

4

/ IF ID

5

/ ID IF *

6

/ ID IF *

7

/ ID IF *

8

M + ID IF *

9

+ + ID M

10

M + ID WB

11

WB M EX

12

WB M

13

WB

14 15 16

A: DIV.D F0, F2, F4 B: MUL.D F6, F8, F10 C: ADD.D F12, F14, F16 D: ADD.D F18, F20, F22 E: L.D F24, 100(R4) WB IF ID

slide-9
SLIDE 9

Summary of Design Features to Avoid Structural Hazards

  • Contention for Data Path between ID and EX stages:

To allow multiple FUs to be simultaneously active – Fully Pipelined FUs so that they have an initiation latency of 1 cycle – Allowed multiple instructions to be in the ID/EX pipeline register

  • Created separate pipeline registers for each non-pipelined FU
  • Sequence of DIV, MUL, ADD, ADD can all be simultaneously active
  • Contention for FUs in EX stage:

– Fully pipeline the units – Replicate the units

  • Contention for Register File in WB stage
  • Contention for Datapaths from EX to WB

– Single write port in Register File – Only 1 completing instruction will reach WB stage at any cycle – Implemented by stalling instruction at the ID stage if it will want WB at the same cycle as an in-flight instruction

2

slide-10
SLIDE 10

FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider

IF

ID/REG

A1 MUL (4 cycle non pipelined) DIV (6 cycle non pipelined) EX A2 A3 A4 MEM WB

1

slide-11
SLIDE 11

RAW Hazards

  • Source instruction produces a value (writes a register) e.g Arithmetic, Load
  • Target instruction consumes value (read the register) e.g. Arithmetic, Store

Example: A: ADD.D F0, F2, F4 B: ADD.D F6, F8, F0 C: MUL.D F10, F12, F14 Hazard Detection Unit: Checks for RAW Hazards for instruction in ID stage. Stalls instruction in ID till data is available B stalled in ID stage till F0 written by A at cycle 8 12 11 10 9 8 7 6 5 4 3 2 1

* * * ID s s s s s IF + + + + s s s s s ID IF WB MEM + + + + ID IF ID ID ID + + + + + + + + ID ID ID ID ID IF IF IF IF IF MEM WB * * *

A B C

IF IF

4

slide-12
SLIDE 12

Hazard Detection

Simple Signaling Mechanism indicating Register is in process of being changed For each register R: BUSY[R] flag : Indicates that R is the destination of an in-flight instruction Set BUSY[R] to TRUE when instruction that writes to R is issued (leaves ID/REG stage) Clear BUSY[R] when instruction writes to R in the WB stage Issue Stage (ID/REG stage): Let instruction being issued have source registers S1, S2 and destination register D while (BUSY[S1] OR BUSY[S2]) stall instruction in ID/REG stage; BUSY[D] = TRUE; WB Stage: (First half of cycle) Write result to destination register D; BUSY[D] = FALSE;

5

slide-13
SLIDE 13

Hazard Detection

Is Simple Signaling Mechanism sufficient? LD F0, 0(R1) MUL F6, F2, F4 ADD F10, F0, F12

6

+ + + + ID ID IF ADD WB MEM * * * * ID IF MUL WB MEM EX ID IF LD

9 8 7 6 5 4 3 2 1

Stall Cycle Waits for BUSY[F0] to be unset Clears BUSY[F0] flag Sets BUSY[F0] flag

slide-14
SLIDE 14

Hazard Detection

Is Simple Signaling Mechanism sufficient? Is it possible for the Reader to read the result of an unintended instruction? LD F0, 0(R1) MUL F0, F2, F4 ADD F10, F0, F12

7

+ + + + ID ID IF ADD WB MEM * * * * ID IF MUL WB MEM EX ID IF LD

9 8 7 6 5 4 3 2 1

Stall Cycle Waits for BUSY[F0] to be unset Clears BUSY[F0] flag Sets BUSY[F0] flag

Will be handled by the method we will choose to remove WAW Hazards

slide-15
SLIDE 15

Forwarding for RAW Hazards

  • Forwarding hardware to directly move output of FU to ID stage

Hazard Detection Unit: Checks for RAW Hazards for instruction in ID stage. Stalls instruction in ID till data is available Forwarding Unit: Moves data directly from production to consumption points

Example: ADD.D F0, F2, F4 ADD.D F6, F8, F0 MUL.D F10, F12, F14

+

2 3 4 5 6

ID + + + IF ID

7

MEM

+ 1

IF IF ID

8

WB + * A B C

ID ID IF IF ID IF

8

slide-16
SLIDE 16

Forwarding Hardware

IF ID

A1 MUL (4 cycle non pipelined) DIV (4 cycle non pipelined) EX A2 A3 A4 MEM WB

MUX

9 MUX Operand Select Data Select Destination Register Value Source Register

slide-17
SLIDE 17

Forwarding

Issue Stage (ID/REG stage): Operand Select: If the destination register of the instruction completing its EX stage this cycle equals S1 or S2 (source registers of instruction in ID stage) then forward value being generated by the EX stage At most 1 instruction (besides SD) can complete EX stage on any cycle (Why?)

MUX

Data Select (FU of completing instruction) FU1 FUn .. Result of Completing Instruction

10

Destination Register of Completing Instruction

slide-18
SLIDE 18

Forwarding Hardware

IF ID

A1 MUL (4 cycle non pipelined) DIV (4 cycle non pipelined) EX A2 A3 A4 MEM WB

MUX

9 MUX Operand Select Data Select Destination Register Value Source Register

slide-19
SLIDE 19

RAW Hazards

Example:

L.D F0, 0(R2) ADD.D F6, F8, F0 MUL.D F10, F12, F14 MEM

2 3 4 5 6

ID EX WB IF ID s +

7 + 1

IF IF ID * *

8

+ A B C + s *

11

slide-20
SLIDE 20

WAR and WAW Hazards

WAR Hazards: Cannot arise (for same reasons as in integer pipeline)

  • Instruction B issued implies earlier instruction A in-flight
  • Instruction A in-flight implies it has read the registers

WAW Hazards: Possible since path lengths to write stage differ

A: ADD.D F0, F2, F4 B: LD.D F0, 0(R2)

+

2 3 4 5 6

ID + + + IF ID EX MEM WB

7

MEM

1

IF

8

WB A B Writes completed out of order Detect WAW hazard in ID stage.

  • Stall instruction in ID stage till safe ( in example: 3 cycles)
  • Prevent write by first instruction (A) by disabling its write control bit -- will

pass through WB stage at cycle 8 but will not write the register file

12

slide-21
SLIDE 21

WAW Hazards

Detecting a WAW Hazards complicated by many possible cases:

Stall instruction in ID if its destination matches that of an in-flight instruction that will write its destination later than the instruction in ID

If path length from ID to WB for current instruction is d, check for a match of destination registers with all instructions that are more than d cycles away from their WB stage. (Easy in principle!) Unnecessarily complicated for rare event (How many comparisons needed?) Compromise:

Stall instruction in ID if its destination matches that of any in-flight instruction

  • May create unnecessary stalls (-)
  • WAW are relatively rare events (+)
  • Hardware is simpler (+) (How to implement? BUSY[D] flag directly provides information)

Note: We are only stalling for instructions that write to a common register without an intervening read to that register; else it’s a RAW stall.

13

slide-22
SLIDE 22

Summary

  • If instructions A and B are in-flight (assume A issued before B)

– Will write to WB on different cycles (structural hazard solution) – Destination registers of A and B are distinct (WAW solution) – Source registers of B differ from the destination register of A (RAW) – Can source registers of A match the destination register of B?

  • Stalled instructions are held in the ID stage for RAW and WAW

– Easy implementation

  • In order Issue : Instructions leave ID stage in (dynamic) program order

– Instructions leave ID with their operands from either REG or forwarding path

  • Out-of-order completion: A, B may complete in out of program issue order if they write

to different registers

  • Problem if precise exceptions needed

– What if A raises an exception (e.g. arithmetic overflow) after B has completed – What if B is an instruction like: ADD.D F0, F0, F2

14

slide-23
SLIDE 23

Summary

  • What is the performance goal of the pipeline?

– Try and achieve a CPI close to 1 – Stalls for

  • Structural hazard (contention for WB) (expected to be rare)
  • WAW hazard (expected to be rare)
  • RAW hazards

– Reduce number of stalls by forwarding – ??

  • Hint: Reduce penalty due to stalls

15