Review: FP Pipeline Model 4-stage fully pipelined adder, - - PowerPoint PPT Presentation

review fp pipeline model 4 stage fully pipelined adder
SMART_READER_LITE
LIVE PREVIEW

Review: FP Pipeline Model 4-stage fully pipelined adder, - - PowerPoint PPT Presentation

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1 A2 A3 A4 EX IF MEM WB ID/REG DIV (6 cycle non pipelined) MUL (4 cycle non pipelined) 1 Review: Summary If instructions A and B are


slide-1
SLIDE 1

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider

IF

ID/REG

A1 MUL (4 cycle non pipelined) DIV (6 cycle non pipelined) EX A2 A3 A4 MEM WB

1

slide-2
SLIDE 2

Review: Summary

  • If instructions A and B are in-flight (assume A issued before B)

– Will write to WB on different cycles (structural hazard solution) – Destination registers of A and B are distinct (WAW solution) – Source registers of B differ from the destination register of A (RAW) – Can source registers of A match the destination register of B?

  • Stalled instructions are held in the ID stage for RAW and WAW

– Easy implementation

  • In order Issue : Instructions leave ID stage in (dynamic) program order

– Instructions leave ID with their operands from either REG or forwarding path

  • Out-of-order completion: A, B may complete in out of program issue order if they write

to different registers

  • Problem if precise exceptions needed

– What if A raises an exception (e.g. arithmetic overflow) after B has completed – What if B is an instruction like: ADD.D F0, F0, F2

2

slide-3
SLIDE 3

Summary

  • What is the performance goal of the pipeline?

– Try and achieve a CPI close to 1 – Stalls for

  • Structural hazard (contention for WB) (expected to be rare)
  • WAW hazard (expected to be rare)
  • RAW hazards

– Reduce number of stalls by forwarding – ??

  • Hint: Reduce penalty due to stalls

3

slide-4
SLIDE 4

Instruction Level Parallelism

A

4

  • No space in Green Lane for the Green Car A. Waiting for space. (Structural Hazard)
  • Green Car A has a flat tire. Waiting to be fixed. (RAW Hazard)
  • All cars on main road stalled till A can progress

Head-of-Line Blocking

slide-5
SLIDE 5

Example Head-of-Line Blocking

DIV F0, F2, F4 SD F0, 0(R0) MUL F6, F8, F10

3

slide-6
SLIDE 6

Head-of-Line Blocking 4-stage fully pipelined adder, Non-pipelined multiplier and divider

IF

ID/REG

A1 MUL (4 cycle non pipelined) DIV (6 cycle non pipelined) EX A2 A3 A4

S D D I V

MEM WB

1

SD will hold up the following MUL even though it is independent

M U L

slide-7
SLIDE 7

Instruction Level Parallelism

A

  • Provide a separate staging area where cars can wait
  • We did that for structural hazards for the ID/EX pipeline register
  • Can we do it for RAW stalls?

5

slide-8
SLIDE 8

Instruction Level Parallelism

A

6

  • Provide a separate staging area where cars can wait:
  • Lots more concurrency
  • What is the cost?
slide-9
SLIDE 9

Instruction Level Parallelism

Motivating Example: RAW dependency between A and B

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

Current Multi-cycle FP unit design

B stalled in ID stage till A produces result

  • All instructions after B will also be stalled till B’s stall clears

7

slide-10
SLIDE 10

Scoreboard

T = 1

ID/R ADD MUL DIV EX IR B

A

Issue Register

WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

MEM

8

slide-11
SLIDE 11

Scoreboard

T = 2

ID/R ADD MUL DIV IR B

Issue Register

WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

A

9

EX MEM

slide-12
SLIDE 12

Scoreboard

T = 3

ID/R ADD MUL DIV IR B

Issue Register

WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

A

10

EX MEM

slide-13
SLIDE 13

Scoreboard

T = 4

ID/R ADD MUL DIV IR B

Issue Register

WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

A

11

EX MEM

slide-14
SLIDE 14

Scoreboard

T = 5

ID/R ADD MUL DIV IR B

Issue Register

WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

A

12

EX MEM

slide-15
SLIDE 15

Scoreboard

T = 6

ID/R ADD MUL DIV IR C

B Issue Register

WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

13

EX MEM

slide-16
SLIDE 16

Scoreboard

T = 7

ID/R ADD MUL DIV IR

C Issue Register

WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

B

14

EX MEM

slide-17
SLIDE 17

Scoreboard

T = 8

ID/R ADD MUL DIV IR

Issue Register

WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

B C C completes at cycle 12

15

EX MEM

slide-18
SLIDE 18

Instruction Level Parallelism

Motivating Example: RAW dependency between A and B

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

Current Multi-cycle FP unit design

B stalled in ID stage till A produces result

  • All instructions after B will also be stalled till B’s stall clears
  • C has no resource or data conflicts
  • Why not allow C to execute while B waits for data ?

17

slide-19
SLIDE 19

Scoreboard

T = 1

Issue ADD MUL DIV EX IR B

A

Issue Register

R R R R WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

Staging Area

MEM

18

slide-20
SLIDE 20

Scoreboard

T = 2

Issue ADD MUL DIV IR C

B

Issue Register

A

R R R R WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

19

EX MEM

slide-21
SLIDE 21

Scoreboard

T = 3

Issue ADD MUL DIV IR

B C

Issue Register

R R R R WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

A

20

EX MEM

slide-22
SLIDE 22

Scoreboard

T = 3

Issue ADD MUL DIV IR

B

Issue Register

C

R R R R WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

A

21

EX MEM

slide-23
SLIDE 23

Scoreboard

T = 3

Issue ADD MUL DIV IR

B

Issue Register

R R R R WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

A C

22

EX MEM

slide-24
SLIDE 24

Scoreboard

T = 6

Issue ADD MUL DIV IR

B

Issue Register

A

R R R R WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

C

23

EX MEM

slide-25
SLIDE 25

Scoreboard

T = 7

Issue ADD MUL DIV IR

Issue Register

B

R R R R WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

C

24

EX MEM

slide-26
SLIDE 26

Scoreboard

T = 8

Issue ADD MUL DIV IR

Issue Register

C

R R R R WB

B A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14

25

EX MEM

slide-27
SLIDE 27

Scoreboard

T = 9

Issue ADD MUL DIV IR

Issue Register

R R R R WB

A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 B

All complete in 10 cycles

26

EX MEM

slide-28
SLIDE 28

Scoreboard Operation: RAW

A IF I R / / / / W IF I R R R R + W IF I + W B C

A: DIV.D F0, F2, F4 B: MUL.D F6, F0, F8 C: ADD.D F10, F12, F14 B waits in the Issue Register till A writes to F0. Meanwhile C enters, completes the ADD, writes F10 and exits.

R + / R * * * * R W

27

slide-29
SLIDE 29

Scoreboard Operation: WAW

IF I R / / / / W IF I R R R R R IF I + + W * * * * W

A: DIV.D F0, F2, F4 B: MUL.D F6, F0, F8 C: ADD.D F0, F10, F12

WAW hazard since C’s write will be lost when A completes Get rid of WAW hazard

  • Do not issue an instruction with the same destination register as an in-flight

instruction

  • Do not issue C if previous in-flight instruction with same destination register

A B C / R IF I R / / / / W IF I R R R R R IF I + + W * * * * W A B C / R I I I I I R I Writes F0 Writes F0

28

slide-30
SLIDE 30

Scoreboard Operation: WAR

IF I R / / / W IF I R IF I + + W

A: DIV.D F0, F2, F4 B: MUL.D F6, F0, F8 C: ADD.D F8, F10, F12

WAR hazard since C’s write to F8 occurs before B reads F8 Get rid of WAR hazard Need to be careful not to confuse a WAR with a RAW

A B C / R R R / / R R R R Reads F0 and F8 Writes F8

29

* * * * W

slide-31
SLIDE 31

Scoreboard Operation: WAR Hazards

Problem of distinguishing a RAW dependency from a WAR dependency A: DIV.D F0, F2, F4 C: MUL.D F6, F0, F8 B: MUL.D F6, F0, F8 D: DIV.D F0, F2, F4 DIV should be allowed to write F0 DIV should not be allowed to write F0 before MUL reads till MUL reads P: MUL.D F6, F0, F8 Q: DIV.D F0, F2, F4 R: DIV.D F10, F0, F4

  • Q needs to distinguish between P and R both of which may be stalled

waiting to read F0

  • Q must wait for P to read F0 before overwriting it
  • Q must write to F0 before R reads it

30

slide-32
SLIDE 32

Operation of Issue (I) Stage

Issue Stage: Every cycle

  • Check whether instruction in Instruction Register (IR) should be issued or

stalled – Stalled instruction waits in IR and holds up all succeeding instructions – Issued instruction moves to IssueRegister of the functional unit it needs

  • Instruction in Instruction Register (IR) stalled if either:

– Structural hazard for a ID/EX register or – WAW dependency with some earlier issued instruction

  • Only 1 instruction with the same destination register issued at any time
  • No WAW hazards
  • Instruction to be Issued:

– Update Data Flow Graph

  • Maintains dependency information between instructions)

31

slide-33
SLIDE 33

Operation of Dispatch (R) and Write (W) stages

Dispatch Stage: Every cycle

  • Check whether instruction in Issue Register can be dispatched or stalled
  • Instruction in Issue Register is stalled if it has

– A structural hazrd for the FU or – a RAW dependency with an in-flight instruction – Waits until FU available and all its source operands are ready (RAW dependencies satisfied)

  • Instruction in Issue Register is dispatched when all the operands are available

– Read the source registers from the Register File – Dispatch the instruction to the FU stage – An operand is available when there is no in-flight instruction with matching destination register

Write Stage: Every cycle

  • Check which instructions in EX/WB pipeline register are SAFE-TO-WRITE
  • (WAR Hazards)

– Select an instruction I that is SAFE-TO-WRITE – Write result of I to its destination register – Update Data Flow Graph to indicate write by I

32