review fp pipeline model 4 stage fully pipelined adder
play

Review: FP Pipeline Model 4-stage fully pipelined adder, - PowerPoint PPT Presentation

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1 A2 A3 A4 EX IF MEM WB ID/REG DIV (6 cycle non pipelined) MUL (4 cycle non pipelined) 1 Review: Summary If instructions A and B are


  1. Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1 A2 A3 A4 EX IF MEM WB ID/REG DIV (6 cycle non pipelined) MUL (4 cycle non pipelined) 1

  2. Review: Summary • If instructions A and B are in-flight (assume A issued before B) – Will write to WB on different cycles (structural hazard solution) – Destination registers of A and B are distinct (WAW solution) – Source registers of B differ from the destination register of A (RAW) – Can source registers of A match the destination register of B? • Stalled instructions are held in the ID stage for RAW and WAW – Easy implementation • In order Issue : Instructions leave ID stage in (dynamic) program order – Instructions leave ID with their operands from either REG or forwarding path • Out-of-order completion: A, B may complete in out of program issue order if they write to different registers • Problem if precise exceptions needed – What if A raises an exception (e.g. arithmetic overflow) after B has completed – What if B is an instruction like: ADD.D F0, F0, F2 2

  3. Summary • What is the performance goal of the pipeline? – Try and achieve a CPI close to 1 – Stalls for • Structural hazard (contention for WB) (expected to be rare) • WAW hazard (expected to be rare) • RAW hazards – Reduce number of stalls by forwarding – ?? • Hint: Reduce penalty due to stalls 3

  4. Instruction Level Parallelism Head-of-Line Blocking A • No space in Green Lane for the Green Car A. Waiting for space. (Structural Hazard) • Green Car A has a flat tire. Waiting to be fixed. (RAW Hazard) • All cars on main road stalled till A can progress 4

  5. Example Head-of-Line Blocking DIV F0, F2, F4 SD F0, 0(R0) MUL F6, F8, F10 3

  6. Head-of-Line Blocking 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1 A2 A3 A4 EX M S IF MEM WB ID/REG U D L D DIV (6 cycle non pipelined) I V MUL (4 cycle non pipelined) SD will hold up the following MUL even though it is independent 1

  7. Instruction Level Parallelism A • Provide a separate staging area where cars can wait • We did that for structural hazards for the ID/EX pipeline register • Can we do it for RAW stalls? 5

  8. Instruction Level Parallelism A • Provide a separate staging area where cars can wait: • Lots more concurrency • What is the cost? 6

  9. Instruction Level Parallelism Motivating Example: RAW dependency between A and B A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Current Multi-cycle FP unit design B stalled in ID stage till A produces result • All instructions after B will also be stalled till B’s stall clears 7

  10. Scoreboard T = 1 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register ADD IR EX MEM ID/R B WB A DIV MUL 8

  11. Scoreboard T = 2 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register ADD IR ID/R EX MEM B WB DIV A MUL 9

  12. Scoreboard T = 3 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register ADD IR ID/R EX MEM B WB DIV A MUL 10

  13. Scoreboard T = 4 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register ADD IR ID/R EX MEM B WB DIV A MUL 11

  14. Scoreboard T = 5 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register ADD IR ID/R EX MEM B WB DIV A MUL 12

  15. Scoreboard T = 6 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register ADD B IR ID/R EX MEM C WB DIV MUL 13

  16. Scoreboard T = 7 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register ADD B IR ID/R EX MEM WB DIV C MUL 14

  17. Scoreboard T = 8 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register ADD B IR ID/R EX MEM WB DIV C completes at cycle MUL C 12 15

  18. Instruction Level Parallelism Motivating Example: RAW dependency between A and B A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Current Multi-cycle FP unit design B stalled in ID stage till A produces result • All instructions after B will also be stalled till B’s stall clears • C has no resource or data conflicts • Why not allow C to execute while B waits for data ? 17

  19. Scoreboard T = 1 Staging Area A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register R ADD IR Issue R EX MEM B WB A R DIV MUL R 18

  20. Scoreboard T = 2 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register R ADD B IR Issue R EX MEM C WB R DIV A MUL R 19

  21. Scoreboard T = 3 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register R ADD B IR Issue R EX MEM WB R DIV A MUL C R 20

  22. Scoreboard T = 3 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register R ADD B IR Issue R EX MEM WB R DIV A MUL R C 21

  23. Scoreboard T = 3 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register R ADD B IR Issue R EX MEM WB R DIV A MUL R C 22

  24. Scoreboard T = 6 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register R ADD B IR Issue R EX MEM WB R DIV A MUL R C 23

  25. Scoreboard T = 7 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register R B ADD IR Issue R EX MEM WB R DIV MUL R C 24

  26. Scoreboard T = 8 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register R ADD B IR Issue R EX MEM WB R DIV MUL R C 25

  27. Scoreboard T = 9 A: DIVD F0, F2, F4 B: ADDD F10, F0, F8 C: MULTD F12, F8, F14 Issue Register R ADD B IR Issue R EX MEM WB R DIV All complete in 10 cycles MUL R 26

  28. Scoreboard Operation: RAW A IF I R / / / / / W B IF I R R R R R R + * W * * * W C IF I R + + W A: DIV.D F0, F2, F4 B: MUL.D F6, F0, F8 C: ADD.D F10, F12, F14 B waits in the Issue Register till A writes to F0. Meanwhile C enters, completes the ADD, writes F10 and exits. 27

  29. Scoreboard Operation: WAW Writes F0 A IF I R / / / / / W B IF I R R R R R R * * * * W C IF I + + W Writes F0 A: DIV.D F0, F2, F4 B: MUL.D F6, F0, F8 C: ADD.D F0, F10, F12 WAW hazard since C’s write will be lost when A completes Get rid of WAW hazard • Do not issue an instruction with the same destination register as an in-flight instruction • Do not issue C if previous in-flight instruction with same destination register A IF I R / / / / / W B IF I R R R R R R * * * * W C IF I I I I I I I R + + W 28

  30. Scoreboard Operation: WAR A IF I R / / / / / / W B IF I R R R R R R R * * * * W C IF I R + + W Reads F0 and F8 Writes F8 A: DIV.D F0, F2, F4 B: MUL.D F6, F0, F8 C: ADD.D F8, F10, F12 WAR hazard since C’s write to F8 occurs before B reads F8 Get rid of WAR hazard Need to be careful not to confuse a WAR with a RAW 29

  31. Scoreboard Operation: WAR Hazards Problem of distinguishing a RAW dependency from a WAR dependency A: DIV.D F0, F2, F4 C: MUL.D F6, F0, F8 B: MUL.D F6, F0, F8 D: DIV.D F0, F2, F4 DIV should be allowed to write F0 DIV should not be allowed to write F0 before MUL reads till MUL reads • Q needs to distinguish between P and R both of which may be stalled waiting to read F0 • Q must wait for P to read F0 before overwriting it P: MUL.D F6, F0, F8 • Q must write to F0 before R reads it Q: DIV.D F0, F2, F4 R: DIV.D F10, F0, F4 30

  32. Operation of Issue (I) Stage Issue Stage: Every cycle • Check whether instruction in Instruction Register (IR) should be issued or stalled – Stalled instruction waits in IR and holds up all succeeding instructions – Issued instruction moves to IssueRegister of the functional unit it needs • Instruction in Instruction Register (IR) stalled if either: – Structural hazard for a ID/EX register or – WAW dependency with some earlier issued instruction • Only 1 instruction with the same destination register issued at any time • No WAW hazards • Instruction to be Issued: – Update Data Flow Graph • Maintains dependency information between instructions) 31

  33. Operation of Dispatch (R) and Write (W) stages Dispatch Stage: Every cycle • Check whether instruction in Issue Register can be dispatched or stalled • Instruction in Issue Register is stalled if it has – A structural hazrd for the FU or – a RAW dependency with an in-flight instruction – Waits until FU available and all its source operands are ready (RAW dependencies satisfied) • Instruction in Issue Register is dispatched when all the operands are available – Read the source registers from the Register File – Dispatch the instruction to the FU stage – An operand is available when there is no in-flight instruction with matching destination register Write Stage: Every cycle • Check which instructions in EX/WB pipeline register are SAFE-TO-WRITE • (WAR Hazards) – Select an instruction I that is SAFE-TO-WRITE – Write result of I to its destination register 32 – Update Data Flow Graph to indicate write by I

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend