control branch hazards

Control (Branch) Hazards A: beqz R2, L1 B C D ------ L1: P - PowerPoint PPT Presentation

Control (Branch) Hazards A: beqz R2, L1 B C D ------ L1: P Nave (Lazy) Implementation of a Conditional Branch Instruction in DLX Pipeline : IF: Fetch Branch Instruction from IM ID: Decode Instruction and read registers to be used in


  1. Control (Branch) Hazards A: beqz R2, L1 B C D ------ L1: P Naïve (Lazy) Implementation of a Conditional Branch Instruction in DLX Pipeline : IF: Fetch Branch Instruction from IM ID: Decode Instruction and read registers to be used in comparison EX: Determine Branch Outcome: Compare Source Register values to zero (or each other) and set FLAG in Output Pipeline Register MEM: Compute Target Address: (PC + 4) carried along with instruction + Sign Extended and Shifted Displacement At end of clock cycle: PC assigned Target Address if instruction now in MEM is a successful branch else Execution continues as normal Successful (or Taken) Branch: (i) instruction is a branch and (ii) branch condition is true 1

  2. Naive Implementation of Branch Equal Instruction: BEQZ Rs, d MUX AND PC PC P ADD C C C + n n Decode + t PC 4 REG t r r l l FILE (rs) zero F REG L MEM I Outcome of the A n A branch known FILE G s (rt) at end of cycle IM t L 3. r u rs U PC updated c with new value rt t rd (branch target i o or PC+4 value) n d at end of cycle d SE << 4. Compute target address 2 IF ID EX MEM WB Branch outcome known

  3. Control Hazard A IF ID EX MEM WB T = 1 A IF ID EX MEM WB T = 2 A IF ID EX MEM WB T = 3 A IF ID EX MEM WB B/P T = 4 16 /NOT TAKEN / TAKEN BRANCH

  4. Control (Branch) Hazards : Problems : The target address of the branch is not known (at least) till instruction is decoded • What is the address of instruction P? • The outcome of the branch (taken/ not taken) is determined deep in the pipeline • Should we execute B or P after A? • What should the pipeline (processor) do after fetching the branch instruction? SOLUTION 1 Delay the next instruction • • Till we know the outcome of the branch and the address of next instruction Software: Add 3 NOPS after every Branch Instruction Hardware: Hazard Detection Unit checks for a Branch Instruction in the ID, EX, or MEM stage and stalls PC/Inserts NOPs 15

  5. Simple Software Solution: Insert NOPs A: beqz R2, label Possible execution sequences: NOP Branch Not Taken: A , NOP, NOP, NOP, B NOP Branch Taken: A , NOP, NOP, NOP, P NOP B ----- • Adds 3 cycles to execution time for every branch label: P 1 2 3 4 5 6 7 8 9 IF ID EX MEM WB A NOP NOP NOP B IF ID EX MEM WB P 17

  6. Control Hazard A IF ID EX MEM WB T = 1 NOP A NOP IF ID EX MEM WB NOP T = 2 A NOP NOP IF ID EX MEM WB NOP T = 3 A NOP NOP NOP IF ID EX MEM WB P/B T = 4 v 16

  7. Hardware-Controlled Pipeline Stall A :BEQ R1 , R2, L1 B : ---- C : --- L1: P 1 2 3 4 5 6 7 8 9 A IF ID EX MEM WB B IF IF IF C C IF ID EX MEM WB P Branch Taken: 3 Additional Cycles

  8. Hardware-Controlled Pipeline Stall A :BEQ R1 , R2, L1 B : ---- C : --- L1: P 1 2 3 4 5 6 7 8 9 A IF ID EX MEM WB B IF IF IF IF ID EX MEM WB C Branch Not Taken: 3 Additional Cycles

  9. Hardware-Controlled Pipeline Stall A :BEQ R1 , R2, L1 B : ---- C : --- L1: P 1 2 3 4 5 6 7 8 9 A IF ID EX MEM WB B IF IF IF EX MEM WB ID C C IF ID EX MEM WB C Optimized Branch Not Taken: PC gets address of C

  10. Hazard Detection Unit Freeze register: do not update HDU Insert NOP P IF ID EX MEM WB C Stall PC and Insert NOP into IF/ID if there is a Branch instruction in either the IF/ID, ID/EX or EX/MEM pipeline register

  11. Hardware Controlled Pipeline Stall A: BEQ R1, R2, L1 B: C: --- L1: P Stall B A IF ID EX MEM WB T = 2 Stall A B T = 3 IF ID EX MEM WB • Instruction B (address) held in PC register until A reaches WB stage • Internally generated NOPs propagated forward while B is stalled

  12. Hardware Controlled Pipeline Stall B T = 4 IF ID EX MEM WB A P IF ID EX MEM WB T = 5 A TAKEN BRANCH B T = 5 IF ID EX MEM WB A BRANCH NOT TAKEN C B OPTIMIZED IF ID EX MEM WB A

  13. Branch Delay Slots Software Solution: • Software must delay the execution of the next-in-line instruction after the Branch Delay depends on the pipeline structure • Microarchitecture is exposed to the software (compiler) Branch Delay slots: • Delay introduced by software to avoid control hazards • Dummy instructions following branch instruction for purpose of creating delays till the new PC value can be set • Instructions in the Branch Delay slots always executed • In our design: 3 Branch Delay Slots • Microarchitecture might choose not to expose all the delay slots and use some hardware mechanisms for providing the remaining delay 5

  14. Performance of Simple Stall Based Schemes 1. Stall scheme has a branch penalty of 3 cycles (may be 2 in optimized hardware design) 2. Software inserted NOPs (3 cycles) 1. Hardware inserted stall cycles (3 non-optimized) Example: Suppose Branch Frequency is 20% and 60% of branches are taken. Assume software solution with penalties as above. Assume the compile is able to fill 20% of the Branch Delay slots with useful instructions. How is CPI affected? Each Branch Instructions incurs extra delay of 3 cycles except for the delay slots filled with useful instructions. Branch Penalty (per executed instruction) = 20% x 3 (delay slots) x(80%) unfilled delay slots = 0.48 cycles CPI = Nominal CPI + Penalty Cycles (per instruction) Assuming no other causes of stalls CPI = 1.0 + 0.52 = 1.48 13

  15. Alternate Hardware Solution beqz R2, label • Why delay in-line instructions B, C, D etc? B C • Let instructions following A enter pipeline normally D E -- Works if Branch Not Taken! label: P 1 2 3 4 5 6 7 8 9 IF ID EX MEM WB A IF ID EX MEM WB B IF ID EX MEM WB C IF ID EX MEM WB D IF ID EX MEM WB E 14

  16. Control Hazard No Stall Cycles A B IF ID EX MEM WB T = 1 B A C IF ID EX MEM WB T = 2 C B A D IF ID EX MEM WB T = 3 D C B A E IF ID EX MEM WB T = 4 15 BRANCH NOT TAKEN

  17. Speculation: Alternate Hardware Solution beqz R2, label What if Branch is Taken? B C D • B, C, D have not updated machine state at cycle 4 E -- • Flush B, C, D at end of cycle 4 label: P 1 2 3 4 5 6 7 8 9 IF ID REG MEM WB A IF ID REG MEM WB B IF ID REG MEM WB C IF ID REG MEM WB D IF ID REG MEM WB P 16

  18. Control Hazard A B IF ID EX MEM WB T = 1 B A C IF ID EX MEM WB T = 2 C B A D IF ID EX MEM WB T = 3 D C B A P IF ID EX MEM WB T = 4 16 TAKEN BRANCH

  19. Control Hazard D C B P A IF ID EX MEM WB T = 4 P D C B Q IF ID EX MEM WB T = 5 Q P D C R IF ID EX MEM WB T = 6 R Q P D S IF ID EX MEM WB T = 7 16 TAKEN BRANCH: WRITES to MEM or REG by B, C or D will result in error

  20. Alternate Hardware Solution Taken Branch A B IF ID EX MEM WB T = 2 B A C IF ID EX MEM WB T = 3 C B A D IF ID EX MEM WB T = 4 Insert NOP in IF/ID, ID/EX, EX/MEM A P IF ID EX MEM WB T = 4 17

  21. Branch Penalty in Modified Hardware Scheme • More than an optimized implementation of stall Simple form of control speculation • Speculating it is a NOT TAKEN Branch • Continue fetching in-line instructions • Performance depends on accuracy of speculation • Speculation correct (NOT TAKEN Branch): Continue with no stalls (0 Penalty Cycles) • Speculation incorrect (TAKEN Branch): Flush 3 trailing instructions (3 Penalty Cycles) • Example : Branch Frequency: 20% 5% of Branches are Unconditional Branches 70% Conditional branches are NOT TAKEN CPI = Nominal CPI + Penalty cycles for TAKEN BRANCH + Penalty Cycles for NOT TAKEN Branch Penalty Cycles for TAKEN BRANCH = Penalty cycles for UNCONDITIONAL BRANCH + Penalty cycles for TAKEN CONDITIONAL BRANCH = 20% x 5% x 3 + 20% x 95% x 30% x 3 = 0.03 + 0.171 = 0.201 CPI = 1.0 + = 1.201 19

  22. Predict branch: Not Taken; Actually Not Taken No Stall Cycles A B IF ID EX MEM WB T = 1 B A C IF ID EX MEM WB T = 2 C B A D IF ID EX MEM WB T = 3 BRANCH NOT TAKEN DO NOTHING D C B A E IF ID EX MEM WB T = 4 20

  23. Predict branch: Not Taken; Actually Taken A B IF ID EX MEM WB T = 1 B A C IF ID EX MEM WB T = 2 C B A D IF ID EX MEM WB T = 3 Branch actually taken: FLUSH pipeline Make B,C,D NOPS A P IF ID EX MEM WB T = 4 21

  24. More Control Speculation Can we predict branch as taken ? Speculatively fetch and execute instructions at the branch target • Useful only if target address is known earlier than branch outcome • • May require stall cycles until target address known • Flush pipeline if prediction is incorrect • Must ensure that flushed instructions do not update any machine state Assume that target address is computed in the ID stage • Stall of 1 cycle till PC updated with target address ( ALWAYS !) • Assume branch outcome known at the end of cycle 3 in EX stage • 22

  25. Predict branch taken A B IF ID EX MEM WB T = 1 A P IF ID EX MEM WB T = 2 P A Q IF ID EX MEM WB T = 3 Branch actually taken: Single stall cycle Q P A R IF ID EX MEM WB T = 4 23

  26. Predict branch taken A B IF ID EX MEM WB T = 1 A EX P IF ID MEM WB T = 2 FLUSH pipeline Branch actually not Make P NOP taken: 2 wasted cycles A B IF ID EX MEM WB T = 3 B A C IF ID EX MEM WB T = 4 24

Recommend


More recommend