branch predictors
play

BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School - PowerPoint PPT Presentation

BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcements Homework 2 release: Sept. 26 th This lecture Dynamic branch prediction


  1. BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

  2. Overview ¨ Announcements ¤ Homework 2 release: Sept. 26 th ¨ This lecture ¤ Dynamic branch prediction ¤ Counter based branch predictor ¤ Correlating branch predictor ¤ Global vs. local branch predictors

  3. Big Picture: Why Branch Prediction? ¨ Problem: performance is mainly limited by the number of instructions fetched per second ¨ Solution: deeper and wider frontend ¨ Challenge: handling branch instructions

  4. Big Picture: How to Predict Branch? ¨ Static prediction (based on direction or profile) ¨ Always not-taken ¨ Target = next PC ¨ Always taken ¨ Target = unknown direction clk target ¨ Dynamic prediction clk NPC PC + ¨ Special hardware using PC 4 Instruction Inst. Memory

  5. Recall: Dynamic Branch Prediction ¨ Hardware unit capable of learning at runtime ¤ 1. Prediction logic n Direction (taken or not-taken) n Target address (where to fetch next) ¤ 2. Outcome validation and training n Outcome is computed regardless of prediction ¤ 3. Recovery from misprediction n Nullify the effect of instructions on the wrong path

  6. Branch Prediction ¨ Goal: avoiding stall cycles caused by branches ¨ Solution: static or dynamic branch predictor ¤ 1. prediction ¤ 2. validation and training ¤ 3. recovery from misprediction ¨ Performance is influenced by the frequency of branches (b), prediction accuracy (a), and misprediction cost (c)

  7. Branch Prediction ¨ Goal: avoiding stall cycles caused by branches ¨ Solution: static or dynamic branch predictor ¤ 1. prediction ¤ 2. validation and training ¤ 3. recovery from misprediction ¨ Performance is influenced by the frequency of branches (b), prediction accuracy (a), and misprediction cost (c) 𝑇𝑞𝑓𝑓𝑒𝑣𝑞 = 𝑃𝑚𝑒 𝑈𝑗𝑛𝑓 𝑂𝑓𝑥 𝑈𝑗𝑛𝑓 = 𝐷𝑄𝐽 234 1 + 𝑐𝑑 = 𝐷𝑄𝐽 567 1 + 1 − 𝑏 𝑐𝑑

  8. Problem ¨ A pipelined processor requires 3 stall cycles to compute the outcome of every branch before fetching next instruction; due to perfect forwarding/bypassing, no stall cycles are required for data/structural hazards; every 5 th instruction is a branch. ¤ Compute speedup gained by a branch predictor with 90% accuracy

  9. Problem ¨ A pipelined processor requires 3 stall cycles to compute the outcome of every branch before fetching next instruction; due to perfect forwarding/bypassing, no stall cycles are required for data/structural hazards; every 5 th instruction is a branch. ¤ Compute speedup gained by a branch predictor with 90% accuracy Speedup = (1 + 0.2 × 3) / (1 + 0.1 × 0.2 × 3) = 1.5

  10. Bimodal Branch Predictors ¨ One-bit branch predictor ¤ Keep track of and use the outcome of last branch taken not-taken N T taken not-taken

  11. Bimodal Branch Predictors ¨ One-bit branch predictor ¤ Keep track of and use the outcome of last branch taken not-taken N T taken not-taken while(1) { for(i=0; i<10; i++) { } for(j=0; j<20; j++) { } }

  12. Bimodal Branch Predictors ¨ One-bit branch predictor ¤ Keep track of and use the outcome of last branch taken not-taken N T taken not-taken while(1) { for(i=0; i<10; i++) { branch-1 } for(j=0; j<20; j++) { branch-2 } }

  13. Bimodal Branch Predictors ¨ One-bit branch predictor ¤ Keep track of and use the outcome of last branch taken ¨ Shared predictor not-taken N T taken ¨ Two mispredictions per loop not-taken while(1) { for(i=0; i<10; i++) { branch-1 } for(j=0; j<20; j++) { branch-2 } }

  14. Bimodal Branch Predictors ¨ One-bit branch predictor ¤ Keep track of and use the outcome of last branch taken ¨ Shared predictor not-taken N T taken ¨ Two mispredictions per loop not-taken while(1) { for(i=0; i<10; i++) { branch-1 Accuracy = 26/30 = 0.86 } for(j=0; j<20; j++) { branch-2 How to improve? } }

  15. Bimodal Branch Predictors ¨ Two-bit branch predictor ¤ Increment if taken ¤ Decrement if untaken while(1) { for(i=0; i<10; i++) { branch-1 } for(j=0; j<20; j++) { branch-2 } }

  16. Bimodal Branch Predictors taken ¨ Two-bit branch predictor 01 10 ¤ Increment if taken not- ¤ Decrement if untaken taken 00 11 not-taken taken while(1) { for(i=0; i<10; i++) { branch-1 } for(j=0; j<20; j++) { branch-2 } }

  17. Bimodal Branch Predictors taken ¨ Two-bit branch predictor 01 10 ¤ Increment if taken not- ¤ Decrement if untaken taken • One misprediction on loop 00 11 not-taken taken exit • Accuracy = 28/30 = 0.93 while(1) { for(i=0; i<10; i++) { branch-1 } for(j=0; j<20; j++) { branch-2 } }

  18. Bimodal Branch Predictors taken ¨ Two-bit branch predictor 01 10 ¤ Increment if taken not- ¤ Decrement if untaken taken • One misprediction on loop 00 11 not-taken taken exit • Accuracy = 28/30 = 0.93 while(1) { • How to improve? for(i=0; i<10; i++) { branch-1 • 3-bit predictor? } • Problem? for(j=0; j<20; j++) { branch-2 • A single predictor shared } } among many branches

  19. Using Multiple Counters ¨ How to assign a branch to each counter? PC Counters Program code … branch-1 … branch-2 … branch-3

  20. Using Multiple Counters ¨ How to assign a branch to each counter? PC a Counters Program code … branch-1 … branch-2 … branch-3 n

  21. Using Multiple Counters ¨ How to assign a branch to each counter? PC a Counters Program code … branch-1 … 1. How many branches branch-2 are in a program? … branch-3 2. How many counters are used? n

  22. Using Multiple Counters ¨ How to assign a branch to each counter? Cost = n2 a bits PC a Counters Program code … branch-1 … 1. How many branches branch-2 are in a program? … branch-3 2. How many counters are used? n

  23. Using Multiple Counters ¨ How to assign a branch to each counter? ¤ Decode History Table (DHT) n Reduced HW with aliasing PC b Counters Program code … branch-1 … Least significant bits are branch-2 used to select a counter … branch-3 n

  24. Using Multiple Counters ¨ How to assign a branch to each counter? ¤ Decode History Table (DHT) n Reduced HW with aliasing PC b Counters Program code … branch-1 … Least significant bits are branch-2 used to select a counter … branch-3 (+) Reduced hardware ( ⎼ ) Branch aliasing n

  25. Using Multiple Counters ¨ How to assign a branch to each counter? ¤ Decode History Table (DHT) Cost = n2 b bits n Reduced HW with aliasing PC b Counters Program code … branch-1 … Least significant bits are branch-2 used to select a counter … branch-3 (+) Reduced hardware ( ⎼ ) Branch aliasing n

  26. Using Multiple Counters ¨ How to assign a branch to each counter? ¤ Decode History Table (DHT) n Reduced HW with aliasing PC ¤ Branch History Table (BHT) b Tags n Precisely tracking branches a-b n Most significant bits are Counters used as tags = hit/miss*

  27. Using Multiple Counters ¨ How to assign a branch to each counter? ¤ Decode History Table (DHT) n Reduced HW with aliasing PC ¤ Branch History Table (BHT) b Tags n Precisely tracking branches a-b n Most significant bits are Counters used as tags (+) No aliasing ( ⎼ ) Missing entries = hit/miss*

  28. Using Multiple Counters ¨ How to assign a branch to each counter? ¤ Decode History Table (DHT) Cost = (a-b+n)2 b bits n Reduced HW with aliasing PC ¤ Branch History Table (BHT) b Tags n Precisely tracking branches a-b n Most significant bits are Counters used as tags (+) No aliasing ( ⎼ ) Missing entries = hit/miss*

  29. Using Multiple Counters ¨ How to assign a branch to each counter? ¤ Decode History Table (DHT) n Reduced HW with aliasing PC ¤ Branch History Table (BHT) BHT DHT b n Precisely tracking branches a-b n n ¤ Combined BHT and DHT n BHT is used on a hit n DHT is used/updated on a miss =

  30. Using Multiple Counters ¨ How to assign a branch to each counter? ¤ Decode History Table (DHT) Cost = (a-b+2n)2 b bits n Reduced HW with aliasing PC ¤ Branch History Table (BHT) BHT DHT b n Precisely tracking branches a-b n n ¤ Combined BHT and DHT n BHT is used on a hit n DHT is used/updated on a miss =

  31. Using Multiple Counters ¨ How to assign a branch to each counter? ¤ Decode History Table (DHT) Cost = (a-b+2n)2 b bits n Reduced HW with aliasing PC ¤ Branch History Table (BHT) BHT DHT b n Precisely tracking branches a-b n n ¤ Combined BHT and DHT n BHT is used on a hit n DHT is used/updated on a miss DHT typically has more entries than BHT =

  32. Correlating Branch Predictor ¨ Executed branches of a program stream may be correlated while (1) { if(x == 0) y = 0; … if(y == 0) x = 1; }

  33. Correlating Branch Predictor ¨ Executed branches of a program stream may be correlated while (1) { if(x == 0) branch-1 y = 0; … if(y == 0) branch-2 x = 1; }

  34. Correlating Branch Predictor ¨ Executed branches of a program stream may be correlated while (1) { while: if(x == 0) BNEQ R1, R0, skp1 branch-1 y = 0; ADDI R2, R0, #0 … skp1: ... if(y == 0) BNEQ R2, R0, skp2 branch-2 x = 1; ADDI R1, R0, #1 } skp2: J while

  35. Correlating Branch Predictor ¨ Executed branches of a program stream may be correlated Global History Register: an r-bit shift register while (1) { that maintains outcome history r if(x == 0) branch-1 taken? y = 0; … if(y == 0) branch-2 x = 1; }

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend