1
play

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT - PDF document

Reducing Branch Penalty Branch penalty in dynamically scheduled processors: wasted cycles due to pipeline flushing on mis- predicted branches Lecture 9: Branch Prediction Reduce branch penalty: 1. Basic idea, saturating counter, BHT, Predict


  1. Reducing Branch Penalty Branch penalty in dynamically scheduled processors: wasted cycles due to pipeline flushing on mis- predicted branches Lecture 9: Branch Prediction Reduce branch penalty: 1. Basic idea, saturating counter, BHT, Predict branch/jump instructions AND branch direction (taken or not taken) BTB, return address prediction, correlating prediction 2. Predict branch/jump target address (for taken branches) 3. Speculatively execute instructions along the predicted path 1 2 What to Use and What to Predict Mis-prediction Detections and Feedbacks Available info: Detections: At the end of decoding Current predicted PC FETCH � predictors pred_PC Target address known at Past branch history PC � � decoding, and not match (direction and target) RENAME Flush fetch stage What to predict: � At commit (most cases) REB/ROB Conditional branch inst: Wrong branch direction or � branch direction and Predictors � target address not match IM target address SCHD Flush the whole pipeline � Jump inst: target (at EXE: MIPS R10000) � address EXE Procedure call/return: � Feedbacks: target address WB Any time a mis-prediction is May need instruction pre- detected PC & Inst pred info feedback PC decoded COMMIT At a branch’s commit (at EXE: called speculative update) 3 4 Branch Direction Prediction Predictor for a Single Branch Predict branch direction: taken or not taken General Form (T/NT) BNE R1, R2, L1 taken 1. Access … 2. Predict state L1: … Not taken Output T/NT PC Static prediction: compilers decide the direction Dynamic prediction: hardware decides the 3. Feedback T/NT direction using dynamic information 1-bit prediction 1-bit Branch-Prediction Buffer 1. Feedback 2-bit Branch-Prediction Buffer 2. Correlating Branch Prediction Buffer T NT 3. Tournament Branch Predictor NT 4. 1 0 Predict Taken and more … Predict Taken 5. T 5 6 1

  2. Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in a loop, 1-bit BHT will cause Prediction Buffer in 2 mispredictions textbook K-bit Branch Consider a loop of 9 iterations before exit: Can use only one 1-bit address predictor, but accuracy is for (…){ low BHT: use a table of simple for (i=0; i<9; i++) 2 k predictors, indexed by bits a[i] = a[i] * 2.0; from PC } Similar to direct mapped cache � End of loop case, when it exits instead of looping Prediction Prediction More entries, more cost, as before but less conflicts, higher � First time through loop on next time through accuracy code, when it predicts exit instead of looping BHT can contain complex � Only 80% accuracy even if loop 90% of the time predictors 7 8 Branch Target Buffer 2-bit Saturating Counter Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) � Note: must check for branch match now, since can’t use wrong Solution: 2-bit scheme where change prediction only if branch address get misprediction twice: (Figure 3.7, p. 249) Example: BTB combined with BHT T Branch PC Predicted PC NT PC of instruction 11 10 Predict Taken Predict Taken T FETCH T NT NT 01 00 Predict Not Predict Not T Taken Taken NT Extra =? Blue: stop, not taken Yes: instruction is prediction state branch and use Gray: go, taken bits No: branch not predicted PC as predicted, proceed normally Adds hysteresis to decision making process next PC (Next PC = PC+4) 9 10 Return Addresses Prediction Correlating Branches Register indirect branch hard to predict Code example showing Assemble code address the potential � Many callers, one callee � Jump to multiple return addresses from a single If (d==0) BNEZ R1, L1 address (no PC-target correlation) d=1; DADDIU R1,R0,#1 SPEC89 85% such branches for procedure If (d==1) L1: DADDIU R3,R1,#-1 return … BNEZ R3, L2 Since stack discipline for procedures, save L2: return address in small buffer that acts like … a stack: 8 to 16 entries has small miss rate Observation: if BNEZ1 is not taken, then BNEZ2 is taken 11 12 2

  3. Correlating Branch Predictor Correlating Branch Predictor Idea: taken/not taken of General form: (m, n) Branch address (4 bits) Branch address (4 bits) recently executed predictor branches is related to � m bits for global behavior of next branch 1-bits per branch 2-bits per branch history, n bits for local local predictors local predictors (as well as the history of history that branch behavior) � Records correlation � Then behavior of between m+1 branches recent branches Prediction Prediction � Simple implementation: Prediction Prediction selects between, say, 2 global history can be predictions of next store in a shift branch, updating just register that prediction � Example: (2,2) � (1,1) predictor: 1-bit predictor, 2-bit global, global, 1-bit local 1-bit global 2-bit global 2-bit local branch history branch history (0 = not taken) (01 = not taken then taken) 13 14 Estimate Branch Penalty Accuracy of Different Schemes (Figure 3.15, p. 206) EX: BHT correct rate 20% 4096 Entries 2-bit BHT is 95%, BTB hit 18% Frequency of Mispredictions Unlimited Entries 2-bit BHT rate is 95% 16% 1024 Entries (2,2) BHT 14% Frequency of Mispredictions 12% Average miss penalty 11% is 15 cycles 10% 8% 6% 6% 6% 6% 5% 5% How much is the 4% 4% branch penalty? 2% 1% 1% 0% 0% nasa7 matrix300 tomcatv doducd spice fpppp gcc espresso eqntott li 4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2) 15 16 Accuracy of Return Address Predictor 17 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend