p age 1
play

P age 1 Correlating Branches Consider 3 Scenarios I dea: t aken/ - PDF document

CS252 Todays Big I dea Graduate Computer Architecture Lecture 18: Reactive: past actions cause system to adapt use Branch Prediction + analysis resources => I LP do what you did bef ore better ex: caches TCP windows


  1. CS252 Today’s Big I dea Graduate Computer Architecture Lecture 18: • Reactive: past actions cause system to adapt use Branch Prediction + analysis resources => I LP – do what you did bef ore better – ex: caches – TCP windows – URL completion, . . . • Proact ive: uses past act ions t o predict April 2, 2002 f ut ure act ions Prof . David E. Culler – optimize speculatively, anticipate what you are about to Comput er Science 252 do – branch prediction Spring 2002 – long cache blocks – ??? CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 1 Lec 18. 2 Review: Case f or Branch Prediction when Review: 7 Branch Prediction Schemes I ssue N instructions per clock cycle 1. Branches will arrive up t o n t imes f ast er in 1. 1- bit Branch- Predict ion Buf f er an n - issue processor 2. 2- bit Branch- Predict ion Buf f er 2. Amdahl’s Law => relat ive impact of t he 3. Correlat ing Branch Predict ion Buf f er control stalls will be larger with the lower 4. Tournament Branch Predict or pot ent ial CPI in an n - issue processor 5. Branch Target Buf f er 6. I nt egrat ed I nst ruct ion Fet ch Unit s 7. Ret urn Address Predict ors conversely, need branch predict ion t o ‘see’ potential parallelism CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 3 Lec 18. 4 Review: Dynamic Branch Prediction Review: Dynamic Branch Prediction (Jim Smit h, 1981) • Bet t er Solut ion: 2- bit scheme where change • Perf ormance = ƒ(accuracy, cost of mispredict ion) predict ion only if get mispredict ion twice: • Branch Hist ory Table: Lower bit s of PC address index t able of 1- bit values T – Says whether or not branch taken last time NT – No address check (saves HW, but may not be right branch) Predict Taken Predict Taken T • Problem: in a loop, 1- bit BHT will cause 2 mispredict ions (avg is 9 it erat ions bef ore exit ): T NT NT – End of loop case, when it exits instead of looping as bef ore Predict Not Predict Not – First time through loop on next time through code, when it T Taken predicts exit inst ead of looping Taken – Only 80% accuracy even if loop 90% of the time • Red: st op, not t aken NT • Green: go, taken • Adds hyst eresis t o decision making process CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 5 Lec 18. 6 P age 1

  2. Correlating Branches Consider 3 Scenarios I dea: t aken/ not Branch address (4 bits) t aken of recent ly • Branch f or loop t est execut ed branches is 2-bits per branch relat ed t o behavior • Check f or error or except ion local predictors of next branch (as • Alt ernat ing t aken / not- t aken well as t he hist ory of – example? that branch behavior) – Then behavior of recent Prediction Prediction branches selects between, say, 4 predictions of next • Your worst- case predict ion scenario branch, updating just that prediction • (2, 2) predict or: 2- bit global, 2- bit local 2-bit recent global branch history (01 = not taken then taken) CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 7 Lec 18. 8 Accuracy of Dif f erent Schemes Re- evaluating Correlation (Figure 3.15, p. 206) 20% 18% • Several of the SPEC benchmarks have less 18% 4096 Entries 2-bit BHT t han a dozen branches responsible f or 90% Frequency of Mispredictions 16% Unlimited Entries 2-bit BHT of t aken branches: 14% 1024 Entries (2,2) BHT program branch % static # = 90% 12% 11% compress 14% 236 13 10% eqntott 25% 494 5 8% gcc 15% 9531 2020 6% 6% 6% 6% mpeg 10% 5598 532 5% 5% 4% 4% real gcc 13% 17361 3214 • Real programs + OS more like gcc 2% 1% 1% 0% 0% 0% • Small benef its beyond benchmarks f or correlat ion? problems wit h branch aliases? 4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2) CS252/ Culler CS252/ Culler 4/ 2/ 02 What ’s missing in t his pict ure? 4/ 2/ 02 Lec 18. 9 Lec 18. 10 BHT Accuracy Tournament Predictors • Mot ivat ion f or correlat ing branch predict ors is • Mispredict because eit her: 2- bit predictor f ailed on important branches; – Wrong guess f or that branch by adding global inf ormat ion, perf ormance – Got branch history of wrong branch when index the improved table • Tournament predict ors: use 2 predict ors, 1 • 4096 ent ry t able programs vary f rom 1% based on global inf ormat ion and 1 based on mispredict ion (nasa7, t omcat v) to 18% local inf ormat ion, and combine wit h a select or (eqntott ), wit h spice at 9% and gcc at 12% • Hopes t o select right predict or f or right • For SPEC92, branch (or right cont ext of branch) 4096 about as good as inf init e t able CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 11 Lec 18. 12 P age 2

  3. Dynamically f inding structure in Tournament Predictor in Alpha 21264 Spaghetti • 4 K 2 - bit counters to choose f rom among a global predictor and a local predictor • Global predictor also has 4K entries and is indexed by the history of the last 12 branches; each entry in the global predictor is a standard 2 - bit predictor – 12- bit pat t ern: ith bit 0 => ith prior branch not taken; it h bit 1 => it h prior branch taken; • Local predictor consists of a 2 - level predictor: ? – Top level a local history table consisting of 1024 10- bit entries; each 10- bit ent ry corresponds t o t he most recent 10 branch outcomes f or the entry. 10- bit history allows patterns 10 branches to be discovered and predicted. – Next level Selected entry f rom the local history table is used to index a table of 1K entries consisting a 3 - bit saturating counters, which provide the local prediction • Total size: 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29K bit s! (~180, 000 transistors) CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 13 Lec 18. 14 Accuracy of Branch Prediction % of predictions f rom local predictor in Tournament Prediction Scheme 99% tomcatv 99% 100% 0% 20% 40% 60% 80% 100% 95% doduc 84% 98% 97% nasa7 matrix300 100% 86% fpppp 82% tomcatv 94% Profile-based 98% 90% doduc 2-bit counter 88% spice 55% Tournament li 77% 98% fpppp 76% gcc 72% 86% espresso 82% 63% espresso 96% 37% eqntott 88% 69% li g c c 70% 94% f ig 3.40 0% 20% 40% 60% 80% 100% Branch prediction accuracy • Prof ile: branch prof ile f rom last execution (static in that in encoded in instruction, but prof ile) CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 15 Lec 18. 16 Need Address Accuracy v. Size (SPEC89) at Same Time as Prediction 10% • Branch Target Buf f er (BTB): Address of branch index to get prediction AND branch address (if taken) 9% – Note: must check f or branch match now, since can’t use wrong branch address 8% ( Figure 3.19, 3.20 ) Local 7% Branch PC Predict ed PC PC of inst ruct ion 6% FETCH 5% Correlating 4% 3% Tournament 2% Extra =? 1% Yes: instruction is prediction state branch and use 0% bits No: branch not predicted PC as 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 predicted, proceed normally next PC (Next PC = PC+4) Total predictor size (Kbits) CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 17 Lec 18. 18 P age 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend