1
play

1 Tournament Branch Predictor Accuracy of Return Address Predictor - PDF document

Correlating Branch Predictor General form: (m, n) Branch address (4 bits) predictor Lecture 10: Branch Prediction and m bits for global 2-bits per branch Instruction Delivery history, n bits for local local predictors history


  1. Correlating Branch Predictor General form: (m, n) Branch address (4 bits) predictor Lecture 10: Branch Prediction and � m bits for global 2-bits per branch Instruction Delivery history, n bits for local local predictors history � Records correlation Branch target buffer, return between m+1 branches address prediction, tournament Prediction � Simple implementation: Prediction predictor, high-performance global history can be instruction delivery store in a shift register � Example: (2,2) predictor, 2-bit global, 2-bit global 2-bit local branch history (01 = not taken then taken) 1 2 Branch Target Buffer Accuracy of Different Schemes Branch Target Buffer (BTB): Address of branch index to (Figure 3.15, p. 206) get prediction AND branch address (if taken) 20% � Note: must check for branch match now, since can’t use wrong 4096 Entries 2-bit BHT branch address 18% Frequency of Mispredictions Example: BTB combined with BHT Unlimited Entries 2-bit BHT 16% 1024 Entries (2,2) BHT Branch PC Predicted PC 14% Frequency of Mispredictions PC of instruction 12% 11% FETCH 10% 8% 6% 6% 6% 6% 5% 5% 4% 4% Extra =? Yes: instruction is 2% prediction state 1% 1% branch and use bits 0% No: branch not 0% predicted PC as predicted, proceed normally nasa7 matrix300 tomcatv doducd spice fpppp gcc espresso eqntott li next PC (Next PC = PC+4) 4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2) 3 4 Estimate Branch Penalty Return Addresses Prediction EX: BHT correct rate Register indirect branch hard to predict is 95%, BTB hit address rate is 95% � Many callers, one callee � Jump to multiple return addresses from a single address (no PC-target correlation) Average miss penalty SPEC89 85% such branches for procedure is 6 cycles return Since stack discipline for procedures, save How much is the return address in small buffer that acts like branch penalty? a stack: 8 to 16 entries has small miss rate 5 6 1

  2. Tournament Branch Predictor Accuracy of Return Address Predictor Used in Alpha 21264: Track both “local” and global history Intended for mixed types of applications Global history: T/NT history of past k branches, e.g. 0 1 0 1 0 1 (NT T NT T NT T) PC Local Global Choice Predictor Predictor Predictor mux Global history NT/T 7 8 Tournament Branch Predictor Branch Prediction With n-way Issue Local predictor: use 10-bit local 1. Branches will arrive up to n times history, shared 3-bit counters faster in an n -issue processor 2. Amdahl’s Law => relative impact of PC Local history Counters NT/T the control stalls will be larger with table (1Kx10) (1Kx3) 10 1 the lower potential CPI in an n -issue Global and choice predictors: processor Global history Counters NT/T 12-bit (4Kx2) 12 1 Counters NT/T 010101010101 1 local/global (4Kx2) 9 10 Instruction Fetch Unit Integrated Instruction Fetch Units 1. Integrated branch prediction: branch Fetch predictor predictor becomes part of the Fetch Predicts next fetch I-cache Predictor addresses to avoid instruction fetch unit fetch delay; may 2. Instruction prefetch: fetch ahead to Fetch Branch pre-predict branch deliver multiple instructions per cycle Predictor direction; may be integrated with I- 3. Instruction memory access and Decode/REN cache buffering: may access multiple cache lines in one cycle, use prefetch to Branch predictor Out-of-erder Execution Engine hide the cost overrides and trains Another approach: trace cache fetch predictor In-order commit 11 12 2

  3. Pitfall: Sometimes bigger and dumber is Dynamic Branch Prediction Summary better Prediction becoming important part of scalar Reversed for 21264 uses tournament execution transaction processing predictor (29 Kbits) (TP) ! Branch History Table: 2 bits for loop accuracy Earlier 21164 uses a � 21264 avg. 17 Correlation: Recently executed branches simple 2-bit predictor mispredictions per 1000 correlated with next branch. with 2K entries (or a instructions total of 4 Kbits) � 21164 avg. 15 � Either different branches mispredictions per 1000 SPEC95 benchmarks, instructions � Or different executions of same branches 21264 outperforms TP code much larger & Tournament Predictor: more resources to 21164 hold 2X branch � 21264 avg. 11.5 competitive solutions and pick between them mispredictions per 1000 predictions based on instructions local behavior (2K vs. Branch Target Buffer: include branch address 1K local predictor in � 21164 avg. 16.5 & prediction the 21264) mispredictions per 1000 Return address stack for prediction of instructions indirect jump 13 14 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend