superscalar design
play

Superscalar Design: Instruction Flow Techniques Virendra Singh - PowerPoint PPT Presentation

Superscalar Design: Instruction Flow Techniques Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay


  1. Superscalar Design: Instruction Flow Techniques Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in EE-739: Processor Design Lecture 26 (19 March 2013) CADSL

  2. Disruption of Sequential Control Flow Fetch Instruction/Decode Buffer Decode Dispatch Buffer Dispatch Reservation Stations Issue Branch Execute Finish Reorder/ Completion Buffer Complete Store Buffer Retire 19 Mar 2013 EE-739@IITB 2 CADSL

  3. Branch Prediction • Target address generation  Target Speculation – Access register: • PC, General purpose register, Link register – Perform calculation: • +/- offset, autoincrement, autodecrement • Condition resolution  Condition speculation – Access register: • Condition code register, General purpose register – Perform calculation: • Comparison of data register(s) 19 Mar 2013 EE-739@IITB 3 CADSL

  4. Target Address Generation Fetch Decode Buffer PC- rel. Decode Reg. ind. Dispatch Buffer Reg. ind. Dispatch with offset Reservation Stations Issue Branch Execute Finish Completion Buffer Complete Store Buffer Retire 19 Mar 2013 EE-739@IITB 4 CADSL

  5. Condition Resolution Fetch Decode Buffer Decode CC reg. Dispatch Buffer GP reg. Dispatch value comp. Reservation Stations Issue Branch Execute Finish Completion Buffer Complete Store Buffer Retire 19 Mar 2013 EE-739@IITB 5 CADSL

  6. Branch/Jump Target Prediction 0x0348 0101 (NTNT) 0x0612 Branch inst. Information Branch target address for predict. address (most recent) • Branch Target Buffer: small cache in fetch stage – Previously executed branches, address, taken history, target(s) • Fetch stage compares current FA against BTB – If match, use prediction – If predict taken, use BTB target • When branch executes, BTB is updated • Optimization: – Size of BTB: increases hit rate – Prediction algorithm: increase accuracy of prediction 19 Mar 2013 EE-739@IITB 6 CADSL

  7. Branch Prediction Function • Prediction function F(X1, X2, … ) – X1 – opcode type – X2 – history • Prediction effectiveness based on opcode only, or history IBM1 IBM2 IBM3 IBM4 DEC CDC Opcode 66 69 71 55 80 78 only History 0 64 64 70 54 74 78 History 1 92 95 87 80 97 82 History 2 93 97 91 83 98 91 History 3 94 97 91 84 98 94 History 4 95 97 92 84 98 95 History 5 95 97 92 84 98 96 19 Mar 2013 EE-739@IITB 7 CADSL

  8. Branch Instruction Distribution % of each branch type % bc with penalty cycles Benchmark b bl bc bcr 3 2 1 cyc cyc cyc bcc spice2g6 7.86 0.30 12.5 0.32 13.8 3.12 0.76 8 2 doduc 1.00 0.94 8.22 1.01 10.1 1.76 2.02 4 matrix300 0.00 0.00 14.5 0.00 0.68 0.22 0.20 0 tomcatv 0.00 0.00 6.10 0.00 0.24 0.02 0.01 gcc 2.30 1.32 15.5 1.81 22.4 9.48 4.85 0 6 espresso 3.61 0.58 19.8 0.68 37.3 1.77 0.31 5 7 li 2.41 1.92 14.3 1.91 31.5 3.44 1.37 19 Mar 2013 EE-739@IITB 8 CADSL 6 5

  9. Branch Instruction Speculation to I-cache Prediction FA-mux Spec. target PC(seq.) = FA (fetch address) PC(seq.) Branch Fetch Predictor Spec. cond. (using a BTB) Decode Buffer BTB Decode update (target addr. Dispatch Buffer and history) Dispatch Reservation Stations Issue Branch Execute Finish Completion Buffer 19 Mar 2013 EE-739@IITB 9 CADSL

  10. BTAC and BHT Design (PPC 604) FA-mux FA I-cache FAR FA FA Branch Target Branch History Address Cache Table (BHT) (BTAC) +4 BTAC update Decode Buffer BHT Decode BHT prediction update Dispatch Buffer BTAC prediction Dispatch BTAC: Reservation - 64 entries Stations - fully associative SFX CFX LS BRN SFX FPU - hit => predict taken Issue Branch BHT: - 512 entries Execute - direct mapped - 2-bit saturating counter history based prediction Finish Completion - overrides BTAC prediction Buffer 19 Mar 2013 EE-739@IITB 10 CADSL

  11. Branch Speculation T (TAG 1) NT NT NT T T (TAG 2) NT NT T NT NT T T T (TAG 3) • Leading Speculation – Typically done during the Fetch stage – Based on potential branch instruction(s) in the current fetch group • Trailing Confirmation – Typically done during the Branch Execute stage – Based on the next Branch instruction to finish execution 19 Mar 2013 EE-739@IITB 11 CADSL

  12. Branch Speculation • Leading Speculation 1. Tag speculative instructions 2. Advance branch and following instructions 3. Buffer addresses of speculated branch instructions • Trailing Confirmation 1. When branch resolves, remove/deallocate speculation tag 2. Permit completion of branch and following instructions 19 Mar 2013 EE-739@IITB 12 CADSL

  13. Branch Speculation T NT NT NT T T (TAG 2) NT NT NT NT T T T T (TAG 1) (TAG 3) • Start new correct path – Must remember the alternate (non-predicted) path • Eliminate incorrect path – Must ensure that the mis-speculated instructions produce no side effects 19 Mar 2013 EE-739@IITB 13 CADSL

  14. Tracking Instructions • Assign branch tags – Allocated in circular order – Instruction carries this tag throughout processor • Track instruction groups – Instructions managed in groups, max. one branch per group – ROB structured as groups • Leads to some inefficiency • Simpler tracking of speculative instructions 19 Mar 2013 EE-739@IITB 14 CADSL

  15. Program Control Flow to I-cache Prediction FA-mux Spec. target FA (fetch address) FA Fetch Branch Predictor Decode Buffer Decode Branch Predictor Dispatch Buffer Update Dispatch Reservation Stations BRN SFX SFX CFX LS FPU Issue Branch Execute Finish Completion Buffer 19 Mar 2013 EE-739@IITB 15 CADSL

  16. Smith Predictor Hardware Branch Address 2 m k -bit counters Updated Counter Value m Saturating Counter Increment/Decrement Branch Prediction Branch Outcome most significant bit • Jim E. Smith. A Study of Branch Prediction Strategies. International Symposium on Computer Architecture, pages 135-148, May 1981 • Widely employed: Intel Pentium, PowerPC 604, PowerPC 620, etc . 19 Mar 2013 EE-739@IITB 16 CADSL

  17. Two-level Branch Prediction PHT 000000 PC = 010110100101 01 000001 000010 000011 01 0110 010100 010101 010110 1 0 010111 BHR 0110 111110 111111 1 Branch Prediction • BHR adds global branch history – Provides more context – Can differentiate multiple instances of the same static branch – Can correlate behavior across multiple static branches 19 Mar 2013 EE-739@IITB 17 CADSL

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend