superscalar design
play

Superscalar Design: An Introduction Virendra Singh Associate - PowerPoint PPT Presentation

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:


  1. Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in EE-739: Processor Design Lecture 21 (05 March 2013) CADSL

  2. Cache: Advanced Optimizations • Small and simple first level caches  Critical timing path: • addressing tag memory, then • comparing tags, then • selecting correct set  Direct-mapped caches can overlap tag compare and transmission of data  Lower associativity reduces power because fewer cache lines are accessed 05 Mar 2013 EE-739@IITB 2 CADSL

  3. L1 Size and Associativity Access time vs. size and associativity 05 Mar 2013 EE-739@IITB 3 CADSL

  4. L1 Size and Associativity Energy per read vs. size and associativity 05 Mar 2013 EE-739@IITB 4 CADSL

  5. Way Prediction • To improve hit time, predict the way to pre-set mux  Mis-prediction gives longer hit time  Prediction accuracy • > 90% for two-way • > 80% for four-way • I-cache has better accuracy than D-cache  First used on MIPS R10000 in mid-90s  Used on ARM Cortex-A8 • Extend to predict block as well  “Way selection”  Increases mis-prediction penalty 05 Mar 2013 EE-739@IITB 5 CADSL

  6. Pipelining Cache • Pipeline cache access to improve bandwidth – Examples: • Pentium: 1 cycle • Pentium Pro – Pentium III: 2 cycles • Pentium 4 – Core i7: 4 cycles • Increases branch mis-prediction penalty • Makes it easier to increase associativity 05 Mar 2013 EE-739@IITB 6 CADSL

  7. Multibanked Caches • Organize cache as independent banks to support simultaneous access – ARM Cortex-A8 supports 1-4 banks for L2 – Intel i7 supports 4 banks for L1 and 8 banks for L2 • Interleave banks according to block address 05 Mar 2013 EE-739@IITB 7 CADSL

  8. Wish list: Highway 05 Mar 2013 EE-739@IITB 8 CADSL

  9. Single Lane Traffic 05 Mar 2013 EE-739@IITB 9 CADSL

  10. Limits of Pipelining Limits of Pipelining • IBM RISC Experience – Control and data dependences add 15% – Best case CPI of 1.15, IPC of 0.87 – Deeper pipelines (higher frequency) magnify dependence penalties • This analysis assumes 100% cache hit rates – Hit rates approach 100% for some programs – Many important programs have much worse hit rates 05 Mar 2013 EE-739@IITB 10 CADSL

  11. Limits on Instruction Level Parallelism (ILP) Weiss and Smith [1984] 1.58 Sohi and Vajapeyam [1987] 1.81 Tjaden and Flynn [1970] 1.86 (Flynn’s bottleneck) Tjaden and Flynn [1973] 1.96 Uht [1986] 2.00 Smith et al. [1989] 2.00 Jouppi and Wall [1988] 2.40 Johnson [1991] 2.50 Acosta et al. [1986] 2.79 Wedig [1982] 3.00 Butler et al. [1991] 5.8 Melvin and Patt [1991] 6 Wall [1991] 7 (Jouppi disagreed) Kuck et al. [1972] 8 Riseman and Foster [1972] 51 (no control dependences) Nicolau and Fisher [1984] 90 (Fisher’s optimism) 05 Mar 2013 EE-739@IITB 11 CADSL

  12. Superscalar Proposal • Go beyond single instruction pipeline, achieve IPC > 1 • Dispatch multiple instructions per cycle • Provide more generally applicable form of concurrency (not just vectors) • Geared for sequential code that is hard to parallelize otherwise • Exploit fine-grained or instruction-level parallelism (ILP) 05 Mar 2013 EE-739@IITB 12 CADSL

  13. Motivation for Superscalar Motivation for Superscalar [Agerwala and Cocke] [Agerwala and Cocke] Speedup jumps from 3 to 4.3 for N=6, f=0.8, but s =2 instead of s=1 (scalar) Typical Range 05 Mar 2013 EE-739@IITB 13 CADSL

  14. Classifying ILP Machines Classifying ILP Machines [Jouppi, DECWRL 1991] • Baseline scalar RISC – Issue parallelism = IP = 1 – Operation latency = OP = 1 – Peak IPC = 1 INSTRUCTIONS SUCCESSIVE 1 IF DE EX WB 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 TIME IN CYCLES (OF BASELINE MACHINE) 05 Mar 2013 EE-739@IITB 14 CADSL

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend