near threshold computing reclaiming moore s law
play

Near-Threshold Computing: Reclaiming Moores Law Dr. Ronald G. - PowerPoint PPT Presentation

1 Near-Threshold Computing: Reclaiming Moores Law Dr. Ronald G. Dreslinski Research Fellow University of Michigan Ann Arbor 1 University of Michigan EnA-HPC -- September 7, 2011 1 1 Motivation 1000000 Transistors


  1. 1 Near-Threshold Computing: Reclaiming Moore’s Law Dr. Ronald G. Dreslinski Research Fellow University of Michigan – Ann Arbor 1 University of Michigan EnA-HPC -- September 7, 2011 1 1

  2. Motivation 1000000 ¡ Transistors ¡(100,000's) ¡ 100000 ¡ Power ¡(W) ¡ Performance ¡(GOPS) ¡ 10000 ¡ Efficiency ¡(GOPS/W) ¡ 1000 ¡ 100 ¡ 10 ¡ Limits ¡on ¡heat ¡extrac6on ¡ 1 ¡ Stagnates ¡performance ¡growth ¡ 0.1 ¡ 0.01 ¡ Limits ¡on ¡energy-­‑efficiency ¡of ¡opera6ons ¡ 0.001 ¡ 1985 ¡ 1990 ¡ 1995 ¡ 2000 ¡ 2005 ¡ 2010 ¡ 2015 ¡ 2020 ¡ 2 ¡ 2 University of Michigan EnA-HPC -- September 7, 2011

  3. Motivation 1000000 ¡ Transistors ¡(100,000's) ¡ Result: ¡Con6nue ¡scaling ¡trends ¡ 100000 ¡ Power ¡(W) ¡ that ¡fueled ¡the ¡compu6ng ¡ revolu6on ¡ Performance ¡(GOPS) ¡ 10000 ¡ Efficiency ¡(GOPS/W) ¡ 1000 ¡ With ¡the ¡help ¡of ¡some ¡beBer ¡ 100 ¡ thermal ¡management… ¡ 10 ¡ Goal: ¡To ¡increase ¡energy-­‑ efficiency ¡of ¡operaGons ¡ 1 ¡ 0.1 ¡ 0.01 ¡ 0.001 ¡ 1985 ¡ 1990 ¡ 1995 ¡ 2000 ¡ 2005 ¡ 2010 ¡ 2015 ¡ 2020 ¡ Era ¡of ¡High ¡Performance ¡Compu6ng ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Era ¡of ¡Energy-­‑Efficient ¡Compu6ng ¡ c. ¡2000 ¡ 3 ¡ 3 University of Michigan EnA-HPC -- September 7, 2011

  4. Outline 4  Define a new region of operation, Near-Threshold Computing  Explore new architectures enabled by key insights of computing in the NTC region  Present an initial design of a 3D stacked NTC system, Centip3De 4 University of Michigan EnA-HPC -- September 7, 2011 4 4

  5. Power Density Limitations 5 Circuit supply Power does not decrease at the voltages are no same rate that transistor count longer scaling… increases Environmental Form factor vs. Concerns Battery Life Stagnant Shrinking A = gate area  scaling 1/s 2 C = capacitance  scaling < 1/s Dynamic dominates Dark Silicon — The emerging dilemma: More and more gates can fit on a die, but not all can be turned on at the same time 5 University of Michigan EnA-HPC -- September 7, 2011 5 5

  6. Today: Super-V th , High Performance, Power Constrained 6 Super-V th Energy / Operation 3+ GHz 0.5 mW/MHz Normalized Power, Energy, & Performance Energy per operation is the key metric for efficiency. Goal: same performance, low energy per operation Log (Delay) 0 V th V nom Supply Voltage Core i7 6 University of Michigan EnA-HPC -- September 7, 2011 6 6

  7. Subthreshold Design 7 Super-V th Sub-V th Energy / Operation 12-16X Log (Delay) 500 – 1000X Operating in the sub-threshold gives us huge power gains at the expense of performance  OK for sensors! 0 V th V nom Supply Voltage 7 University of Michigan EnA-HPC -- September 7, 2011 7 7

  8. Evolution of Subthreshold Designs 8 Subliminal 1 Design (2006) -0.13 µ m CMOS -Used to investigate existence of Vmin -2.60 µ W/MHz Phoneix 1 Design (2008) - 0.18 µ m CMOS -Used to investigate sleep current -2.8 µ W/MHz / 30pW sleep power Subliminal 2 Design (2007) -0.13 µ m CMOS -Used to investigate process variation -3.5 µ W/MHz Phoenix 2 Design (2010) - 0.18 µ m CMOS -Commercial ARM M3 Core -Used to investigate: • Energy harvesting • Power management -37.4 µ W/MHz 8 University of Michigan EnA-HPC -- September 7, 2011 8 8

  9. Near-Threshold Computing (NTC) 9 Super-V th Sub-V th Energy / Operation ~6-8X ~2X Near-Threshold Computing (NTC): • >60X power reduction • 6-8X energy reduction • Invest portion of extra transistors from scaling to overcome barriers Log (Delay) ~50-100X ~10X 0 V th V nom Supply Voltage 9 University of Michigan EnA-HPC -- September 7, 2011 9 9

  10. Silicon Verification of Trends 10 Phoenix 2 Processor Phoenix 2 Design [Seok’11] 180nm Design 1.8V -> 700mV ~10x NTC Performance Loss ~7x NTC Energy Reduction Seok ISSCC 2011 10 University of Michigan EnA-HPC -- September 7, 2011 10 10

  11. NTC – Opportunities and Challenges 11  Opportunities:  New architectures  Optimized Processes  3D Integration – less thermal restrictions  Challenges:  Low Voltage Memory  New SRAM designs  Robustness analysis at near-threshold  Variation  Razor [Ernst’03] and other in-situ delay monitoring  Adaptive body biasing  Performance Loss  Many-core designs to improve parallelism  Core boosting to improve single thread performance 11 University of Michigan EnA-HPC -- September 7, 2011 11 11

  12. Outline 12  Define a new region of operation, Near-Threshold Computing  Explore new architectures enabled by key insights of computing in the NTC region  Present an initial design of a 3D stacked NTC system, Centip3De 12 University of Michigan EnA-HPC -- September 7, 2011 12 12

  13. Minimum Energy SRAM 13 Total Dynamic — Leakage  SRAM has a lower activity rate than logic  VDD for minimum energy operation (V MIN ) is higher  Running logic at V MIN for SRAM has a small energy penalty with increased performance 13 University of Michigan EnA-HPC -- September 7, 2011 13 13

  14. New NTC Architectures 14 Next Level Memory Next Level Memory BUS / Switched Network BUS / Switched Network L1 L1 L1 L1 L1 Cluster Cluster Cluster Core Core Core Core Core Cluster L1 L1 L1 L1 L1 Core Core Core Core Key Insight: • SRAM is run at a higher V DD than cores with little energy penalty, allowing caches to operate faster than the core Design Levers: • Operating Voltage • L1 Size • Number of Cores per Cluster • Number of Clusters 14 University of Michigan EnA-HPC -- September 7, 2011 14 14

  15. L1 Cache Size Tradeoff 15 Core Core Decreased Miss Rate L1 L1 Higher Energy/Access L2 L2 15 University of Michigan EnA-HPC -- September 7, 2011 15 15

  16. Results – Energy Optimal L1 Size (Single Core) 16  Energy dependency on L1 size  Trade-off between L1 and L2 access 16 University of Michigan EnA-HPC -- September 7, 2011 16 16

  17. Clustering Tradeoffs 17 CPU CPU CPU CPU CPU CPU CPU CPU L1 L1 L1 L1 L1 L1 O X X Tradeoffs ----------------------- + Clustered Sharing L2 L2 - Cluster Conflict - New Bus - L1 Speed 17 University of Michigan EnA-HPC -- September 7, 2011 17 17

  18. Energy Optimal Cluster-based CMP (Fixed Die Size) 18 18 University of Michigan EnA-HPC -- September 7, 2011 18 18

  19. Full Space Analysis 19 19 University of Michigan EnA-HPC -- September 7, 2011 19 19

  20. Various Scaling Methods 20  Baseline Normalized Energy/Operation  Single CPU @ 1 233MHz L2 38% 0.8 L1 71% 4 Cores  Simple CMP Core 4 L1’s 0.6  One core per L1  Vdd scaling 53% 0.4 2 Cores/Cluster 3 Clusters  Proposed cluster- 0.2 based CMP  Multiple cores per L1 0  Vdd scaling Uniprocessor CMP w/ NTC DVFS 20 University of Michigan EnA-HPC -- September 7, 2011 20 20

  21. Energy Optima for SPLASH2 21  Cluster based architecture with Vdd and Vth scaling  Optimal cluster size is 2 for most of the apps  Rad choose non-clustered CMP  Average: 74% over baseline, 55% over simple CMP energy savings energy savings over n c k L1 size/kB over baseline simple CMP Cho 3 2 64 70.8% 52.8% Fft 2 2 32 72.6% 68.5% fmm � 8 � 2 � 128 � 79.7% � 41.6% luc � 3 � 2 � 32 � 77.8% � 64.4% lun � 2 � 2 � 64 � 69.2% � 58.0% rad � 16 � 1 � 128 � 84.2% � 35.1% ray � 3 � 2 � 128 � 65.1% � 54.9% -21- 21 University of Michigan EnA-HPC -- September 7, 2011 21 21

  22. Energy Optima w/ Performance Requirements 22  Cluster based approach provides best savings  Traditional approach only saves energy at high end 53% 20% 32% 22 University of Michigan EnA-HPC -- September 7, 2011 22 22

  23. Outline 23  Define a new region of operation, Near-Threshold Computing  Explore new architectures enabled by key insights of computing in the NTC region  Present an initial design of a 3D stacked NTC system, Centip3De 23 University of Michigan EnA-HPC -- September 7, 2011 23 23

  24. A Closer Look at Wafer-Level Stacking 24 Oxide Silicon Dielectric(SiO2/SiN) “Super-Contact” Gate Poly STI (Shallow Trench Isolation) W (Tungsten contact & via) Al (M1 – M5) Cu (M6, Top Metal) Illustration from Bob Patti, Tezzaron 24 University of Michigan EnA-HPC -- September 7, 2011 24 24

  25. Next, Stack a Second Wafer & Thin: 25 25 University of Michigan EnA-HPC -- September 7, 2011 25 25

  26. Then, Stack a Third Wafer: 26 3rd wafer 2nd wafer 1st wafer: controller 26 University of Michigan EnA-HPC -- September 7, 2011 26 26

  27. Centip3De – 3D NTC Prototype 27 Logic - A Logic - B F2F Bond Logic - B Logic - A DRAM Sense/Logic – Bond Routing DRAM F2F Bond DRAM Centip3De Design • 130nm, 7-Layer 3D-Stacked Chip • 128 - ARM M3 Cores • 150mm 2 27 University of Michigan EnA-HPC -- September 7, 2011 27 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend