power optimal pipelining in deep submicron technology
play

Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo - PowerPoint PPT Presentation

ISLPED 2004 8/10/2004 Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanovi Computer Architecture Group, MIT CSAIL Traditional Pipelining Goal: Maximum performance Vdd Clk-Q Setup Propagation Delay Clk


  1. ISLPED 2004 8/10/2004 Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanovi Computer Architecture Group, MIT CSAIL

  2. Traditional Pipelining • Goal: Maximum performance Vdd Clk-Q Setup Propagation Delay Clk Clk Clk

  3. Pipelining as a Low-Power Tool • Goal: Low-Power, Fixed Throughput Vdd Clk-Q Setup Propagation Delay Clk Time Slack Clk Time Slack Clk

  4. Pipelining as a Low-Power Tool • Goal: Low-Power, Fixed Throughput Vdd Clk-Q Setup Propagation Delay Clk Time Slack Clk Traded for Power (supply voltage scaling) Time Slack Clk

  5. Pipelining as a Low-Power Tool Power * Clock frequency fixed Flip-flop Power Pipelining Overhead Time slack Delay

  6. Pipelining as a Low-Power Tool Power * Clock frequency fixed Power Saving Supply voltage scaling Delay

  7. Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops → → → → Power-Optimal Pipelining

  8. Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops → → → → Power-Optimal Pipelining Power Too shallow pipelining Delay

  9. Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops → → → → Power-Optimal Pipelining Power Too deep pipelining Too shallow pipelining Delay

  10. Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops → → → → Power-Optimal Pipelining Power Too deep pipelining Too shallow pipelining Optimal pipelining Optimal Power Saving Delay

  11. Contribution • Pipelining is an old idea. • Research focus has been on performance impact of pipelining. • Idea of using pipelining [Chandrakasan ’92] to lower power has not been fully explored in deep submicron technology. • Analysis and circuit-level simulation of Power-Optimal Pipelining for different regimes of V th , activity factor, clock gating

  12. Bottom-to-Top Approach 1. Impact of pipelining on power component 2. Impact of pipelining on total power (with/without clock-gating) Power Total Power (clock-gated) active active inactive Time Idle Leakage Switching Power Power Power Component Component Component

  13. Bottom-to-Top Approach 1. Impact of pipelining on power component 2. Impact of pipelining on total power (with/without clock-gating) Power Total Power (not clock-gated) active active inactive Time *Idle power = power consumed when circuit is idle Idle Leakage Switching and not clock-gated Power Power Power Component Component Component

  14. Methodology • Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant • Test bench – BPTM (Berkeley Predictive Technology Model) 70nm process: – LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24) – Hspice simulation at 100°C, Clock = 2 GHz Baseline N FO4 inverters ( N = 2 ~ 24) TG flip-flops TG flip-flops One Pipeline Stage

  15. Pipelining and Switching Power: Analytical Trend Optimal Switching Power Saving O(N 2 ) Flip-flop overhead Quadratic reduction O(1/N) of logic switching power ∝ ∝ V dd 2 ∝ ∝ N 2 ∝ ∝ ∝ ∝ Optimal FO4 Number of FO4 per stage, N

  16. Pipelining and Leakage Power: Analytical Trend Optimal Saving Leakage Power α ) (1< α O(1/N) α α α < 2) O(N α α α Flip-flop overhead Superlinear reduction of logic leakage power Optimal FO4 ∝ V dd * e( η ∝ η V dd ) ∝ ∝ N α ∝ ∝ η η ∝ ∝ α α α DIBL effect Number of FO4 per stage, N

  17. Pipelining and Idle Power: Analytical Trend • Clock-gating is not always possible – Increased control complexity – insufficient setup time of clock enable signal • Leakage Power + Flip-flop Switching Power – Between leakage power scaling and flip-flop switching power scaling depending on leakage level

  18. Pipelining and Idle Power: Analytical Trend Leakage Flip-flop Switching Power Scale Power Scale Optimal Idle Power Optimal Saving Saving Relative Power O(N) Optimal FO4 Linear reduction of α ) (1< α α α α < 2) O(N α α α Flip-flop switching O(1/N) power ∝ ∝ 1/N * V dd 2 ∝ ∝ N ∝ ∝ ∝ ∝ Optimal FO4 O(1/N) Number of FO4 per stage, N Number of FO4 per stage, N

  19. Simulation Results: Power Components Fixed Throughput @ 2 GHz Power Switching Leakage Idle Components Power Power Power α ) α ) O(N α α α O(N) or O(N α α α O(N 2 ) Right hand (1< α α < 2) α α (1< α α α α < 2) side curve Saving* 79(HVT)~ 70(LVT)~ 55(HVT)~ 82(LVT)% 75(HVT)% 70(LVT)% N* 6 6 8 N = Number of N* = Optimal N Saving* = Optimal FO4 inverters power saving by per stage pipelining (Not including flip-flop delay)

  20. Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Clock Gating Gating relative power relative power *2 GHz *Flip-flop delay not included in optimal FO4 activity factor activity factor

  21. Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Clock Idle Gating Gating Power relative power relative power Leakage Power Switching Switching Power Power activity factor activity factor

  22. Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Clock Gating Gating relative power relative power LVT activity factor activity factor

  23. Discussion • LVT can be fast and power-efficient – enables lower V dd • Flip-flop delay more important than flip-flop power for power-optimal pipelining

  24. Limitation of This Work Effect on Effect on optimal logic optimal depth power saving ↑ ↑ ↓ ↓ ↑ ↑ ↓ ↓ Super-linear growth of flip-flops ↑ ↑ ↑ ↑ ↓ ↓ ↓ ↓ Additional memory ↓ ↓ ↓ ↓ ↑ ↑ ↑ ↑ Reduced glitches ↑ ↑ ↑ ↑ ↓ ↓ ↓ ↓ Parasitic wire capacitance

  25. Conclusion • Pipelining is an effective low-power tool when used to support voltage scaling in digital system implementing highly parallel computation. • Optimal Logic Depth: 6-8 FO4 – ~ 8-10 FO4 including flip-flop delay • Optimal Power Saving: 55 – 80% – It depends on V th , AF, Clock-Gating • Insights: – Pipelining is more effective with High AF • Pipelining is most effective at saving switching power – Pipelining is more effective with lower V th • Except for when leakage power is dominant. – Pipelining is more effective with clock-gating • reduced flip-flop overhead.

  26. Acknowledgments • Thanks to SCALE group members and anonymous reviewers • Funded by NSF CAREER award CCR- 0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.

  27. BACKUP SLIDES

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend