Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo - - PowerPoint PPT Presentation
Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo - - PowerPoint PPT Presentation
ISLPED 2004 8/10/2004 Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanovi Computer Architecture Group, MIT CSAIL Traditional Pipelining Goal: Maximum performance Vdd Clk-Q Setup Propagation Delay Clk
Traditional Pipelining
- Goal: Maximum performance
Clk Clk-Q Setup Propagation Delay Clk Clk
Vdd
Pipelining as a Low-Power Tool
Clk Clk-Q Setup Propagation Delay
Time Slack
- Goal: Low-Power, Fixed Throughput
Time Slack Vdd
Clk Clk
Pipelining as a Low-Power Tool
Clk Clk-Q Setup Propagation Delay
Time Slack
- Goal: Low-Power, Fixed Throughput
Time Slack Vdd
Clk Clk
Traded for Power
(supply voltage scaling)
Pipelining as a Low-Power Tool
Delay Power
Pipelining
Time slack Flip-flop Power Overhead
* Clock frequency fixed
Pipelining as a Low-Power Tool
Delay Power
Supply voltage scaling Power Saving * Clock frequency fixed
Power-Optimal Pipelining
- Power reduction from pipelining limited by power
- verhead of increased number of flip-flops
→ → → → Power-Optimal Pipelining
Power-Optimal Pipelining
Delay Power Too shallow pipelining
- Power reduction from pipelining limited by power
- verhead of increased number of flip-flops
→ → → → Power-Optimal Pipelining
Power-Optimal Pipelining
Delay Power Too deep pipelining Too shallow pipelining
- Power reduction from pipelining limited by power
- verhead of increased number of flip-flops
→ → → → Power-Optimal Pipelining
Power-Optimal Pipelining
Delay Power Optimal Power Saving Too deep pipelining Too shallow pipelining Optimal pipelining
- Power reduction from pipelining limited by power
- verhead of increased number of flip-flops
→ → → → Power-Optimal Pipelining
Contribution
- Pipelining is an old idea.
- Research focus has been on performance impact of
pipelining.
- Idea of using pipelining [Chandrakasan ’92] to lower
power has not been fully explored in deep submicron technology.
- Analysis and circuit-level simulation of Power-Optimal
Pipelining for different regimes of Vth, activity factor, clock gating
1. Impact of pipelining on power component 2. Impact of pipelining on total power (with/without clock-gating)
Bottom-to-Top Approach
Total Power (clock-gated) Power Time
active active inactive
Switching Power Component Leakage Power Component Idle Power Component
1. Impact of pipelining on power component 2. Impact of pipelining on total power (with/without clock-gating)
Bottom-to-Top Approach
Total Power (not clock-gated) Switching Power Component Leakage Power Component Idle Power Component Power Time
active inactive active *Idle power = power consumed when circuit is idle and not clock-gated
- Target digital system: Fixed throughput,
Highly parallel computation, Logic-dominant
- Test bench
– BPTM (Berkeley Predictive Technology Model) 70nm process: – LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24) – Hspice simulation at 100°C, Clock = 2 GHz
Methodology
Baseline
N FO4 inverters (N = 2 ~ 24) One Pipeline Stage
TG flip-flops TG flip-flops
Pipelining and Switching Power: Analytical Trend
O(N2) O(1/N) Number of FO4 per stage, N Switching Power Optimal Saving Optimal FO4 Quadratic reduction
- f logic switching power
Flip-flop overhead ∝ ∝ ∝ ∝ Vdd
2 ∝
∝ ∝ ∝ N2
O(1/N) Leakage Power O(Nα
α α α ) (1<α
α α α< 2) Optimal Saving Optimal FO4 Superlinear reduction
- f logic leakage power
Flip-flop overhead
Pipelining and Leakage Power: Analytical Trend
Number of FO4 per stage, N ∝ ∝ ∝ ∝ Vdd * e(η η η ηVdd) ∝ ∝ ∝ ∝ Nα
α α α
DIBL effect
Pipelining and Idle Power: Analytical Trend
- Clock-gating is not always possible
– Increased control complexity – insufficient setup time of clock enable signal
- Leakage Power + Flip-flop Switching Power
– Between leakage power scaling and flip-flop switching power scaling depending on leakage level
Pipelining and Idle Power: Analytical Trend
O(1/N) Number of FO4 per stage, N Relative Power O(Nα
α α α ) (1<α
α α α< 2) Optimal Saving Optimal FO4 O(1/N) Number of FO4 per stage, N O(N) Optimal Saving Optimal FO4 Linear reduction of Flip-flop switching power
∝ ∝ ∝ ∝ 1/N * Vdd
2 ∝
∝ ∝ ∝ N Leakage Power Scale Flip-flop Switching Power Scale
Idle Power
Simulation Results: Power Components
8 6 6 N* 70(LVT)~ 75(HVT)% O(Nα
α α α )
(1<α α α α< 2)
Leakage Power 55(HVT)~ 70(LVT)% 79(HVT)~ 82(LVT)% Saving* O(N) or O(Nα
α α α )
(1<α α α α< 2)
O(N2) Right hand side curve Idle Power Switching Power Power Components
N = Number of FO4 inverters per stage (Not including flip-flop delay) N* = Optimal N Saving* = Optimal power saving by pipelining
Fixed Throughput @ 2 GHz
Optimal Power Saving
Optimal FO4 = 6 Clock Gating No Clock Gating Optimal FO4 = 6~8
activity factor activity factor relative power relative power
*2 GHz *Flip-flop delay not included in
- ptimal FO4
Idle Power Switching Power Switching Power Leakage Power
activity factor activity factor relative power relative power
Optimal Power Saving
Optimal FO4 = 6 Optimal FO4 = 6~8 Clock Gating No Clock Gating
activity factor activity factor relative power relative power
Optimal Power Saving
Optimal FO4 = 6 Optimal FO4 = 6~8 LVT Clock Gating No Clock Gating
Discussion
- LVT can be fast and power-efficient
– enables lower Vdd
- Flip-flop delay more important than flip-flop
power for power-optimal pipelining
Limitation of This Work
↑ ↑ ↑ ↑ ↓ ↓ ↓ ↓
Reduced glitches
↓ ↓ ↓ ↓ ↑ ↑ ↑ ↑
Parasitic wire capacitance
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
Effect on
- ptimal logic
depth
↓ ↓ ↓ ↓
Additional memory
↓ ↓ ↓ ↓
Super-linear growth of flip-flops Effect on
- ptimal
power saving
Conclusion
- Pipelining is an effective low-power tool
when used to support voltage scaling in digital system implementing highly parallel computation.
- Optimal Logic Depth: 6-8 FO4
– ~ 8-10 FO4 including flip-flop delay
- Optimal Power Saving: 55 – 80%
– It depends on Vth, AF, Clock-Gating
- Insights:
– Pipelining is more effective with High AF
- Pipelining is most effective at saving switching
power
– Pipelining is more effective with lower Vth
- Except for when leakage power is dominant.
– Pipelining is more effective with clock-gating
- reduced flip-flop overhead.
Acknowledgments
- Thanks to SCALE group members and
anonymous reviewers
- Funded by NSF CAREER award CCR-