Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo - - PowerPoint PPT Presentation

power optimal pipelining in deep submicron technology
SMART_READER_LITE
LIVE PREVIEW

Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo - - PowerPoint PPT Presentation

ISLPED 2004 8/10/2004 Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanovi Computer Architecture Group, MIT CSAIL Traditional Pipelining Goal: Maximum performance Vdd Clk-Q Setup Propagation Delay Clk


slide-1
SLIDE 1

Power-Optimal Pipelining in Deep Submicron Technology

Seongmoo Heo and Krste Asanovi Computer Architecture Group, MIT CSAIL

ISLPED 2004 8/10/2004

slide-2
SLIDE 2

Traditional Pipelining

  • Goal: Maximum performance

Clk Clk-Q Setup Propagation Delay Clk Clk

Vdd

slide-3
SLIDE 3

Pipelining as a Low-Power Tool

Clk Clk-Q Setup Propagation Delay

Time Slack

  • Goal: Low-Power, Fixed Throughput

Time Slack Vdd

Clk Clk

slide-4
SLIDE 4

Pipelining as a Low-Power Tool

Clk Clk-Q Setup Propagation Delay

Time Slack

  • Goal: Low-Power, Fixed Throughput

Time Slack Vdd

Clk Clk

Traded for Power

(supply voltage scaling)

slide-5
SLIDE 5

Pipelining as a Low-Power Tool

Delay Power

Pipelining

Time slack Flip-flop Power Overhead

* Clock frequency fixed

slide-6
SLIDE 6

Pipelining as a Low-Power Tool

Delay Power

Supply voltage scaling Power Saving * Clock frequency fixed

slide-7
SLIDE 7

Power-Optimal Pipelining

  • Power reduction from pipelining limited by power
  • verhead of increased number of flip-flops

→ → → → Power-Optimal Pipelining

slide-8
SLIDE 8

Power-Optimal Pipelining

Delay Power Too shallow pipelining

  • Power reduction from pipelining limited by power
  • verhead of increased number of flip-flops

→ → → → Power-Optimal Pipelining

slide-9
SLIDE 9

Power-Optimal Pipelining

Delay Power Too deep pipelining Too shallow pipelining

  • Power reduction from pipelining limited by power
  • verhead of increased number of flip-flops

→ → → → Power-Optimal Pipelining

slide-10
SLIDE 10

Power-Optimal Pipelining

Delay Power Optimal Power Saving Too deep pipelining Too shallow pipelining Optimal pipelining

  • Power reduction from pipelining limited by power
  • verhead of increased number of flip-flops

→ → → → Power-Optimal Pipelining

slide-11
SLIDE 11

Contribution

  • Pipelining is an old idea.
  • Research focus has been on performance impact of

pipelining.

  • Idea of using pipelining [Chandrakasan ’92] to lower

power has not been fully explored in deep submicron technology.

  • Analysis and circuit-level simulation of Power-Optimal

Pipelining for different regimes of Vth, activity factor, clock gating

slide-12
SLIDE 12

1. Impact of pipelining on power component 2. Impact of pipelining on total power (with/without clock-gating)

Bottom-to-Top Approach

Total Power (clock-gated) Power Time

active active inactive

Switching Power Component Leakage Power Component Idle Power Component

slide-13
SLIDE 13

1. Impact of pipelining on power component 2. Impact of pipelining on total power (with/without clock-gating)

Bottom-to-Top Approach

Total Power (not clock-gated) Switching Power Component Leakage Power Component Idle Power Component Power Time

active inactive active *Idle power = power consumed when circuit is idle and not clock-gated

slide-14
SLIDE 14
  • Target digital system: Fixed throughput,

Highly parallel computation, Logic-dominant

  • Test bench

– BPTM (Berkeley Predictive Technology Model) 70nm process: – LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24) – Hspice simulation at 100°C, Clock = 2 GHz

Methodology

Baseline

N FO4 inverters (N = 2 ~ 24) One Pipeline Stage

TG flip-flops TG flip-flops

slide-15
SLIDE 15

Pipelining and Switching Power: Analytical Trend

O(N2) O(1/N) Number of FO4 per stage, N Switching Power Optimal Saving Optimal FO4 Quadratic reduction

  • f logic switching power

Flip-flop overhead ∝ ∝ ∝ ∝ Vdd

2 ∝

∝ ∝ ∝ N2

slide-16
SLIDE 16

O(1/N) Leakage Power O(Nα

α α α ) (1<α

α α α< 2) Optimal Saving Optimal FO4 Superlinear reduction

  • f logic leakage power

Flip-flop overhead

Pipelining and Leakage Power: Analytical Trend

Number of FO4 per stage, N ∝ ∝ ∝ ∝ Vdd * e(η η η ηVdd) ∝ ∝ ∝ ∝ Nα

α α α

DIBL effect

slide-17
SLIDE 17

Pipelining and Idle Power: Analytical Trend

  • Clock-gating is not always possible

– Increased control complexity – insufficient setup time of clock enable signal

  • Leakage Power + Flip-flop Switching Power

– Between leakage power scaling and flip-flop switching power scaling depending on leakage level

slide-18
SLIDE 18

Pipelining and Idle Power: Analytical Trend

O(1/N) Number of FO4 per stage, N Relative Power O(Nα

α α α ) (1<α

α α α< 2) Optimal Saving Optimal FO4 O(1/N) Number of FO4 per stage, N O(N) Optimal Saving Optimal FO4 Linear reduction of Flip-flop switching power

∝ ∝ ∝ ∝ 1/N * Vdd

2 ∝

∝ ∝ ∝ N Leakage Power Scale Flip-flop Switching Power Scale

Idle Power

slide-19
SLIDE 19

Simulation Results: Power Components

8 6 6 N* 70(LVT)~ 75(HVT)% O(Nα

α α α )

(1<α α α α< 2)

Leakage Power 55(HVT)~ 70(LVT)% 79(HVT)~ 82(LVT)% Saving* O(N) or O(Nα

α α α )

(1<α α α α< 2)

O(N2) Right hand side curve Idle Power Switching Power Power Components

N = Number of FO4 inverters per stage (Not including flip-flop delay) N* = Optimal N Saving* = Optimal power saving by pipelining

Fixed Throughput @ 2 GHz

slide-20
SLIDE 20

Optimal Power Saving

Optimal FO4 = 6 Clock Gating No Clock Gating Optimal FO4 = 6~8

activity factor activity factor relative power relative power

*2 GHz *Flip-flop delay not included in

  • ptimal FO4
slide-21
SLIDE 21

Idle Power Switching Power Switching Power Leakage Power

activity factor activity factor relative power relative power

Optimal Power Saving

Optimal FO4 = 6 Optimal FO4 = 6~8 Clock Gating No Clock Gating

slide-22
SLIDE 22

activity factor activity factor relative power relative power

Optimal Power Saving

Optimal FO4 = 6 Optimal FO4 = 6~8 LVT Clock Gating No Clock Gating

slide-23
SLIDE 23

Discussion

  • LVT can be fast and power-efficient

– enables lower Vdd

  • Flip-flop delay more important than flip-flop

power for power-optimal pipelining

slide-24
SLIDE 24

Limitation of This Work

↑ ↑ ↑ ↑ ↓ ↓ ↓ ↓

Reduced glitches

↓ ↓ ↓ ↓ ↑ ↑ ↑ ↑

Parasitic wire capacitance

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑

Effect on

  • ptimal logic

depth

↓ ↓ ↓ ↓

Additional memory

↓ ↓ ↓ ↓

Super-linear growth of flip-flops Effect on

  • ptimal

power saving

slide-25
SLIDE 25

Conclusion

  • Pipelining is an effective low-power tool

when used to support voltage scaling in digital system implementing highly parallel computation.

  • Optimal Logic Depth: 6-8 FO4

– ~ 8-10 FO4 including flip-flop delay

  • Optimal Power Saving: 55 – 80%

– It depends on Vth, AF, Clock-Gating

  • Insights:

– Pipelining is more effective with High AF

  • Pipelining is most effective at saving switching

power

– Pipelining is more effective with lower Vth

  • Except for when leakage power is dominant.

– Pipelining is more effective with clock-gating

  • reduced flip-flop overhead.
slide-26
SLIDE 26

Acknowledgments

  • Thanks to SCALE group members and

anonymous reviewers

  • Funded by NSF CAREER award CCR-

0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.

slide-27
SLIDE 27

BACKUP SLIDES