A Mathematical Solution to Power Optimal Pipeline Design Power - - PowerPoint PPT Presentation

a mathematical solution to power optimal pipeline design
SMART_READER_LITE
LIVE PREVIEW

A Mathematical Solution to Power Optimal Pipeline Design Power - - PowerPoint PPT Presentation

A Mathematical Solution to Power Optimal Pipeline Design Power Optimal Pipeline Design by Utilizing Soft Edge Flip Flops M. Ghasemazar, B. Amelifard, M. Pedram University of Southern California Department of Electrical Engineering August 11,


slide-1
SLIDE 1

A Mathematical Solution to Power Optimal Pipeline Design Power Optimal Pipeline Design by Utilizing Soft Edge Flip Flops

  • M. Ghasemazar, B. Amelifard, M. Pedram

University of Southern California

Department of Electrical Engineering August 11, 2008

ISLPED 2008 ISLPED 2008

1

slide-2
SLIDE 2

Outline Outline

  • Soft-Edge Flip Flops

Soft Edge Flip Flops

  • Power Optimal Pipeline Design
  • Problem Formulation

Problem Formulation

  • SEFF Modeling
  • Experimental Results
  • Experimental Results
  • Conclusion

2

slide-3
SLIDE 3

Soft Edge Flip Flop Soft Edge Flip Flop

  • Key idea: Allow the data to pass through

D Q

y p g a flip flop during a transparency window, instead of on a triggering clock edge K d t E bl l k i

CLK

SEFF

  • Key advantage: Enable slack passing

between adjacent pipeline stages which are separated by (master-slave) flip-

  • w

flops

  • Circuit implementation: Delay the clock
  • f the master latch to create a window

Clk

ransparency Windo

  • f the master latch to create a window

during which both the master and slave latches are ON

ClkD

Tr

DATA Setup Time Hold Time

3

slide-4
SLIDE 4

SEFF Implementation SEFF Implementation

Clk Clk

Conventional (Hard Edge) M t Sl FF

D Q

Clk Clk Clk Clk Clk Clk

Master-Slave FF

Clk Clk

Clk ClkD

Soft Edge Master-Slave FF

D Q

Clk Clk ClkD ClkD

Clk Clk ClkD

Clk ClkD

ClkD

Delay 4

slide-5
SLIDE 5

SEFF Characteristics SEFF Characteristics

  • Setup and hold times, and clock-to-q delay of a

Setup and hold times, and clock to q delay of a soft-edge flip-flop are all functions of the transparency window width, w

  • Simulations show a linear dependency on w

0 921 30 45 100

, 1

( ) ( )

s i i i

t w a w a t w b w b ⎧ ⎪ = + ⎪ ⎪ ⎪ ⎪ + ⎨

y = 0.921x - 30.45 20 40 60 80 Time (ps)

Setup Time

, 1 , 1

( ) ( )

h i i i cq i i i

t w b w b t w c w c = + ⎨ ⎪ ⎪ ⎪ = + ⎪ ⎪ ⎩

  • 40
  • 20

20 40 60 80 100 120 140 Setup/Hold

Hold Time

y = -0.651x + 33.54

  • 60

Window size (ps)

5

slide-6
SLIDE 6

SEFF Characteristics – cont’d SEFF Characteristics cont d

  • Power consumption of a SEFF is monotonically

p y increasing with its window size (w). This is due to:

– Higher switching activities in the internal nodes in the transparency window transparency window – Higher dynamic and leakage power consumption in the additional delay generation circuitry

E i t l l ti f

  • Experimental evaluation of

total power consumption:

200 250 300 350

ation (uW)

2 , 2 1 FF i i i

P d w d w d = + +

50 100 150 200

Power Dissipa

40 80 120 160

Transparency window (ps) P

6

slide-7
SLIDE 7

Pipeline Basics Pipeline Basics

D Q D Q D Q

C1 C2

FF0 FF1 FF2 CLK

C1 C2

FF0 FF1 FF2

di tcq,i ts,i

  • Timing constraints for a linear pipeline

(1)

, , 1

1

i s i cq i clk

d t t T i N

+ + ≤ ≤ ≤

(2)

  • Substitute FFs with SEFFs

– First and Last FF’s remain hard-edge ones

, 1 ,

1

i cq i h i

t t i N δ

+ ≥ ≤ ≤

st a d ast s e a a d edge o es

  • This is needed to avoid imposing constraints on the sender/receiver of data

– Intermediate stage FF’s may be substituted by SEFFs

( ) ( ) 1 d T t t i N ≤ ≤ ≤

, , 1 1

( ) ( ) 1

i s i i cq i i clk

d T t w t w i N

− −

≤ − − ≤ ≤

, 1 1 , (

) ( ) 1

i i cq i i h i

t w t w i N δ

− −

≥ − ≤ ≤

7

slide-8
SLIDE 8

Power Optimal Pipeline Power Optimal Pipeline

  • Main Idea: Passing available slack of some stages to

g g more timing critical stages to provide them with more freedom in power optimization through voltage scaling F l l t T T 560 d t t t 30

  • For example, let Tclk=Tclk,min=560ps and ts=th=tcq=30ps

– If FF1 is replaced with a SEFF with a window size of 50ps

  • the first stage borrows 50ps from the second stage
  • the circuit can be powered with a lower supply voltage level

– Ideally, 10% Vdd reduction ->19% power saving

C1 C3

D Q D Q D Q

FF0 FF1 FF2

C1 C2

D Q

FF3

C3

d1=500ps d2=400ps d3=450ps

CLK

8

slide-9
SLIDE 9

PSLP Problem Statement PSLP Problem Statement

  • Power-optimal Soft Linear Pipeline Design

p p g

– Goal: Minimize the total power consumption of an N-stage linear pipeline circuit – Variables: Variables:

  • Optimal supply voltage level (1 variable)
  • Transparency windows size of the individual soft-edge FF-sets (N-1)
  • Delay elements to avoid hold time violations (N)
  • Delay elements to avoid hold time violations (N)

– Constraints:

  • Setup/hold times

Window size limits

1 , , , 1 1 1

. ( ) ( , ) ( , ) ( ) ( ) ( ) ( ); 1

N N N i i total Comb i FF i DE i i i i

Min P P v P w v P z v st I d v T t w v t w v i N

− = = =

= + + ≤ − − ≤ ≤

∑ ∑ ∑

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪

  • Window size limits
  • Single supply voltage

, , 1 1 , 1 1 , max min

. . ( ) ( ) ( , ) ( , ); 1 ( ) ( ) ( , ) ( , ); 1 ( ) ; 1 1

i s i i cq i i clk i i i cq i i h i i

st I d v T t w v t w v i N II v z t w v t w v i N III w w w i N δ

− − − −

≤ ≤ ≤ + ≥ − ≤ ≤ ≤ ≤ ≤ ≤ −

{ }

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪

1

( ) , ...,

m

IV v V V V − ∈{

}

1

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎪

9

slide-10
SLIDE 10

SEFF Modeling SEFF Modeling

( )

, 1

, ( ) ( )

s i i i

t w v a v w a v ⎧ ⎪ = + ⎪ ⎪ ⎪

  • Setup time, hold time, clock-to-q

d l d di i ti

( ) ( )

, 1 , 1

, ( ) ( ) , ( ) ( )

h i i i cq i i i

t w v b v w b v t w v c v w c v ⎪ ⎪ ⎪ = + ⎨ ⎪ ⎪ ⎪ = + ⎪ ⎪ ⎩

delay, and power dissipation are functions of both voltage and transparency window size

( ) ( ) ( ) 2 , 2 1 FF i i i

P d v w d v w d v = + +

p y

– Voltage-dependent coefficients are determined from SPICE simulations

me (ps)

  • 30
  • 20
  • 10

Vdd=0.9V Vdd=1.0V Vdd=1.1V Vdd=1.2V

me (ps)

60 80 100 Vdd=0.9V Vdd=1.0V Vdd=1.1V Vdd=1.2V

elay (ps)

160 180 200

Setup Tim

  • 70
  • 60
  • 50
  • 40

Hold Tim

20 40

Clk-to-Q de

80 100 120 140 Vdd=0.9V Vdd=1.0V Vdd=1.1V Vdd=1.2V

Transparency window (ps)

40 60 80 100 120 140

Transparency window (ps)

40 60 80 100 120 140

Transparency window (ps)

40 60 80 100 120 140

10

slide-11
SLIDE 11

Combinational Circuit Modeling Combinational Circuit Modeling

  • Total power consumption at

2 3

( )

comb i dyn i leak i

v v P v P P V V ⎛ ⎞ ⎛ ⎞ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ = + ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ ⎜ ⎜

p p voltage level, v:

, , ,

( )

comb i dyn i leak i

V V ⎜ ⎜ ⎟ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎜ ⎟ ⎟ ⎝ ⎠ ⎝ ⎠

V V

α

⎛ ⎞ ⎟ ⎜

  • Max and Min combinational

logic cell delays (calculated from the alpha power law):

( ) ( )

t i i t

V V d v d V v V ⎛ ⎞ − ⎟ ⎜ ⎟ = ⎜ ⎟ ⎜ ⎟ ⎜ − ⎝ ⎠

( ) ( )

t i i

V V v V

α

δ δ ⎛ ⎞ − ⎟ ⎜ ⎟ = ⎜ ⎟ ⎜

from the alpha power law):

  • Power dissipation overhead

( ) ( )

i i t

v V v V δ δ ⎜ ⎟ ⎜ ⎟ ⎜ − ⎝ ⎠

  • f a delay element:

( )

( )

,

DE

P z v k v z = ⋅

11

slide-12
SLIDE 12

Solving the PSLP Solving the PSLP

  • To solve PSLP

To solve PSLP

– Enumerate all possible values for v – PSLP with fixed voltage (PSLP-FV) g ( )

  • Pcomb,i terms drop out of the cost function
  • Voltage constraint (IV) disappears
  • All other timing and power parameters become only
  • All other timing and power parameters become only

dependent on wi and zi variables

– For each fixed v, a quadratic program is set up and l d solved

  • We must minimize a quadratic cost function subject to linear

inequality constraints

  • PSLP-FV can be solved optimally in polynomial time

12

slide-13
SLIDE 13

Experimental Setup Experimental Setup

  • Hspice simulations were used to extract parameters that

p p are needed for the problem formulation

– 65nm Predictive Technology Model (PTM) Nominal supply voltage 1 2V – Nominal supply voltage 1.2V – Die temperature 100oC

  • The SIS optimization package was used to synthesize a

set of linear pipelines as test-bench circuits

  • The MOSEK toolbox used to solve the mathematical
  • ptimization problem
  • ptimization problem
  • All results were collected on a 2.4GHz Pentium 4PC with

2GB memory

13

slide-14
SLIDE 14

Benchmark Spec Benchmark Spec

Testbench (max, min) stage delays at nominal lt ( ) Clock f (# of stages) voltage (ps) freq. (GHz) TB1 (4) (320,140), (332,150), (308,150), 2.0 (320,170) TB2 (5) (320,140), (332,150), (308,150), (280,145), (320,170) 2.0 TB3 (3) (325, 150), (310,155), (219,160) 2.0 TB4 (5) (275,40), (235,40), (245,60), (275 50) (275 70) 2.5 (275,50), (275,70) TB5 (4) (310,100), (245,40), (245,50), (245,60) 2.5

14

slide-15
SLIDE 15

Experimental Results

TB Power Red. (%) Optimum Vdd (V) Optimum Window size (ps)

Using slack passing to minimize power without degrading performance

TB1 32.1 1.0 40, 49, 22 TB2 33.8 1.0 40, 49, 46, 21 TB3 48 1 0 95 43 52 TB3 48.1 0.95 43,52 TB4 16.3 1.10 36, 35, 35, 20 TB5 25.4 1.05 60, 41, 36 Testbench Performance Improvement (%)

Utilizing slack passing to improve performance

  • Area overhead:

Negligible compared to size of the rest of

p ( ) TB1 14% TB2 15% TB3 20%

the pipeline circuit

  • Runtime for all

benchmarks: Less

TB3 20% TB4 5% TB5 10%

than one second

15

slide-16
SLIDE 16

A Case Study: 34-bit Adder

  • Problem: How to partition a 34-bit adder into 4 stages
  • f pipeline to achieve maximum performance?

D Q D Q D Q

FF0 FF1 FF2

X bits Y bits

D Q

FF3

Z bits

D Q

FF4

T bits

Maximum Performance

CLK

X+Y+Z+T=34

D Q D Q D Q

C1 C2

D Q

C3

D Q

C4

D Q D Q D Q

C1 C2

D Q

C3

D Q

C4

D Q D Q D Q

CLK FF0 FF1 FF2

C1 C2

D Q

FF3

10 8 8 D Q

FF4

8 D Q D Q D Q

CLK FF0 FF1 FF2

C1 C2

D Q

FF3

8 10 8 D Q

FF4

8 D Q D Q D Q

FF0 FF1 FF2

C1 C2

D Q

FF3

C3

9 9 8 D Q

C4

FF4

8 D Q D Q D Q

FF0 FF1 FF2

C1 C2

D Q

FF3

C3

9 8 9 D Q

C4

FF4

8

CLK CLK

16

slide-17
SLIDE 17

A Case Study: 34-bit Adder

Maximum Performance Configuration Vdd (V) Min Clock Period (ps) Power Consumption (mW) Maximum Performance 10−8−8−8 1.2 450 6.42 8−10−8−8 1.2 472 6.50 8−8−10−8 1.2 472 6.51 8−8−8−10 1.2 486 6.55 9−9−8−8 1.2 455 6.42 9 9 8 8 1.2 455 6.42 9−8−9−8 1.2 433 6.51

17

slide-18
SLIDE 18

A Case Study: 34-bit Adder A Case Study: 34 bit Adder

  • Problem: How to partition a 34-bit adder into 4 stages
  • f pipeline to achieve minimum power at target
  • f pipeline to achieve minimum power at target

performance level?

Minimum Power @ 2.0GHz Configuration Vdd (V) Power Consumption (MW) 10 8 8 8 1 05 4 9 10−8−8−8 1.05 4.9 8−10−8−8 1.15 5.1 9−9−8−8 1.05 4.9 9−8−8−9 1.10 4.9

18

slide-19
SLIDE 19

Conclusion Conclusion

  • We presented a new technique to minimize the total

p q power consumption of a linear pipeline circuit by utilizing soft-edge flip-flops and choosing the optimal supply voltage level for the pipeline voltage level for the pipeline

  • We formulated the problem as a mathematical program

and solved it efficiently

  • Our experimental results demonstrate that this technique

is quite effective in reducing the power consumption of a pipeline circuit under a performance constraint pipeline circuit under a performance constraint

  • Future work will focus on problem of minimizing the

energy cost of throughput in a linear pipeline circuit with gy g p p p dynamic error detection and correction capability

19