Integrated Retiming and Simultaneous Vdd/Vth Scaling for Total - - PowerPoint PPT Presentation
Integrated Retiming and Simultaneous Vdd/Vth Scaling for Total - - PowerPoint PPT Presentation
Integrated Retiming and Simultaneous Vdd/Vth Scaling for Total Power Minimization Mongkol Ekpanyapong Advisor: Prof. Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology Outline Introduction and
2
May, 2006
Outline
Introduction and Motivation Related Work Methodology Experimental Results Conclusions
3
May, 2006
Introduction
Both static and dynamic power are the important issue in deep
submicron design
Performance is important issue The objective of this work is to minimize total power consumption
while maintain the target clock period
4
May, 2006
Retiming Algorithm
Linear Programming
Can easily be modified to handle any linear objective
Bellman-Ford Algorithm
Can handle large circuits
5
May, 2006
Power Minimization
Minimize total number of Flip-flop to reduce flip-flop power Using dual Vdd and Vth to minimize static and dynamic power
6
May, 2006
Outline
Introduction and Motivation Related Work Methodology Experimental Results Conclusions
7
May, 2006
Retiming and Voltage Scaling
- C. E. Leiserson and J. B. Saxe, “Retiming synchronous
circuitry,” Algorithmica 1991
- K. Usami and M. Horowitz, “Clustered Voltage Scaling
Technique for Low-Power Design“ , ISLPED 1995
- N. Chabini and W. Wolf, “Reducing Dynamic Power
Consumption in Synchronous Sequential Digital Designs Using Retiming and Supply Voltage Scaling,” TVLSI 2004
8
May, 2006
Outline
Introduction and Motivation Related Work Methodology Experimental Results Conclusions
9
May, 2006
Circuit Description
Target Clock Period
Power Minimization with Retiming
RETIMING Voltage Scaling (LP) Fixed
10
May, 2006
Retiming Formulation
Objective: Minimize the number of flip-flops (FF.) Constraints:
- Num. FF. has to be satisfied
r(u) ≤ w(eu,v) + r(v)
- Num. FF. on critical paths has to be greater than zero
1 2 3
r(v) 1 r(v) V is the set of gates and E is the set of edges. v ∈ V and e ∈ E r(v) is the number of FF. moved from fanout of node v to fanin of node v w(eu,v) is the FF. count on edge u,v, D(u,v) is the maximum delay on path u,v W(u,v) is minimum number of FF. on path u,v 1 w(e) 1 wr(e)
11
May, 2006
Retiming Formulation
Q Q SET CLR S R1 2 3
r(v) 1 r(v) V is the set of gates and E is the set of edges. v ∈ V and e ∈ E r(v) is the number of FF. moved from fanout of node v to fanin of node v w(eu,v) is the FF. count on edge u,v, D(u,v) is the maximum delay on path u,v W(u,v) is minimum number of FF. on path u,v Objective: Minimize the number of flip-flops (FF.) Constraints:
- Num. FF. has to be satisfied
r(u) ≤ w(eu,v) + r(v)
- Num. FF. on critical paths has to be greater than zero
4
Q Q SET CLR S R12
May, 2006
Retiming Formulation
Q Q SET CLR S Ru v
V is the set of gates and E is the set of edges. v ∈ V and e ∈ E r(v) is the number of FF. moved from fanout of node v to fanin of node v w(eu,v) is the FF. count on edge u,v, D(u,v) is the maximum delay on path u,v W(u,v) is minimum number of FF. on path u,v Objective: Minimize the number of flip-flops (FF.) Constraints:
- Num. FF. has to be satisfied
r(u) ≤ w(eu,v) + r(v)
- Num. FF. on critical paths has to be greater than zero
Only these 2 FF. can move out of u
13
May, 2006
Retiming Formulation
Q Q SET CLR S R1 2 3
W(1,2) = 0 W(1,3) = 1 W(2,3) = 1 V is the set of gates and E is the set of edges. v ∈ V and e ∈ E r(v) is the number of FF. moved from fanout of node v to fanin of node v w(eu,v) is the FF. count on edge u,v, D(u,v) is the maximum delay on path u,v W(u,v) is minimum number of FF. on path u,v Cycle Time (L) =2 Objective: Minimize the number of flip-flops (FF.) Constraints:
- Num. FF. has to be satisfied
r(u) ≤ w(eu,v) + r(v)
- Num. FF. on critical paths has to be greater than zero
D(1,2) = 2 D(1,3) = 3 D(2,3) = 2 r(1)-r(3) ≤ 0 r(1) ≤ r(3)
14
May, 2006
Retiming Formulation
Q Q SET CLR S R1 2 3
W(1,2) = 0 W(1,3) = 1 W(2,3) = 1 V is the set of gates and E is the set of edges. v ∈ V and e ∈ E r(v) is the number of FF. moved from fanout of node v to fanin of node v w(eu,v) is the FF. count on edge u,v, D(u,v) is the maximum delay on path u,v W(u,v) is minimum number of FF. on path u,v Cycle Time (L) =2 Objective: Minimize the number of flip-flops (FF.) Constraints:
- Num. FF. has to be satisfied
r(u) ≤ w(eu,v) + r(v)
- Num. FF. on critical paths has to be greater than zero
D(1,2) = 2 D(1,3) = 3 D(2,3) = 2 r(1)-r(3) ≤ 0 r(1) ≤ r(3)
15
May, 2006
Q Q SET CLR S R Q Q SET CLR S RNon-critical Gates for Power Minimization
Non-critical gates: What should we do? We can use the voltage scaling for non-critical gates after retiming to minimize total power consumption
16
May, 2006
Low-to-High Vdd Conversion
Level Converter (LC) requirement
LC
17
May, 2006
Voltage Scaling Formulation
Objective: Minimize gate power + LC power Constraints: Each gate has to be assigned to
- nly one voltage state
Arrival time + gate delay of each node ≤ target clock period Level converter inserted if low Vdd node drives high Vdd node
18
May, 2006
Voltage Scaling Formulation
v v v v v Vdd High Vth Low (xv,4=1) Vdd High Vth High (xv,3=1) Vdd Low Vth Low (xv,2=1) Vdd Low Vth High (xv,1=1)
19
May, 2006
Voltage Scaling Formulation
u v d(u) = 1 s(u) = 0 s(v) = 1
20
May, 2006
Voltage Scaling Formulation
u v d(u) = 1 s(w) = 1 s(v) = 2 w s(u) = 0 d(w) = 1
21
May, 2006
Voltage Scaling Formulation
u v d(u) = 1 s(u) = 0 s(v) = 1 d(v) = 1 Cycle time (L) = 2 s(u) + d(u) ≤ 2 s(v) + d(v) ≤ 2
22
May, 2006
Voltage Scaling Formulation
VL VH
LC
m(e) = 1
23
May, 2006
Convert from ILP to LP
0.0 0.6 0.3 0.5 1 0.3 0.3 0.4 0.0 0.6 0.0 0.8 0.8 0.5
Assume only two states for illustration purpose m(e) xu,2 = x(u)
0 = low Vdd 1 = high Vdd
VL VH
xu,1=1 xu,2=0 xu,1=0 xu,2=1
24
May, 2006
Gradient Search Algorithm for LC Relaxation
Solve LP by setting m(e) = 0 if m(e) < mth Otherwise m(e) = 1 Solve LP Return Compute new mth Relax LP solution
0.0 0.6 0.3 0.5 1 0.3 0.3 0.4 0.0 0.6 0.0 0.8 0.8 0.5
mth = 0.5
1 1 1
While |Gain| > Threshold
25
May, 2006
Gradient Search Algorithm for LC Relaxation
Solve LP by setting m(e) = 0 if m(e) < mth Otherwise m(e) = 1 Solve LP Return Compute new mth Relax LP solution
0.0 1 0.3 0.4 0.0 0.0 0.8 0.8 0.5
mth = 0.5
1 1 1 0.7 0.3 0.3 0.7
While |Gain| > Threshold
Voltage Assignment Relaxation
26
May, 2006
Voltage Assignment
Four possible voltage assignment:
High Vdd, low Vth node
Fastest gate, high dynamic power, high leakage power
High Vdd, high Vth node
High dynamic power, low leakage power
Low Vdd, low Vth node
Low dynamic power, high leakage power
Low Vdd, high Vth node
Slowest gate, low dynamic power, low leakage power
27
May, 2006
Possible Supply Voltage Assignment
VH VH VL VH VH VL VL VL
LC
VH VH VL VH VH VL VL VL
LC LC LC
Feasible Solution Infeasible Solution
28
May, 2006
LP Relaxation for Voltage State Assignment
u
LC
low Vdd high Vdd
VL VH
v
29
May, 2006
LP Relaxation for Voltage State Assignment
u
0.7
low Vdd high Vdd
VL VH
v
30
May, 2006
LP Relaxation for Voltage State Assignment
u
LC
v
low Vdd high Vdd
VL VH
31
May, 2006
LP Relaxation for Voltage State Assignment
high Vdd low Vth
Slk = 2.2 v v Assigned VddHigh to V Dly = 1 Dly = 2.1 v
high Vdd high Vth
32
May, 2006
LP Relaxation for Voltage State Assignment
Slk = 1.5 v v Assigned VddHigh to V Dly = 1 Dly = 2.1 v
high Vdd low Vth high Vdd high Vth
33
May, 2006
LP Relaxation for Voltage State Assignment
0.0 1 1 0.3 0.3 0.0 0.0 0.7 0.7 0.3 1 1
Assume only two states
VL VH
34
May, 2006
Gradient Search Algorithm for LC Relaxation
Solve LP by setting m(e) = 0 if m(e) < mth Otherwise m(e) = 1 Solve LP Return Compute new mth Relax LP solution
0.0 1 0.3 0.0 0.0
mth = 0.5
1 1 1 0.7 0.3 0.3 0.7
While |Gain| > Threshold
0.6 0.3 0.5 0.4 0.6 0.8 0.8 0.5 0.3
Compute for next mth mth = 0.6
35
May, 2006
Post Refinement
36
May, 2006
Outline
Introduction and Motivation Related Work Methodology Experimental Results Conclusions
37
May, 2006
Impact of Retiming on Power
0.66 0.93 0.76 1
- Ratio
552.8 536.2 781.7 765 643.5 569.8 835.2 761.4 s1494 552.4 535.7 781.7 765 627.6 568.1 821.3 761.8 s1488 447.9 395.5 599.1 546.7 602.3 404.8 764.6 567 s1238 434.1 381.8 591 538.7 579.7 389.3 758.3 567.9 s1196 433.5 283.6 586.9 436.9 407 247.6 543 383.5 s838 314.2 299.9 418.2 403.9 331.2 307.4 407.8 384 s832 310.3 296 415 400.8 322.8 299 404.8 381 s820 244.6 199.4 375.8 330.5 269.9 181.9 399.2 311.1 s713 232.6 187.4 361.8 316.6 246.1 170 372.1 295.9 s641 GLF GL GLF GL GLF GL GLF GL Vdd + Vth (uW) Vdd (uW) Vdd + Vth (uW) Vdd (uW) min FF. retiming Retiming + Scaling [Chabini04] ckt GL = Gate Power + LC Power GLF = Gate Power + LC Power + FF Power
38
May, 2006
Power Comparison on Different Voltage Scaling Techniques (in uW)
1 day 44 sec 29 sec 28 sec time 0.66 0.66 0.94 1 ratio 550.5 552.8 773.2 795.5 s1494 551.4 552.4 773.8 796.1 s1488 446.6 447.9 619.1 648.4 s1238 434.1 434.1 616.2 646.8 s1196 428.6 433.5 579.6 627.3 s838 312.5 314.2 415.6 428.9 s832 309.7 310.3 412.4 425.7 s820 243.3 244.6 392.3 458 s713 230.6 232.6 374.5 434.3 s641 ILP LP CVS[Usami95] INIT ckt
INIT = all nodes Vdd-H + Vth-L CVS= clustered Voltage Scaling LP = Linear Programming ILP = Integer Linear Programming
39
May, 2006
Outline
Introduction and Motivation Related Work Methodology Experimental Results Conclusions
40
May, 2006
Conclusions
Power minimization is an important VLSI design issue: both
static and dynamic power
We propose a mathematical model to solve power optimization
issue while maintain the target clock period
The experiment results show up to 30% power reduction
41
May, 2006
42
May, 2006
Delay and Power for Voltage Scaling
43
May, 2006
Retiming Algorithm
- FF. edge has weight =
clock period * number of FF.
If Bellman-Ford algorithm has a
feasible solution, the target clock period is feasible
Binary search is used to identify
smallest feasible clock period (cycle time)
1 1 1 1 1 1
- 1
1
1 2 2
1
1 2 3 3 Gate and wire delay Flipflop
44
May, 2006
Retiming LP Formulation
1 2 3 4 5
2 2 20 10 10 20 10
60
1 1 1
30
45
May, 2006
Retiming Formulation
Q Q SET CLR S R1 w(e)
1 2 3
r(v) W(1,2) = 0 W(1,3) = 1 W(2,3) = 1 1 wr(e) 1 r(v) r(v) is the number of FF. moved from fanout of node v to fanin of node v w(eu,v) is the FF. count on edge u,v, D(u,v) is the maximum delay on path u,v W(u,v) is minimum number of FF. on path u,v Cycle Time (L) =2 Objective: Minimize the number of flip-flops (FF.) Constraints:
- Num. FF. has to be satisfied
r(u) ≤ w(eu,v) + r(v)
- Num. FF. on critical paths has to be greater than zero
D(1,2) = 2 D(1,3) = 3 D(2,3) = 2
46
May, 2006
LC
47
May, 2006
LCFF
48
May, 2006
Power Comparison on Different Voltage Scaling Techniques (in uW)
35 sec 0.69 575.0 577.1 454.1 439.0 440.1 331.6 327.7 268.7 253.9 MVVS
[Srivastava04]
1 day 44 sec 29 sec 28 sec time 0.66 0.66 0.94 1 ratio 550.5 552.8 773.2 795.5 s1494 551.4 552.4 773.8 796.1 s1488 446.6 447.9 619.1 648.4 s1238 434.1 434.1 616.2 646.8 s1196 428.6 433.5 579.6 627.3 s838 312.5 314.2 415.6 428.9 s832 309.7 310.3 412.4 425.7 s820 243.3 244.6 392.3 458 s713 230.6 232.6 374.5 434.3 s641 ILP LP CVS[Usami95] INIT ckt
INIT = all nodes Vdd-H + Vth-L CVS= clustered Voltage Scaling MVS = modified Vdd/Vth and Sizing LP = Linear Programming ILP = Integer Linear Programming