A Faster Approximation Scheme for Timing A Faster Approximation - - PowerPoint PPT Presentation

a faster approximation scheme for timing a faster
SMART_READER_LITE
LIVE PREVIEW

A Faster Approximation Scheme for Timing A Faster Approximation - - PowerPoint PPT Presentation

A Faster Approximation Scheme for Timing A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Driven Minimum Cost Layer Assignment Shiyan Hu* , Shiyan Hu * , Zhuo Zhuo Li* * , and Charles J. Alpert* * Li* * , and


slide-1
SLIDE 1

A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment

Shiyan Hu Shiyan Hu* , * , Zhuo Zhuo Li* * , and Charles J. Alpert* * Li* * , and Charles J. Alpert* * * Dept of ECE, Michigan Technological University * Dept of ECE, Michigan Technological University * * IBM Austin Research Lab * * IBM Austin Research Lab

slide-2
SLIDE 2

2

Outline Outline

slide-3
SLIDE 3

3

Layer Assignment Layer Assignment

1 1X X 2 2X X 4 4X X

  • In 45nm technology, layer assignment is critical for

In 45nm technology, layer assignment is critical for timing and buffer area optimization timing and buffer area optimization

slide-4
SLIDE 4

4

Wire RC and Delay Wire RC and Delay

10 20 30 40 50 60 70 80 90 0.1 0.2 0.3 0.4 0.5 0.6 0.7 wire length w ire d e la y M2 M4 M6

Wire in higher Wire in higher layer has much layer has much smaller delay smaller delay

0.00E+ 00 5.00E-01 1.00E+ 00 1.50E+ 00 2.00E+ 00 Resistance M2 M4 M6 0.00E+ 00 5.00E+ 01 1.00E+ 02 1.50E+ 02 2.00E+ 02 2.50E+ 02 Capacitance M2 M4 M6

slide-5
SLIDE 5

5

I mpact to Buffering I mpact to Buffering

  • A buffer can

A buffer can drive longer drive longer distance in distance in higher layer higher layer

  • Timing is

Timing is improved improved

  • Fewer buffers

Fewer buffers are needed are needed

slide-6
SLIDE 6

6

I mpact to Routing/ Buffering I mpact to Routing/ Buffering

IP IP IP IP

slide-7
SLIDE 7

7

Problem Formulation Problem Formulation

  • Find a

Find a minimal cost minimal cost layer assignment such that the layer assignment such that the timing constraint is satisfied. timing constraint is satisfied.

Same Layer Same Layer Can be different Can be different layers layers

  • Given

Given

– – A buffered Steiner tree A buffered Steiner tree with n wire segments with n wire segments – – Timing constraint Timing constraint – – m wire layers with RC m wire layers with RC parameters and cost parameters and cost

  • A layer refers to a pair of horizontal and vertical

A layer refers to a pair of horizontal and vertical layers with similar RC characteristics layers with similar RC characteristics

  • Between any buffers, one layer is used

Between any buffers, one layer is used

  • In early design stage, when buffering effect is

In early design stage, when buffering effect is considered, wire shaping is not important [Alpert considered, wire shaping is not important [Alpert TCAD TCAD’ ’01] 01]

  • In post

In post-

  • routing stage, wire shaping could improve

routing stage, wire shaping could improve timing, reduce timing, reduce vias vias and reduce coupling and so and reduce coupling and so forth forth

slide-8
SLIDE 8

Fully Polynomial Time Approximation Scheme (FPTAS) Fully Polynomial Time Fully Polynomial Time Approximation Scheme (FPTAS) Approximation Scheme (FPTAS)

8

  • A Fully Polynomial Time

A Fully Polynomial Time Approximation Scheme Approximation Scheme

  • Provably good

Provably good

  • Within (1+

Within (1+ ɛ

ɛ)

)

  • ptimal cost for any
  • ptimal cost for any

ɛ ɛ> 0

> 0

  • Runs in time

Runs in time polynomial in n polynomial in n (segments), m (layers) (segments), m (layers) and 1/ and 1/ ɛ

ɛ

  • Ultimate solution for

Ultimate solution for an NP an NP-

  • hard problem

hard problem in theory in theory

  • Highly practical

Highly practical

1 1X X 2 2X X 4 4X X

slide-9
SLIDE 9

Previous Work in I CCAD’08 Previous Work in I CCAD’08

  • It depends on M

It depends on M and uses a DP of O(m and uses a DP of O(mn n3

3/

/ ɛ

ɛ2

2) time

) time

9

Bound independent oracle query Our DP needs one run for all W

  • New FPTAS runs in O(mn

New FPTAS runs in O(mn2

2/

/ ɛ

ɛ) time

) time

Ratio between upper and Ratio between upper and lower bounds of the cost of lower bounds of the cost of

  • ptimal layer assignment
  • ptimal layer assignment

An iterative DP with An iterative DP with incremental W incremental W

slide-10
SLIDE 10

10 10

The Rough Picture The Rough Picture

W* : the cost of optimal solution W* : the cost of optimal solution

Check it Make guess on W* Return the solution Good (close to W* ) Not Good Key 2: Smart guess Key 1: Efficient checking

slide-11
SLIDE 11

11

Key 1: Efficient Checking Key 1: Efficient Checking

Benefit of guess Benefit of guess

  • Only maintain the

Only maintain the solutions with cost solutions with cost no greater than the no greater than the guessed cost guessed cost

  • Accelerate DP

Accelerate DP

slide-12
SLIDE 12
  • Oracle (x): the checker, able to decide whether x

Oracle (x): the checker, able to decide whether x> > W* or not W* or not – – Without knowing W* Without knowing W* – – Answer efficiently Answer efficiently

12 12

The Oracle The Oracle

Oracle (x) Guess x within the bounds Setup upper and lower bounds of cost W* Update the bounds

slide-13
SLIDE 13

13 13

Construction of Oracle(x) Construction of Oracle(x)

Scale and round Scale and round each wire cost each wire cost

⎥ ⎦ ⎥ ⎢ ⎣ ⎢ = n x w w / ε Only interested in Only interested in whether there is a whether there is a solution with cost solution with cost up to x satisfying up to x satisfying timing constraint timing constraint

Dynamic Dynamic Programming Programming Perform DP to Perform DP to scaled problem scaled problem with cost bound with cost bound n/ n/ ɛ

ɛ. Time

. Time polynomial in n/ polynomial in n/ ɛ

ɛ

slide-14
SLIDE 14

14

Scaling and Rounding Scaling and Rounding

ɛ/n

2xɛ

ɛ/n

3xɛ

ɛ/n

4xɛ

ɛ/n

Wire cost

Wire cost is integer after scaling and rounding with upper bound n/ɛ. Total # solutions is bounded in DP

Rounding error at each wire Rounding error at each wire

≤xɛ

ɛ/n, total rounding error , total rounding error ≤xɛ ɛ. .

  • Larger x: larger error, fewer

Larger x: larger error, fewer distinct costs and faster distinct costs and faster

  • Smaller x: smaller error, more

Smaller x: smaller error, more distinct costs and slower distinct costs and slower

  • Rounding is the reason of

Rounding is the reason of acceleration acceleration

slide-15
SLIDE 15

Dynamic Programming Results Dynamic Programming Results

15

Yes, there is a solution satisfying timing constraint No, no such solution

With cost rounding back, the solution has cost at most n/ɛ • xɛ/n + xɛ= (1+ɛ)x > W* With cost rounding back, the solution has cost at least n/ɛ • xɛ/n = x ≤ W*

DP result w/ all w are integers ≤ n/ɛ

slide-16
SLIDE 16

16

Solution Characterization Solution Characterization

  • To model effect to

To model effect to upstream, a upstream, a candidate solution candidate solution is associated with is associated with

  • v: a node

v: a node

  • Q: required arrival

Q: required arrival time time

  • W: cumulative

W: cumulative wire cost wire cost

slide-17
SLIDE 17

17

Cost (W)-Bounded Dynamic Programming (DP) Cost (W)-Bounded Dynamic Programming (DP)

Candidate solutions are propagated toward the source

  • Start from sinks
  • Candidate solutions

are generated

  • Two operations

– Subtree processing – Solution update at buffer

  • Solution Pruning
slide-18
SLIDE 18

18

Subtree Processing Subtree Processing

  • Three paths

Three paths

– – p pa

a: a

: a -

  • > u

> u – – P Pb

b: b

: b -

  • > u

> u – – P Pc

c: c

: c -

  • > u

> u

  • Q

Qu

u(l)= min{

(l)= min{ Q Qa

a-

  • d(p

d(pa

a,l),

,l),Q Qb

b-

  • d(

d(p pb

b,l),Q

,l),Qc

c-

  • d(p

d(pc

c,l)}

,l)}

  • W

Wu

u(l)=

(l)= W Wa

a+

+ W Wb

b+

+ W Wc

c+ w(T,l)

+ w(T,l)

  • Wires are in the same

Wires are in the same layer l layer l

( (Q Qu

u,W

,Wu

u)

) ( (Q Qa

a,

,W Wa

a)

) ( (Q Qb

b,

,W Wb

b)

) ( (Q Qc

c,

,W Wc

c)

)

slide-19
SLIDE 19

19

Exponential # of Solutions Exponential # of Solutions

  • W (= n/

W (= n/ ɛ

ɛ) solutions

) solutions at each downstream at each downstream buffer buffer

  • Na

Naï ïve ve merging takes merging takes O(W O(Wk

k) time with k

) time with k branches branches

( (Q Qa

a,1 ,1,

,W Wa

a,1 ,1)

) ( (Q Qa

a,2 ,2,

,W Wa

a,2 ,2)

) ( (Q Qa

a,3 ,3,

,W Wa

a,3 ,3)

) ( (Q Qa

a,4 ,4,

,W Wa

a,4 ,4)

) ( (Q Qu

u,

,W Wa

a)

) ( (Q Qb

b,1 ,1,

,W Wb

b,1 ,1)

) ( (Q Qb

b,2 ,2,

,W Wb

b,2 ,2)

) ( (Q Qb

b,3 ,3,

,W Wb

b,3 ,3)

) ( (Q Qb

b,4 ,4,

,W Wb

b,4 ,4)

) ( (Q Qc,1

c,1,

,W Wc

c,1 ,1)

) (Q (Qc,2

c,2,

,W Wc

c,2 ,2)

) (Q (Qc,3

c,3,

,W Wc

c,3 ,3)

) (Q (Qc,4

c,4,

,W Wc

c,4 ,4)

)

k k

  • For two solutions at a node with the same

For two solutions at a node with the same W, the one with smaller Q is dominated W, the one with smaller Q is dominated

  • Try to only generate non

Try to only generate non-

  • dominated

dominated solutions since most of O(W solutions since most of O(Wk

k) solutions

) solutions are dominated solutions are dominated solutions

slide-20
SLIDE 20

20

Multi-Way Merging Multi-Way Merging

  • If best Q for cost w is obtained by merging

If best Q for cost w is obtained by merging Q(a Q(a1

1 i1 i1), Q(a

), Q(a2

2 i2 i2),..., Q(a

),..., Q(ak

k ik ik)

), where i , where i1

1+ i

+ i2

2+

+ … …i ik

k= w,

= w, best Q for cost w+ 1 is obtained by best Q for cost w+ 1 is obtained by

max max 1

1 ≤ r r ≤ k k min { Q(a

min { Q(a1

1 i1 i1),Q(a

),Q(a2

2 i2 i2),..., Q(a

),..., Q(ar

r ir+ 1 ir+ 1), ...,Q(a

), ...,Q(ak

k ik ik)}

)}

slide-21
SLIDE 21

Four-Branch Example Four-Branch Example

21

Solution(w= 8, Q= 9) is shown. Solution(w= 8, Q= 9) is shown. To compute Solution (w= 9, Q) To compute Solution (w= 9, Q)

slide-22
SLIDE 22

Four-Branch Example – Case 1 Four-Branch Example – Case 1

22

Candidate Solution (w= 9, Q= 8) Candidate Solution (w= 9, Q= 8)

slide-23
SLIDE 23

Four-Branch Example – Case 2 Four-Branch Example – Case 2

23

Candidate Solution (w= 9, Q= 4) Candidate Solution (w= 9, Q= 4)

slide-24
SLIDE 24

Four-Branch Example – Case 3 Four-Branch Example – Case 3

24

Candidate Solution (w= 9, Q= 5) Candidate Solution (w= 9, Q= 5)

slide-25
SLIDE 25

Four-Branch Example – Case 4 Four-Branch Example – Case 4

25

Candidate Solution (w= 9, Q= 7) Candidate Solution (w= 9, Q= 7)

slide-26
SLIDE 26

Linear Time Multi-Way Merging Linear Time Multi-Way Merging

26

  • Lemma: given a

Lemma: given a subtree subtree with m layers, k with m layers, k branches and W non branches and W non-

  • dominated solutions at each

dominated solutions at each downstream buffer, one can merge them in downstream buffer, one can merge them in O( O(mkW mkW) time. ) time.

slide-27
SLIDE 27

Solution Update at Buffer Solution Update at Buffer

27

  • After merging, one non

After merging, one non-

  • dominated solution per

dominated solution per layer per cost, totally layer per cost, totally O( O(mW mW) solutions ) solutions

  • For each cost, find

For each cost, find largest Q for all layers largest Q for all layers after buffer and after buffer and propagate it propagate it

( (Q Qu

u,W

,Wu

u)

) ( (Q Qa

a,

,W Wa

a)

) ( (Q Qb

b,

,W Wb

b)

) ( (Q Qc

c,

,W Wc

c)

)

slide-28
SLIDE 28

28

Cost-Bounded DP Cost-Bounded DP

  • Lemma: given a tree with n wire segments and

Lemma: given a tree with n wire segments and m layers, the optimal layer assignment subject m layers, the optimal layer assignment subject to cost budget W= n/ to cost budget W= n/ ɛ

ɛ can be computed in

can be computed in O( O(mnW mnW)= O(mn )= O(mn2

2/

/ ɛ

ɛ) time.

) time.

slide-29
SLIDE 29

29

Key 2: Bound I ndependent Guess Key 2: Bound I ndependent Guess

  • U (L): upper (lower) bound on W*

U (L): upper (lower) bound on W*

  • Naive binary search style approach

Naive binary search style approach

  • Runtime depends on the initial bounds U and L

Runtime depends on the initial bounds U and L

Oracle (x) x= (U+ L)/2 and W= n/ ɛ

ɛ

Set U and L on W*

U= (1+ (1+ɛ ɛ)x )x L= x

W* < (1+ W* < (1+ ɛ

ɛ)x

)x W* W* ≥

≥ x

x

slide-30
SLIDE 30

30

Adapt ɛ Adapt Adapt ɛ ɛ

Rounding factor xɛ

ɛ/n for cost

  • Larger

Larger ɛ

ɛ: faster with

: faster with rough estimation rough estimation

  • Smaller

Smaller ɛ

ɛ: slower with

: slower with accurate estimation accurate estimation

  • Adapt

Adapt ɛ

ɛ and relate it with U and L

and relate it with U and L

slide-31
SLIDE 31

31

U/ L Related Scale & Round U/ L Related Scale & Round

Wire cost U/L

xɛ/n xɛ/n

slide-32
SLIDE 32

32

Conceptually Conceptually

  • Begin with large

Begin with large ɛ

ɛ’

’ and progressively reduce it and progressively reduce it according to U/L as x approaches W* according to U/L as x approaches W*

  • Set

Set ɛ

ɛ’

’ as a geometric sequence of …, 8, 4, 2, 1, 1/2, …, ɛ

ɛ

  • One run of DP takes about O(n/ ɛ

ɛ) time. Total

Total runtime is O( runtime is O(… … + n/8 + n/4 + n/2 + + n/8 + n/4 + n/2 + … … + n/ + n/ ɛ

ɛ) =

) = O(n/ O(n/ ɛ

ɛ). Independent of # of iterations

). Independent of # of iterations

slide-33
SLIDE 33

Oracle Query Till U/ L< 2 Oracle Query Till U/ L< 2

33

' * , * , * , * , '

1 , 1

i i l i u i l i u i

W W x W W ε ε + ⋅ = − = ) ( ) ( ) 1 (

) 3 / 4 ( 2 / 1 1 * , * , 2 1 * , * , 2 1 ' 2

i t

t i i u i l t i i u i l t i i

W W mn O W W mn O mn O

⋅ ≤ ≤ ≤ ≤ ≤ ≤

∑ ∑ ∑

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = = ε

( )

) ( ) 59 . ( ) (

2 ) 3 / 4 ( 2 / 1 2 ) 3 / 4 ( 2 / 1 * , * , 2

mn O mn O W W mn O

t j t j i u i l

j j

= = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛

∑ ∑

< ≤ ⋅ ⋅ < ≤

i t

t u t l i u i l i u i l i u i l i l i u i l i u

W W W W W W W W W W W W

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⇒ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⇒ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =

+ + ) 3 / 4 ( * , * , * , * , 3 / 4 * , * , * , * , 4 / 3 * , * , * 1 , * 1 ,

slide-34
SLIDE 34

When U/ L< 2 When U/ L< 2

34

  • At least one

At least one feasible solution, feasible solution,

  • therwise no
  • therwise no

solution w/ cost solution w/ cost 2n/ 2n/ ɛ

ɛ • L

ɛ/n = 2L

/n = 2L

≥ ≥ U

U

  • Runs in O(mn

Runs in O(mn2

2/

/ ɛ

ɛ)

) time time

Pick min cost solution satisfying Pick min cost solution satisfying timing at driver timing at driver W= 2n/ W= 2n/ ɛ

ɛ

Scale and round each cost by L Scale and round each cost by Lɛ

ɛ/n

/n Run DP

slide-35
SLIDE 35

35

FPTAS for Layer Assignment FPTAS for Layer Assignment

  • Theorem: a (1+

Theorem: a (1+ ɛ

ɛ) approximation to the timing

) approximation to the timing constrained minimum cost layer assignment constrained minimum cost layer assignment problem can be computed in O(mn problem can be computed in O(mn2

2/

/ ɛ

ɛ) time for

) time for any any ɛ

ɛ> 0.

> 0.

slide-36
SLIDE 36

36

The Algorithmic Flow The Algorithmic Flow

Oracle (x) Adapting ɛ = [U/L-1] 1/2 Set U and L of W* Set x= [UL/(1+ ɛ)] 1/2 Update U or L U/L< 2 Compute final solution

slide-37
SLIDE 37

37

Experiments Experiments

  • Experimental Setup

Experimental Setup

– – 1000 industrial nets 1000 industrial nets

  • Compared to Dynamic Programming

Compared to Dynamic Programming and the previous FPTAS [ICCAD and the previous FPTAS [ICCAD’ ’08] 08]

slide-38
SLIDE 38

38 38

Cost Ratio Compared to DP Cost Ratio Compared to DP

Approximation Ratio ɛ Wire Cost Ratio

0.1 0.2 0.3 0.4 0.5 . 5 . 1 . 2 . 3 . 4 . 5 Old FPTAS New FPTAS

slide-39
SLIDE 39

39 39

Speedup Compared to DP Speedup Compared to DP

Approximation Ratio ɛ Speedup

1 2 3 4 5 6 7 0.05 0.1 0.2 0.3 0.4 0.5 Old FPTAS New FPTAS

slide-40
SLIDE 40

40

Observations Observations

  • FPTAS always achieves the theoretical guarantee

FPTAS always achieves the theoretical guarantee

  • Larger

Larger ɛ

ɛ leads to more speedup

leads to more speedup

  • 3.9x faster with 2.2% additional wire area compared to DP

3.9x faster with 2.2% additional wire area compared to DP

  • Up to 6.5x faster than DP

Up to 6.5x faster than DP

  • On average about 2x faster than previous FPTAS

On average about 2x faster than previous FPTAS

slide-41
SLIDE 41

41

Conclusion Conclusion

  • Propose a (1+

Propose a (1+ ɛ

ɛ) approximation for timing

) approximation for timing constrained layer assignment for any constrained layer assignment for any ɛ

ɛ > 0 running

> 0 running in O(mn in O(mn2

2/

/ ɛ

ɛ) time

) time

– – Linear time DP running in O( Linear time DP running in O(mnW mnW) time ) time – – Bound independent oracle query Bound independent oracle query

– – Up to 6.5x faster than DP and 2x faster than Up to 6.5x faster than DP and 2x faster than previous FPTAS previous FPTAS – – Few percent additional wire area compared to Few percent additional wire area compared to DP as guaranteed theoretically DP as guaranteed theoretically

slide-42
SLIDE 42

42

Thanks