Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing - - PowerPoint PPT Presentation

discrete buffer and wire sizing for discrete buffer and
SMART_READER_LITE
LIVE PREVIEW

Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing - - PowerPoint PPT Presentation

Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing for Link-Based Non-Tree Clock Network Link-Based Non-Tree Clock Network Link Based Non Tree Clock Network Link Based Non Tree Clock Network Rupak Samanta, Jiang Hu and Peng Li


slide-1
SLIDE 1

Discrete Buffer and Wire Sizing for Link-Based Non-Tree Clock Network Discrete Buffer and Wire Sizing for Link-Based Non-Tree Clock Network Link Based Non Tree Clock Network Link Based Non Tree Clock Network

Rupak Samanta, Jiang Hu and Peng Li Department of Electrical and Computer Engineering p p g g Texas A&M University

slide-2
SLIDE 2

Outline Outline

Introduction P li i f S t V t M hi Preliminary of Support Vector Machine Sizing Algorithm Experimental Results Conclusion

2

slide-3
SLIDE 3

Challenges Challenges

Power constraint PVT PVT variations

< 65nm 90nm 130nm

3

slide-4
SLIDE 4

Challenges in Clock Network Challenges in Clock Network

Clock network is a sub-circuit that involves both the challenges variability and power consumption challenges – variability and power consumption A well known approach for skew tolerance to variation is clock mesh – large wire/power overhead Link based non tree clock network provides trade off Link-based non-tree clock network provides trade-off between variation tolerance and power overhead

4

slide-5
SLIDE 5

Our Objective Our Objective

“To investigate optimizing link-based clock network through discrete buffer and wire sizing” g g

5

slide-6
SLIDE 6

Review of Link Insertion Review of Link Insertion

  • A. Rajaram, J. Hu and R. Mahapatra, DAC04

Add links such that skew due to variations is reduced AND nominal skew is unaffected Link = link_capacitors + link_resistor

u R u W w C/2 C/2

6

slide-7
SLIDE 7

Link Insertion for Buffered Clock Tree Link Insertion for Buffered Clock Tree

  • G. Venkataraman et.al. ICCAD05

Links need to be inserted between different sub-networks 7

slide-8
SLIDE 8

Motivation Motivation

There is almost no work on optimizing clock network with cross links Most of the previous work on buffer and wire sizing

Elmore delay model continuous sizing

Elmore delay model is inaccurate and differs by large t h d ith SPICE amount when compared with SPICE The number of buffer and wire options are small, rounding continuous sizing result in significant errors rounding continuous sizing result in significant errors

8

slide-9
SLIDE 9

Our Contributions Our Contributions

Support Vector Machine (SVM) is explored

h dl th l d l d l i

handle the complex delay model issue provide guidance for discrete optimization in large

design space design space

Proposed a two stage hybrid optimization h approach

Discrete sizing

U i t d l d l

Using accurate delay model

9

slide-10
SLIDE 10

Support Vector Machine(SVM) Support Vector Machine(SVM)

SVM is well suited for highly nonlinear and high-dimensional data data SVM can operate in different modes

Classification Regression Ranking

For a set of M training data set (x1, y1) …….…(xm, ym), the regression model

f(x) = Σαi K(si , x)

The kernel function K(si, x) corresponds to a dot product in certain feature space

10

slide-11
SLIDE 11

Skew Quality Function Skew Quality Function

2

) (

j i

t t Q − =∑

ti is clock delay at leaf node i

) (

j i

Q ∑

An overall function that penalize large skew Clock delay is obtained through SPICE simulations y g The data is applied to train SVM model of Q Once an SVM model is built it is utilized repeatedly then Once an SVM model is built, it is utilized repeatedly, then the training cost is amortized

11

slide-12
SLIDE 12

Sizing Algorithm Sizing Algorithm

The goal of the buffer and the wire sizing is to minimize the global skew the global skew Our approach is to iteratively optimize a portion of the given clock network given clock network

Compared to simultaneously optimizing the entire network

  • ur approach is more practical

Compared to iteratively optimizing a single element, our

approach is more efficient on finding a global solution

12

slide-13
SLIDE 13

Buffered Clock Tree with Cross links Buffered Clock Tree with Cross links

u

Level i

u v segment

Level 1

subtree

i+1

subtree

13

slide-14
SLIDE 14

Optimization Flow Optimization Flow

Input: Clock Network

d fi iti f t

Remove Link Resistors

p: definition of component e: types of element to be sized k: #components in optimization

Optimization Core1(p1, e1,k1,ε1) stage1

ε: optimization engine

  • 1. Run SPICE simulation

2 S < k t

Optimization Core2(p1, e1,2,k2,ε2) Add back Link Resistors

  • 2. S <= k components p

associated with max delay S <= S U k components p associated with min delay

Add back Link Resistors Optimization Core1(p2, e2,k1,ε2) stage2

  • 3. Build SVM over S
  • 4. Size element e in S using ε

Optimization Core2(p2, e2,2,k2,ε2) p1: subtree p2: subtree + links e1: buffers + wires

14

e1 bu e s es e1,2: wires

slide-15
SLIDE 15

Optmization Stage I Optmization Stage I

Link Resistors are removed, Link capacitance retained STEP 1 STEP 2

  • Step 1 is done on a coarse
  • Step 2 is on fine-grained

STEP 1 STEP 2

  • Step 1 is done on a coarse

level

  • Each wire segment in a sub-

tree is sized uniformly

  • Step 2 is on fine grained

and local level

  • Individual wires in each sub-

tree are sized differently

  • Only one variable is needed

for the wire

  • This is done to reduce the

t t l b f i bl i

  • For each sub-tree the

number of variables is usually large

  • The number of sub trees

total number of variables in each sub-tree

  • More sub-trees can be

chosen for optimization

  • The number of sub-trees

chosen for optimization are smaller than Step 1 chosen for optimization

15

slide-16
SLIDE 16

Optimization Engine for Stage I Optimization Engine for Stage I

Input: Set of Components to be sized τ is error tolerance 1. While (improve) { 2. Partition S into a set G of m groups 3 Obtain average leaf node delay t

  • f

, τ δ τ δ + ≤ Δ ≤ −

l

t

leaf node i ∀

3. Obtain average leaf node delay tavg of each group 4. Sort groups in G in non-decreasing

  • rder of tavg

, 1

, =

∀b b i

x

∀ ∀

=

w j

y , 1

,

buffer i wire i avg

5. For i = 1 to m/2 { 6. while (improve) 7 Increase t

  • f g in G by δ

∀ ∀

∀w j,

} 1 , {

, ∈ b i

x

i,b

} 1 , { ∈

w j

y

j,w

ILP 7. Increase tavg of gi in G by δ 8. while (improve) 9. Decrease tavg of gm-i+1 in G by δ

} , {

,w j

y

j, g

10. }

  • 11. }

16

slide-17
SLIDE 17

Optimization Stage II Optimization Stage II

Link Resistors added back

  • The clock network topology becomes non-tree
  • The clock network topology becomes non tree
  • Similar to stage I, this stage consists of two

steps of optimizations

  • Since network topology is non-tree its not
  • Since network topology is non-tree, its not

friendly to ILP formulation

  • The optimization engine is designed using a

group migration heuristics

  • Similar to stage I, the objective is to minimize

the skew cost

17

slide-18
SLIDE 18

Optimization Engine for Stage II Optimization Engine for Stage II

Input: Set S of Component to be sized 1. While (true) { 2. S’ S 3 While S’ is not empty { 3. While S is not empty { 4. Find move i with max gain gi 5. ei = the element sized in move i

i i i

Q Q g − =

−1

6. S’ S’ – {ei}} 7. Find l such that cumulated gain Gl is maximized

∑ = l i g l G 1

8. If Gl > 0 make the l moves on S 9. Else break }

18

slide-19
SLIDE 19

Experimental Setup Experimental Setup

The experiments are performed on ISCAS89 sequential benchmark circuits. The circuits are synthesized using SIS and placed in mPL The clock tree construction and link insertion is done according to g paper G. Venkataraman et.al [ICCAD05]. Case # of Sinks # of Buffers # of Links S9234 S5378 135 164 20 25 21 30 S5378 S13207 S15850 164 503 566 25 77 81 30 69 86 S15850 S38584 S35932 566 1428 1728 81 235 286 86 50 143 19 S35932 1728 286 143

slide-20
SLIDE 20

Experimental Setup…Contd Experimental Setup…Contd

Model Vdd Buffer Library Wire Library K1 K2 y y 90nm BPTM 1.0 V 16X, 24X, 32X, 48X 1X, 2X, 3X 10-15 4-5

X: size of minimum width buffer (wire)

The buffer and wire sizing algorithm is implemented in C. The Integer Linear Program is solved using a public domain solver called GLPK [http://www.gnu.org/software/glpk/] The binaries for the Support Vector Machine is downloaded f [htt // li ht j hi /] from [http://svmlight.joachims.org/] The experiments are performed using 2 Dual-Core Intel Xeon Processor of 3 2 Ghz and 8Gb of memory

20

Processor of 3.2 Ghz and 8Gb of memory

slide-21
SLIDE 21

Comparison between SPICE, SVM and Elmore delay Comparison between SPICE, SVM and Elmore delay

21

slide-22
SLIDE 22

Experimental Results Experimental Results

The experiments are done in SPICE to compare skew and power power We compare three approaches for the non-tree clock network

Tree+ Link Tree+ Link+ Sizing (wo SVM) Tree+ Link+ Sizing (w SVM)

Our approach is also suitable for optimizing Clock tree

  • network. So we simulated clock tree for the three approaches

Tree Tree Tree + Sizing (wo SVM) Tree + Sizing (w SVM)

22

slide-23
SLIDE 23

Global Skew for Non-tree Clock Network Global Skew for Non-tree Clock Network

Result for Normalized Global Skew

1 1.2 0.8

  • bal Skew

0.4 0.6

  • rmalized Glo

0.2 No s9234 s5378 s13207 s15850 s38584 s35932 Average Test Cases

23

Tree + Link Tree + Link + Sizing (wo SVM) Tree + Link + Sizing (w SVM)

slide-24
SLIDE 24

Average Power Consumption for Non-tree Clock Network Average Power Consumption for Non-tree Clock Network

Result for Normalized Average Power Consumtpion

1 1.2

  • n

0.8 er Consumpti 0.4 0.6 Average Powe 0.2 Normalized A s9234 s5378 s13207 s15850 s38584 s35932 Average Test Cases

24

Tree + Link Tree + Link + Sizing (wo SVM) Tree + Link + Sizing (w SVM)

slide-25
SLIDE 25

Total Capacitance for Non-tree Clock Network Total Capacitance for Non-tree Clock Network

Result for Normalized Total Capacitance

1 4 1.2 1.4 0.8 1 Capacitance 0.4 0.6 malized Total 0.2 Norm s9234 s5378 s13207 s15850 s38584 s35932 Average Test Cases Tree + Link Tree + Link + Sizing (wo SVM) Tree + Link + Sizing (w SVM)

25

Tree + Link Tree + Link + Sizing (wo SVM) Tree + Link + Sizing (w SVM)

slide-26
SLIDE 26

Global Skew for Clock Tree Network Global Skew for Clock Tree Network

Result for Normalized Global Skew

1 1.2 0.8 al Skew 0.4 0.6 malized Glob 0.2 Norm s9234 s5378 s13207 s15850 s38584 s35932 Average Test Cases

26

Tree Tree + Sizing (wo SVM) Tree + Sizing (w SVM)

slide-27
SLIDE 27

Average Power Consumption for Clock Tree Network Average Power Consumption for Clock Tree Network

Result for Normalized Average Power Consumtpion

1 1.2

  • n

0.8 er Consumpti 0.4 0.6 Average Powe 0.2 Normalized A s9234 s5378 s13207 s15850 s38584 s35932 Average Test Cases

27

Tree Tree + Sizing (wo SVM) Tree + Sizing (w SVM)

slide-28
SLIDE 28

Total Capacitance for Clock Tree Network Total Capacitance for Clock Tree Network

Result for Normalized Total Capacitance

1 1.2 0.8 apacitance 0.4 0.6 lized Total Ca 0.2 Normal s9234 s5378 s13207 s15850 s38584 s35932 Average Test Cases

28

Tree Tree + Sizing (wo SVM) Tree + Sizing (w SVM)

slide-29
SLIDE 29

Conclusions Conclusions

We investigate the buffer and wire sizing for link-based non- tree clock network tree clock network Support Vector Machine is explored

to handle complex delay model issues

p y

provide guidance in discrete optimization space

A two stage hybrid optimization approach using an accurate g y p pp g delay is proposed SPICE based experimental results indicate significant i i k l improvement in skew results Our sizing algorithm can also be applied to clock tree network

29

slide-30
SLIDE 30

Questions Questions

30