Pattern Based Method For P/G Grid Analysis Jin Shi, Yici Cai, - - PowerPoint PPT Presentation

pattern based method for p g grid analysis
SMART_READER_LITE
LIVE PREVIEW

Pattern Based Method For P/G Grid Analysis Jin Shi, Yici Cai, - - PowerPoint PPT Presentation

Pattern Based Method For P/G Grid Analysis Jin Shi, Yici Cai, Xianlong Hong Sheldon X.-D. Tan EDA Lab. CS Department, Tsinghua University EE Department, University of California Riverside Outline Practical Computation Problems


slide-1
SLIDE 1

Pattern Based Method For P/G Grid Analysis

Jin Shi, Yici Cai, Xianlong Hong Sheldon X.-D. Tan

EDA Lab. CS Department, Tsinghua University EE Department, University of California Riverside

slide-2
SLIDE 2

2006-3-31 2

Outline

  • Practical Computation Problems
  • Overview of Existing Methods
  • Acceleration Tech
  • Some Observations
  • Pattern Based Method
  • Conclusion
slide-3
SLIDE 3

2006-3-31 3

Outline

  • Practical Computation Problems
  • Overview of Existing Methods
  • Acceleration Tech
  • Some Observations
  • Pattern Based Method
  • Conclusion
slide-4
SLIDE 4

2006-3-31 4

Practical Problems #1

  • Linear computation complexity does not mean linear

increasing time cost due to Cache Miss

3

( ) O n

  • 1

1 2 3 4 5 6 1 2 3 4 5 log Problem Size log cycles/flop

T = N4.7

Size 2000 took 5 days 12000 would take 1095 years

matrix multiply operation

algorithm looks like

5

( ) O n

slide-5
SLIDE 5

2006-3-31 5

Practical Problems #1

  • Summary of Cache Miss

– Linear computation complexity is not enough – Optimize algorithms together with cache performance

slide-6
SLIDE 6

2006-3-31 6

Practical Problems #2

  • Iterative efficiency

– Preconditioner’s performance decreases as the matrix size increases

1 2 3 4 5 6 7 8 9 10 11 50 100 150 200 250 300 350 400 2 log(size) iteration times Performance of Different Preconditioner

slide-7
SLIDE 7

2006-3-31 7

Practical Problems #3

  • Memory efficiency limitation
  • When the Design is too large …

– Hard to load it from DB – Impossible to build matrix – Vector malloc run out of memory – Too many years to get results

slide-8
SLIDE 8

2006-3-31 8

Our Contribution

  • Alleviate Cache Miss
  • Reduce the overall memory usage
  • Present more efficient preconditioner under a

partition framework

  • Constant preconditioner performance
slide-9
SLIDE 9

2006-3-31 9

Outline

  • Practical Computation Problems
  • Overview of Existing Methods
  • Acceleration Tech
  • Some Observations
  • Pattern Based Method
  • Conclusion
slide-10
SLIDE 10

2006-3-31 10

Summary of Existing Method

  • Computation Complexity <

– LU – PCG p slightly larger than 1 – M-G with small coefficient – R-W with large coefficient – ADI with small coefficient

  • Memory Efficiency

– LU – PCG – M-G – R-W – ADI

  • Trade off between speed and accuracy

2

( ) O n

( )

p

O n

2

( ) O n ( ) O n ( ) O n

( ) O n

( ) O n

2

( ) O n ( ) O n ( ) O n

( ) O n

slide-11
SLIDE 11

2006-3-31 11

Summary

  • Direct Methods vs Iterative Methods

– Time cost of LU: – Time cost of PCG: – Which is faster depends on the fill in ratio and performance of preconditioner

1 1 1 2 1 1

( ( )) ( ( ) ( ) ) ( ) (1 ) ( ) : t d A N t L v t U v t L v f nnz A f fill in ratio

− − −

+ × + ∝ + ⋅

( ( ) ) ( ( ) ) ( ) ( ) t d A N n t A v t A v n n z A + × ⋅ ∝

slide-12
SLIDE 12

2006-3-31 12

Outline

  • Practical Computation Problems
  • Overview of Existing Methods
  • Acceleration Tech
  • Some Observations
  • Pattern Based Method
  • Conclusion
slide-13
SLIDE 13

2006-3-31 13

Acceleration Tech

  • Model Order Reduction

– S domain Based before 2000 – Electrical Equivalent Circuit Based 2002 – Topological Partition Based 2004

  • Among these methods, topological partition

method is the most powerful one to simulate large P/G grid

slide-14
SLIDE 14

2006-3-31 14

Partition Benefits

  • For direct method

– Decrease decomposition time – Decrease fill in ratio – Decrease Cache Miss

  • For iterative method

– Decrease preconditioner construction time – Increase preconditioner performance – Decrease iteration time – Decrease Cache Miss

2 2

( ) n x n x ⋅ <

slide-15
SLIDE 15

2006-3-31 15

Outline

  • Practical Computation Problems
  • Overview of Existing Methods
  • Acceleration Tech
  • Some Observations
  • Pattern Based Method
  • Conclusion
slide-16
SLIDE 16

2006-3-31 16

Observation #1

  • Too many elements share the same value

– Extract R L C with BEM Solver – All most all elements are extracted from M1 and M2 – More than 80% elements in M1 and M2 have the same value

slide-17
SLIDE 17

2006-3-31 17

Observation #2

  • P/G grid topology is

self-similar

– One layer contains many routed metal in the same direction – Metal rails share the same width and pitch

  • Possible to transform

topology similarity to matrix similarity ?

– Yes

  • How about irregular P/G

grid ?

– Do Local regularization to make elements in local area share the same value

  • Continuous in topology

domain

– Local regularization will not introduce obvious computation errors

slide-18
SLIDE 18

2006-3-31 18

Observation #3

  • When matrix size is medium, PCG method is

faster than any other method

– PCG can converge within 10 times iteration – LU usually has fill in factor larger than 30 – R-W is inefficient for small case – M-G needs auxiliary computation structure and time to construct them

  • Traditional preconditioner can be improved

– Use topology similarity property – Use element value similarity property

slide-19
SLIDE 19

2006-3-31 19

Outline

  • Practical Computation Problems
  • Overview of Existing Methods
  • Acceleration Tech
  • Some Observations
  • Pattern Based Method
  • Conclusion
slide-20
SLIDE 20

2006-3-31 20

Pattern Based Method

  • Regular P/G Grid in 3D
slide-21
SLIDE 21

2006-3-31 21

Pattern Base Method

  • Self Similarity: Global Similarity (Global Pattern)
slide-22
SLIDE 22

2006-3-31 22

Pattern Base Method

  • Self Similarity: Local Similarity (Local Pattern)
slide-23
SLIDE 23

2006-3-31 23

Pattern Base Method

  • Similarity Summary

– Patterns are elements similar to each other – Patterns exist not only in global area but also in local area

  • Strategy

– Partition global area to blocks, perform relaxation iteration between global blocks – Reuse local pattern to perform local simulation

slide-24
SLIDE 24

2006-3-31 24

Pattern Based Method

  • Local Matrix Generation

– Element fill in under NA method – Node order is important

g

g

g g g g ⎡ ⎤ ⎢ ⎥ − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ O L L L L L L L L L O L L L L L L L L L O

slide-25
SLIDE 25

2006-3-31 25

Pattern Based Method

  • Local Matrix Generation

– Ordering node number according to the routing direction of each layer

slide-26
SLIDE 26

2006-3-31 26

Pattern Based Method

  • Diagonal Block

– transform topology similarity to matrix similarity – one tridiagonal matrix need to be stored

slide-27
SLIDE 27

2006-3-31 27

Pattern Based Method

  • Via Block

– All the vias share the same value between two adjacent layers – Fill in of via element is regular : size , pitch – No need to store sub matrixes caused by vias

slide-28
SLIDE 28

2006-3-31 28

Pattern Based Method

  • Drawback of traditional preconditioner

– Performance of ILU or incomplete choleskey decomposition are restricted to memory usage – More fill in cause faster convergence speed

17 48.41 1e-4 44 10.96 1e-3 93 2.79 1e-2 221 1.5 1e-1 Avg Iter times Fill in ratio Drop threshold No odering residual=1e-6

slide-29
SLIDE 29

2006-3-31 29

Pattern Based Method

  • Preconditioner Generation: Two Metal Layer Case
  • Schur-alike Decomposition to get approximate inverse

mod( , )

ij via ij T ij

a b j i c g A C A a B b C c C B

  • ther

c a b α β ⎡ ⎤ ⎡ ⎤ = = ⎧ ⎡ ⎤ ⎪ ⎢ ⎥ ⎢ ⎥ = = = ⎨ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ⎣ ⎦ ⎪ ⎩ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦

1 1 1 1 1 1 1 1

' ' '

T T T T T

A C I A I A C C B C A I B I B B C A C A C I I A C A C B C A I I B

− − − − − − − −

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ = − ⎡ ⎤ ⎡ ⎤ − ⎡ ⎤ ⎡ ⎤ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦

slide-30
SLIDE 30

2006-3-31 30

Pattern Based Method

  • Preconditioner Generation

– Perform Precondition in CG Algorithm

1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 1 1 2 2 1 1 2 2 1 1 2 3

' 1: 2: ' ' 3:

T T T T T

A C I I A C A P r r r C B C A I I B I r r r step r r p A r C A I r C A r r C p r p r A step r r B r r B step r

− − − − − − − − − − −

⎡ ⎤ ⎡ ⎤ − ⎡ ⎤ ⎡ ⎤ ⋅ = = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ = = = = = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − − + − + ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ = = = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦

1 1 2 1 2 2 2 2 2

r A Cr I A C r r I

− −

⎡ ⎤ ⎡ ⎤ − − = = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦

slide-31
SLIDE 31

2006-3-31 31

Pattern Based Method

  • Preconditioner Generation

– Inverse of diagonal block – Other matrix vector dot operation

1 2 2 1 T T

A C A r C r r C B B r C r a A r a r a ⎡ ⎤ + ⎡ ⎤ = ⎢ ⎥ ⎢ ⎥ + ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

1 1 1 1

a A r a r a

− − − −

⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

slide-32
SLIDE 32

2006-3-31 32

Pattern Based Method

  • Benefit of Using Pattern Structure

– Memory cost is reduced from to less than – Reuse of sub matrixes and vectors can reduce Cache Miss dramatically – Better preconditioner can be constructed to accelerate convergence speed

0.5

( ) O n

( ) O n

slide-33
SLIDE 33

2006-3-31 33

Pattern Based Method

  • Algorithm Flow
slide-34
SLIDE 34

2006-3-31 34

Pattern Based Method

  • Experimental Performance: Speed
slide-35
SLIDE 35

2006-3-31 35

Pattern Based Method

  • Experimental Performance: Memory
slide-36
SLIDE 36

2006-3-31 36

Conclusion

  • New pattern based method is presented
  • Combine advantages of direct method and

iterative method

  • Decrease the Cache Miss dramatically
  • Reduce the memory efficiency dramatically
  • New preconditioner is faster than traditional

preconditioner

  • Fast linear PCG can be achieved even the grid

size is huge

slide-37
SLIDE 37

2006-3-31 37

Thank you !