Pattern Based Method For P/G Grid Analysis Jin Shi, Yici Cai, - - PowerPoint PPT Presentation
Pattern Based Method For P/G Grid Analysis Jin Shi, Yici Cai, - - PowerPoint PPT Presentation
Pattern Based Method For P/G Grid Analysis Jin Shi, Yici Cai, Xianlong Hong Sheldon X.-D. Tan EDA Lab. CS Department, Tsinghua University EE Department, University of California Riverside Outline Practical Computation Problems
2006-3-31 2
Outline
- Practical Computation Problems
- Overview of Existing Methods
- Acceleration Tech
- Some Observations
- Pattern Based Method
- Conclusion
2006-3-31 3
Outline
- Practical Computation Problems
- Overview of Existing Methods
- Acceleration Tech
- Some Observations
- Pattern Based Method
- Conclusion
2006-3-31 4
Practical Problems #1
- Linear computation complexity does not mean linear
increasing time cost due to Cache Miss
3
( ) O n
- 1
1 2 3 4 5 6 1 2 3 4 5 log Problem Size log cycles/flop
T = N4.7
Size 2000 took 5 days 12000 would take 1095 years
matrix multiply operation
algorithm looks like
5
( ) O n
2006-3-31 5
Practical Problems #1
- Summary of Cache Miss
– Linear computation complexity is not enough – Optimize algorithms together with cache performance
2006-3-31 6
Practical Problems #2
- Iterative efficiency
– Preconditioner’s performance decreases as the matrix size increases
1 2 3 4 5 6 7 8 9 10 11 50 100 150 200 250 300 350 400 2 log(size) iteration times Performance of Different Preconditioner
2006-3-31 7
Practical Problems #3
- Memory efficiency limitation
- When the Design is too large …
– Hard to load it from DB – Impossible to build matrix – Vector malloc run out of memory – Too many years to get results
2006-3-31 8
Our Contribution
- Alleviate Cache Miss
- Reduce the overall memory usage
- Present more efficient preconditioner under a
partition framework
- Constant preconditioner performance
2006-3-31 9
Outline
- Practical Computation Problems
- Overview of Existing Methods
- Acceleration Tech
- Some Observations
- Pattern Based Method
- Conclusion
2006-3-31 10
Summary of Existing Method
- Computation Complexity <
– LU – PCG p slightly larger than 1 – M-G with small coefficient – R-W with large coefficient – ADI with small coefficient
- Memory Efficiency
– LU – PCG – M-G – R-W – ADI
- Trade off between speed and accuracy
2
( ) O n
( )
p
O n
2
( ) O n ( ) O n ( ) O n
( ) O n
( ) O n
2
( ) O n ( ) O n ( ) O n
( ) O n
2006-3-31 11
Summary
- Direct Methods vs Iterative Methods
– Time cost of LU: – Time cost of PCG: – Which is faster depends on the fill in ratio and performance of preconditioner
1 1 1 2 1 1
( ( )) ( ( ) ( ) ) ( ) (1 ) ( ) : t d A N t L v t U v t L v f nnz A f fill in ratio
− − −
+ × + ∝ + ⋅
( ( ) ) ( ( ) ) ( ) ( ) t d A N n t A v t A v n n z A + × ⋅ ∝
2006-3-31 12
Outline
- Practical Computation Problems
- Overview of Existing Methods
- Acceleration Tech
- Some Observations
- Pattern Based Method
- Conclusion
2006-3-31 13
Acceleration Tech
- Model Order Reduction
– S domain Based before 2000 – Electrical Equivalent Circuit Based 2002 – Topological Partition Based 2004
- Among these methods, topological partition
method is the most powerful one to simulate large P/G grid
2006-3-31 14
Partition Benefits
- For direct method
– Decrease decomposition time – Decrease fill in ratio – Decrease Cache Miss
- For iterative method
– Decrease preconditioner construction time – Increase preconditioner performance – Decrease iteration time – Decrease Cache Miss
2 2
( ) n x n x ⋅ <
2006-3-31 15
Outline
- Practical Computation Problems
- Overview of Existing Methods
- Acceleration Tech
- Some Observations
- Pattern Based Method
- Conclusion
2006-3-31 16
Observation #1
- Too many elements share the same value
– Extract R L C with BEM Solver – All most all elements are extracted from M1 and M2 – More than 80% elements in M1 and M2 have the same value
2006-3-31 17
Observation #2
- P/G grid topology is
self-similar
– One layer contains many routed metal in the same direction – Metal rails share the same width and pitch
- Possible to transform
topology similarity to matrix similarity ?
– Yes
- How about irregular P/G
grid ?
– Do Local regularization to make elements in local area share the same value
- Continuous in topology
domain
– Local regularization will not introduce obvious computation errors
2006-3-31 18
Observation #3
- When matrix size is medium, PCG method is
faster than any other method
– PCG can converge within 10 times iteration – LU usually has fill in factor larger than 30 – R-W is inefficient for small case – M-G needs auxiliary computation structure and time to construct them
- Traditional preconditioner can be improved
– Use topology similarity property – Use element value similarity property
2006-3-31 19
Outline
- Practical Computation Problems
- Overview of Existing Methods
- Acceleration Tech
- Some Observations
- Pattern Based Method
- Conclusion
2006-3-31 20
Pattern Based Method
- Regular P/G Grid in 3D
2006-3-31 21
Pattern Base Method
- Self Similarity: Global Similarity (Global Pattern)
2006-3-31 22
Pattern Base Method
- Self Similarity: Local Similarity (Local Pattern)
2006-3-31 23
Pattern Base Method
- Similarity Summary
– Patterns are elements similar to each other – Patterns exist not only in global area but also in local area
- Strategy
– Partition global area to blocks, perform relaxation iteration between global blocks – Reuse local pattern to perform local simulation
2006-3-31 24
Pattern Based Method
- Local Matrix Generation
– Element fill in under NA method – Node order is important
g
g
g g g g ⎡ ⎤ ⎢ ⎥ − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ O L L L L L L L L L O L L L L L L L L L O
2006-3-31 25
Pattern Based Method
- Local Matrix Generation
– Ordering node number according to the routing direction of each layer
2006-3-31 26
Pattern Based Method
- Diagonal Block
– transform topology similarity to matrix similarity – one tridiagonal matrix need to be stored
2006-3-31 27
Pattern Based Method
- Via Block
– All the vias share the same value between two adjacent layers – Fill in of via element is regular : size , pitch – No need to store sub matrixes caused by vias
2006-3-31 28
Pattern Based Method
- Drawback of traditional preconditioner
– Performance of ILU or incomplete choleskey decomposition are restricted to memory usage – More fill in cause faster convergence speed
17 48.41 1e-4 44 10.96 1e-3 93 2.79 1e-2 221 1.5 1e-1 Avg Iter times Fill in ratio Drop threshold No odering residual=1e-6
2006-3-31 29
Pattern Based Method
- Preconditioner Generation: Two Metal Layer Case
- Schur-alike Decomposition to get approximate inverse
mod( , )
ij via ij T ij
a b j i c g A C A a B b C c C B
- ther
c a b α β ⎡ ⎤ ⎡ ⎤ = = ⎧ ⎡ ⎤ ⎪ ⎢ ⎥ ⎢ ⎥ = = = ⎨ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ⎣ ⎦ ⎪ ⎩ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦
1 1 1 1 1 1 1 1
' ' '
T T T T T
A C I A I A C C B C A I B I B B C A C A C I I A C A C B C A I I B
− − − − − − − −
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ = − ⎡ ⎤ ⎡ ⎤ − ⎡ ⎤ ⎡ ⎤ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
2006-3-31 30
Pattern Based Method
- Preconditioner Generation
– Perform Precondition in CG Algorithm
1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 1 1 2 2 1 1 2 2 1 1 2 3
' 1: 2: ' ' 3:
T T T T T
A C I I A C A P r r r C B C A I I B I r r r step r r p A r C A I r C A r r C p r p r A step r r B r r B step r
− − − − − − − − − − −
⎡ ⎤ ⎡ ⎤ − ⎡ ⎤ ⎡ ⎤ ⋅ = = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ = = = = = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − − + − + ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ = = = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
1 1 2 1 2 2 2 2 2
r A Cr I A C r r I
− −
⎡ ⎤ ⎡ ⎤ − − = = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦
2006-3-31 31
Pattern Based Method
- Preconditioner Generation
– Inverse of diagonal block – Other matrix vector dot operation
1 2 2 1 T T
A C A r C r r C B B r C r a A r a r a ⎡ ⎤ + ⎡ ⎤ = ⎢ ⎥ ⎢ ⎥ + ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦
1 1 1 1
a A r a r a
− − − −
⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦
2006-3-31 32
Pattern Based Method
- Benefit of Using Pattern Structure
– Memory cost is reduced from to less than – Reuse of sub matrixes and vectors can reduce Cache Miss dramatically – Better preconditioner can be constructed to accelerate convergence speed
0.5
( ) O n
( ) O n
2006-3-31 33
Pattern Based Method
- Algorithm Flow
2006-3-31 34
Pattern Based Method
- Experimental Performance: Speed
2006-3-31 35
Pattern Based Method
- Experimental Performance: Memory
2006-3-31 36
Conclusion
- New pattern based method is presented
- Combine advantages of direct method and
iterative method
- Decrease the Cache Miss dramatically
- Reduce the memory efficiency dramatically
- New preconditioner is faster than traditional
preconditioner
- Fast linear PCG can be achieved even the grid
size is huge
2006-3-31 37