Pattern Based Method For P/G Grid Analysis Jin Shi, Yici Cai, Xianlong Hong Sheldon X.-D. Tan EDA Lab. CS Department, Tsinghua University EE Department, University of California Riverside
Outline • Practical Computation Problems • Overview of Existing Methods • Acceleration Tech • Some Observations • Pattern Based Method • Conclusion 2006-3-31 2
Outline • Practical Computation Problems • Overview of Existing Methods • Acceleration Tech • Some Observations • Pattern Based Method • Conclusion 2006-3-31 3
Practical Problems #1 • Linear computation complexity does not mean linear increasing time cost due to Cache Miss 12000 would take 1095 years 6 T = N 4.7 5 log cycles/flop 4 3 Size 2000 took 5 days 2 1 0 0 1 2 3 4 5 -1 log Problem Size matrix multiply operation 3 5 algorithm looks like O n ( ) O n ( ) 2006-3-31 4
Practical Problems #1 • Summary of Cache Miss – Linear computation complexity is not enough – Optimize algorithms together with cache performance 2006-3-31 5
Practical Problems #2 • Iterative efficiency – Preconditioner’s performance decreases as the matrix size increases Performance of Different Preconditioner 400 350 300 250 iteration times 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 2 log(size) 2006-3-31 6
Practical Problems #3 • Memory efficiency limitation • When the Design is too large … – Hard to load it from DB – Impossible to build matrix – Vector malloc run out of memory – Too many years to get results 2006-3-31 7
Our Contribution • Alleviate Cache Miss • Reduce the overall memory usage • Present more efficient preconditioner under a partition framework • Constant preconditioner performance 2006-3-31 8
Outline • Practical Computation Problems • Overview of Existing Methods • Acceleration Tech • Some Observations • Pattern Based Method • Conclusion 2006-3-31 9
Summary of Existing Method 2 • Computation Complexity < O n ( ) 2 – LU O n ( ) – PCG p slightly larger than 1 p O n ( ) – M-G with small coefficient O n ( ) – R-W with large coefficient O n ( ) – ADI with small coefficient O n ( ) • Memory Efficiency 2 – LU O n ( ) – PCG ( ) O n – M-G O n ( ) – R-W O n ( ) – ADI O n ( ) • Trade off between speed and accuracy 2006-3-31 10
Summary • Direct Methods vs Iterative Methods – Time cost of LU: − − + × + 1 1 t d A ( ( )) N ( ( t L v ) t U v ( ) ) 1 2 − ∝ + ⋅ 1 ( ) (1 ) ( ) : t L v f nnz A f fill in ratio 1 – Time cost of PCG: + × ⋅ t ( d ( A ) ) N ( n t ( A v ) ) ∝ t ( A v ) n n z ( A ) – Which is faster depends on the fill in ratio and performance of preconditioner 2006-3-31 11
Outline • Practical Computation Problems • Overview of Existing Methods • Acceleration Tech • Some Observations • Pattern Based Method • Conclusion 2006-3-31 12
Acceleration Tech • Model Order Reduction – S domain Based before 2000 – Electrical Equivalent Circuit Based 2002 – Topological Partition Based 2004 • Among these methods, topological partition method is the most powerful one to simulate large P/G grid 2006-3-31 13
Partition Benefits • For direct method • For iterative method – Decrease – Decrease preconditioner decomposition time construction time ⋅ < 2 2 – Increase preconditioner n x ( n x ) performance – Decrease fill in ratio – Decrease iteration time – Decrease Cache Miss – Decrease Cache Miss 2006-3-31 14
Outline • Practical Computation Problems • Overview of Existing Methods • Acceleration Tech • Some Observations • Pattern Based Method • Conclusion 2006-3-31 15
Observation #1 • Too many elements share the same value – Extract R L C with BEM Solver – All most all elements are extracted from M1 and M2 – More than 80% elements in M1 and M2 have the same value 2006-3-31 16
Observation #2 • P/G grid topology is • How about irregular P/G self-similar grid ? – One layer contains many – Do Local regularization to routed metal in the same make elements in local direction area share the same value – Metal rails share the same width and pitch • Continuous in topology • Possible to transform domain topology similarity to – Local regularization will not matrix similarity ? introduce obvious computation errors – Yes 2006-3-31 17
Observation #3 • When matrix size is medium, PCG method is faster than any other method – PCG can converge within 10 times iteration – LU usually has fill in factor larger than 30 – R-W is inefficient for small case – M-G needs auxiliary computation structure and time to construct them • Traditional preconditioner can be improved – Use topology similarity property – Use element value similarity property 2006-3-31 18
Outline • Practical Computation Problems • Overview of Existing Methods • Acceleration Tech • Some Observations • Pattern Based Method • Conclusion 2006-3-31 19
Pattern Based Method • Regular P/G Grid in 3D 2006-3-31 20
Pattern Base Method • Self Similarity: Global Similarity (Global Pattern) 2006-3-31 21
Pattern Base Method • Self Similarity: Local Similarity (Local Pattern) 2006-3-31 22
Pattern Base Method • Similarity Summary – Patterns are elements similar to each other – Patterns exist not only in global area but also in local area • Strategy – Partition global area to blocks, perform relaxation iteration between global blocks – Reuse local pattern to perform local simulation 2006-3-31 23
Pattern Based Method • Local Matrix Generation – Element fill in under NA method – Node order is important g ⎡ ⎤ O L L L L ⎢ ⎥ − g L L L g g ⎢ ⎥ ⎢ ⎥ L L O L L ⎢ ⎥ − L L L ⎢ g g ⎥ ⎢ ⎥ ⎣ L L L L O ⎦ 2006-3-31 24
Pattern Based Method • Local Matrix Generation – Ordering node number according to the routing direction of each layer 2006-3-31 25
Pattern Based Method • Diagonal Block – transform topology similarity to matrix similarity – one tridiagonal matrix need to be stored 2006-3-31 26
Pattern Based Method • Via Block – All the vias share the same value between two adjacent layers – Fill in of via element is regular : size , pitch – No need to store sub matrixes caused by vias 2006-3-31 27
Pattern Based Method • Drawback of traditional preconditioner – Performance of ILU or incomplete choleskey decomposition are restricted to memory usage – More fill in cause faster convergence speed No odering residual=1e-6 Drop Fill in ratio Avg Iter times threshold 1e-1 1.5 221 1e-2 2.79 93 1e-3 10.96 44 1e-4 48.41 17 2006-3-31 28
Pattern Based Method • Preconditioner Generation: Two Metal Layer Case ⎡ ⎤ ⎡ ⎤ a b = α β = ⎧ ⎡ ⎤ ⎪ j mod( , i ) c g ⎢ ⎥ ⎢ ⎥ A C = = = ij via ⎨ ⎢ ⎥ A a B b C c ⎢ ⎥ ⎢ ⎥ = T ij ⎣ ⎦ ⎪ C B ⎩ other c 0 ⎢ ⎥ ⎢ ⎥ ij ⎣ ⎦ ⎣ ⎦ a b • Schur-alike Decomposition to get approximate inverse ⎡ ⎤ − ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 A C I A I A C = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − T T 1 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ C B C A I B ' ⎦ I − = − T 1 B ' B C A C − 1 ⎡ ⎤ ⎡ ⎤ − − − ⎡ ⎤ ⎡ ⎤ 1 1 A C I I A C A = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − − − T T 1 ⎣ ⎦ 1 ⎣ ⎦ C B ⎣ ⎦ ⎣ ⎦ C A I I B ' 2006-3-31 29
Pattern Based Method • Preconditioner Generation – Perform Precondition in CG Algorithm − 1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ − − − ⎡ ⎤ 1 1 A C I I A C A ⋅ = = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ P r r r − − − T T 1 ⎣ ⎦ 1 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ C B C A I I B ' ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 1 I r r r − = = = = = 1 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ step 1: r ⎢ ⎥ r p A r − − − − + − + 1 T 1 2 ⎣ ⎦ T 1 1 2 T 2 ⎣ ⎦ C A I ⎣ ⎦ ⎣ ⎦ r C A r r C p r 1 ⎡ ⎤ ⎡ ⎤ − ⎡ ⎤ 1 1 p r A = = = 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ step 2: r r − − 2 1 1 2 2 1 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ B ' r r B ' 1 2 ⎡ − ⎤ ⎡ − ⎤ − − 1 1 1 2 r A Cr I A C r = = ⎢ 2 2 ⎢ ⎥ ⎥ step 3: r 3 2 2 ⎣ ⎦ ⎣ ⎦ I r 2 2006-3-31 30
Pattern Based Method • Preconditioner Generation – Inverse of diagonal block ⎡ ⎤ − 1 a ⎢ ⎥ − − = ⎢ 1 1 A r a r ⎥ ⎢ ⎥ − 1 a ⎣ ⎦ – Other matrix vector dot operation ⎡ ⎤ + ⎡ ⎤ 1 2 A C A r C r = ⎢ ⎥ ⎢ ⎥ r + T ⎣ ⎦ 2 T 1 C B ⎣ ⎦ B r C r ⎡ ⎤ a ⎢ ⎥ = ⎢ A r a r ⎥ ⎢ ⎥ ⎣ ⎦ a 2006-3-31 31
Recommend
More recommend