GPSCP: A General-Purpose Support-Circuit Preconditioning Approach to - - PowerPoint PPT Presentation

gpscp a general purpose support circuit preconditioning
SMART_READER_LITE
LIVE PREVIEW

GPSCP: A General-Purpose Support-Circuit Preconditioning Approach to - - PowerPoint PPT Presentation

Design Automation Group GPSCP: A General-Purpose Support-Circuit Preconditioning Approach to Large Scale SPICE Accurate Preconditioning Approach to Large-Scale SPICE-Accurate Nonlinear Circuit Simulations Authors: Xueqian Zhao Xueqian Zhao


slide-1
SLIDE 1

Design Automation Group

GPSCP: A General-Purpose Support-Circuit Preconditioning Approach to Large Scale SPICE Accurate Preconditioning Approach to Large-Scale SPICE-Accurate Nonlinear Circuit Simulations Xueqian Zhao Authors: Xueqian Zhao Zhuo Feng

Department of Electrical & Computer Engineering Michigan Technological University

1

Michigan Technological University

slide-2
SLIDE 2

Large-Scale SPICE-Accurate Nonlinear Circuit Simulation

  • Motivations

– Modern ICs that integrate billions of transistors and interconnect components need to be accurately modeled and analyzed need to be accurately modeled and analyzed – Fast SPICE simulators may introduce errors due to various approximations

  • Challenges in large-scale SPICE-accurate circuit simulations

Direct methods may not be runtime and memory efficient – Direct methods may not be runtime and memory efficient – Iterative solvers (GMRES) require reliable and efficient preconditioners – The same accuracy as SPICE simulator

Vin Mp C

Vout

Iout

Cur. Amp

C

If

VG

LDO LDO

Analog Circuit Blocks

+

  • Vref

Rf1 Rf2 Cout Error Amp

Amp.

Cf

IC

LDO LDO

2

Digital Circuit Blocks Original Circuit with Analog and Digital Blocks

slide-3
SLIDE 3

Circuit Simulation Background

  • Problem formulation

– Nonlinear differential equations

( ) ( ( )) ( ( )) ( ) d F x f x t q x t u t dt    

– f(.) and q(.) denote the static and dynamic nonlinearities, respectively

  • Standard SPICE simulators rely on Newton-Raphson (NR) method

– Linearize the nonlinear devices (transistors, etc)

dt

( , ) Obt i th fi l l ti th h NR it ti

( ) , ( )

k k

k k x x

f q G x C x x x      

– Obtain the final solution through NR iterations

xk1  xk  F  x      

1

F(xk )

3

Jacobian matrix

slide-4
SLIDE 4

Prior Works and Our Previous Approaches

  • Existing direct and iterative solvers

– Direct solver: LU decomposition (KLU [1]) E i f l l d bl d t – Expensive for large-scale and non-sparse problems due to the exponentially increased memory and runtime cost – Krylov-subspace iterative methods: GMRES [2] – Achieve better memory efficiency – Convergence rate depends on the effectiveness and efficiency of preconditioners y p

  • Our previous approaches: support-graph (circuit) preconditioned

iterative methods – Support-graph preconditioner for large-scale power grid network Support graph preconditioner for large scale power grid network simulations – Support-circuit preconditioner for large-scale interconnect-dominant nonlinear circuit

4

[1] T. Davis and E. Palamadai Natarajan. Algorithm 907: KLU, a direct sparse solver for circuit simulation problems. ACM Trans. Math. Softw., 2010. [2] Y. Saad and M. Schultz. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 1986.

nonlinear circuit

slide-5
SLIDE 5

Support-Graph Preconditioner [1]

  • Support-graph preconditioner (SG) for linear networks

– Find maximum weighted (or low stretch) spanning tree in the original graph Matrix factors for the spanning tree can be computed in linear time and space – Matrix factors for the spanning tree can be computed in linear time and space – Highly efficient and effective preconditioner for large circuit simulations

1 2 3 4 2

1

2 1 d    

G

1 4 2 2 3

1'

2 d    

P

4 5 1 6 4 6 5 8 1 3 3

2 3 4 5 6

2 4 3 4 8 1 6 4 3 6 5 1 8 5 3 4 9 d d d d d d                    

1 4 6 5 8 1 3 3 6 5 4

2 3 4 5 6

2 ' 4 4 ' 8 ' 6 4 6 ' 5 8 5 ' 4 ' 9 d d d d d d                    

Matrix 1st 2nd 3rd 4th 5th 6th cond

9 8 7 4 9

7 8 9

4 9 1 9 4 3 4 d d d        

  • The condition number of P-1G can be greatly reduced

4 9 7 8 9

7 8 9

4 9 9 ' 4 4 ' d d d        

Matrix 1st 2nd 3rd 4th 5th 6th cond G 26.170 23.182 17.572 11.514 9.373 6.673 135.948 P 25.239 23.540 17.579 10.909 9.865 6.822 16.752

5

P-1G 1.431 1.204 1.062 1.000 1.000 1.000 17.442

[1] X. Zhao, J. Wang, Z. Feng and S. Hu. Power grid analysis with hierarchical support graphs. In Proc. ICCAD, 2011

slide-6
SLIDE 6

Support-Circuit Preconditioner [1]

  • Support-circuit preconditioners (SCP) for interconnect-dominant

circuits

– Sparsify the linear networks of the original circuit network p y g – Take advantage of existing sparse matrix solution techniques (e.g. KLU) – Limitations: only efficient for interconnect-dominant circuits with near-linear complexity complexity

Support Graph of the Original Network

LDO LDO LDO LDO LDO LDO LDO LDO LDO LDO

6

Digital Circuit Blocks Support-Circuit Preconditioner

[1] X. Zhao and Z. Feng, Towards Efficient SPICE-Accurate Nonlinear Circuit Simulation with On-the-Fly Support-Circuit Preconditioners. In Proc. DAC, 2012

slide-7
SLIDE 7

Our Proposed GPSCP Method

  • Our proposed method: general-purpose support-circuit

preconditioned (GPSCP) iterative solver: – Effective for solving general large-scale nonlinear circuits – Scalable linearized circuit sparsification – Based on support graph and graph sparsification research [1-2] – Based on support graph and graph sparsification research [1-2] – Energy-based preconditioner improving – Dynamic preconditioner updating

7

[1] D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. ACM STOC, 2004. [2] M. Bern, J. R. Gilbert, B. Hendrickson, N. Nguyen, and S. Toledo. Support-graph preconditioners. SIAM J. Matrix Anal. Appl., 2006.

slide-8
SLIDE 8

General-Purpose Support-Circuit Preconditioner

  • General-purpose support-circuit preconditioners

– Allow for general large-scale nonlinear circuit simulations – Parasitics-dominant analog circuits such as amplifiers, PLLs, … – Good scalability of complete circuit sparsification – Solve large transistor-dominant circuits with near-linear complexity So e a ge t a s sto do a t c cu ts t ea ea co p e ty – Tree-like support-circuit preconditioner – Near-linear computational and memory cost

g

C

g V

g d

C

gd

C

3

g

2

g

d g

3

R R

2

R

d

g

C

g V

g d

gd

C

3

g

2

g ds

g

ds

C

m gs

g V

s

gs

C

1

g

4

g

5

g

s

4

R

5

R

1

R

ds

g

ds

C

m gs

g V

s

1

g

5

g

8

Linearized Circuit Nonlinear Circuit Support Circuit

slide-9
SLIDE 9

Support Circuit Construction (1)

  • Support graph can be obtained through the following steps:

– 1. Decompose the original graph into a Laplacian graph and a directed graph 2 Extract the support graph based for the Laplacian graph g d

gd

C

3

g

2

g

d g

3

R

2

R – 2. Extract the support graph based for the Laplacian graph

ds

g

ds

C

m gs

g V

s

gs

C

1

g

4

g

5

g

g s

3 4

R

5

R

1

R

C

g d

gd

C h

2

g g

C

g d

gd

C h

2

g g

C

g d

gd

C h

2

g g

Linearized Circuit Nonlinear Circuit

ds

g

ds

C h

g s

gs

C h

g

3

g

4

g

ds

g

ds

C h

m gs

g V

g s

gs

C h

g

3

g

4

g

ds

g

ds

C h

g s

g

3

g

9 Original Weighted Graph

1

g

5

g

Weighted Graph Laplacian Support Graph Laplacian

1

g

5

g

1

g

5

g

slide-10
SLIDE 10

Support Circuit Construction (2)

  • Support-circuit preconditioner is subsequently built by

– 1. Combining support graph and active components 2 Factorizing the support circuit matrix using sparse matrix solvers C

g d

gd

C h

2

g

d

g C

g V

g d

gd

C

3

g

2

g

– 2. Factorizing the support circuit matrix using sparse matrix solvers

ds

g

ds

C h

s

1

g

3

g g

ds

g

ds

C

m gs

g V

s

1

g

5

g

d

C

g d

gd

C h

2

g

3

g

g

5

g

Support Graph

ds

g

ds

C h

m gs

g V

s

1

g

5

g

Active

m

g

V

1

g

Support Circuit

Sub Support Circuit Sub Support Circuit

10 Active Components

Support-Circuit Preconditioner

slide-11
SLIDE 11

Towards A Better Support Graph

  • Convergence of support-graph preconditioners

– The convergence is determined by the condition number of matrix pencil (G,P) p ( , ) Th t f il (G P) (P 1G) i d fi d

max min

( , ) ( , ) ( , ) G P k G P G P   

– The support of pencil (G,P) (P-1G) is defined as: – Eigenvalues of pencil (G,P) (P-1G) are bounded by 

( , ) min{ | ( ) 0, all }

T n

G P x P G x x          

  • Spanning-tree support graph as a preconditioner

– May not efficient for ill-conditioned system Reduced overall conductivities of the resistive network

T T

– Reduced overall conductivities of the resistive network – Miss-matched power dissipation between the original graph and the spanning-tree graph

11 Power dissipated by G: Power dissipated by P:

T

x Gx

T

x Px

slide-12
SLIDE 12

Towards A Better Support Graph (cont.)

  • Graph approximation quality

– A weighted graph P σ-approximates a weighted graph A if – means

( ) ( ) ( ), ( ) is the Laplacian matrix of P A P A A       

( ) ( ) P A   

2

( )

T T i i

x Px x Ax x    

( ) ( )

edge

( )

i i i

  • Better support graph approximations

– Resistive network A, : power dissipation

T

x Ax

– The spanning tree P of A retains: n-1 edges, therefore

T T

x Px x Ax 

– If and , the preconditioner can be more effective.

( ) ( ) eigen P eigen A   P 

( ) ( ) power P power A  

12

the preconditioner can be more effective.

P

slide-13
SLIDE 13

Ultra-Sparsifier Support Graph (1)

  • Graph sparsification (non-tree)

– Ultra-sparsifier [1] contains at most n-1+k edges (spanning tree + extra edges) Spanning tree Ultra-sparsifier Spanning tree Ultra sparsifier – It is k-ultra-sparse that -approximates the original graph with high b bilit [1]

(1)

/

  • n

k n

Edges of spanning tree graph Extra edges

probability [1] – Spanning tree is 0-ultra-sparse – Ultra-sparsifier better approximates the original graph

/ k n

13

[1] D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. ACM STOC, 2004.

slide-14
SLIDE 14

Ultra-Sparsifier Support Graph (2)

  • Maximum weighted degree metric

– Provides trade-offs between the preconditioner quality and the runtime efficiency of matrix factorizations. y – The weighted degree of vertex v in a graph A is defined:

4

w

w

1

( ) max

i i i

w wd v w

 

1

v w1 w2 w3

In a mesh grid 1 (1 critical edge) ≤ d( ) ≤ 4 (4 e enl critical edges)

1 ( ) ( )

v V

awd A wd v n

 

w4

– In a mesh grid, 1 (1 critical edge) ≤ wd(v) ≤ 4 (4 evenly critical edges) – Consider nodes with larger weighted degree values – Each edge is as critical as others

14

slide-15
SLIDE 15

Ultra-Sparsifier Support Graph (3)

  • Iterative critical node selection

– Define γ as the percentage of weighted degree range Define θ as the percentage of graph approximation (power dissipation) – Define θ as the percentage of graph approximation (power dissipation)

Step 1

  • Initialize γ and θ (e.g. set to be relatively large

values such as 0.7)

Ultra-sparsifier Spanning tree

Step 2

  • Select all the nodes of Agraph that satisfy
  • wd(v) > γ x awd(Agraph)

U i i l ti t t P d

U t a spa s e p g

Step 3

  • Using prior solution to compute Pwrselect and

compare it with PwrAgraph

  • If Pwrselect > θ x PwrAgraph, the selected node:

Step 4

g p

critical nodes;

  • Otherwise, reduce γ and repeat steps 2-4
  • Selected critical nodes: pick its top few most critical

Extra edges Critical nodes

15 Step 5 p p edges

slide-16
SLIDE 16

Energy-based Spanning-graph Scaling

  • It has been shown that the graph approximation also means the

difference of power consumption between different graphs.

( )  

T T T

P A A

2

( ) ,    

T T h ij i j

x A x x P x g x x ( )   

T T T graph graph graph

x P x x A x x A x 1  

  • The process of support circuit improvement based on energy-based

spanning-graph scaling.

( ) , 

graph st ij i j

x A x x P x g x x 1  

spanning graph scaling.

Spanning Tree Scaled Spanning Tree Ultra-Sparsifier 16 Original Edge Scaled Edge Extra Critical Edge w/o Scaling

slide-17
SLIDE 17

Dynamic Preconditioner Updating Scheme

  • During the Newton-Raphson steps or transient steps, the linearized

models of transistors may change drastically

  • Re-computing the support circuit for each Newton-Raphson step may

introduce substantial overhead

If the support graph changes significantly:

| | 

cur pre

i i d i

g g

If the support graph changes significantly:

| |

 

pre

node i i node i

tol g

 node i

  • tol: user defined value
  • gi

ll i t i id t t d i i t t

Do: regenerate the support graph, as well as support circuit

17

  • gicur: all passive components incident to node i in current step
  • gipre: all passive components incident to node i in previous step
slide-18
SLIDE 18

Complete Algorithm Flow

Netlist Input Extract passive networks Linearized circuit Evaluate devices Find maximum spanning tree For each network Create matrix G and P Compute wd(v) and awd PGMRES iterative solver Then add un-picked critical edges to v Iteratively find critical nodes No Yes Converge? Combine ultra- sparsifier with 18 Return solution active components

slide-19
SLIDE 19

Experimental Setup

CKT

#nunk #Mos #nnz #Pnnz Memory (MB)

Direct GPSCP

test1 202,738 67,451 1,156,428 823,156 204.91 37.21(5.5X) test2 202,738 114,664 2,081,570 1,689,365 450.80 73.90(6.1X) 3 608 12 192 603 3 12 301 2 626 443 916 8 1 6 1 ( 2X) test3 608,127 192,603 3,127,301 2,626,443 916.78 176.15(5.2X) test4 608,127 327,426 5,629,682 4,857,630 1,651.22 250.11(6.6X) test5 1 187 452 644 852 10 837 454 9 460 210 3 136 83 468 19(6 7X) test5 1,187,452 644,852 10,837,454 9,460,210 3,136.83 468.19(6.7X) test6 63,981

  • 575,981

494,757 204.53 30.92(6.6X)

  • Tests 1-5: large PDNs with on-chip voltage regulators
  • #nunk: number of unknowns in the circuits
  • #Mos: number of MOSFET in the circuits
  • #nnz: number of non-zero elements in the MNA matrix
  • Test 6: Industrial analog design (only MNA matrix available)

19

  • #nnz: number of non-zero elements in the MNA matrix
  • #Pnnz: number of non-zero elements in the preconditioner matrix
  • Memory: memory cost during LU factorization of MNA matrix
slide-20
SLIDE 20

Experimental Results (1)

  • Support-circuit MNA matrix can well preserve the dominant eigenvalues

1 7008 x 10

5

Top 20 Largest Eigenvalues of Systems Matrices 1.7007 1.7008

Original Spanning-Tree Ultra-Sparsifier

1.7006 1.7007 nitude

Ultra-Sparsifier

1.7006 Magn 2 4 6 8 10 12 14 16 18 20 1.7005 1.7005

20

2 4 6 8 10 12 14 16 18 20

PDN with on-chip VRs (DC analysis)

slide-21
SLIDE 21

Experimental Results (2)

  • Support-circuit MNA matrix can well preserve the dominant eigenvalues
  • In TR analysis

Top 18 Largest Eigenvalues of Systems

6

Top 20 Largest Eigenvalues of Systems 2200 Top 18 Largest Eigenvalues of Systems Original Ultra-sparsifier 4 5 x 10

6

Top 20 Largest Eigenvalues of Systems

x 10

5

Original Ultra-sparsifier 1800 2000 agnitude 2 3 agnitude

1 7006 1.7006 1.7007 1.7007 x 10

1600 Ma 1 Ma

5 10 15 20 1.7005 1.7006

5 10 15 1400

Industrial analog circuit d i (TR l i )

5 10 15 20

PDN with on-chip VRs (TR l i )

21

design (TR analysis) (TR analysis)

slide-22
SLIDE 22

Experimental Results (3)

  • Runtime & Memory Efficiency between SCP [1] and GPSCP

Runtime speedups over the direct solver Memory improvements

  • f

matrix Runtime speedups over the direct solver using SCP [1] and GPSCP algorithms are reported. Memory improvements

  • f

matrix factorization

  • ver

the direct solver using SCP [1] and GPSCP algorithms are reported. Nonlinearity: #

# NonDev T tD

22

[1] X. Zhao and Z. Feng. Towards efficient SPICE-accurate nonlinear circuit simulation with on-the-fly support-circuit preconditioners. In Proc. ACM DAC, 2012.

#TotDev

slide-23
SLIDE 23

Experimental Results (4)

CKT Direct GPSCP Fact Solve Setup Fact GMRES #iter. Speedup Error(%)

  • Runtime comparison for a single Newton-Raphson step (DC)

test1 3.31 0.05 0.42 0.22 0.22 10 3.9X 0.05 test2 4.94 0.06 0.52 0.28 0.29 12 4.6X 0.05 test3 23.03 0.18 1.42 0.64 2.02 13 5.7X 0.05 test4 36 85 0 20 1 48 0 91 2 31 14 7 8X 0 05 test4 36.85 0.20 1.48 0.91 2.31 14 7.8X 0.05 test5 63.35 0.61 2.15 1.74 2.37 17 10.8X 0.05 test6 18.12 0.02 1.02 0.30 0.74 19 8.8X 0.1

  • Runtime comparison for transient analysis

35000 1400 20000 25000 30000 35000 Direct Non-dnm Dynamic e (s) 6 1X/8 6X 9.9X/14.0X 800 1000 1200 1400

Sizes (K-nodes)

10000 15000 20000 Runtime 4 0X/5 1X 4.7X/5.8X 6.1X/8.6X 200 400 600 800

23

5000 test1 test2 test3 test4 test5 3.6X/4.4X 4.0X/5.1X 200 test1 test2 test3 test4 test5

slide-24
SLIDE 24

Conclusion

  • Proposed a general-purpose support-circuit preconditioner

(GPSCP) for scalable large-scale nonlinear circuit simulation ( ) g

  • Key Ideas:

– 1 Extract ultra-sparsifier support graphs from the passive

  • 1. Extract ultra sparsifier support graphs from the passive

networks of linearized circuit – 2. Combine them with the active components (e.g. controlled sources) sources) – 3. Use energy-based preconditioner improving and dynamic preconditioner updating schemes

  • Our experimental results show that GPSCP can:

– Obtain up to 14X speedups in DC and transient simulations

24

– Reduce up to 80% memory consumption

slide-25
SLIDE 25

THANK YOU!

25