1
Graph Sparsification Approaches to Scalable Integrated Circuit - - PowerPoint PPT Presentation
Graph Sparsification Approaches to Scalable Integrated Circuit - - PowerPoint PPT Presentation
Design Automation Group Graph Sparsification Approaches to Scalable Integrated Circuit Modeling and Simulations Zhuo Feng Acknowledgements: My PhD students Xueqian Zhao (MTU) and Lengfei Han (MTU) ICSICT, Oct, 2014 1 Scalable SPICE-Accurate
2
Scalable SPICE-Accurate IC Simulations
+
- Vin
Mp Vref Rf1 Rf2 Cout
Vout
Iout Error Amp
Cur. Amp.
Cf
If IC
VG
VR VR VR VR
Analog Circuit Blocks Digital Circuit Blocks Original Circuit with Analog and Digital Blocks
- Motivation
– Integrated circuit (IC) system that involves billions of transistors and interconnect components needs to be accurately modeled and analyzed
- Challenges in large-scale SPICE-accurate IC simulations
– Computational cost grows rapidly with traditional direct solution methods – Iterative solution methods need to be robust and efficient for general tasks
Power Delivery Network (PDN) w/ Embedded Voltage Regulators (VRs)
3
Background of SPICE Simulation Algorithms
- Standard SPICE simulators rely on Newton-Raphson (NR) method
– Step1: Linearize the nonlinear devices (transistors, diodes, etc) – Step 2: Update the solution through NR iteration
( ) , ( )
k k
k k x x
f q G x C x x x δ δ δ δ = =
( ) ( ( )) ( ( )) ( ) d F x f x t q x t u t dt = + + =
- Problem formulation
– Nonlinear differential equations – f(.) and q(.) denote the static and dynamic nonlinearities, respectively
Jacobian of F(x)
4
Prior Works
- Direct and iterative solvers have been used in SPICE simulations
– Direct solver: LU decomposition (KLU [1]) – Expensive for large-scale post-layout IC problems due to the exponentially increased memory and runtime cost – Krylov-subspace iterative methods: GMRES [2] – Pros: black box solver, good memory efficiency, high parallelism – Cons: problem dependent convergence properties, worse runtime
– ILU and domain-decomposition based preconditioners, etc
References: [1] T. Davis, et al. Algorithm 907: KLU, a direct sparse solver for circuit simulation problems. ACM Trans. Math. Softw., 2010. [2] Y. Saad, et al. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 1986. [3] D. A. Spielman, et al. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. ACM STOC, 2004. [4] M. Bern, et al. Support-graph preconditioners. SIAM J. Matrix Anal. Appl., 2006.
- Our contribution: a circuit-oriented preconditioning approach
– Novel circuit-oriented preconditioners (compared to matrix-oriented ones ) – Rigorous mathematic foundation: graph sparsification research [3-4] – Consistent performance when solving transistor-level nonlinear circuits
5
Graph Sparsification Techniques
- Graph sparsification basics
– Find a subgraph P approximating the original graph G in some measure (pairwise distance, cut values, graph Laplacian, etc) – Maintain the same set of vertices such that P can be used as a proxy for G in numerical computations w/o introducing much error – A good graph sparsifier should keep very few edges to limit the computation and storage cost
Figure source:
- L. Koutis, G. L. Miller and R. Peng. A fast solver for a class of linear systems. Commun. ACM, 2012
G P
6
- Support-graph preconditioner (SGP)
– Example: find a spanning tree from the original graph – Compute matrix factors w/o introducing any fill-ins for the spanning tree
- The condition number of P-1G can be greatly reduced
1 2 3 4 5 1 9 8 7 6 4 2 4 6 5 4 9 8 1 3 3
1 2 3 4 5 6 7 8 9
2 1 2 4 3 4 8 1 6 4 3 6 5 1 8 5 3 4 9 1 9 4 3 4 d d d d d d d d d
Support-Graph Preconditioner
1 1 4 2 4 6 5 4 9 8 1 3 3 2 3 6 5 4 7 8 9
1 2 3 4 5 6 7 8 9
' 2 2 ' 4 4 ' 8 ' 6 4 6 ' 5 8 5 ' 4 ' 9 9 ' 4 4 ' d d d d d d d d d
Matrix 1st 2nd 3rd 4th 5th 6th cond G 26.170 23.182 17.572 11.514 9.373 6.673 135.948 P 25.239 23.540 17.579 10.909 9.865 6.822 16.752 P-1G 1.431 1.204 1.062 1.000 1.000 1.000 17.442
G P
7
- A naïve support-circuit preconditioner (SCP)
– Sparsifies the linear networks of the original circuit network – Takes advantage of existing sparse matrix techniques (Cholesky, LU, etc) – Nearly-linear complexity for analyzing nanoscale (parasitics-dominant) ICs – E.g. clock networks, power delivery networks, etc.
Support-Circuit Preconditioner
VR VR VR VR
Digital Circuit Blocks
VR VR VR VR
Support-Circuit Preconditioner Support Graph of the Original Network
8
- General-purpose support-circuit preconditioner (GPSCP)
– Extracts sparsified network from the linearized circuit of the original circuit – Leverages existing sparse matrix solution techniques – Nearly-linear complexity for analyzing more general nonlinear circuit systems
Support-Circuit Preconditioner (Cont.)
Linearized Circuit
ds
g
ds
C
m gs
g V g s d
gs
C
gd
C
1
g
4
g
3
g
2
g
5
g
Nonlinear Circuit d g s
3
R
4
R
5
R
1
R
2
R
ds
g
ds
C
m gs
g V
g s d
gd
C
1
g
3
g
2
g
5
g
Support Circuit
9
Nonlinear Circuit
d g s
3
R
4
R
5
R
1
R
2
R
Support-Circuit Preconditioner Extraction (1)
- Directed weighted graph corresponding to a linearized circuit
– Can be obtained around an solution point during NR iterations – Will be sparsified through graph decomposition and sparsification
Linearized Circuit
ds
g
ds
C
m gs
g V g s d
gs
C
gd
C
1
g
4
g
3
g
2
g
5
g
1
Directed Weighted Graph
ds
g
ds
C h
m gs
g V
g s d
gs
C h
gd
C h
1
g
2
g
3
g
4
g
5
g
2
ds
g
ds
C h g s d
gs
C h
gd
C h
1
g
2
g
3
g
4
g
5
g
Undirected Weighted Graph
3
Support Graph
ds
g
ds
C h g s d
gd
C h
1
g
2
g
3
g
5
g
4
10
Controlling Sources
m
g
V
ds
g
ds
C h
g s d
gd
C h
1
g
2
g
3
g
5
g
Support Graph
Support-Circuit Preconditioner Extraction (2)
- Support-circuit preconditioner extraction
– Combine support graph and other components (e.g. controlling sources) – Factor the Jacobian matrix of the support circuit to create the preconditioner
ds
g
ds
C h
m gs
g V
g s d
gd
C h
1
g
2
g
3
g
5
g
Support Circuit
5 5
ds
g
ds
C
m gs
g V
g s d
gd
C
1
g
3
g
2
g
5
g
6
Spt-CKT Spt-CKT
General-Purpose Support Circuit
7
11
Quality Quantification of Support Graph Preconditioners
- Convergence of support-graph preconditioners
– The convergence relies on the condition number of matrix pencil (G,P) – The support of pencil (G,P) is defined as: – Eigenvalues of pencil (G,P) are bounded by – A smaller means faster convergence
τ
( , ) min{ | ( ) 0, all }
T n
G P x P G x x σ τ τ = ∈ℜ − ≥ ∈ℜ
max min
( , ) ( , ) ( , ) G P k G P G P λ λ =
- Spanning-tree support graph as a preconditioner
– May require many iterations to converge if (mismatch) is too large – can be estimated by comparing Joule heating of two resistive networks
Power dissipated by G: Power dissipated by P:
T
x Gx
T
x Px
τ τ
τ
12
Ultra-Sparsifier Support Graph (1)
- Ultra-sparsifier (non-tree) support graphs
– Ultra-sparsifier contains at most n-1+k edges (spanning tree + extra edges) – It is k-ultra-sparse that -approximates the original graph with high probability [1] – Adding extra edges to the spanning tree can better approximate the original graph (e.g. eigenvalues, power dissipations) Spanning tree
Edges of spanning tree graph Extra edges
Ultra-sparsifier
[1] D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. ACM STOC, 2004.
13
Ultra-Sparsifier Support Graph (2)
- Sparsity control of an ultra-sparsifier support graph
– Provides tradeoffs between the quality and efficiency of preconditioners – Weighted degree of a vertex v in a graph A is defined:
– Example: for a 2D-mesh grid, 1 ≤ wd(v) ≤ 4 – If wd(v) ->1: one dominant edge – If wd(v) ->4 : four evenly critical edges
( )
( ) ( ) max ( , )
u neighbor v
vol v wd v w u v
∈
=
vol(v): total weight incident to node v w(u,v): the weight of the edge connecting nodes v and u
14
Ultra-Sparsifier Support Graph (3)
- Iterative ultra-sparsifier support graph construction
– Define θ as the matching factor threshold (0 < θ < 1) of node weighted degree Step 1
- Compute weighted degree wd of each node
in the original graph A Step 2
- Compute the support graph A’ with
weighed degree wd’ Step 3
- Recover edges to A’ until wd’/wd > θ for
each node in the support graph A’ Step 4
- Return the final ultra-sparsifier support
graph A’for support-circuit preconditioning
Extra edges
Ultra-sparsifier Spanning tree
wd’/wd < θ wd’/wd > θ
15
Performance Model Guided Sparsification
- Runtime performance model can help find the optimal θ
– Which is better: a denser or sparser support graph?
tot GMRES LU
T N T T = ⋅ +
LU
T
GMRES
T N ⋅
Denser preconditioner
1. Greater LU factorization time 2. Less GMRES iterations
LU
T
GMRES
T N ⋅
Sparser preconditioner
1. Less LU factorization time 2. More GMRES iterations
Goal: minimize Ttot by finding a proper matching factor threshold θ !
Total Runtime:
16
Finding the Optimal Weighted Degree Threshold θ
- Optimal weighted degree threshold θ
– Exploit symbolic matrix factorization results to quickly identify optimal θ – E.g. find θ that maximizes the flops change of Cholesky factorizations
17
Performance Modeling Results
- Experiments results of IBM power grid benchmarks
Runtime and flops vs. weighted degree threshold θ
Runtime results of manual and automatic sparsification schemes
18
Test Cases for Experiments
CKT
# nunk # Mos # R # C # L # I ldo1 3M 84K 6M 250K 7K 250K ldo2 5M 71K 10M 422K 12K 422K pg1 3M 144 6M 250K 7K 250K pg2 6M 144 11M 490K 14K 490K clk1 3M 65K 6M 3M
- clk2
6M 65K 11M 6M
- Circuit Design Parameters:
- #nunk: number of unknowns in the circuits
- #Mos: number of MOSFET
- #R: number of resistors
- #L: number of inductors
- #C: number of capacitors
- #I: number of current sources
Three Circuit Design Types:
- ldo: large PDNs with on-chip VRs
- pg: large PDNs with power gating
- clk: clock distribution network
19
Results of Performance Model Guided Sparsification
- Experimental results for a large PDN with multiple VRs
– Performance guided sparsification approach achieve nearly-optimal runtime Runtime of a single NR step using different θ
20
Experimental Results
CKT #NR Direct GPSCP
Time (s) #GMRES Time (s) Speedup ldo1 237 279,629 4,130 15,368 18X ldo2 314
- 3,979
23,793
- pg1
222 108,784 3,381 10,204 11X pg2 421 185,892 3,478 14,206 13X clk1 132 50,688 1,452 3,493 14X clk2 219 112,497 2,555 8,001 14X
- Runtime comparison for transient analysis (100-time-step)
- Memory comparison
CKT Direct GPSCP
ldo1
4.2GB 0.8GB/5X
ldo2
- 1.1GB/-
pg1
3.2GB 0.8GB/4X
pg2
7.8GB 1.6GB/5X
clk1
4.3GB 0.8GB/5X
clk2
10.0GB 1.4GB/7X
21
Experimental Results (2)
- A large PDN with embedded multiple VRs
22
RF Simulation Methods
- For nonlinear RF circuits, output is usually quasi-periodic
– SPICE may require simulating many periods to reach steady state – Time-domain shooting method can not handle distributed devices
- Harmonic Balance (HB) analysis for steady-state RF simulation
– HB analysis can capture the steady-state spectral response directly – Harmonic balance also refers to balancing the current between linear and nonlinear portions at every harmonic frequency
Output may contain
- freqs. other than
ω
( )
t cos ω
Nonlinear Circuit
+ v −
v
Freq Domain, MHz dB Time Domain (ps) Voltage (v)
23
HB Analysis of RF Circuits
- Non-autonomous circuit analysis[1]
: state variables : impulse response function of linear circuit components : dynamic nonlinearities : static nonlinearities : time-dependent excitation sources
[1] K. S. Kundert and A. Sangiovanni-vincentelli. Simulation of Nonlinear Circuits in the Frequency Domain, CAD, 1986
( ) x t
( ) q
( ) f ( ) b t
( ) y t
are typically periodic functions
( ), x t ( ), q
( ) f
24
HB Analysis of RF Circuits (2)
- HB Jacobian matrix (frequency domain)
– and represent the Fast Fourier Transform(FFT) and Inverse Fast Fourier Transform(IFFT) respectively – G and C denote the linearization of q() and f() at s time domain sampled points, (s=2k+1, k is positive frequencies number) – includes lots of dense blocks introduced by
1 1
2
− −
Γ Γ + Γ ΩΓ + = G C f j Y Jhb π
∂ ∂ ∂ ∂ ∂ ∂ =
S
t t t
x q x q x q C
2 1
∂ ∂ ∂ ∂ ∂ ∂ =
S
t t t
x f x f x f G
2 1
− = Ω kI kI
Γ
1 −
Γ
hb
J
1 1
& C G
− −
Γ Γ Γ Γ
25
Challenges in Harmonic Balance (HB) Analysis
- Direct Methods for RF HB circuit simulation (A. Mehrotra et al, DAC’09)
– Challenged by solving large yet non-sparse Jacobian matrices – Cons: comp./memory cost grows quickly with circuit size
- Traditional iterative methods for HB analysis (P. Feldmann et al, CICC’96,
- W. Dong et al, TCAD’09)
– Pros: black-box, matrix-oriented, memory-efficient – E.g. ILU preconditioner, domain-decomposition preconditioner – Cons: inefficient/unreliable for strongly nonlinear RF systems
= Γ ⋅ ⋅ Γ
− 1 2 1 2 1 1
G G G G G G G G G
s s s
=
s
g g g G
2 1
T s
G G G ] , , , [
2 1
T s
g g g ] , , , [
2 1
FFT
Dense circulant matrices due to FFT/IFFT operations
26
- From graph sparsification to Jacobian matrix sparsification
– Modified nodal analysis (MNA) matrix reduction: 20% ~ 38% fewer entries – Fill-ins during LU reduction: 60% LU factorization Speedup: 50X
Graph Sparsification Approach to HB Analysis
-
-
⇒
-
-
-
MNA Matrix HB Jacobian Matrix
- ×
-
- ×
- ×
⇒ × ×
- ×
×
- ×
-
× ×
- ×
-
Fill-ins during LU Block Fill-ins during LU
Before Graph Sparsification
-
-
⇒
-
-
-
MNA Matrix HB Jacobian Matrix
- ×
-
-
⇒
-
×
-
- ×
-
Fill-ins during LU Block Fill-ins during LU After Graph Sparsification
27
Conclusion
- Graph sparsification approaches to circuit simulations
– MNA matrix decomposition into Laplacian and Complement matrices – Performance-guided graph sparsification of Laplacian matrix – Support-circuit preconditioner construction
- Our preliminary results
– Highly reliable convergence for time/frequency domain simulations – Up to 18X (21X) speedup and 7X (6X) memory reduction for time (frequency) domain simulations
– Scalable to large post-layout integrated circuits
- Future work
– Will explore spectral graph sparsification methods – Will exploit heterogeneous CPU-GPU computing platforms
28
Nonlinear Devices Evaluation in HB
- Evaluation of nonlinear devices
Fr Freq->Ti Time: terminal voltage waveforms Tim ime e do domai ain: evaluate current (derivative) waveforms Time->Fr Freq: currents(derivatives) in freq. domain
Terminal voltage spectrum IFFT/IAPDFT Terminal voltage samples Device evaluation Ids samples FFT/APDFT(Almost-Periodic DFT) Ids spectrum
- Terminal voltage samples
– Need sampling at 2k+1 time points (k is the positive frequencies number) according to Nyquist–Shannon sampling theorem.
29
Support-Circuit Preconditioner for HB Analysis
- Step 1: MNA matrix decomposition of linearized RF circuit
– Laplacian Matrix (P): passive devices such as resistors, capacitors, etc
– Complement Matrix (A): active devices such as transconductances, etc
M1 L1 R1 L2 C2 C1 R2 RF Circuit Linearized Circuit at t1 Linearized Circuit at ts . . .
P t1
A t1 L1 R1 L2 C2 C1 C
gd
C
gs
gds Cgs
g
mV gs
R2 1 2 3 4 5 L1 R1 L2 C2 C1 C
gd
C
gs
gds Cgs
g
mV gs
R2 1 2 3 4 5
P ts
A ts
t1~ts are s time sampled time points
30
Support-Circuit Preconditioner for HB Analysis (2)
- Step 2: Representative Laplacian matrix construction
– Different sampled time points have different entry values – Normalize the scaled Laplacian matrices of all sampled time points
…
P t1 P t2 P ts
Representative Laplacian Matrix Normalize Average
31
Support-Circuit Preconditioner for HB Analysis (3)
g1+C2/h 5 2 gds+Cds/h C1/h Cgd/h 3 1 4 g2 Cgs/h Representative Laplacian Matrix Original Weighted Graph Ultra Sparsifier C1/h Cgd/h 3 1 4 g2 5 2 g1+C2/h gds+Cds/h
Sparsified Representative Laplacian Matrix
Complement Matrix Sparsification pattern Matrix
- Step 3: Sparsification Pattern Extraction
– Convert matrix to weighted graph – Sparsify the weighted graph and convert back to matrix form – Combine with the complement matrix
32
Support-Circuit Preconditioner for HB Analysis (4)
System MNA Matrix t1 Sparsification pattern Matrix System MNA Matrix t2 System MNA Matrix ts Sparsified System MNA Matrix t1 Sparsified system MNA Matrix t2 Sparsified system MNA Matrix ts
… …
- Step 4: MNA Matrix Sparsification
33 Support circuit preconditioner Permuted matrix
- Circulant matrix in HB
- Step 5: Support circuit block preconditioner generation
– Original matrix : all variables of a single harmonic grouped together – Permuted matrix: all the harmonics of a single variable grouped together
Support-Circuit Preconditioner for HB Analysis (5)
= Γ ⋅ ⋅ Γ
− 1 2 1 2 1 1
G G G G G G G G G
s s s
=
s
g g g G
2 1
T s
G G G ] , , , [
2 1
T s
g g g ] , , , [
2 1
FFT
Permutation FFT
Sparsified MNA matrix
34
Case Study : Double-balanced Gilbert Mixer
- MOSFET linearization model
[21] [2] [1] [8] [16] [25] [27] [20] [7] [15] [13] [14] [11] [18] [22] [17] [4] [6]
M2 M1 R7 M5 L1 L0 C0 Vlo+ M3 M4 M6 R1 R3 R8 L2 R10 L3 C1 R2 Vrf+ R5 Vrf- R6 Vlo- R4 VDD
[1] [8] [21] [16] [25] [27] [20] [7] [15] [26] [13] [14] [11] [18] [22] [17] [4] [6] [2]
- Linearized passive network (Laplacian matrix) extraction
Rds
gmVgs
gnVbs D S
G
B Cgd Cgs
G B S D
[xx] denotes node index
35
Case Study : Double-balanced Gilbert Mixer (cont.)
- Ultra-sparsifier support graph construction
– Step 1: Extract maximum spanning tree – Step 2: Restore critical edges until reaching a desired approximation
2
4 6 8 11 13 14 1 18 16 21 17 22
25 27 2
4 6 8 11 13 14 1 18 16 21 17 22
25 27 2
4 6 8 11 13 14 1 18 16 21 17 22
25 27
Laplacian graph Maximum spanning tree Ultra sparsifier
36
HB Simulation Engine on CPU-GPU Platform
Device evaluation Support-circuit preconditioner Preconditioner factorization GMRES iterations Convergence checking Start End NR
- Decompose MNA matrix to
Passive and active matrices 1. Performance modeling based sparsification configuration 2. Construct representative passive matrix 3. Extract sparsification pattern 4. Sparsify MNA Matrix 5. Generate Support-circuit preconditioner
- GPU-based block LU
decomposition
- Matrix-free iterative solver
37
Runtime Performance Modeling
- Lookup table (LUT) for runtime performance modeling
– 2D LUTs predict LU factorization runtime on GPU – Two LUTs are created for GPU matrix multiplications and matrix divisions
Runtime performance lookup table for GPU-based matrix operations
Matrix operation batch size Matrix size Bilinear interpolation
38
Parallel Sparse Block LU Factorization
- Representative Sparsified MNA Matrix (test matrix)
– Approximates the properties of block sparse matrix – Created by averaging all sparsified MNA matrices – Factorized to get the fill-ins’ locations
…
Test matrix Average
Sparsified System MNA Matrix t1 Sparsified system MNA Matrix t2 Sparsified system MNA Matrix ts
x
Fill-in
x x x x
LU
L factor U factor
39
Parallel Sparse Block LU Factorization (cont.)
- Data dependency graph
– Column k depends on column j, when U(j, k) != 0 [1] – Can be derived from U matrix
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
2 1 6 4 5 3 7 8 9
Level 0 Level 1 Level 2 Level 3 Level 4
[1] J. Gilbert and T. Peierls. Sparse partial pivoting in time proportional to arithmetic operations. SIAM J. Sci. Stat. Comput., 9(5):862–873, 1988.
40
Parallel Sparse Block LU Factorization (cont.)
- Modified data dependency graph
– Identify “fake” dependency when L(j+1:n, j) == 0 – Eliminate “fake” dependencies
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 2 1 6 4 5 3 7 8 9 Level Level 1 Level 2
2 1 6 4 5 3 7 8 9
Level 0 Level 1 Level 2 Level 3 Level 4
41
Parallel Sparse Block LU Factorization (cont.)
- GPU-based block sparse
matrix LU factorizations
– Levelize the factorization according to data dependency graph – Each level only contains matrix multiplication and division operations – Use batched matrix multiplication and inversion functions provided by CUBLAS
2 1 6 4 5 3 7 8 9 Level 0 Level 1 Level 2
÷
X X X X X X X X X X X X X X X X X X X X X X X X X X X
÷
X X X X X X X X X X X X X X X X X X X X X X X X X X X
÷
X X X X X X X X X X X X X X X X X X X X X X X X X X X
…
Level 0 Level n Result
×
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
…
÷
X X X X X X X X X X X X X X X X X X X X X X X X X X X
÷
X X X X X X X X X X X X X X X X X X X X X X X X X X X
…
× ×
…
42
Experiment Setup
Note:
- Freqs: Number of harmonics
- Nunk: Number of unknowns
CKT Name Nodes Tones Freqs Nunk 1 mixer 1 302 2 25 14798 2 mixer 2 1988 2 41 161028 3 mixer 3 5262 2 5 47358 4 mixer 4 7532 2 13 188300 5 LNA + mixer 1 343 3 63 42875 6 LNA + mixer 2 5303 3 14 143181 7 LNA + mixer 3 7573 3 14 204471
- Widely used RF circuits as the benchmark
43
- Support-circuit preconditioned HB (SCPHB) method
– High robustness and efficiency – Runtime speedup: 21X (compared with direct solver in DAC’09) – Memory reduction: 6X (compared with direct solver in DAC’09)
Runtime and Memory Efficiency on CPU
CKT Direct solver BD preconditioner SCPHB preconditioner Time(s) Mem(GB) Time(s) K-Its Time(s) Mem(GB) K-Its Speedup 1 471.9 0.23 24.9 821 145.5 0.10 204 3.24X 2 19263.1 7.95 5637.6 6731 1408 1.72 383 13.7X 3 686.4 0.36 92.2 165 69.5 0.06 229 9.8X 4 14153.5 4.26 1072.3 273 1035.6 0.73 355 21.3X 5 2561.6 1.92 DNF DNF 821.5 1 194 3.1X 6 4040.9 3.34 DNF DNF 414.7 0.67 328 9.74X 7 6633.6 5.21 DNF DNF 791 0.83 255 8.38X K-Its : GMRES iteration number; DNF : Do not finish within 1000 Newton iterations
44
- Simulation runtime VS. input power of LNA+Mixer
– BD preconditioner: runtime increases exponentially – SCPHB preconditioner: runtime remains nearly constant
Runtime Efficiency for Strongly Nonlinearities
45
Scalability
- Nearly-linear runtime and memory scalability