Linear and Nonlinear SP 2 Methods for Large Scale Eigenvalue - - PowerPoint PPT Presentation
Linear and Nonlinear SP 2 Methods for Large Scale Eigenvalue - - PowerPoint PPT Presentation
Linear and Nonlinear SP 2 Methods for Large Scale Eigenvalue Calculations Zhaojun Bai University of California, Davis Workshop on Recent Advances in Numerical Methods for Eigenvalue Problems NCTS, Taiwan, January 4-8, 2008 Linear and
Linear and Nonlinear “ SP2 ” Methods for Large Scale Eigenvalue Calculations
(SP2 = Structure-Preserving Subspace Projection)
Zhaojun Bai University of California, Davis Workshop on Recent Advances in Numerical Methods for Eigenvalue Problems NCTS, Taiwan, January 4-8, 2008
Two parts of this talk
- Part I: Linear problems
Algebraic Substructuring for eigenvalue and frequency response calculations
- Part II: Nonlinear problems
Nonlinear Rayleigh-Ritz ITerative (NRRIT) technique for solving nonlinear eigenvalue problems
Part I Algebraic Substructuring for Eigenvalue and Frequency Response Calculations joint work with
- X. S. Li, C. Yang, P. Husbands, E. Ng
LBNL
- W. Gao
Fudan University, China
- J. H. Ko
Konkuk University, South Korea
EIG and FRA
- EIG (eigenvalue problem)
Kq = λMq
σ
− → Kσq = λσMq where Kσ = K − σM, λσ = λ − σ
- FRA (frequency response calculation)
H(ω) = lT(K − ω2M + iωD)−1b = lT [γ1(ω)Kσ + γ2(ω, σ)M]−1 b ↑ σ, D = αK + βM where γ1(ω) = 1 + iωα γ2(ω, σ) = −ω2 + σ + iω(σα + β),
Substructuring methods
- Eigenvalue and frequency response calculations are ubiquitous.
- Substructuring techniques dates back to the 1960s (CMS)
- Substructuring holds great promoise for solving extremely large scale
problems, e.g., AMLS
- Issues as the techniques extended for broader applications:
– Extraction of arbitrary eigenmodes – High frequency response calculations – Accuracy enhancement – Performance tuning
Outline of Part I
- 1. Substructure partition and reduction (projection)
- 2. ASEIG: eigenvalue calculation
- 3. ASFRA: frequency response calculation
- 4. Remarks
Substructure partition I.1
- Single-level
Kσ =
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
Kσ
11
Kσ
13
Kσ
22 Kσ 23
Kσ
31 Kσ 32 Kσ 33
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
M =
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
M11 M13 M22 M23 M31 M32 M33
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
- Multi-level (nested dissection)
- 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15
5 7 1 2 3 4 6 8 9 10 13 14 15 11 12
e.g., by METIS of Karypis and Kumar
Substructure reduction (projection) I.1
- Block elimination matrix L
- Kσ = L−TKσL−1 =
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
Kσ
11
Kσ
22
- Kσ
33
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ,
- M = L−TML−1 =
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
M11
- M13
M22
- M23
- M31
- M32
- M33
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
- Extraction of local modes
Sm =
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
S1 S2 S3
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
← − ← − ← − (Kσ
11, M11)
(Kσ
22, M22)
(
- Kσ
33,
- M33)
- Subspace projection onto span{Sm}
Kσ
m = ST m
- KσSm =
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
kσ
11
kσ
22
- kσ
33
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ,
Mm = ST
m
- MSm =
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
m11
- m13
m22
- m23
- m31
- m32
- m33
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
Eigenvalue calculation I.2
- Projected (smaller) eigenvalue problem:
Kσ
mΦ = MmΦΘσ,
- “Global” eigenpairs:
Θσ = diag(θσ) = diag(Θσ
l , Θσ n, Θσ r)
Φ = (φ) =
Φl Φn Φr
- Retained modes: Φn (determined by “global cutoff values”)
Truncated modes: Φt =
Φl Φr
- Ritz pairs (Θσ
n + σ, L−1SmΦn) ≈ (λ, q)’s of (K, M)
- Why does it work, ... [Yang et al], [Voss et al], ...
AS algorithm for eigenvalue computation I.2
- 0. Substructure partition, if necessary
- 1. Block elimination and congruence transformation
- 2. Mode selection/computation for sub-structures and separators
- 3. Subspace assembling
- 4. Projection calculation
- 5. Projected eigenvalue problem
AS implementation I.2
- Major operations:
- 1. Congruence transformations and projection
- 2. projected eigenvalue problem
- Cost
- 1. flops: more than a single sparse Cholesky factorization
- 2. storage: Block Cholesky factors + projected matrices + ...
But no triangular solvers, no (re)-orthogonalization ... vs. SIL
- ASEIG package [ACM TOMS ’08]
- 1. interleaves steps 1-4 computations
- 2. recomputes some of intermediate matrix blocks instead of storing
50% of memory saving trade with 15% recompute time
Case study: accelerator cavity design (SLAC) I.2
- EMC simulation for accelerator design
- Challenges:
- 1. Small eigenvalues (tightly clustered) out of a large-eigenvalue dominated
eigenspetrum: many small nonzero eigenvalues desired
- 2. Large null space in the stiffness matrix K
- 3. Requires high accuracy for eigenpairs
Case study: accelerator cavity design (SLAC) I.2 Performance results:
- A 6-cell DDS realistic structure, N = 65, 740
- 4-level AS, nproj ≈ 3K
100 200 300 400 500 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 nev Seconds SIL AMLS AMLS−Ritz
- SIL (shift-Invert Lanczos) reqiures multiple shifts (factorizations)
Frequency response calculation on [ωmin, ωmax] I.3
- H(ω)-projection onto span{L−1Sm}:
Hm(ω) = lT
m(γ1Kσ m + γ2Mm)−1bm = lTpm(ω)
- Frequency response equation
(γ1Kσ
m + γ2Mm)
- Gm(ω)
pm(ω) = bm
- Recall: global modes Φ = (φ) =
- Φl Φn Φr
- f (Kσ
m, Mm)
- Partition
pm(ω) = pn(ω) + pt(ω) and pn(ω) ∈ span{Φn}, pt(ω) ∈ span{
Φl Φr
- }
- Phase 1: compute pn(ω) directly
pn(ω) = Φn (γ1Θσ
n + γ2I)−1Φn Tbm
- .
- Phase 2: compute pt(ω) iteratively
– iterative refinement pℓ
t(ω) = pℓ−1 t
(ω) + ∆pℓ
t(ω),
– Initial p0
t(ω) given by an extrapolation from the previous frequency point ω′
– Gerlerkin projection approximation for solving Gm(ω)∆pℓ
t(ω) = rℓ−1 m (ω)
– Frequency sweeping iteration (FSI) pℓ
t(ω) = pℓ−1 t
(ω) + 1 γ1
- (Kσ
m)−1 − Φn(Θσ n)−1Φn T
- rℓ−1
m (ω)
- ≈∆pℓ
t(ω)
- ... [Bennighof and Kaplan] ... [Ko and B.]
What local eigenmodes to retain? I.3
- Convergence analysis of FSI =
⇒ cutoff values for the modes Φn to be retained
- Truncated modal residuals
ΦT
t rℓ m(ω)
- = −γ2
γ1 Θσ
t ΦT t rℓ−1 m (ω)
- .
- Contraction for the truncated mode Φt = (φk):
- φT
k rℓ m(ω)
φT
k rℓ−1 m (ω)
- ≤ dmax
|θσ
k| ≤ ξ < 1,
where dmax = max{d(ωk, σ), 1 ≤ k ≤ nf}, d(ω, σ) =
- −γ2
γ1
- Global cutoff values (to determine retained modes Φn) of (Kσ
m, Mm):
λσ
min = −dmax
ξ and λσ
max = dmax
ξ . i.e., retained eigenpairs of the original matrix pair (K, M) within [λmin, λmax] =
⎡ ⎢ ⎢ ⎣σ − dmax
ξ , σ + dmax ξ
⎤ ⎥ ⎥ ⎦
ASFRA = ASEIG + FRA I.3
- Initial steps: σ, ... cutoff values
- Compute Kσ
m, Mm, bm, lm
- Compute the retained eigenpairs (Θσ
n, Φn) of (Kσ m, Mm)
- FSI:
– calculate pn(ωk) by retained modes Φn – set the initial response p0
t(ωk)
– ℓ–loop for the frequency point ωk ∗ update residual rℓ−1
m (ωk)
∗ calculate the correction ∆pℓ
t(ωk)
∗ test for convergence ... ∗ compute pℓ
t(ωk)
– pm(ωk) = pn(ωk) + pℓ
t(ωk)
– Hm(ωk) = lT
mpm(ωk)
Case study: checkerboard filter I.3
- FE simulation of a prototype checkerboard MEMS resonator for a
high-frequency bandpass filter for, e.g., the surface acoustic wave devices
- D+
D− D+ D− S+ S+ S− S−
The SEM picture of a fabricated device and FE models
courtesy of David Bindel, Sunil Behave, Roger Howe.
- N = 15258, [fmin, fmax] = [230, 250]MHz,
- Performance
checkerboard filter N = 15258 Direct Solution SIL ASFRA0 ASFRA m (AS subspace) – – 1635 307 n (retained modes) – 242 231 37 Total FS iter – – 96 145 Elapsed time (sec.) 1612.6 86.23 208.15 26.72
Remarks I.4
- 1. ASEIG + ASFRA package
General purpose, memory efficient, application tunable, ...
- 2. Better understanding of AS accuracy and applicability
- 3. Performance evaluation
tuning parameters: number of AS levels, local and global cutoff values, stopping criteria, ...
- 4. To some extent, ASEIG + ASFRA generalizes the underlying algorithms,
funcationality and applicability of commecial viable AMLS of Bennighof et al.
Part II Nonlinear Rayleigh-Ritz ITerative (NRRIT) technique for solving nonlinear eigenvalue problems joint work with Ben-Shan Liao
Simens
Rich Lee and Kwak Ko
Stanford Linear Accelerator Center
SciDAC SciDAC ESS Team ESS Team
Computational Mathematics
- L. Lee, L. Ge, V. Akcelik
- C. Sheng, H. Jiang,
- E. Prudencio
Accelerator Modeling
- A. Kabel, K. Ko
- Z. Li, C. Ng,
- A. Candel
Computing Technologies
- N. Folwell, A. Guetz,
- G. Schussman,
- R. Uplenchwar
Stanford
- G. Golub
SNL
- P. Knupp, T. Tautges,
- K. Devine
LBNL
- E. Ng, W. Gao, P.
Husbands, X. Li,
- C. Yang
LLNL
- L. Diachin, D. Brown, K. Chand,
- B. Henshaw, D. Quinlan
UCD
- B. Liao, Z. Bai,
- K. Ma, H. Yu,
ISICs – TSTT, TOPS, PERC; SAPP- Stanford, LBNL, UCD
Advanced Computations Department
Columbia
- D. Keyes
RPI
- M. Shephard, A.
Bauer, E. Seol CMU
- O. Ghattas
International Linear International Linear Collider Collider (ILC) (ILC)
http://www.linearcollider.org/
The ILC is a proposed new electron-positron collider that would allow physicists to answer compelling questions on identity of dark matter to the existence of extra dimensions. In the ILC's design, two facing linear accelerators, each 20 kilometers long, accelerate electrons and positrons to TeV energy using superconducting accelerating cavities. The Global Design Effort will establish the design of the ILC, focusing the efforts
- f hundreds of accelerator
scientists and particle physicists in North America, Europe and Asia.
Various ILC Cavities Various ILC Cavities
BCD Low-Loss SST STF1 ICHIRO
Outline of Part II
- 1. NEP
- 2. Initial approximations and ordering
- 3. NEP methods
- 4. NRRIT (Nonlinear Arnoldi Method)
- 5. Numerical experiments
- 6. Ongoing work
NEP II.1
- NEP
T(λ)x = 0 where T(λ) = K − λM + E(λ),
- Origin: Nedelec-type FE discretization of the frequency domain Maxwell’s
equation with waveguide BCs [ISH95, KFG+06, Lee05a, Lee05b]
- KT = K ≥ 0, MT = M > 0, W T
j = Wj
- Multiple waveguide modes,
E(λ) = i
p
- j=1
- (λ − σ2
j) Wj,
σj are given nonnegative scalars (cutoff values), i = √−1.
- Future: coupling of multiple waveguide modes, where
E(λ) = i
p
- j=1
- (λ − σ2
j) W TE j
+ i
p
- j=1
λ
- (λ − σ2
j) W TM j
Eigenvalues of interest II.1
- Let κ =
√ λ, resonant frequency = f(κ) = c 2π · Re(κ), external Qe-value = Qe(κ) = 1 2 · Re(κ) Im(κ).
- The external Qe-values measure the electromagnetic coupling between the
cavity and waveguide and characterize the energy loss.
- Computational task: find eigenvalues λ satisfying
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
κ = √ λ is close to the shift value σ0 = 2π
c f0 and
λ ∈ D =
- λ | λ = κ2, Re(κ) > σ0, Im(κ) > 0 and Qe(κ) >
- Qe > 1
- ,
where f0 and
- Qe are prescribed.
Region for the desired eigenvalues II.1
150 200 250 300 350 400 5 10 15 20 25 30 Real part of square root of eigenvalues Imaginary part of square root of eigenvalues Qe=10 σ Region of interest 1 2 3 4 5 6 7 8 9 10 shift point
- : “exact” eigenvalues;
×: initial approximations
Initial approximations and ordering II.2
- Linearization via the first order truncation:
T(λ) ≈ T(λ0) + (λ − λ0)T ′(λ0) ≡
- K(λ0) − λ
- M(λ0),
- Initial approximations:
selected eigenpairs (θℓ, vℓ) of (
- K(σ2
0),
- M(σ2
0))
- Ordering: selected eigenpairs (θℓ, vℓ) of
|θ1/2
1
− σ0| ≤ |θ1/2
2
− σ0| ≤ · · · ≤ |θ1/2
n
− σ0|.
- Example:
T(λ)x =
- K − λM + i
- (λ − σ2
1) W1
- x = 0 ⇔ QEP
Waveguide Cavity
Ω Γ
E
Γ
M
Γ
V
Initial approximations and ordering II.2
150 200 250 300 350 400 5 10 15 20 25 30 Real part of square root of eigenvalues Imaginary part of square root of eigenvalues Qe=10 σ Region of interest 1 2 3 4 5 6 7 8 9 10 shift point
- : “exact” eigenvalues;
×: initial approximations
NEP methods for T(λ)x = 0 II.3
- Newton’s methods:
Inverse ITeration (IIT), Residual Inverse ITeration (RIIT) Nonlinear Rayleigh Quotient Iteration (NRQI) Safeguarded Newton method
- Linearization Methods:
Picard iteration, Self-Consistent Iteration (SCI) Successive Linear Approximation Method (SLAM)
- “Nonlinear” Subspace Projection Methods ←
− this talk
Nonlinear Arnoldi method Nonlinear Rayleigh-Ritz ITerative (NRRIT)
Must-read papers:
- A. Ruhe, Algorithms for the nonlinear eigenvalue problem, SIAM J. Numer. Anal, Vol.10,
No.4, pp.674–689, 1973
- V. Mehrmann and H. Voss, Nonlinear eigenvalue problems: a challenge for modern eigenvalue
methods, 2004 (http://www.tu-harburg.de/mat/hp/voss)
Nonlinear Rayleigh-Ritz ITerative (NRRIT) technique II.4 NRRIT framework:
- 1. Select a proper projection subspace V.
- 2. Compute a proper pair (θ, z) satisfying the Galerkin condition:
T(θ)z ⊥ V for z ∈ V.
- 3. Expand or restart the projection subspace V
Origin: Nonlinear Arnoldi method [Betcke and Voss,’04], [Voss,’04]
NRRIT – matrix version II.4
- 1. Compute an orthonormal basis Q of the properly selected subspace V (and
initial approximations (θℓ, vℓ))
- 2. Compute an eigenpair (θ, g) with initials (θℓ, QTvℓ) of the reduced NEP
TQ(θ)g = 0, where TQ(θ) = QHT(θ)Q
- 3. Compute Ritz pairs (θ, Qg) of the original NEP
- 4. Expand or restart Q as necessary
NRRIT in practice II.4 Key issues:
- 1. what’a proper initial projection subspace V?
- 2. what’s the order to compute the desired eigenvalues (proper ordering of the
initial approximations)?
- 3. how to expand or restart Q when necessary?
Main features of our implementation of NRRIT II.4
- 1. Proper initial approximations (θℓ, vℓ):
K − λM vs.
- K(σ2
0) − λ
- M(σ2
0)
- 2. Ordering (θℓ, vℓ) by
|θ1/2
1
− σ0| ≤ |θ1/2
2
− σ0| ≤ · · · ≤ |θ1/2
n
− σ0|.
- 3. Preserving real symmetry with the projection matrix
Q = orth(
- Re(V ) Im(V )
- ),
- 4. Structure-preserved reduced NEP
TQ(θ)y = 0, where TQ(θ) = QTT(θ)Q = KQ − θMQ + i
p
- j=1 (θ − σ2
j)
1 2WQ,j
with KQ = QHKQ, MQ = QTMQ and WQ,j = QTWjQ.
- 5. Subspace expansion: with the current Ritz pair (σ, z),
v := v − QQTv, v = T −1(σ2
0)T(θ)z
and Q :=
- Q
Re(v) Im(v)
- 6. Restarting and purging:
(a) restart with Q by replacing vℓ ← − z (b) purge all vectors v induced by the initial (θℓ, vℓ) leading to the failure of convergence
- 7. Reference:
Ben-Shan Liao, Subspace projection methods for model order reduction and nonlinear eigenvalue computation, PhD Thesis, UC Davis, 2007
Numerical experiements II.5 Example 1: the NEP with one cutoff values, N = 10102 T(λ)x =
- K − λM + i
- (λ − σ2
1)W1+
- x = 0,
Equivalent Quadratic Eigenvalue Problem: (ν2M + νW +
- K)x = 0
where ν = (λ − σ2
1)1/2, W = iW1 and
- K = σ2
1M − K.
λ1 SOAR (1.9474685976542845+E02, 4.6015939351662103E-01) NRRIT (1.9474685976743996+E02, 4.6015939593544449E-01) λ2 SOAR . . . . . . NRRIT . . . . . .
Numerical experiements II.5 Example 2: the NEP with two cutoff values T(λ)x =
- K − λM + i
- (λ − σ2
1)W1 + i
- (λ − σ2
2)W2
- x = 0,
N = 9956, nnz(K) = nnz(M) = 148, 318. nnz(W1) = 57, nnz(W2) = 293.
Waveguide Open Cavity
Ω Γ
E
Γ
M
Γ
V
Γ
V
Numerical experiements II.5 Example 2: CPU in second: IIT SLAM NRRIT ǫ Iters E-Time Iters E-Time Iters E-Time 10−8 15 71.98 12 79.48 46 22.95 10−10 18 85.92 16 96.19 69 32.91 10−12 22 104.47 20 115.33 87 42.08 10−14 24 113.97 21 120.22 114 57.59 IIT = Inverse ITeration SLAM = Successive Linear Approximation Method Convergence threshold: T(θ)Qgℓ T(θ) Qgℓ ≤ ǫ for ℓ = 1, 2, . . . , 10
Ongoing work II.6
- Understanding of NRRIT: subspace expansion, restarting, convergence analysis
- Accuracy of computed eigenvalues (external Qe-values)
- Verification: how to enumerate eigenvalues and detect missing ones?
- Recent work of Huang and Su on NEP arising from fiber optic design
[Kaufman ’06]:
- A + s(λ)uuT
- x = λx
- Large-scale NEP (cut-off values p = 8 ∼ 24, DOFs = 106 ∼ 107)
- SLAC’s Omega3P package for next-generation accelerator design (DOE