Linear and Nonlinear SP 2 Methods for Large Scale Eigenvalue - - PowerPoint PPT Presentation

linear and nonlinear sp 2 methods for large scale
SMART_READER_LITE
LIVE PREVIEW

Linear and Nonlinear SP 2 Methods for Large Scale Eigenvalue - - PowerPoint PPT Presentation

Linear and Nonlinear SP 2 Methods for Large Scale Eigenvalue Calculations Zhaojun Bai University of California, Davis Workshop on Recent Advances in Numerical Methods for Eigenvalue Problems NCTS, Taiwan, January 4-8, 2008 Linear and


slide-1
SLIDE 1

Linear and Nonlinear “ SP2 ” Methods for Large Scale Eigenvalue Calculations

Zhaojun Bai University of California, Davis Workshop on Recent Advances in Numerical Methods for Eigenvalue Problems NCTS, Taiwan, January 4-8, 2008

slide-2
SLIDE 2

Linear and Nonlinear “ SP2 ” Methods for Large Scale Eigenvalue Calculations

(SP2 = Structure-Preserving Subspace Projection)

Zhaojun Bai University of California, Davis Workshop on Recent Advances in Numerical Methods for Eigenvalue Problems NCTS, Taiwan, January 4-8, 2008

slide-3
SLIDE 3

Two parts of this talk

  • Part I: Linear problems

Algebraic Substructuring for eigenvalue and frequency response calculations

  • Part II: Nonlinear problems

Nonlinear Rayleigh-Ritz ITerative (NRRIT) technique for solving nonlinear eigenvalue problems

slide-4
SLIDE 4

Part I Algebraic Substructuring for Eigenvalue and Frequency Response Calculations joint work with

  • X. S. Li, C. Yang, P. Husbands, E. Ng

LBNL

  • W. Gao

Fudan University, China

  • J. H. Ko

Konkuk University, South Korea

slide-5
SLIDE 5

EIG and FRA

  • EIG (eigenvalue problem)

Kq = λMq

σ

− → Kσq = λσMq where Kσ = K − σM, λσ = λ − σ

  • FRA (frequency response calculation)

H(ω) = lT(K − ω2M + iωD)−1b = lT [γ1(ω)Kσ + γ2(ω, σ)M]−1 b ↑ σ, D = αK + βM where γ1(ω) = 1 + iωα γ2(ω, σ) = −ω2 + σ + iω(σα + β),

slide-6
SLIDE 6

Substructuring methods

  • Eigenvalue and frequency response calculations are ubiquitous.
  • Substructuring techniques dates back to the 1960s (CMS)
  • Substructuring holds great promoise for solving extremely large scale

problems, e.g., AMLS

  • Issues as the techniques extended for broader applications:

– Extraction of arbitrary eigenmodes – High frequency response calculations – Accuracy enhancement – Performance tuning

slide-7
SLIDE 7

Outline of Part I

  • 1. Substructure partition and reduction (projection)
  • 2. ASEIG: eigenvalue calculation
  • 3. ASFRA: frequency response calculation
  • 4. Remarks
slide-8
SLIDE 8

Substructure partition I.1

  • Single-level

Kσ =

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

11

13

22 Kσ 23

31 Kσ 32 Kσ 33

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

M =

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

M11 M13 M22 M23 M31 M32 M33

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

  • Multi-level (nested dissection)
  • 1

2 3 4 5 6 7 8 9 10 11 12 13 14 15

5 7 1 2 3 4 6 8 9 10 13 14 15 11 12

e.g., by METIS of Karypis and Kumar

slide-9
SLIDE 9

Substructure reduction (projection) I.1

  • Block elimination matrix L
  • Kσ = L−TKσL−1 =

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

11

22

33

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ,

  • M = L−TML−1 =

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

M11

  • M13

M22

  • M23
  • M31
  • M32
  • M33

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

  • Extraction of local modes

Sm =

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

S1 S2 S3

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

← − ← − ← − (Kσ

11, M11)

(Kσ

22, M22)

(

33,

  • M33)
  • Subspace projection onto span{Sm}

m = ST m

  • KσSm =

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

11

22

33

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ,

Mm = ST

m

  • MSm =

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

m11

  • m13

m22

  • m23
  • m31
  • m32
  • m33

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

slide-10
SLIDE 10

Eigenvalue calculation I.2

  • Projected (smaller) eigenvalue problem:

mΦ = MmΦΘσ,

  • “Global” eigenpairs:

Θσ = diag(θσ) = diag(Θσ

l , Θσ n, Θσ r)

Φ = (φ) =

Φl Φn Φr

  • Retained modes: Φn (determined by “global cutoff values”)

Truncated modes: Φt =

Φl Φr

  • Ritz pairs (Θσ

n + σ, L−1SmΦn) ≈ (λ, q)’s of (K, M)

  • Why does it work, ... [Yang et al], [Voss et al], ...
slide-11
SLIDE 11

AS algorithm for eigenvalue computation I.2

  • 0. Substructure partition, if necessary
  • 1. Block elimination and congruence transformation
  • 2. Mode selection/computation for sub-structures and separators
  • 3. Subspace assembling
  • 4. Projection calculation
  • 5. Projected eigenvalue problem
slide-12
SLIDE 12

AS implementation I.2

  • Major operations:
  • 1. Congruence transformations and projection
  • 2. projected eigenvalue problem
  • Cost
  • 1. flops: more than a single sparse Cholesky factorization
  • 2. storage: Block Cholesky factors + projected matrices + ...

But no triangular solvers, no (re)-orthogonalization ... vs. SIL

  • ASEIG package [ACM TOMS ’08]
  • 1. interleaves steps 1-4 computations
  • 2. recomputes some of intermediate matrix blocks instead of storing

50% of memory saving trade with 15% recompute time

slide-13
SLIDE 13

Case study: accelerator cavity design (SLAC) I.2

  • EMC simulation for accelerator design
  • Challenges:
  • 1. Small eigenvalues (tightly clustered) out of a large-eigenvalue dominated

eigenspetrum: many small nonzero eigenvalues desired

  • 2. Large null space in the stiffness matrix K
  • 3. Requires high accuracy for eigenpairs
slide-14
SLIDE 14

Case study: accelerator cavity design (SLAC) I.2 Performance results:

  • A 6-cell DDS realistic structure, N = 65, 740
  • 4-level AS, nproj ≈ 3K

100 200 300 400 500 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 nev Seconds SIL AMLS AMLS−Ritz

  • SIL (shift-Invert Lanczos) reqiures multiple shifts (factorizations)
slide-15
SLIDE 15

Frequency response calculation on [ωmin, ωmax] I.3

  • H(ω)-projection onto span{L−1Sm}:

Hm(ω) = lT

m(γ1Kσ m + γ2Mm)−1bm = lTpm(ω)

  • Frequency response equation

(γ1Kσ

m + γ2Mm)

  • Gm(ω)

pm(ω) = bm

  • Recall: global modes Φ = (φ) =
  • Φl Φn Φr
  • f (Kσ

m, Mm)

  • Partition

pm(ω) = pn(ω) + pt(ω) and pn(ω) ∈ span{Φn}, pt(ω) ∈ span{

Φl Φr

  • }
slide-16
SLIDE 16
  • Phase 1: compute pn(ω) directly

pn(ω) = Φn (γ1Θσ

n + γ2I)−1Φn Tbm

  • .
  • Phase 2: compute pt(ω) iteratively

– iterative refinement pℓ

t(ω) = pℓ−1 t

(ω) + ∆pℓ

t(ω),

– Initial p0

t(ω) given by an extrapolation from the previous frequency point ω′

– Gerlerkin projection approximation for solving Gm(ω)∆pℓ

t(ω) = rℓ−1 m (ω)

– Frequency sweeping iteration (FSI) pℓ

t(ω) = pℓ−1 t

(ω) + 1 γ1

  • (Kσ

m)−1 − Φn(Θσ n)−1Φn T

  • rℓ−1

m (ω)

  • ≈∆pℓ

t(ω)

  • ... [Bennighof and Kaplan] ... [Ko and B.]
slide-17
SLIDE 17

What local eigenmodes to retain? I.3

  • Convergence analysis of FSI =

⇒ cutoff values for the modes Φn to be retained

  • Truncated modal residuals

ΦT

t rℓ m(ω)

  • = −γ2

γ1 Θσ

t ΦT t rℓ−1 m (ω)

  • .
  • Contraction for the truncated mode Φt = (φk):
  • φT

k rℓ m(ω)

φT

k rℓ−1 m (ω)

  • ≤ dmax

|θσ

k| ≤ ξ < 1,

where dmax = max{d(ωk, σ), 1 ≤ k ≤ nf}, d(ω, σ) =

  • −γ2

γ1

  • Global cutoff values (to determine retained modes Φn) of (Kσ

m, Mm):

λσ

min = −dmax

ξ and λσ

max = dmax

ξ . i.e., retained eigenpairs of the original matrix pair (K, M) within [λmin, λmax] =

⎡ ⎢ ⎢ ⎣σ − dmax

ξ , σ + dmax ξ

⎤ ⎥ ⎥ ⎦

slide-18
SLIDE 18

ASFRA = ASEIG + FRA I.3

  • Initial steps: σ, ... cutoff values
  • Compute Kσ

m, Mm, bm, lm

  • Compute the retained eigenpairs (Θσ

n, Φn) of (Kσ m, Mm)

  • FSI:

– calculate pn(ωk) by retained modes Φn – set the initial response p0

t(ωk)

– ℓ–loop for the frequency point ωk ∗ update residual rℓ−1

m (ωk)

∗ calculate the correction ∆pℓ

t(ωk)

∗ test for convergence ... ∗ compute pℓ

t(ωk)

– pm(ωk) = pn(ωk) + pℓ

t(ωk)

– Hm(ωk) = lT

mpm(ωk)

slide-19
SLIDE 19

Case study: checkerboard filter I.3

  • FE simulation of a prototype checkerboard MEMS resonator for a

high-frequency bandpass filter for, e.g., the surface acoustic wave devices

  • D+

D− D+ D− S+ S+ S− S−

The SEM picture of a fabricated device and FE models

courtesy of David Bindel, Sunil Behave, Roger Howe.

slide-20
SLIDE 20
  • N = 15258, [fmin, fmax] = [230, 250]MHz,
slide-21
SLIDE 21
  • Performance

checkerboard filter N = 15258 Direct Solution SIL ASFRA0 ASFRA m (AS subspace) – – 1635 307 n (retained modes) – 242 231 37 Total FS iter – – 96 145 Elapsed time (sec.) 1612.6 86.23 208.15 26.72

slide-22
SLIDE 22

Remarks I.4

  • 1. ASEIG + ASFRA package

General purpose, memory efficient, application tunable, ...

  • 2. Better understanding of AS accuracy and applicability
  • 3. Performance evaluation

tuning parameters: number of AS levels, local and global cutoff values, stopping criteria, ...

  • 4. To some extent, ASEIG + ASFRA generalizes the underlying algorithms,

funcationality and applicability of commecial viable AMLS of Bennighof et al.

slide-23
SLIDE 23

Part II Nonlinear Rayleigh-Ritz ITerative (NRRIT) technique for solving nonlinear eigenvalue problems joint work with Ben-Shan Liao

Simens

Rich Lee and Kwak Ko

Stanford Linear Accelerator Center

slide-24
SLIDE 24

SciDAC SciDAC ESS Team ESS Team

Computational Mathematics

  • L. Lee, L. Ge, V. Akcelik
  • C. Sheng, H. Jiang,
  • E. Prudencio

Accelerator Modeling

  • A. Kabel, K. Ko
  • Z. Li, C. Ng,
  • A. Candel

Computing Technologies

  • N. Folwell, A. Guetz,
  • G. Schussman,
  • R. Uplenchwar

Stanford

  • G. Golub

SNL

  • P. Knupp, T. Tautges,
  • K. Devine

LBNL

  • E. Ng, W. Gao, P.

Husbands, X. Li,

  • C. Yang

LLNL

  • L. Diachin, D. Brown, K. Chand,
  • B. Henshaw, D. Quinlan

UCD

  • B. Liao, Z. Bai,
  • K. Ma, H. Yu,

ISICs – TSTT, TOPS, PERC; SAPP- Stanford, LBNL, UCD

Advanced Computations Department

Columbia

  • D. Keyes

RPI

  • M. Shephard, A.

Bauer, E. Seol CMU

  • O. Ghattas
slide-25
SLIDE 25

International Linear International Linear Collider Collider (ILC) (ILC)

http://www.linearcollider.org/

The ILC is a proposed new electron-positron collider that would allow physicists to answer compelling questions on identity of dark matter to the existence of extra dimensions. In the ILC's design, two facing linear accelerators, each 20 kilometers long, accelerate electrons and positrons to TeV energy using superconducting accelerating cavities. The Global Design Effort will establish the design of the ILC, focusing the efforts

  • f hundreds of accelerator

scientists and particle physicists in North America, Europe and Asia.

slide-26
SLIDE 26

Various ILC Cavities Various ILC Cavities

BCD Low-Loss SST STF1 ICHIRO

slide-27
SLIDE 27

Outline of Part II

  • 1. NEP
  • 2. Initial approximations and ordering
  • 3. NEP methods
  • 4. NRRIT (Nonlinear Arnoldi Method)
  • 5. Numerical experiments
  • 6. Ongoing work
slide-28
SLIDE 28

NEP II.1

  • NEP

T(λ)x = 0 where T(λ) = K − λM + E(λ),

  • Origin: Nedelec-type FE discretization of the frequency domain Maxwell’s

equation with waveguide BCs [ISH95, KFG+06, Lee05a, Lee05b]

  • KT = K ≥ 0, MT = M > 0, W T

j = Wj

  • Multiple waveguide modes,

E(λ) = i

p

  • j=1
  • (λ − σ2

j) Wj,

σj are given nonnegative scalars (cutoff values), i = √−1.

  • Future: coupling of multiple waveguide modes, where

E(λ) = i

p

  • j=1
  • (λ − σ2

j) W TE j

+ i

p

  • j=1

λ

  • (λ − σ2

j) W TM j

slide-29
SLIDE 29

Eigenvalues of interest II.1

  • Let κ =

√ λ, resonant frequency = f(κ) = c 2π · Re(κ), external Qe-value = Qe(κ) = 1 2 · Re(κ) Im(κ).

  • The external Qe-values measure the electromagnetic coupling between the

cavity and waveguide and characterize the energy loss.

  • Computational task: find eigenvalues λ satisfying

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

κ = √ λ is close to the shift value σ0 = 2π

c f0 and

λ ∈ D =

  • λ | λ = κ2, Re(κ) > σ0, Im(κ) > 0 and Qe(κ) >
  • Qe > 1
  • ,

where f0 and

  • Qe are prescribed.
slide-30
SLIDE 30

Region for the desired eigenvalues II.1

150 200 250 300 350 400 5 10 15 20 25 30 Real part of square root of eigenvalues Imaginary part of square root of eigenvalues Qe=10 σ Region of interest 1 2 3 4 5 6 7 8 9 10 shift point

  • : “exact” eigenvalues;

×: initial approximations

slide-31
SLIDE 31

Initial approximations and ordering II.2

  • Linearization via the first order truncation:

T(λ) ≈ T(λ0) + (λ − λ0)T ′(λ0) ≡

  • K(λ0) − λ
  • M(λ0),
  • Initial approximations:

selected eigenpairs (θℓ, vℓ) of (

  • K(σ2

0),

  • M(σ2

0))

  • Ordering: selected eigenpairs (θℓ, vℓ) of

|θ1/2

1

− σ0| ≤ |θ1/2

2

− σ0| ≤ · · · ≤ |θ1/2

n

− σ0|.

  • Example:

T(λ)x =

  • K − λM + i
  • (λ − σ2

1) W1

  • x = 0 ⇔ QEP

Waveguide Cavity

Ω Γ

E

Γ

M

Γ

V

slide-32
SLIDE 32

Initial approximations and ordering II.2

150 200 250 300 350 400 5 10 15 20 25 30 Real part of square root of eigenvalues Imaginary part of square root of eigenvalues Qe=10 σ Region of interest 1 2 3 4 5 6 7 8 9 10 shift point

  • : “exact” eigenvalues;

×: initial approximations

slide-33
SLIDE 33

NEP methods for T(λ)x = 0 II.3

  • Newton’s methods:

Inverse ITeration (IIT), Residual Inverse ITeration (RIIT) Nonlinear Rayleigh Quotient Iteration (NRQI) Safeguarded Newton method

  • Linearization Methods:

Picard iteration, Self-Consistent Iteration (SCI) Successive Linear Approximation Method (SLAM)

  • “Nonlinear” Subspace Projection Methods ←

− this talk

Nonlinear Arnoldi method Nonlinear Rayleigh-Ritz ITerative (NRRIT)

Must-read papers:

  • A. Ruhe, Algorithms for the nonlinear eigenvalue problem, SIAM J. Numer. Anal, Vol.10,

No.4, pp.674–689, 1973

  • V. Mehrmann and H. Voss, Nonlinear eigenvalue problems: a challenge for modern eigenvalue

methods, 2004 (http://www.tu-harburg.de/mat/hp/voss)

slide-34
SLIDE 34

Nonlinear Rayleigh-Ritz ITerative (NRRIT) technique II.4 NRRIT framework:

  • 1. Select a proper projection subspace V.
  • 2. Compute a proper pair (θ, z) satisfying the Galerkin condition:

T(θ)z ⊥ V for z ∈ V.

  • 3. Expand or restart the projection subspace V

Origin: Nonlinear Arnoldi method [Betcke and Voss,’04], [Voss,’04]

slide-35
SLIDE 35

NRRIT – matrix version II.4

  • 1. Compute an orthonormal basis Q of the properly selected subspace V (and

initial approximations (θℓ, vℓ))

  • 2. Compute an eigenpair (θ, g) with initials (θℓ, QTvℓ) of the reduced NEP

TQ(θ)g = 0, where TQ(θ) = QHT(θ)Q

  • 3. Compute Ritz pairs (θ, Qg) of the original NEP
  • 4. Expand or restart Q as necessary
slide-36
SLIDE 36

NRRIT in practice II.4 Key issues:

  • 1. what’a proper initial projection subspace V?
  • 2. what’s the order to compute the desired eigenvalues (proper ordering of the

initial approximations)?

  • 3. how to expand or restart Q when necessary?
slide-37
SLIDE 37

Main features of our implementation of NRRIT II.4

  • 1. Proper initial approximations (θℓ, vℓ):

K − λM vs.

  • K(σ2

0) − λ

  • M(σ2

0)

  • 2. Ordering (θℓ, vℓ) by

|θ1/2

1

− σ0| ≤ |θ1/2

2

− σ0| ≤ · · · ≤ |θ1/2

n

− σ0|.

  • 3. Preserving real symmetry with the projection matrix

Q = orth(

  • Re(V ) Im(V )
  • ),
  • 4. Structure-preserved reduced NEP

TQ(θ)y = 0, where TQ(θ) = QTT(θ)Q = KQ − θMQ + i

p

  • j=1 (θ − σ2

j)

1 2WQ,j

with KQ = QHKQ, MQ = QTMQ and WQ,j = QTWjQ.

slide-38
SLIDE 38
  • 5. Subspace expansion: with the current Ritz pair (σ, z),

v := v − QQTv, v = T −1(σ2

0)T(θ)z

and Q :=

  • Q

Re(v) Im(v)

  • 6. Restarting and purging:

(a) restart with Q by replacing vℓ ← − z (b) purge all vectors v induced by the initial (θℓ, vℓ) leading to the failure of convergence

  • 7. Reference:

Ben-Shan Liao, Subspace projection methods for model order reduction and nonlinear eigenvalue computation, PhD Thesis, UC Davis, 2007

slide-39
SLIDE 39

Numerical experiements II.5 Example 1: the NEP with one cutoff values, N = 10102 T(λ)x =

  • K − λM + i
  • (λ − σ2

1)W1+

  • x = 0,

Equivalent Quadratic Eigenvalue Problem: (ν2M + νW +

  • K)x = 0

where ν = (λ − σ2

1)1/2, W = iW1 and

  • K = σ2

1M − K.

λ1 SOAR (1.9474685976542845+E02, 4.6015939351662103E-01) NRRIT (1.9474685976743996+E02, 4.6015939593544449E-01) λ2 SOAR . . . . . . NRRIT . . . . . .

slide-40
SLIDE 40

Numerical experiements II.5 Example 2: the NEP with two cutoff values T(λ)x =

  • K − λM + i
  • (λ − σ2

1)W1 + i

  • (λ − σ2

2)W2

  • x = 0,

N = 9956, nnz(K) = nnz(M) = 148, 318. nnz(W1) = 57, nnz(W2) = 293.

Waveguide Open Cavity

Ω Γ

E

Γ

M

Γ

V

Γ

V

slide-41
SLIDE 41

Numerical experiements II.5 Example 2: CPU in second: IIT SLAM NRRIT ǫ Iters E-Time Iters E-Time Iters E-Time 10−8 15 71.98 12 79.48 46 22.95 10−10 18 85.92 16 96.19 69 32.91 10−12 22 104.47 20 115.33 87 42.08 10−14 24 113.97 21 120.22 114 57.59 IIT = Inverse ITeration SLAM = Successive Linear Approximation Method Convergence threshold: T(θ)Qgℓ T(θ) Qgℓ ≤ ǫ for ℓ = 1, 2, . . . , 10

slide-42
SLIDE 42

Ongoing work II.6

  • Understanding of NRRIT: subspace expansion, restarting, convergence analysis
  • Accuracy of computed eigenvalues (external Qe-values)
  • Verification: how to enumerate eigenvalues and detect missing ones?
  • Recent work of Huang and Su on NEP arising from fiber optic design

[Kaufman ’06]:

  • A + s(λ)uuT
  • x = λx
  • Large-scale NEP (cut-off values p = 8 ∼ 24, DOFs = 106 ∼ 107)
  • SLAC’s Omega3P package for next-generation accelerator design (DOE

SciDAC)