Modern Krylov subspace methods (and applications to parabolic - - PowerPoint PPT Presentation
Modern Krylov subspace methods (and applications to parabolic - - PowerPoint PPT Presentation
Modern Krylov subspace methods (and applications to parabolic control problems) Daniel B. Szyld Temple University, Philadelphia Scientific and Statistical Computing Seminar University of Chicago 25 October 2012 1
Collaborators Xiuhong Du, Alfred University Marcus Sarkis, Worcester Poly. Inst., and IMPA, Rio de Janeiro Christian Schaerer, Universidad Nacional de Asunci´
- n
Valeria Simoncini, Universit` a di Bologna
2
Plan for the talk
- Introduction to Truncated Krylov subspace methods
- Inexact Krylov subspace methods:
Introduction and Applications
- Special case of parabolic control problems
3
General Problem Statement Solve a system Hx = b, H Hermitian or non-Hermitian using Krylov subspace iterative methods
4
Krylov subspace methods Km(H, r0) = span{r0, Hr0, H2r0, . . . , Hm−1r0}. Subspaces are nested: Km ⊂ Km+1. Given x0, r0 = b − Hx0, find approximation xm ∈ x0 + Km(H, r0), satisfying some property.
5
Krylov subspace methods (cont.) Conditions: Galerkin, e.g., FOM, CG: b − Hxm ⊥ Km(H, r0) Petrov-Galerkin, e.g., GMRES, MINRES: b − Hxm ⊥ HKm(H, r0)
- r equivalently
xm = arg min{b − Hx2}, x ∈ x0 + Km(H, r0)
6
Krylov subspace methods (cont.) Some implementation issues
- Methods work by suitably choosing a basis of Km(H, r0)
- Let v1, v2, . . . , vm be such a basis, chosen to be orthonormal.
- One can of course run Gram-Schmidt on
{r0, Hr0, H2r0, . . . , Hm−1r0} (not advised). Instead, Arnoldi[1951] (for general H) said: v1 = r0 normalized v2 = Hv1 − v1, Hv1v1 normalized, span{v1, v2} = span{r0, Hr0} v3 = Hv2 − v2, Hv2v2 − v1, Hv2v1 normalized, span{v1, v2, v3} = span{r0, Hr0, H2r0} etc.
7
Arnoldi method Let β = r0, and v1 = r0/β. For k = 1, . . . Compute Hvk, then vk+1hk+1,k = Hvk − k
j=1 vjhjk,
where hjk = vj, Hvk, j ≤ k, and hk+1,k is positive and such that vk+1 = 1. In practice: Modified Gram-Schmidt or Householder orthogonalization With Vm = [v1, v2, . . . , vm], obtain Arnoldi relation: HVm = Vm+1Hm+1,m Hm+1,m is (m + 1) × m upper Hessenberg
8
Arnoldi relation (cont.) Hm+1,m = h11 h12 h13 · · · h1m h21 h22 h23 · · · h2m ... ... . . . ... ... . . . hm,m−1 hmm hm+1,m = Hm hm+1,meT
m
HVm = Vm+1Hm+1,m = VmHm + hm+1,mvm+1eT
m 9
Krylov subspace methods (cont.) More implementation issues Element in Km(H, v1) is a linear combination of v1, v2, . . . , vm, i.e., of the form Vmy, y ∈ Rm Each method finds y = ym and we have xm = Vmym For FOM, or CG, we have the Galerkin condition rm ⊥ Km, i.e., 0 = V T
m(b − Hxm) = V T m b − V T mHVmym
= βe1 − Hmym
10
FOM: Full Orthogonalization Method [Saad, 1978] Use Arnoldi methods to construct Vm, Hm, Solve Hmym = βe1 Approximation is xm = Vmym. Test for convergence: Is rm < ε? Cost: one matrix-vector product per step, at step m, orthogonalization (m inner-products),
- ne solution of small m× m upper Hessenberg matrix (LU with no fill).
(rm cheap or free - no details here). Storage: at step m, m vectors v1, v2, . . . , vm.
11
GMRES implementation We use Arnoldi methods to obtain Vm, and as before, element in Km(H, v1) is of the form Vmy, y ∈ Rm. Recall: β = r0, HVm = VmHm+1,m b − Hxm = r0 − HVmym = βv1 − Vm+1Hm+1,my = Vm+1(βe1 − Hm+1,my) rm = min
x∈Km b − Hx = min y∈Rm βe1 − Hm+1,my
QR factorization Hm+1,m = Qm+1Rm+1,m, Rm+1,m = Rm , rm = min
y∈Rm QT m+1βe1 − Rm+1,my. 12
rm = min
y∈Rm QT m+1βe1 − Rm+1,my.
QT
m+1βe1 =
tm ρm+1 Then, ym = R−1
m tm, and xm = Vmym.
Furthermore, rm = QT
m+1βe1 − Rm+1,mym = |ρm+1|
Use this computed residual for stopping (may deviate from true residual!) Use Givens rotations for QR, save rotations from previous steps. Only two entries per step needed.
13
GMRES Cost: one matrix-vector product per step, at step m, orthogonalization (m inner-products), rotations for QR, solution of m × m triangular system. Storage: at step m, m vectors v1, v2, . . . , vm. Residual norms monotonically nonincreasing, but stagnation possible. Superlinear convergence can be observed.
14
- Main costs:
- 1. Matrix-vector product: Hvk
- 2. Orthogonalization
- 3. Storage (if there is no recursion)
- One popular alternative: Restarted methods. Hit and miss.
Little theory. [Saad, 1996], [Simoncini, 2000]
- It may happen than larger value of restart parameter is less
efficient! [Embree, 2003]
15
This Talk
- Consider the case when one does not fully orthogonalize:
Truncated methods.
- Reduce the cost of matrix-vector product when H is either
– Not known exactly – Computationally expensive (e.g., Schur complement, reduced Hessian) – Preconditioned with variable matrix (i.e., iteration dependent)
- Apply all this to Parabolic Control Problems
- Use a Parareal-in-time approximation
16
Truncated Krylov subspace methods For the same amount of storage (max ℓ vectors), instead of restarting: Only orthogonalize with respect of the previous ℓ vectors. In Arnoldi we have then: vk+1hk+1,k = Hvk −
k
- j=max{1,k−ℓ+1}
vjhjk, where hjk = vj, Hvk, j ≤ k, and hk+1,k is positive and such that vk+1 = 1. Only need to store these previous ℓ vectors.
17
Upper Hessenberg matrix is now banded (upper bandwidth ℓ). Hm+1,m = h11 h12 h13 h21 h22 h23 ... ... ... ... hm,m−1 hmm hm+1,m
18
Truncated Krylov subspace methods (cont.)
- Truncated GMRES [Saad and Wu, 1996]
Truncated FOM [Saad, 1981], [Jia, 1996]
- Basis of Km(H, r0), v1, v2, . . . , vm (m > ℓ) is not orthogonal,
but xm ∈ Km(H, r0), and minimization (or Galerkin condition) is over the whole space.
- Hm+1,m banded with upper semiband ℓ − 2.
Matrix with basis vectors Vm not orthogonal. Can be implemented so that only O(ℓ) vectors are stored.
- Extreme case, ℓ = 3 , Hm+1,m tridiagonal.
If H is SPD, FOM reduces to CG (and Vm automatically
- rthogonal).
- Theory for “non-optimal methods” [Simoncini and Szyld, 2005]
19
Example: L(u) = −uxx + −uyy + 100(x + y)ux + 100(x + y)uy, on [0, 1]2, Dirichlet b.c., centered 5 pts. discretization, n = 2500.
5 10 15 20 25 30 35 40 45 10
−9
10
−8
10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10 10
1
number of its. 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10
6
10
−9
10
−8
10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10 10
1
flops
GMRES, Truncated ℓ = 3.
20
Inexact Krylov subspace methods
- At the kth iteration of the Krylov space method use
(H + Dk)vk−1 instead of Hvk−1, where Dk can be monitored
- Two examples now:
- Schur complement, where the inverse is approximated
- Inexact preconditioning
- [Bouras, Frayss´
e, and Giraud, CERFACS reports 2000, SIMAX 2005]
show experimentally that as k progresses Dk can be allowed to be larger; see also [Golub and Ye, 1999], [Notay, 1999],
[Sleijpen and van der Eshof, 2004] and [Simoncini and Eld´ en, 2002]
21
Inexact Krylov (cont.) We repeat: Dk small at first, Dk can be big later. Convergence is maintained!
- Instead of
HVm = Vm+1Hm+1,m we have now [(H + D1)v1, (H + D2)v2, . . . , (H + Dm)vm] = Vm+1Hm+1,m
- Subspace spanned by v1, v2, . . . , vm is not a Krylov subspace,
but Vm orthogonal (in the full case)
22
Theorem for Inexact FOM
[Simoncini and Szyld, 2003]
True residual: rm = b − Hxm = r0 − HVmym Computed residual(e.g.): ˜ rm = r0 − Vm+1Hm+1,mym = r0 − Wmym Let ε > 0. If for every k ≤ m, Dk ≤ σmin(Hm∗) m∗ 1 ˜ rk−1ε ≡ ℓF
m
1 ˜ rk−1ε , then V T
mrm ≤ ε and rm − ˜
rm ≤ ε. m∗ being the maximum number of iterations allowed Similar results for inexact GMRES see also [Giraud, Gratton, Langou, 2007]
23
Theorem for Inexact Truncated FOM Dk ≤ σmin(Hm∗)σmin(Vm) m∗ 1 ˜ rk−1ε ≡ ℓT F
m
1 ˜ rk−1ε , implies V T
mrm ≤ ε and δm = rm − ˜
rm ≤ ε. Notes:
- This result applies in particular to Inexact CG
- ℓm can be estimated from problem, if information is available.
24
First Experiment H = diag([10−4, 2, 3, · · · , 100]) Dk = symm [αkrandn(100, 100)] b = randn(100, 1) We chose ε = 10−8
- Our condition (e.g. for FOM)
Dk ≤ σmin(H) m∗ 1 ˜ rk−1ε is very conservative. In most cases it is too strict. However, σmin(H) does play a role.
25
CG: condition Dk ≤ σmin(H)
m∗ 1 ˜ rk−1ε
10 20 30 40 50 60 70 80 90 100 10
−15
10
−10
10
−5
10 10
5
||Dk|| ||Vk
Trk||
||rk|| number of iterations magnitude
V T
mrm ≪ ε 26
Applications:
- I. Schur complement systems
A B BT w x = f , BT A−1Bx = BTA−1f; Aw = f − Bx Hx = b A−1 not exactly (use Krylov method).
27
Applications: I. Schur complement systems (cont.)
- A−1 not exactly (use Krylov method).
- Replace Hv with Hv := BTz(k)
j
, where z(k)
j
is the approximation
- btained at the jth (inner) iteration of the solution to the equation
Az = Bv
- Question is then: How many inner iterations?
i.e., at what value of j stop? “Translate” conditions on Dk to conditions on norm of inner residual. Let rinner
k
= Az(k)
j
− Bv be the inner residual Take rinner
k
< σm⋆(Hm⋆) BT A−1m⋆ 1 ˜ rfom
k−1
ε ≡ εinner
28
20 40 60 80 100 120 10
−8
10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10 10
1
number of iterations magnitude
δm ||rm|| ||rm|| εinner
~
- Two-dim. saddle point magnetostatic problem from
[Perugia, Simoncini, Arioli, 1999], A is 1272 × 1272
- Inexact FOM, m⋆ = 120, ε = 10−4
29
Applications:
- II. Inexact Preconditioning
Hx = b − → HP−1¯ x = b, x = P−1¯ x P−1 not performed exactly (use Krylov method) HP−1vk replaced with H˜ zk, ˜ zk ≈ P−1vk Arnoldi relation HP−1Vm = Vm+1Hm+1,m is transformed into H[˜ z1, · · · , ˜ zm] = Vm+1Hm+1,m. Use Flexible Krylov subspace method rinner
k
= vk − P˜ zk inner residual rinner
k
≤ σm⋆(Hm⋆) HP−1m⋆ 1 ˜ rgm
k−1ε ≡ εinner 30
10 20 30 40 50 60 70 80 10
−14
10
−12
10
−10
10
−8
10
−6
10
−4
10
−2
10 10
2
number of iterations magnitude
||rm|| ||rm|| ~ εinner δm
For same 2D saddle point, use P = I BT B . Solve BT Bpk = rhs iteratively, m⋆ = 80, ε = 10−9, tolerance εinner
31
Some CPU Times: Same Magnetostatic 2D Problem Outer tolerance: ε = 10−8 rinner
k
≤ c0 router
m−1 ε ≡ εinner
c0: Constant estimated a–priori: Here we use 10−2 and 10−4. Elapsed Time CPU in seconds of a Sun Enterprise 4500 (Fortran code) (4 CPU 400MHertz, 2GBytes RAM) CG iterations. Problem Size Fixed Inner Tol
- Var. Inner Tol.
- Var. Inner Tol.
εinner = 10−10 10−10/r 10−12/r 3810 17.0 (54) 11.4 (54) 14.7 (54) 9102 82.9 (58) 62.8 (58) 70.7 (58) 14880 198.4 (54) 156.5 (54) 170.1 (54)
32
Applications:
- III. Parabolic Control Problems
Inverse problem: Recover control v(x) based on field (state) z(x) related by the forward problem zt + Az = v, xǫΩ z = g, xǫ∂Ω z = z0, xǫΩ/∂Ω, for t = 0 A elliptic, e.g., A = −△ This is a distributed control problem. Similar techniques for boundary control problems (control g).
33
Associated variational problem J(z(v), v) := α 2 tf
t0
z(v)(t, ·) − ˜ y(t, ·)2
L2(Ω)
+ β 2 z(v)(tf, ·) − ˜ y(tf, ·)2
L2(Ω) + γ
2 tf
t0
v(t, ·)2
L2(Ω).
discretized as Jτ
h(z, v) = 1
2 (z − ˜ y)T K(z − ˜ y) + 1 2 vT Gv + (z − ˜ y)T g.
34
Discretized forward problem (FD) Ez + Nv = f E = F1 −F0 F1 ... ... −F0 F1 , N = B B ... B
35
Optimization problem min Jτ
h(z, v)
subject to Ez + Nv = f Lagrangian: Jτ
h(z, v) + qT (Ez + Nv − f) 36
Linearize to obtain K ET G NT E N z u p = g f After elimination one obtains Hu := (G + NT E−T KE−1N)u = b H being the spd reduced Hessian
37
Hu = (G + NT E−T KE−1N)u = b We use inexact FOM, approximating each of the the systems with E and ET with a Parareal method with varying (increasing) tolerance. MVP Hv
- 1. Multiply Nv
- 2. Solve Ez = Nv by solving Ez = Nv with an inner tolerance ǫin1
- 3. Multiply Kz
- 4. Solve ET w = Kz by solving with an inner tolerance ǫin2
- 5. Compute NT w
38
Approximate solutions of systems with E or ET Hu = (G + NT E−T KE−1N)u = b We prove that with ǫin1 = ǫin2, we have spectral equivalence of the form (v, Gv) ≤ (v, Hv) ≤ µ(v, Gv) We choose FOM, since it reduces to CG when we have full symmetry.
39
Sample Experiment: 2D heat equation 15 × 15 grid. τ = 1/512, control u of order 115200 Same relative tolerance for both Parareal systems, outer ε = 10−6
ℓ(i)
m ǫ
IFOM TIFOM mT = 2 mT = 4 mT = 8 mT = 12 10−12 15(576) 16(610) 16(608) 15(576) 15(576) 10−10 15(482) 17(532) 17(528) 16(504) 15(482) 10−8 15(388) 17(426) 17(426) 17(420) 15(388) 10−7 15(340) 18(394) 17(374) 17(368) 15(340) 10−6 15(288) 19(340) 19(338) 18(320) 16(298) 10−5 17(238) 24(298) 21(266) 19(258) 19(242) 10−4 17(180) n.c. 22(210) 28(254) 20(192)
40
One surface of true and recovered model, and their difference increasing εinner = 10−5/˜ rk−1, mT = 8
2 4 6 8 10 12 2 4 6 8 10 12 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022
2 4 6 8 10 12 2 4 6 8 10 12 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022
2 4 6 8 10 12 2 4 6 8 10 12 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 x 10
−7
True Computed error O(10−7)
41
TIFOM Convergence curves
5 10 15 20 25 10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10 ICG TIFOM(4) TIFOM(8) TIFOM(12) TIFOM(20)
- uter ε = 10−6, inner factor = 10−5
42
Similar results for more non-symmetric
ℓ(1)
m ε
ℓ(2)
m ε
IFOM TIFOM o-iter. (i-iter.)
- -iter. (i-iter)
mT = 2 mT = 4 mT = 8 mT = 12 10−6 10−7 15(288) 18(330) 19(338) 17(310) 16(298) 10−6 10−6 15(288) 19(340) 19(338) 18(320) 16(298) 10−6 10−5 16(300) 23(390) 20(346) 19(334) 17(310) 10−6 10−4 16(300) 51(710) 37(520) 20(352) 17(310) 10−5 10−7 15(234) 19(270) 27(280) 18(254) 21(246) 10−5 10−6 15(234) 19(270) 29(284) 18(254) 19(242) 10−5 10−5 17(238) 24(298) 21(266) 19(258) 19(242) 10−5 10−4 17(240) 30(334) 31(356) 20(270) 19(244)
43
One surface of true and recovered model, and their difference, outer tolerance ε = 10−6 εin1 = 10−6/˜ rk−1, εin4 = 10−4/˜ rk−1, mT = 8
2 4 6 8 10 12 2 4 6 8 10 12 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022 2 4 6 8 10 12 2 4 6 8 10 12 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022 2 4 6 8 10 12 2 4 6 8 10 12 −8 −6 −4 −2 2 4 6 8 10 12 x 10
−8
True Computed error O(10−6)
44
Conclusions
- Inexact matrix-vector product (or inexact preconditioning)
might be worth trying for your problem
- Truncated methods
might be worth trying for your problem
- New work on inexact and truncated for parabolic control problems