Modern Krylov subspace methods (and applications to parabolic - PowerPoint PPT Presentation

Modern Krylov subspace methods (and applications to parabolic control problems) Daniel B. Szyld Temple University, Philadelphia ————————— Scientific and Statistical Computing Seminar University of Chicago 25 October 2012 1

Collaborators Xiuhong Du, Alfred University Marcus Sarkis, Worcester Poly. Inst., and IMPA, Rio de Janeiro Christian Schaerer, Universidad Nacional de Asunci´ on Valeria Simoncini, Universit` a di Bologna 2

Plan for the talk • Introduction to Truncated Krylov subspace methods • Inexact Krylov subspace methods: Introduction and Applications • Special case of parabolic control problems 3

General Problem Statement Solve a system Hx = b, H Hermitian or non-Hermitian using Krylov subspace iterative methods 4

Krylov subspace methods K m ( H, r 0 ) = span { r 0 , Hr 0 , H 2 r 0 , . . . , H m − 1 r 0 } . Subspaces are nested: K m ⊂ K m +1 . Given x 0 , r 0 = b − Hx 0 , find approximation x m ∈ x 0 + K m ( H, r 0 ) , satisfying some property. 5

Krylov subspace methods (cont.) Conditions: Galerkin, e.g., FOM, CG: b − Hx m ⊥ K m ( H, r 0 ) Petrov-Galerkin, e.g., GMRES, MINRES: b − Hx m ⊥ H K m ( H, r 0 ) or equivalently x m = arg min {� b − Hx � 2 } , x ∈ x 0 + K m ( H, r 0 ) 6

Krylov subspace methods (cont.) Some implementation issues • Methods work by suitably choosing a basis of K m ( H, r 0 ) • Let v 1 , v 2 , . . . , v m be such a basis, chosen to be orthonormal. • One can of course run Gram-Schmidt on { r 0 , Hr 0 , H 2 r 0 , . . . , H m − 1 r 0 } (not advised). Instead, Arnoldi[1951] (for general H ) said: v 1 = r 0 normalized v 2 = Hv 1 − � v 1 , Hv 1 � v 1 normalized, span { v 1 , v 2 } = span { r 0 , Hr 0 } v 3 = Hv 2 − � v 2 , Hv 2 � v 2 − � v 1 , Hv 2 � v 1 normalized, span { v 1 , v 2 , v 3 } = span { r 0 , Hr 0 , H 2 r 0 } etc. 7

Arnoldi method Let β = � r 0 � , and v 1 = r 0 /β . For k = 1 , . . . v k +1 h k +1 ,k = Hv k − � k Compute Hv k , then j =1 v j h jk , where h jk = � v j , Hv k � , j ≤ k , and h k +1 ,k is positive and such that � v k +1 � = 1 . In practice: Modified Gram-Schmidt or Householder orthogonalization With V m = [ v 1 , v 2 , . . . , v m ] , obtain Arnoldi relation: HV m = V m +1 H m +1 ,m H m +1 ,m is ( m + 1) × m upper Hessenberg 8

Arnoldi relation (cont.)   h 11 h 12 h 13 · · · h 1 m   h 21 h 22 h 23 · · · h 2 m     . ... ...   . .     H m +1 ,m = .   ... ... .   .       h m,m − 1 h mm     h m +1 ,m   H m =   h m +1 ,m e T m HV m = V m +1 H m +1 ,m = V m H m + h m +1 ,m v m +1 e T m 9

Krylov subspace methods (cont.) More implementation issues Element in K m ( H, v 1 ) is a linear combination of v 1 , v 2 , . . . , v m , i.e., of the form V m y , y ∈ R m Each method finds y = y m and we have x m = V m y m For FOM, or CG, we have the Galerkin condition r m ⊥ K m , i.e., 0 = V T m ( b − Hx m ) = V T m b − V T m HV m y m = βe 1 − H m y m 10

FOM: Full Orthogonalization Method [Saad, 1978] Use Arnoldi methods to construct V m , H m , Solve H m y m = βe 1 Approximation is x m = V m y m . Test for convergence: Is � r m � < ε ? Cost: one matrix-vector product per step, at step m , orthogonalization ( m inner-products), one solution of small m × m upper Hessenberg matrix (LU with no fill). ( � r m � cheap or free - no details here). Storage: at step m , m vectors v 1 , v 2 , . . . , v m . 11

GMRES implementation We use Arnoldi methods to obtain V m , and as before, element in K m ( H, v 1 ) is of the form V m y , y ∈ R m . Recall: β = � r 0 � , HV m = V m H m +1 ,m b − Hx m = r 0 − HV m y m = βv 1 − V m +1 H m +1 ,m y = V m +1 ( βe 1 − H m +1 ,m y ) � r m � = min x ∈K m � b − Hx � = min y ∈ R m � βe 1 − H m +1 ,m y �    R m QR factorization H m +1 ,m = Q m +1 R m +1 ,m , R m +1 ,m =  , 0 y ∈R m � Q T � r m � = min m +1 βe 1 − R m +1 ,m y � . 12

y ∈R m � Q T � r m � = min m +1 βe 1 − R m +1 ,m y � .   t m Q T m +1 βe 1 =   ρ m +1 Then, y m = R − 1 m t m , and x m = V m y m . Furthermore, � r m � = � Q T m +1 βe 1 − R m +1 ,m y m � = | ρ m +1 | Use this computed residual for stopping (may deviate from true residual!) Use Givens rotations for QR, save rotations from previous steps. Only two entries per step needed. 13

GMRES Cost: one matrix-vector product per step, at step m , orthogonalization ( m inner-products), rotations for QR, solution of m × m triangular system. Storage: at step m , m vectors v 1 , v 2 , . . . , v m . Residual norms monotonically nonincreasing, but stagnation possible. Superlinear convergence can be observed. 14

• Main costs: 1. Matrix-vector product: Hv k 2. Orthogonalization 3. Storage (if there is no recursion) • One popular alternative: Restarted methods. Hit and miss. Little theory. [Saad, 1996], [Simoncini, 2000] • It may happen than larger value of restart parameter is less efficient! [Embree, 2003] 15

This Talk • Consider the case when one does not fully orthogonalize: Truncated methods. • Reduce the cost of matrix-vector product when H is either – Not known exactly – Computationally expensive (e.g., Schur complement, reduced Hessian) – Preconditioned with variable matrix (i.e., iteration dependent) • Apply all this to Parabolic Control Problems • Use a Parareal-in-time approximation 16

Truncated Krylov subspace methods For the same amount of storage (max ℓ vectors), instead of restarting: Only orthogonalize with respect of the previous ℓ vectors. In Arnoldi we have then: k � v k +1 h k +1 ,k = Hv k − v j h jk , j =max { 1 ,k − ℓ +1 } where h jk = � v j , Hv k � , j ≤ k , and h k +1 ,k is positive and such that � v k +1 � = 1 . Only need to store these previous ℓ vectors. 17

Upper Hessenberg matrix is now banded (upper bandwidth ℓ ).   h 11 h 12 h 13   h 21 h 22 h 23     ... ...       H m +1 ,m =   ... ...         h m,m − 1 h mm     h m +1 ,m 18

Truncated Krylov subspace methods (cont.) • Truncated GMRES [Saad and Wu, 1996] Truncated FOM [Saad, 1981], [Jia, 1996] • Basis of K m ( H, r 0 ) , v 1 , v 2 , . . . , v m ( m > ℓ ) is not orthogonal, but x m ∈ K m ( H, r 0 ) , and minimization (or Galerkin condition) is over the whole space. • H m +1 ,m banded with upper semiband ℓ − 2 . Matrix with basis vectors V m not orthogonal. Can be implemented so that only O( ℓ ) vectors are stored. • Extreme case, ℓ = 3 , H m +1 ,m tridiagonal. If H is SPD, FOM reduces to CG (and V m automatically orthogonal). • Theory for “non-optimal methods” [Simoncini and Szyld, 2005] 19

Example: L ( u ) = − u xx + − u yy + 100( x + y ) u x + 100( x + y ) u y , on [0 , 1] 2 , Dirichlet b.c., centered 5 pts. discretization, n = 2500 . 1 1 10 10 0 0 10 10 −1 −1 10 10 −2 −2 10 10 −3 −3 10 10 −4 −4 10 10 −5 −5 10 10 −6 −6 10 10 −7 −7 10 10 −8 −8 10 10 −9 10 −9 10 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 5 10 15 20 25 30 35 40 45 flops 6 number of its. x 10 GMRES, Truncated ℓ = 3 . 20

Inexact Krylov subspace methods • At the k th iteration of the Krylov space method use ( H + D k ) v k − 1 instead of Hv k − 1 , where � D k � can be monitored • Two examples now: - Schur complement, where the inverse is approximated - Inexact preconditioning • [Bouras, Frayss´ e, and Giraud, CERFACS reports 2000, SIMAX 2005] show experimentally that as k progresses � D k � can be allowed to be larger; see also [Golub and Ye, 1999], [Notay, 1999], [Sleijpen and van der Eshof, 2004] and [Simoncini and Eld´ en, 2002] 21

Inexact Krylov (cont.) We repeat: � D k � small at first, � D k � can be big later. Convergence is maintained! • Instead of HV m = V m +1 H m +1 ,m we have now [( H + D 1 ) v 1 , ( H + D 2 ) v 2 , . . . , ( H + D m ) v m ] = V m +1 H m +1 ,m • Subspace spanned by v 1 , v 2 , . . . , v m is not a Krylov subspace, but V m orthogonal (in the full case) 22

Theorem for Inexact FOM [Simoncini and Szyld, 2003] True residual: r m = b − Hx m = r 0 − HV m y m Computed residual(e.g.): ˜ r m = r 0 − V m +1 H m +1 ,m y m = r 0 − W m y m Let ε > 0 . If for every k ≤ m , � D k � ≤ σ min ( H m ∗ ) 1 1 r k − 1 � ε ≡ ℓ F r k − 1 � ε , m m ∗ � ˜ � ˜ then � V T m r m � ≤ ε and � r m − ˜ r m � ≤ ε . m ∗ being the maximum number of iterations allowed Similar results for inexact GMRES see also [Giraud, Gratton, Langou, 2007] 23

Theorem for Inexact Truncated FOM � D k � ≤ σ min ( H m ∗ ) σ min ( V m ) 1 1 r k − 1 � ε ≡ ℓ T F r k − 1 � ε , m m ∗ � ˜ � ˜ implies � V T m r m � ≤ ε and δ m = � r m − ˜ r m � ≤ ε . Notes: • This result applies in particular to Inexact CG • ℓ m can be estimated from problem, if information is available. 24

Modern Krylov subspace methods (and applications to parabolic - PowerPoint PPT Presentation

Modern Krylov subspace methods (and applications to parabolic control problems) Daniel B. Szyld Temple University, Philadelphia Scientific and Statistical Computing Seminar University of Chicago 25 October 2012 1

Iterative Krylov Subspace Methods for Sparse Reconstruction James Nagy Mathematics and Computer

Multilevel Krylov Methods Deflation Deflation, DD, MG Reinhard Nabben Multilevel Krylov

Whats so great about Krylov subspaces? David S. Watkins Department of Mathematics Washington

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

Inverse Free Preconditioned Krylov Subspace Method for Symmetric Generalized Eigenvalue Problems

Iterative methods Krylov subspace methods for symmetric eigenvalue problems Jacobi-Davidson

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

Lanczos method 1D tight-binding model O(N) Krylov subspace method Applications

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

The use of stopping criteria for iterative Krylov methods in designing adaptive methods for PDEs

Moments, Krylov subspace methods and model reduction Zden ek Strako Academy of Sciences and

Analysis of Krylov subspace methods: Moments, error estimators, numerical stability and

Krylov subspace methods and exascale computations: good match or lost case? Zden ek Strako

Krylov subspace methods for Perron-Frobenius operators in RKHS Yuka Hashimoto Takashi Nodera

Exponential Integrators using Matrix Functions: Krylov Subspace Methods and Chebyshev Expansion

Structure-preserving Krylov subspace methods for Hamiltonian and symplectic eigenvalue problems

Week 2: to do Mathematics 3670: Computer Systems Bits, Data Types, and Operations What When

What is case? Nominative/accusative languages Many languages mark nouns or noun phrases with

Information Representation

Reasoning with OWL Olivier Dameron 1 , Alan Rector 2 , Nick Drummond 2 , Matthew Horridge 2 1) EA

Linear algebra fundamentals Matthew Macauley Department of Mathematical Sciences Clemson

Number systems Last lecture Course overview The Digital Age Todays lecture

Robust Preconditioning in Elasticity Joachim Sch oberl Center for Computational Engineering

Living on Zoom Living on Zoom CS 105 Tour of the Black Holes of Computing! Were