On optimal short recurrences for generating orthogonal Krylov - - PowerPoint PPT Presentation

on optimal short recurrences for generating orthogonal
SMART_READER_LITE
LIVE PREVIEW

On optimal short recurrences for generating orthogonal Krylov - - PowerPoint PPT Presentation

On optimal short recurrences for generating orthogonal Krylov subspace bases J org Liesen based on joint work with Zden ek Strako s and Petr Tich y (Czech Academy of Sciences), Vance Faber (BD Biosciences, Seattle), Beresford


slide-1
SLIDE 1

On optimal short recurrences for generating

  • rthogonal Krylov subspace bases

  • rg Liesen

based on joint work with Zdenˇ ek Strakoˇ s and Petr Tich´ y (Czech Academy of Sciences), Vance Faber (BD Biosciences, Seattle), Beresford Parlett (Berkeley), and Paul Saylor (Illinois)

slide-2
SLIDE 2

Overview

  • 1. Introduction: Krylov subspace methods
  • 2. Optimal short recurrences
  • 3. Characterization and examples
slide-3
SLIDE 3

Introduction: Krylov subspace methods (1)

  • Methods that are based on

K(A, v) ≡ span{v, Av, . . . , A−v}, n = 1, 2, . . . , where A is a given square matrix and v is the initial vector.

  • Must generate bases of K(A, v), n = 1, 2, . . . .
  • Trivial choice: v, Av, . . . , A−v.

This is computationally infeasible (recall the Power Method).

slide-4
SLIDE 4

Introduction: Krylov subspace methods (1)

  • For numerical stability: Well conditioned basis.
  • For computational efficiency: Short recurrence.
  • Best of both worlds: .
  • First such method for Ax = b:

Conjugate gradient (CG) method of Hestenes and Stiefel (1952).

  • Methods that are based on

K(A, v) ≡ span{v, Av, . . . , A−v}, n = 1, 2, . . . , where A is a given square matrix and v is the initial vector.

  • Must generate bases of K(A, v), n = 1, 2, . . . .
  • Trivial choice: v, Av, . . . , A−v.

This is computationally infeasible (recall the Power Method).

slide-5
SLIDE 5

Introduction: Krylov subspace methods (2)

The classical CG method

  • f Hestenes and Stiefel

(US National Bureau of Standards Preprint No. 1659, March 10, 1952)

The residual vectors r, r, .. . , r− are generated by a short recurrence and form an orthogonal basis of K(A, r).

slide-6
SLIDE 6

Introduction: Krylov subspace methods (3)

  • CG is for symmetric positive definite A.
  • (Paige and Saunders, 1975):

Short recurrence & orthogonal basis methods for symmetric A.

slide-7
SLIDE 7

Introduction: Krylov subspace methods (4)

  • By the end of the 1970s it was unknown if such methods existed

also for general unsymmetric A.

  • at Gatlinburg VIII

(now Householder VIII) held in Oxford from July 5 to 11, 1981:

slide-8
SLIDE 8

Introduction: Krylov subspace methods (5)

  • We want to solve Ax = b iteratively, starting from x.
  • Step n = 1, 2, . . . : x = x− + α−p−,

direction vector p−, scalar α− (both to be determined).

  • Krylov subspace method:

span{p, . . . , p−} = K(A, v) (v = r = b − Ax).

  • CGClike descent method:

Error is minimized in some given inner product norm, = ,

.

slide-9
SLIDE 9

Introduction: Krylov subspace methods (5)

  • We want to solve Ax = b iteratively, starting from x.
  • Step n = 1, 2, . . . : x = x− + α−p−,

direction vector p−, scalar α− (both to be determined).

  • Krylov subspace method:

span{p, . . . , p−} = K(A, v) (v = r = b − Ax).

  • CGClike descent method:

Error is minimized in some given inner product norm, = ,

.

  • x − x is minimal iff x − x ⊥ span{p, . . . , p−}.
  • By construction, this is satisfied iff

α− = x − x−, p− p−, p− and p−, p = 0, j = 0, . . . , n − 2, i.e. p, . . . , p− B K(A, v)

slide-10
SLIDE 10

Introduction: Krylov subspace methods (6)

  • Faber and Manteuffel answered Golub’s question in 1984:

For a general matrix A there exists CGClike descent method

slide-11
SLIDE 11

Optimal short recurrences (1)

Notation:

  • Matrix A ∈ ×, nonsingular.
  • Matrix B ∈ ×, Hermitian positive definite (HPD),

defining the BCinner product, x, y ≡ y∗Bx.

  • Initial vector v ∈ .
  • d = d(A, v), the grade of v with respect to A,

K(A, v) ⊂ . . . ⊂ K(A, v) = K(A, v) = . . . = K(A, v) .

slide-12
SLIDE 12

Optimal short recurrences (1)

Notation:

  • Matrix A ∈ ×, nonsingular.
  • Matrix B ∈ ×, Hermitian positive definite (HPD),

defining the BCinner product, x, y ≡ y∗Bx.

  • Initial vector v ∈ .
  • d = d(A, v), the grade of v with respect to A,

K(A, v) ⊂ . . . ⊂ K(A, v) = K(A, v) = . . . = K(A, v) .

Our goal: Generate a BCorthogonal basis v, . . . , v of K(A, v).

  • 1. span{v, . . . , v} = K(A, v),

for n = 1, . . . , d,

  • 2. v, v = 0,

for j = k, j, k = 1, . . . , d.

slide-13
SLIDE 13

Optimal short recurrences (2)

  • Standard way for generating the BCorthogonal basis: Arnoldi’s method.

v = Av −

  • hv ,

n = 1, . . . , d − 1 , h = Av, v v, v , d = dim K(A, v) . (No normalization for convenience.)

slide-14
SLIDE 14

Optimal short recurrences (2)

matrix of size d × (d − 1)

  • Standard way for generating the BCorthogonal basis: Arnoldi’s method.

v = Av −

  • hv ,

n = 1, . . . , d − 1 , h = Av, v v, v , d = dim K(A, v) . (No normalization for convenience.)

  • Rewritten in matrix notation: AV− = VH−, where

V ≡ [v, . . . , v], H− ≡ h h− 1 ... . . . ... h−− 1 V ∗

BV is diagonal ,

d = dim K(A, v) .

slide-15
SLIDE 15
  • The in Arnoldi’s method,

v = Av −

  • hv ,

n = 1, . . . , d − 1 , is an s + 2 when v = Av −

hv , n = 1, . . . , d − 1 .

Optimal short recurrences (3)

  • For s = 1: Optimal 3Cterm recurrence,

v = Av − hv − h−v−

  • Why ?

! "

slide-16
SLIDE 16

Optimal short recurrences (4)

s

  • d − s − 2

                 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ... . . . ... ... ... ... ... ... ... ∗ ... ... ... . . . ... ... ∗ ... ∗ ∗                  

largest upper triangle that is zero

Optimal (s + 2)Cterm recurrence: H− (s + 2)

slide-17
SLIDE 17

Optimal short recurrences (4)

s

  • d − s − 2

                 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ... . . . ... ... ... ... ... ... ... ∗ ... ... ... . . . ... ... ∗ ... ∗ ∗                  

(L. and Strakoˇ s, 2007) Given A, B as above and a nonnegative integer s with s + 2 ≤ d(A). (d(A) = degree of A’s minimal polynomial.) Then A B (s + 2) , if

  • for any v the matrix H− is at most (s + 2)Cband Hessenberg, and
  • for at least one v the matrix H− is (s + 2)Cband Hessenberg.

largest upper triangle that is zero

Optimal (s + 2)Cterm recurrence: H− (s + 2)

slide-18
SLIDE 18

Optimal short recurrences (5)

  • H− is at most (s + 2)Cband Hessenberg if

h = Av, v v, v = 0 , for n > m + s , n = 1, . . . , d − 1 .

  • Therefore: h = 0

⇐ ⇒ 0 = Av, v = v, Av, where A ≡ B−A∗B is the B A.

slide-19
SLIDE 19

Optimal short recurrences (5)

  • If A = p(A) for a polynomial of degree s, then Av ∈ K(A, v).
  • Then for n > m + s:

v ⊥ Av ⇔ h = 0.

  • Hence: If A = p(A), then H− is at most (s + 2)Cband Hessenberg.
  • The condition A = p(A) is in this context.
  • H− is at most (s + 2)Cband Hessenberg if

h = Av, v v, v = 0 , for n > m + s , n = 1, . . . , d − 1 .

  • Therefore: h = 0

⇐ ⇒ 0 = Av, v = v, Av, where A ≡ B−A∗B is the B A.

slide-20
SLIDE 20

Optimal short recurrences (6)

  • If A = p(A), where p is a polynomial of the smallest possible degree s,

then A is called BCnormal(s).

(Faber and Manteuffel, 1984) For A, B as above, and a nonnegative integer s with s + 2 < d(A): A admits for the given B an optimal (s + 2)Cterm recurrence if and only if A is BCnormal(s).

  • This is a (generalized) characterization of normality.
  • Sufficiency is rather straightforward, necessity .

Key words from the proof of necessity in (Faber and Manteuffel, 1984) inC clude: “continuous function” (analysis), “closed set of smaller dimension” (topology), “wedge product” (multilinear algebra).

  • In (Faber, L., Tich´

y, 2007) we give ! ! , both using only “elementary” tools.

slide-21
SLIDE 21

Optimal short recurrences (7)

In (Faber, L., Tich´ y, 2007) we explain why necessity is so difficult to prove:

  • Assumption: For any v,

(1) A [v, . . . , v−] = [v, . . . , v−, v] H−, where H− is (s + 2)Cband Hessenberg. To prove: A is BCnormal(s).

slide-22
SLIDE 22

Optimal short recurrences (7)

In (Faber, L., Tich´ y, 2007) we explain why necessity is so difficult to prove:

  • Assumption: For any v,

(1) A [v, . . . , v−] = [v, . . . , v−, v] H−, where H− is (s + 2)Cband Hessenberg. To prove: A is BCnormal(s).

  • By construction, Av ∈ K(A, v), i.e.

Av =

  • hv.

Adding this in (1) gives (2) A [v, . . . , v−, v] = [v, . . . , v−, v] H. " ! # $! ! H (s + 2)

slide-23
SLIDE 23

Optimal short recurrences (7)

In (Faber, L., Tich´ y, 2007) we explain why necessity is so difficult to prove:

  • Assumption: For any v,

(1) A [v, . . . , v−] = [v, . . . , v−, v] H−, where H− is (s + 2)Cband Hessenberg. To prove: A is BCnormal(s).

  • By construction, Av ∈ K(A, v), i.e.

Av =

  • hv.

Adding this in (1) gives (2) A [v, . . . , v−, v] = [v, . . . , v−, v] H. " ! # $! ! H (s + 2)

  • H is the matrix representation of A : K(A, v) → K(A, v)

with respect to the orthonormal basis v, . . . , v.

  • Problem: % A& !

$! '

slide-24
SLIDE 24

Characterization and examples (1)

  • In practice A is given and we ask:

Does there exist an HPD B such that A is BCnormal(s) with small s?

  • Standard example: A = A∗ and B = I, then A = p(A) for p(z) = z.
slide-25
SLIDE 25

Characterization and examples (1)

  • In practice A is given and we ask:

Does there exist an HPD B such that A is BCnormal(s) with small s?

  • Standard example: A = A∗ and B = I, then A = p(A) for p(z) = z.
  • More interesting example: Saddle point matrices.

A = A A

  • −A

A , B = B(γ) = A − γI A

  • A

γI − A , where A = A

> 0, A = A ≥ 0, and A ∈ × has full rank k.

  • A is a standard saddle point matrix with negated second block row.
  • Transformation: symmetric indefinite into nonsymmetric positive real.
  • (L. and Parlett, 2007)

B = B > 0 holds if A < (λ(A) − γ) (γ − λ(A)), and in this case A is BCnormal(1).

#$ % & ' ($ )& *

slide-26
SLIDE 26

100 200 300 400 500 600 10

−10

10

−9

10

−8

10

−7

10

−6

10

−5

10

−4

10

−3

10

−2

10

−1

10

Characterization and examples (2)

New CG method vs. MATLAB's MINRES for a Stokes test problem from IFISS

  • solid : normalized MINRES residual norms
  • solid : normalized CG residual norms
  • dashed : normalized CG error norms
  • In (L. and Parlett, 2007) a corresponding 3Cterm CG method is

constructed and analyzed. This method appears to be competitive with MINRES:

slide-27
SLIDE 27

Characterization and examples (3)

(: (L. and Strakoˇ s, 2007) A is BCnormal(s) if and only if

  • 1. A ( with the eigendecomposition A = WΛW −, and
  • 2. B = (WDW ∗)−, where D is HPD and block diagonal

with blocks corresponding to those of Λ, and

  • 3. Λ∗ = p(Λ) for a polynomial p of (smallest possible) degree s.
slide-28
SLIDE 28

Characterization and examples (3)

(: (L. and Strakoˇ s, 2007) A is BCnormal(s) if and only if

  • 1. A ( with the eigendecomposition A = WΛW −, and
  • 2. B = (WDW ∗)−, where D is HPD and block diagonal

with blocks corresponding to those of Λ, and

  • 3. Λ∗ = p(Λ) for a polynomial p of (smallest possible) degree s.
  • Every A = WΛW − is BCnormal(s) for HPD B and s,

but no optimal short recurrence in the nondiagonalizable case.

  • s is the smallest degree of a polynomial p for which p(Λ) = Λ∗.
slide-29
SLIDE 29

Characterization and examples (4)

(Faber and Manteuffel, 1984; Khavinson and ´ Swi¸ atek, 2003)

  • 1. s = 1 if and only if the eigenvalues of A lie on a line in (are collinear).
  • 2. If the eigenvalues of A are ,

the shortest optimal recurrence A may admit for any HPD B has length at least d(A)/3 + 4. = ⇒ Except for a few unimportant cases, the length of the optimal recurrence is either 3 or d(A) − 1. = ⇒ Overabundant supply of Krylov subspace methods for general A.

slide-30
SLIDE 30

Characterization and examples (4)

(Faber and Manteuffel, 1984; Khavinson and ´ Swi¸ atek, 2003)

  • 1. s = 1 if and only if the eigenvalues of A lie on a line in (are collinear).
  • 2. If the eigenvalues of A are ,

the shortest optimal recurrence A may admit for any HPD B has length at least d(A)/3 + 4. = ⇒ Except for a few unimportant cases, the length of the optimal recurrence is either 3 or d(A) − 1. = ⇒ Overabundant supply of Krylov subspace methods for general A.

Options for Ax = b in case the eigenvalues of A are not collinear:

  • Orthogonality but full recurrence (GMRES).
  • Short recurrence but no orthogonality (BiCG, QMR, etc.).
  • “Preconditioners” P so that P A is BCnormal(1) for some B,

e.g. (Concus and Golub, 1978) and (Widlund, 1978).

  • ) * +
slide-31
SLIDE 31

Characterization and examples (5)

  • (Gragg, 1982) discovered the + :

Orthogonal Krylov subspace bases for unitary A can be generated by a (nonCoptimal) 3Cterm recurrence of the form v = Av − βAv− − βv (stable implementation is in form of two coupled 2Cterm recurrences).

  • This algorithm is used for solving unitary eigenvalue problems and linear

systems with shifted unitary matrices.

slide-32
SLIDE 32

Characterization and examples (5)

  • Generalized in (Barth and Manteuffel, 2000):

If A = r(A) for r = p/q, where p and q have degrees s and t, a BCorthogonal Krylov subspace can be generated by a (nonCoptimal) recurrence of length at most t + s + 2.

  • In case A is unitary: A∗ = A−, hence p(z) = 1, q(z) = z.
  • (Gragg, 1982) discovered the + :

Orthogonal Krylov subspace bases for unitary A can be generated by a (nonCoptimal) 3Cterm recurrence of the form v = Av − βAv− − βv (stable implementation is in form of two coupled 2Cterm recurrences).

  • This algorithm is used for solving unitary eigenvalue problems and linear

systems with shifted unitary matrices.

slide-33
SLIDE 33

Characterization and examples (5)

  • Are thre any other matrices A whose adjoint A (for some B) is a low

degree rational function in A?

  • (L., 2007)

There is an HPD B such that A = r(A) with deg r ≡ max{deg p, deg q} if and only if either d(A) is small,

  • r A is diagonalizable with collinear or concyclic eigenvalues.

More precisely: For diagonalizable A with n ≥ 4 distinct eigenvalues that are neither collinear nor concyclic, d(A) > n 5 , d(A) > n 3 , and 1 ≤ d(A) d(A) < 5.

slide-34
SLIDE 34

Concluding remarks

  • Completely reworked the theory of short recurrences for generating orC

thogonal Krylov subspace bases; new, mathematically rigorous definitions

  • f all important concepts have been given.
  • In particular, here we discussed:

new proofs of the fundamental theorem of Faber and Manteuffel, a new 3Cterm CG method for saddle point matrices, the existence of alternative (isometric Arnoldi style) recurrences.

  • Visit for the related papers:

J.L. and P. Saylor, , SINUM 42 (2005). J.L. and Z. Strakoˇ s, , to appear in SIREV. J.L., ' ! ', to appear in SIMAX.

  • V. Faber, J.L. and P. Tich´

y, , -./ , , submitted. J.L. and B. N. Parlett, ! , submitted.