[PPT] - A Chebyshev-based two-stage iterative method as an alternative to PowerPoint Presentation

SLIDE 1

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems

Mario Arioli

m.arioli@rl.ac.uk

CCLRC-Rutherford Appleton Laboratory with Daniel Ruiz (E.N.S.E.E.I.H.T)

Cortona, 20-24 September, 2004 – p.1/30

SLIDE 2

Outline

Problems with Preconditioning Techniques for improving the solvers The two phase algorithm Error Analysis Influence of the different parameters Numerical experiments

Cortona, 20-24 September, 2004 – p.2/30

SLIDE 3

Problems with Preconditioning

Au = b A ∈ I RN×N SPD

Cortona, 20-24 September, 2004 – p.3/30

SLIDE 4

Problems with Preconditioning

Au = b A ∈ I RN×N SPD

M = RTR a SPD preconditioner R−TAR−1Ru = R−Tb

A = R−TAR−1, x = Ru, ˆ

b = R−Tb.

Cortona, 20-24 September, 2004 – p.3/30

SLIDE 5

Problems with Preconditioning

Au = b A ∈ I RN×N SPD

M = RTR a SPD preconditioner R−TAR−1Ru = R−Tb

A = R−TAR−1, x = Ru, ˆ

b = R−Tb. Good clusterization but problem still ill-conditioned

Cortona, 20-24 September, 2004 – p.3/30

SLIDE 6

A PDE example

Ω ⊂ I R2 simply connected bounded polygonal. Let a(u, v) ∀u, v ∈ H1

0(Ω)

be a continuous and coercive bilinear form, and L(v) ∈ H−1(Ω). The problem

Find u ∈ H1

0(Ω) such that

a(u, v) = L(v), ∀v ∈ H1

0(Ω),

has a unique solution.

Cortona, 20-24 September, 2004 – p.4/30

SLIDE 7

A PDE example

a(u, v) =

Ω

K(x)∇u · ∇vdx, ∀u, v ∈ H1

0(Ω)

L(v) =

Ω 10vdx,

∀v ∈ H1

0(Ω)

*

Cortona, 20-24 September, 2004 – p.5/30

SLIDE 8

A PDE example

a(u, v) =

Ω

K(x)∇u · ∇vdx, ∀u, v ∈ H1

0(Ω)

L(v) =

Ω 10vdx,

∀v ∈ H1

0(Ω) Ω Ω Ω

1 2

✁

✂☎✄ ✁ ✂✆

(0,1)

✁

✂☎✄ ✂✆

(1,1)

✝ ✄ ✁ ✂ ✆

(0,0)

K(x) =      1 x ∈ Ω \ {Ω1 ∪ Ω2}, 106 x ∈ Ω1, 104 x ∈ Ω2. *

Cortona, 20-24 September, 2004 – p.5/30

SLIDE 9

A PDE example

Using pdetool c

, we generated a mesh satisfying the usual regularity

conditions of Ciarlet and we computed a finite-element approximation of the problem with the use of continuous piece-wise linear elements. The approximated problem is equivalent to the following system of linear equations: Au = b. In our mesh, the largest triangle has an area of 3.123×10−4, therefore, the resulting linear system has 16256 triangles, 8289 nodes, and 7969 degrees of freedom.

Cortona, 20-24 September, 2004 – p.6/30

SLIDE 10

A PDE example

We used three kinds of preconditioners: the classical Jacobi diagonal matrix, M = diag(A), the incomplete Cholesky decomposition of A with zero fill-in, and the incomplete Cholesky decomposition of A with drop tolerance 10−2. Using the incomplete Cholesky decompositions, we computed the upper triangular matrix R such that M = RTR.

Cortona, 20-24 September, 2004 – p.7/30

SLIDE 11

A PDE example

We used three kinds of preconditioners: the classical Jacobi diagonal matrix, M = diag(A), the incomplete Cholesky decomposition of A with zero fill-in, and the incomplete Cholesky decomposition of A with drop tolerance 10−2. Using the incomplete Cholesky decompositions, we computed the upper triangular matrix R such that M = RTR. M κ(M−1A) λmin λmax I 2.6 109 3.7 10−3 9.6 106 Jacobi 6.8 108 3.1 10−9 2.08

Inc. Cholesky(0)

9.4 107 1.7 10−8 1.6

Inc. Cholesky(10−2)

6.2 106 1.8 10−7 1.1

Cortona, 20-24 September, 2004 – p.7/30

SLIDE 12

A PDE example

Preconditioner M µ Jacobi

Inc. Cholesky(0)
Inc. Cholesky(10−2)

λmax/103 3 λmax/500 5 λmax/200 18 λmax/100 43 3 λmax/50 11 λmax/20 32 λmax/10 >200 68 3 λmax/5 9 λmax/2 40 Number of eigenvalues in [λmin, µ].

Cortona, 20-24 September, 2004 – p.8/30

SLIDE 13

Problems with Preconditioning

Good clusterization but problem still ill-conditioned

Cortona, 20-24 September, 2004 – p.9/30

SLIDE 14

Problems with Preconditioning

Good clusterization but problem still ill-conditioned Several right-hand sides in succession

Cortona, 20-24 September, 2004 – p.9/30

SLIDE 15

Problems with Preconditioning

Good clusterization but problem still ill-conditioned Several right-hand sides in succession Implicit matrix

Cortona, 20-24 September, 2004 – p.9/30

SLIDE 16

Techniques for improving the solvers

Cortona, 20-24 September, 2004 – p.10/30

SLIDE 17

Techniques for improving the solvers

Deflation: compute the invariant space corresponding to the smallest eigenvalues

Cortona, 20-24 September, 2004 – p.10/30

SLIDE 18

Techniques for improving the solvers

Deflation: compute the invariant space corresponding to the smallest eigenvalues Filtering + Lanczos

Cortona, 20-24 September, 2004 – p.10/30

SLIDE 19

The two-phase iterative approach

Fix µ such that λmin( A) < µ < λmax( A)

Cortona, 20-24 September, 2004 – p.11/30

SLIDE 20

The two-phase iterative approach

Fix µ such that λmin( A) < µ < λmax( A) Start with an initial set of s randomly generated vectors (s is the block-size), and “damp”, in these starting vectors, the eigenfrequencies associated with all the eigenvalues in [µ, λmax( A)].

Cortona, 20-24 September, 2004 – p.11/30

SLIDE 21

The two-phase iterative approach

Fix µ such that λmin( A) < µ < λmax( A) Start with an initial set of s randomly generated vectors (s is the block-size), and “damp”, in these starting vectors, the eigenfrequencies associated with all the eigenvalues in [µ, λmax( A)]. Compute all the eigenvectors associated with all the eigenvalues in the range [λmin( A), µ].

Cortona, 20-24 September, 2004 – p.11/30

SLIDE 22

The two-phase iterative approach

Fix µ such that λmin( A) < µ < λmax( A) Start with an initial set of s randomly generated vectors (s is the block-size), and “damp”, in these starting vectors, the eigenfrequencies associated with all the eigenvalues in [µ, λmax( A)]. Compute all the eigenvectors associated with all the eigenvalues in the range [λmin( A), µ]. If the eigenvalues are well clustered the number of remaining eigenvalues in [λmin( A), µ], with reasonable µ (λmax/100, or λmax/10), should be small compared to the size of the linear system.

Cortona, 20-24 September, 2004 – p.11/30

SLIDE 23

The two-phase iterative approach

10

−12

10

−10

10

−8

10

−6

10

−4

10

−2

10 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

26 eigs. are below the threshold (µ = λmax/10) ← Smallest eig. = 1.0897e−13; Largest eig. = 2.5923

Cortona, 20-24 September, 2004 – p.12/30

SLIDE 24

The two-phase iterative approach

We can use Chebyshev polynomials to damp

T0(ω) = 1 ,

T1(ω) = ω Tn+1(ω) = 2ωTn(ω) − Tn−1(ω) n ≥ 1. . If we consider d > 1 and Hn(ω) = Tn(ω) Tn(d) , then Hn has minimum l∞ norm on the interval [−1, 1] over all polynomials Qn of degree less than of equal to n and satisfying the condition Qn(d) = 1, and we have max

ω∈[−1,1] Hn(ω) =

1 Tn(d).

Cortona, 20-24 September, 2004 – p.13/30

SLIDE 25

The two-phase iterative approach

λ ∈ R − → ωµ(λ) = λmax( A) + µ − 2λ λmax( A) − µ , with λmin( A) < µ < λmax( A) given above. This transformation maps λmax( A) to −1, µ to 1, and 0 to ωµ(0) = dµ = λmax( A) + µ λmax( A) − µ > 1. Then, Pn(λ) = Tn(ωµ(λ)) Tn(dµ) ,

Cortona, 20-24 September, 2004 – p.14/30

SLIDE 26

The two-phase iterative approach

A = UΛUT

the eigendecomposition of the SPD matrix A, with Λ = diag(λi)

Cortona, 20-24 September, 2004 – p.15/30

SLIDE 27

The two-phase iterative approach

A = UΛUT

the eigendecomposition of the SPD matrix A, with Λ = diag(λi) z =

n

i=1

uiξi, with ξi = uT

i z, i = 1, . . . , n,

Cortona, 20-24 September, 2004 – p.15/30

SLIDE 28

The two-phase iterative approach

A = UΛUT

the eigendecomposition of the SPD matrix A, with Λ = diag(λi) z =

n

i=1

uiξi, with ξi = uT

i z, i = 1, . . . , n,

v = Pn( A)z =

n

i=1

ui (Pn(λi)ξi) ,

Cortona, 20-24 September, 2004 – p.15/30

SLIDE 29

The two-phase iterative approach

A = UΛUT

the eigendecomposition of the SPD matrix A, with Λ = diag(λi) z =

n

i=1

uiξi, with ξi = uT

i z, i = 1, . . . , n,

v = Pn( A)z =

n

i=1

ui (Pn(λi)ξi) , The eigencomponents of the resulting vector v are close to that of the initial vector z for all i such that λi is close to 0 and relatively much smaller for large enough degree n and all i such that λi ∈ [µ, λmax( A)].

Cortona, 20-24 September, 2004 – p.15/30

SLIDE 30

The two-phase iterative approach

1 2 3 4 5 6 20 40 60 80 100 120 −30 −25 −20 −15 −10 −5

Eigencomponents of the starting set of

rthonormal vectors (6 vectors, ran-

domly generated, and orthonormalized).

1 2 3 4 5 6 20 40 60 80 100 120 −30 −25 −20 −15 −10 −5

Eigencomponents of this set of vectors after 51 Chebyshev iterations (the resulting 6 vectors have also been reorthonor- malized).

The eigenvectors of A are indexed on the X-axis (from 1 to 136) in increasing order of their corresponding eigenvalue.
The indexes (from 1 to 6) of the vectors in the current set are indicated on the Y-axis.
The Z-axis indicates the logarithm of the absolute values of the eigencomponents in each of the 6 vectors.

Cortona, 20-24 September, 2004 – p.16/30

SLIDE 31

The two-phase iterative approach

We can interpret the use of Chebyshev polynomials as a filtering tool that increases the degree of colinearity with some selected

eigenvectors. Then, after “filtering” the initial starting vectors, we
btain a set of s vectors with eigencomponents below a certain level

ε for those eigenvalues in the range [µ, λmax( A)], and relatively much bigger eigencomponents linked with the smallest eigenvalues in A.

Cortona, 20-24 September, 2004 – p.17/30

SLIDE 32

The two-phase iterative approach

We can interpret the use of Chebyshev polynomials as a filtering tool that increases the degree of colinearity with some selected

eigenvectors. Then, after “filtering” the initial starting vectors, we
btain a set of s vectors with eigencomponents below a certain level

ε for those eigenvalues in the range [µ, λmax( A)], and relatively much bigger eigencomponents linked with the smallest eigenvalues in A. The block-size s v.s. the number k of remaining eigenvalues outside the interval [µ, λmax( A)].

Cortona, 20-24 September, 2004 – p.17/30

SLIDE 33

The two-phase iterative approach

We can interpret the use of Chebyshev polynomials as a filtering tool that increases the degree of colinearity with some selected

eigenvectors. Then, after “filtering” the initial starting vectors, we
btain a set of s vectors with eigencomponents below a certain level

ε for those eigenvalues in the range [µ, λmax( A)], and relatively much bigger eigencomponents linked with the smallest eigenvalues in A. The block-size s v.s. the number k of remaining eigenvalues outside the interval [µ, λmax( A)]. k ≤ s, a Ritz decomposition enables us to get the k wanted eigenvectors.

Cortona, 20-24 September, 2004 – p.17/30

SLIDE 34

The two-phase iterative approach

We can interpret the use of Chebyshev polynomials as a filtering tool that increases the degree of colinearity with some selected

eigenvectors. Then, after “filtering” the initial starting vectors, we
btain a set of s vectors with eigencomponents below a certain level

ε for those eigenvalues in the range [µ, λmax( A)], and relatively much bigger eigencomponents linked with the smallest eigenvalues in A. The block-size s v.s. the number k of remaining eigenvalues outside the interval [µ, λmax( A)]. k ≤ s, a Ritz decomposition enables us to get the k wanted eigenvectors. k > s, we use a Block-Lanczos type of approach to build a Krylov basis starting with these filtered vectors and to stop when appropriate.

Cortona, 20-24 September, 2004 – p.17/30

SLIDE 35

The two-phase iterative approach

Lanczos/Block-Lanczos algorithm does not maintain the nice property of the filtered vectors.

Cortona, 20-24 September, 2004 – p.18/30

SLIDE 36

The two-phase iterative approach

Lanczos/Block-Lanczos algorithm does not maintain the nice property of the filtered vectors.

5 10 15 20 25 30 20 40 60 80 100 120 −30 −25 −20 −15 −10 −5

Eigendecomposition of the Krylov basis obtained after 5 block-Lanczos steps (Block Lanczos with block size 6 and full classical Gram-Schmidt reorthogonalization).

Cortona, 20-24 September, 2004 – p.18/30

SLIDE 37

Lanczos/Orthodir approach

To maintain the level of the unwanted eigenfrequencies in the

rthonormal block Krylov basis under ε, perform, at each

Block-Lanczos iteration, a few extra Chebyshev iterations on the newly generated Block Lanczos vectors V(k+1).

Cortona, 20-24 September, 2004 – p.19/30

SLIDE 38

Lanczos/Orthodir approach

To maintain the level of the unwanted eigenfrequencies in the

rthonormal block Krylov basis under ε, perform, at each

Block-Lanczos iteration, a few extra Chebyshev iterations on the newly generated Block Lanczos vectors V(k+1).

5 10 15 20 25 30 20 40 60 80 100 120 −30 −25 −20 −15 −10 −5

Eigendecomposisiton of the Krylov basis obtained after 5 block-Lanczos/Orthodir steps with additional Chebyshev filtering at each Lanczos iteration

Cortona, 20-24 September, 2004 – p.19/30

SLIDE 39

Partial Spectral Factorization phase

Z = Chebyshev Filter(Y, ε, [µ, λmax], A) is the application of a Chebyshev polynomial in A to Y, viz. Z = Pn( A)Y, where Pn has a degree such that its L∞ norm over the interval [µ, λmax] is less than ε. Fixing the block size to s, the cut-off eigenvalue µ, λmin( A) < µ < λmax( A), and the filtering level ε ≪ 1:

Cortona, 20-24 September, 2004 – p.20/30

SLIDE 40

Partial Spectral Factorization phase

P(0) = random(n, s); and Q(0) = orthonormalize(P(0)) Z(0) = Chebyshev Filter(Q(0), ε, [µ, λmax],

A)

[Y(0), Σ(0)

1 , W] = SVD(Z(0), 0);

and δ0 = σmin(Σ(0)

1 )

Z(0) = Chebyshev Filter(Y(0), δ0, [µ, λmax],
A);

and V(0) = orthonormalize(

Z(0))

V = V(0); and set δ2 = 1 for k = 0, 1, 2, . . . , until convergence do : P(k+1) =

AV(k) − VVT
AV(k); [V(k+1), Σ(k+1)

1

, W] = SVD(P(k+1), 0); set δ1 = σmin(Σ(k+1)

1

)/λmax and δ = max(ε, δ1 × δ2) for i = 1, 2 Z(k+1) = Chebyshev Filter(V(k+1), δ, [µ, λmax],

A)

Y(k+1) = Z(k+1) − VVT Z(k+1) [V(k+1), Σ(k+1)

2

, W] = SVD(Y(k+1), 0) δ2 = σmin(Σ(k+1)

2

) and set δ = δ2 end V = [V; V(k+1)]

Cortona, 20-24 September, 2004 – p.21/30

SLIDE 41

Solution of the system

Once the near-invariant subspace linked to the smallest eigenvalues is obtained, we can use it for the computation of further solutions. The idea is to perform an oblique projection of the initial residual (ˆ r0 = ˆ b − Ax0) onto this near invariant subspace in order to get the eigencomponents in the solution corresponding to the smallest eigenvalues, viz. ˆ r1 = ˆ r0 − AV

VT

AV −1 VTˆ r0, with x1 = x0 + V

VT

AV −1 VTˆ r0. To compute the remaining part of the solution vector Ax2 = ˆ r1,

ne can then use the classical Chebyshev algorithm with eigenvalue

bounds given by µ and λmax( A).

Cortona, 20-24 September, 2004 – p.22/30

SLIDE 42

Solution phase

[ˆ r1, x1] = Chebyshev Solve(ˆ r0, x0, δ, [µ, λmax], A)

Cortona, 20-24 September, 2004 – p.23/30

SLIDE 43

Solution phase

[ˆ r1, x1] = Chebyshev Solve(ˆ r0, x0, δ, [µ, λmax], A)

Solution phase

For any right-hand side vector ˆ b, set x0 and ˆ r0 = ˆ b − Ax0, and perform the following consecutive steps: [ˆ r1, x1] = Chebyshev Solve(ˆ r0, x0, ε, [µ, λmax], A) ˆ r = ˆ r1 − AV

VT

AV −1 VTˆ r1; and x = x1 + V

VT

AV −1 VTˆ r1

Cortona, 20-24 September, 2004 – p.23/30

SLIDE 44

Error Analysis

||ˆ r||

A−1

= ||x∗ − x||

A

e0 = A−1ˆ r0 = x∗ − x0 v = Pn( A)ˆ r0/||Pn( A)e0||2.

Cortona, 20-24 September, 2004 – p.24/30

SLIDE 45

Error Analysis

||ˆ r||

A−1

= ||x∗ − x||

A

e0 = A−1ˆ r0 = x∗ − x0 v = Pn( A)ˆ r0/||Pn( A)e0||2. ||ˆ r||

A−1

||ˆ r0||

A−1

≤ || A

1 2||2||

A− 1

2||2||(I − VVT)v||2.

Cortona, 20-24 September, 2004 – p.24/30

SLIDE 46

Error Analysis

||ˆ r||

A−1

= ||x∗ − x||

A

e0 = A−1ˆ r0 = x∗ − x0 v = Pn( A)ˆ r0/||Pn( A)e0||2. ||ˆ r||

A−1

||ˆ r0||

A−1

≤ || A

1 2||2||

A− 1

2||2||(I − VVT)v||2.

Theorem Let U1 ∈ Rn×m be the matrix of the eigenvectors of

A corresponding to Λ1 and U2 ∈ Rn×(n−m) be the matrix of the remaining eigenvectors of

A. Let V ∈ Rn×ℓ be the full basis

generated by the Algorithm using a filtering level ε and v = Pn( A)ˆ r0/||Pn( A)e0||2. If mε ≪ 1 and ℓ ≥ m then ||(I − VVT)v||2 ≤ 2ε√m + 2ε2m.

Cortona, 20-24 September, 2004 – p.24/30

SLIDE 47

Error Analysis

||ˆ r||

A−1

= ||x∗ − x||

A

e0 = A−1ˆ r0 = x∗ − x0 v = Pn( A)ˆ r0/||Pn( A)e0||2. ||ˆ r||

A−1

||ˆ r0||

A−1

≤ 4√mε|| A

1 2||2||

A− 1

2||2.

Theorem Let U1 ∈ Rn×m be the matrix of the eigenvectors of

A corresponding to Λ1 and U2 ∈ Rn×(n−m) be the matrix of the remaining eigenvectors of

A. Let V ∈ Rn×ℓ be the full basis

generated by the Algorithm using a filtering level ε and v = Pn( A)ˆ r0/||Pn( A)e0||2. If mε ≪ 1 and ℓ ≥ m then ||(I − VVT)v||2 ≤ 2ε√m + 2ε2m.

Cortona, 20-24 September, 2004 – p.24/30

SLIDE 48

Error Analysis

20 40 60 80 100 120 140 10

−18

10

−16

10

−14

10

−12

10

−10

10

−8

10

−6

10

−4

10

−2

Eigencomponents of residual Rk Eigencomponents of error (Xk − Xsol)

Computation of the solution of a linear system with a right-hand side vector b corresponding to a given random exact solution vector x∗. The filtering level ε has been fixed at 10−8 in both phases of the algorithm.

Cortona, 20-24 September, 2004 – p.25/30

SLIDE 49

Influence of the different parameters

The choice of the starting block size s

Cortona, 20-24 September, 2004 – p.26/30

SLIDE 50

Influence of the different parameters

The choice of the starting block size s The choice of the cut-off eigenvalue µ

Cortona, 20-24 September, 2004 – p.26/30

SLIDE 51

Influence of the different parameters

The choice of the starting block size s The choice of the cut-off eigenvalue µ The choice of the filtering level ε

Cortona, 20-24 September, 2004 – p.26/30

SLIDE 52

Influence of the different parameters

The choice of the starting block size s The choice of the cut-off eigenvalue µ The choice of the filtering level ε ε = 10−8

Cortona, 20-24 September, 2004 – p.26/30

SLIDE 53

Influence of the different parameters

The choice of the starting block size s The choice of the cut-off eigenvalue µ The choice of the filtering level ε ε = 10−8 ||ˆ r||

A−1

||ˆ r0||

A−1

≤ τ, ε = τ

κ(

A) . κ( A) = A A−1 “a posteriori” good approximation

Cortona, 20-24 September, 2004 – p.26/30

SLIDE 54

Influence of the different parameters

Number of Chebyshev Iterations (s = 6) µ = λmax/5 µ = λmax/10 µ = λmax/100 Block Krylov Value of ε Value of ε Value of ε Iteration 10−14 10−8 10−14 10−8 10−14 10−8 Start 1 2 3 4 5 35 + 2 10 11 16 25 35 20 + 3 9 11 15 20 20 51 + 4 15 20 32 43

30 + 3

15 18 30 30

165 + 14

59 88 165

96 + 14

56 90 96

In the case µ = λmax/5, there are 33 eigenvectors to capture.

In the case µ = λmax/10, there are 26 eigenvectors to capture. In the case µ = λmax/100, there are 19 eigenvectors to capture.

Cortona, 20-24 September, 2004 – p.27/30

SLIDE 55

Numerical experiments

Back to the starting example

Cortona, 20-24 September, 2004 – p.28/30

SLIDE 56

Numerical experiments

µ = λmax/γ Spectral Factorization Solution phase γ

Tot. Chebyshev

Size V Chebyshev Error Energy Iterations Iterations Norm Jacobi 1000 1030 (1004) 3 231 7 10−3 (2.6 10−5) 500 1101 (1114) 5 163 6 10−4 (7.0 10−6) 200 2234 (2615) 1 103 2.5 10−4 (4 10−6)

Inc. Cholesky(0)

100 433 (248) 5 (3) 68 7.4 10−3 (1.8 10−5) 50 462 (503) 9 48 3 10−3 (1.3 10−5)

Inc. Cholesky(10−2)

10 55 (70) 3 19 8.2 10−3 (3.3 10−6) Summary of the results

Cortona, 20-24 September, 2004 – p.29/30

SLIDE 57

THE END

Cortona, 20-24 September, 2004 – p.30/30