Krylov subspace methods and exascale computations: good match or - - PowerPoint PPT Presentation

krylov subspace methods and exascale computations good
SMART_READER_LITE
LIVE PREVIEW

Krylov subspace methods and exascale computations: good match or - - PowerPoint PPT Presentation

Krylov subspace methods and exascale computations: good match or lost case? Zden ek Strako Charles University in Prague and Czech Academy of Sciences http://www.karlin.mff.cuni.cz/strakos SPPEXA Symposium, Mnich, January 2016


slide-1
SLIDE 1

Krylov subspace methods and exascale computations: good match or lost case?

Zdenˇ ek Strakoš

Charles University in Prague and Czech Academy of Sciences http://www.karlin.mff.cuni.cz/˜strakos SPPEXA Symposium, Münich, January 2016

slide-2
SLIDE 2
  • Z. Strakoš

2

Personal prehistory

Strakos, Z., Efficiency and Optimizing of Algorithms and Programs on the Host Computer / Array Processor System, Parallel Computing, 4, 1987,

  • pp. 189-209.
  • Host Computer (0.2 MFlops) / Array Processor (up to 10 MFlops).
  • Large instruction overhead and slow data transfers.
  • Pipelining, several arithmetic units.
  • Possible overlap of data transfers and arithmetic.
  • Slow scalar operations.

Basic problems and principles are not even after thirty years that much different.

slide-3
SLIDE 3
  • Z. Strakoš

3

Preconditioned algebraic CG

r0 = b − Ax0, solve Mz0 = r0, p0 = z0 For n = 1, . . . , nmax αn−1 = z∗

n−1rn−1

p∗

n−1Apn−1

xn = xn−1 + αn−1pn−1 , stop when the stopping criterion is satisfied rn = rn−1 − αn−1Apn−1 Mzn = rn , solve for zn βn = z∗

nrn

z∗

n−1rn−1

pn = zn + βnpn−1 End

slide-4
SLIDE 4
  • Z. Strakoš

4

Obstacles for parallelization

  • Synchronized recursion.
  • Matrix-vector multiplication and vector updates are linear and (possibly)
  • fast. Preconditioning is expensive (substantial global communication).
  • Scalar coefficients bring in nonlinearity and require inner products.

However, for the approximation power of the methods, nonlinearity is essential!

  • Parallelization can lead to numerical instabilities.
slide-5
SLIDE 5
  • Z. Strakoš

5

Parallel (communication sensitive) algorithms?

  • Block recursion in order to increase arithmetic/communication ratio.
  • Numerical stability is crucial.
  • Stopping criteria can save the case. Size of the blocks?
  • Preconditioning means an approximate solution of a part of the problem.

State-of-the-art in the algorithmic developments:

  • E. Carson, Communication-Avoiding Krylov Subspace Methods in Theory

and Practice, PhD Thesis, UC at Berkeley, CA, 2015.

slide-6
SLIDE 6
  • Z. Strakoš

6

Outline

  • 1. Philosophy of using Krylov subspace methods
  • 2. Nonlinear model reduction
  • 3. Inexact Krylov?
  • 4. Operator and algebraic preconditioning
  • 5. Krylov subspaces and discretization
  • 6. Stopping criteria?
slide-7
SLIDE 7
  • Z. Strakoš

7

1 Plethora of Krylov subspace methods

  • Thorough analysis and fair comparison of several important methods

should be given priority to overproduction of algorithmic variations.

  • Krylov subspace methods are efficient providing that they “do justice

to the inner nature of the problem.” (C. Lanczos, 1947). Infinite dimensional considerations are very useful.

  • Oversimplification is dangerous. Widespread worst scenario analysis

restricted to the operator only, universal contraction-based bounds, asymptotic considerations, unjustified or hidden restrictive assumptions.

  • Results pointing out difficulties should be taken as an inspiration. They

are instead unwanted and often labeled as “negative.”

slide-8
SLIDE 8
  • Z. Strakoš

8

1 Málek and S, SIAM Spotlight, 2015

slide-9
SLIDE 9
  • Z. Strakoš

9

Outline

  • 1. Philosophy of using Krylov subspace methods
  • 2. Nonlinear model reduction
  • 3. Inexact Krylov?
  • 4. Operator and algebraic preconditioning
  • 5. Krylov subspaces and discretization
  • 6. Stopping criteria?
slide-10
SLIDE 10
  • Z. Strakoš

10

2 Operator form of the BVP and preconditioning

Let V be a real (infinite dimensional) Hilbert space with the inner product (·, ·)V : V × V → R, let V # be the dual space of bounded linear functionals on V . Consider a bounded and coercive operator A : V → V # and the equation in V # Ax = b , A : V → V #, x ∈ V, b ∈ V # . Using the Riesz map, (τAx − τb, v)V = 0 for all v ∈ V . The Riesz map τ can be interpreted as transformation of the original problem Ax = b in V # into the equation in V τAx = τb, τA : V → V, x ∈ V, τb ∈ V , which is (unfortunately) called preconditioning.

slide-11
SLIDE 11
  • Z. Strakoš

11

2 Model reduction using Krylov subspaces

Let B (= τA) be a bounded linear operator on the Hilbert space V . Choosing z0 (= τb − τAx0) ∈ V . Consider the Krylov sequence z0, z1 = Bz0, z2 = Bz1 = B2z0, . . . , zn = Bzn−1 = Bnzn−1, . . . Determine a sequence of operators Bn defined on the sequence of nested subspaces Vn = span {z0, . . . , zn−1} , with the projector En

  • nto Vn , such that (Vorobyev (1958, 1965))

z1 = Bz0 = Bnz0, z2 = B2z0 = (Bn)2z0, . . . zn−1 = Bn−1z0 = (Bn)n−1z0, Enzn = EnBnz0 = (Bn)nz0.

slide-12
SLIDE 12
  • Z. Strakoš

12

2 Bounded self-adjoint operators in

V

B x = f ← → ω(λ),

  • F(λ) dω(λ)

↑ ↑ Tn yn = fV e1 ← → ω(n)(λ),

n

  • i=1

ω(n)

i

F

  • θ(n)

i

  • Using F(λ) = λ−1 gives (assuming coercivity)

λU

λL

λ−1 dω(λ) =

n

  • i=1

ω(n)

i

  • θ(n)

i

−1 + u − un2

a

f2

V

Stieltjes (1894) and Vorobyev (1958) moment problems for self-adjoint bounded operators reduce to the Gauss-Christoffel quadrature (1814). No one would consider describing it by contraction.

slide-13
SLIDE 13
  • Z. Strakoš

13

2 CG in Hilbert spaces

r0 = b − Ax0 ∈ V #, p0 = τr0 ∈ V For n = 1, 2, . . . , nmax αn−1 = rn−1, τrn−1 Apn−1, pn−1 = (τrn−1, τrn−1)V (τApn−1, pn−1)V xn = xn−1 + αn−1pn−1 , stop when the stopping criterion is satisfied rn = rn−1 − αn−1Apn−1 βn = rn, τrn rn−1, τrn−1 = (τrn, τrn)V (τrn−1, τrn−1)V pn = τrn + βnpn−1 End Hayes (1954); Vorobyev (1958, 1965); Karush (1952); Stesin (1954) Superlinear convergence for (identity + compact) operators.

slide-14
SLIDE 14
  • Z. Strakoš

14

Outline

  • 1. Philosophy of using Krylov subspace methods
  • 2. Matching moments model reduction
  • 3. Inexact Krylov?
  • 4. Operator and algebraic preconditioning
  • 5. Krylov subspaces and discretization
  • 6. Stopping criteria?
slide-15
SLIDE 15
  • Z. Strakoš

15

3 Delay of convergence due to inexactness

20 40 60 80 100 10

−15

10

−10

10

−5

10

?

100 200 300 400 500 600 700 800 10

−15

10

−10

10

−5

10 iteration number residual smooth ubound backward error loss of orthogonality approximate solution error

Here numerical inexactness due to roundoff. How much may we relax accuracy of the most costly operations without causing an unwanted delay and/or affecting the maximal attainable accuracy?

slide-16
SLIDE 16
  • Z. Strakoš

16

Outline

  • 1. Philosophy of using Krylov subspace methods
  • 2. Nonlinear model reduction
  • 3. Inexact Krylov?
  • 4. Operator and algebraic preconditioning
  • 5. Krylov subspaces and discretization
  • 6. Stopping criteria?
slide-17
SLIDE 17
  • Z. Strakoš

17

4 Restriction to finite dimensional subspace Vh

Let Φh = (φ(h)

1 , . . . , φ(h) N )

be a basis of the subspace Vh ⊂ V , let Φ#

h = (φ(h)# 1

, . . . , φ(h)#

N

) be the canonical basis of its dual V #

h ,

( V #

h = AVh) . Using the coordinates in

Φh and in Φ#

h ,

f, v → v∗f , (u, v)V → v∗Mu, (Mij) = ((φj, φi)V )i,j=1,...,N , Au → Au , Au = AΦhu = Φ#

h Au ;

(Aij) = (a(φj, φi))i,j=1,...,N , τf → M−1f , τf = τΦ#

h f = ΦhM−1f ;

we get with b = Φ#

h b , xn = Φh xn , pn = Φh pn , rn = Φ# h rn

the algebraic CG formulation.

slide-18
SLIDE 18
  • Z. Strakoš

18

4 Galerkin discretization gives matrix CG in

Vh

r0 = b − Ax0, solve Mz0 = r0, p0 = z0 For n = 1, . . . , nmax αn−1 = z∗

n−1rn−1

p∗

n−1Apn−1

xn = xn−1 + αn−1pn−1 , stop when the stopping criterion is satisfied rn = rn−1 − αn−1Apn−1 Mzn = rn , solve for zn βn = z∗

nrn

z∗

n−1rn−1

pn = zn + βnpn−1 End Günnel, Herzog, Sachs (2014); Málek, S (2015)

slide-19
SLIDE 19
  • Z. Strakoš

19

4 Observations

  • Unpreconditioned CG, i.e.

M = I , corresponds to the discretization basis Φ orthonormal wrt (·, ·)V .

  • Orthogonalization of the discretization basis will result in the

unpreconditioned algebraic CG that is applied to the preconditioned algebraic system. The resulting matrix of this preconditioned algebraic system is not sparse!

  • Any algebraic preconditioning applied to the algebraic system that arise

from discretization can be interpreted within this operator preconditioning framework.

slide-20
SLIDE 20
  • Z. Strakoš

20

Outline

  • 1. Philosophy of using Krylov subspace methods
  • 2. Nonlinear model reduction
  • 3. Inexact Krylov?
  • 4. Operator and algebraic preconditioning
  • 5. Krylov subspaces and discretization
  • 6. Stopping criteria?
slide-21
SLIDE 21
  • Z. Strakoš

21

5 Conjugate gradient method - first n steps

Tn =           α1 β2 β2 ... ... ... ... ... ... ... βn βn αn           is the Jacobi matrix of the orthogonalization coefficients and the CG method is formulated by Tnyn = τr0V e1, xn = x0 + Qnyn , xn ∈ Vn . Infinite dimensional Krylov subspace methods perform discretization via model reduction.

slide-22
SLIDE 22
  • Z. Strakoš

22

Outline

  • 1. Philosophy of using Krylov subspace methods
  • 2. Nonlinear model reduction
  • 3. Inexact Krylov?
  • 4. Operator and algebraic preconditioning
  • 5. Krylov subspaces and discretization
  • 6. Stopping criteria?
slide-23
SLIDE 23
  • Z. Strakoš

23

6 L-shape domain, Papež, Liesen, S (2014)

−1 −0.5 0.5 1 −1 1 0.2 0.4 0.6 0.8 1 1.2 1.4

−1 1 −1 1 −4 −2 2 4 x 10

−4

Exact solution x (left) and the discretisation error x − xh (right) in the Poisson model problem, linear FEM, adaptive mesh refinement. Quasi equilibrated discretization error over the domain.

slide-24
SLIDE 24
  • Z. Strakoš

24

6 L-shape domain, Papež, Liesen, S (2014)

−1 1 −1 1 −4 −2 2 4 x 10

−4

−1 1 −1 1 −4 −2 2 4 x 10

−4

Algebraic error xh − x(n)

h

(left) and the total error x − x(n)

h

(right) after the number of CG iterations guaranteeing x − xha = ∇(x − xh) ≫ x − xnA .

slide-25
SLIDE 25
  • Z. Strakoš

25

Conclusions

  • Krylov subspace methods adapt to the problem. Exploiting this

adaptation is the key to their efficient use.

  • They are expensive and by their nature recursive. Therefore they can

not be efficient without being fast, i.e., without powerful preconditioning.

  • Individual steps modeling-analysis-discretization-computation should

not be considered separately within isolated disciplines. They form a single problem.

  • Fast HPC computations result from appropriate handling of all involved

issues, including numerical stability and a posteriori error analysis leading to appropriate stopping criteria.

  • There are many difficult but exciting challenges ahead. In order to

resolve them, we should fairly admit that they exist.

slide-26
SLIDE 26
  • Z. Strakoš

26

References

  • J. Málek and Z.S., Preconditioning and the Conjugate Gradient Method

in the Context of Solving PDEs. SIAM Spotlight Series, SIAM (2015)

  • T. Gergelits and Z.S., Composite convergence bounds based on

Chebyshev polynomials and finite precision conjugate gradient computations, Numer. Alg. 65, 759-782 (2014)

  • J. Papež, J. Liesen and Z.S., Distribution of the discretization and

algebraic error in numerical solution of partial differential equations, Linear Alg. Appl. 449, 89-114 (2014)

  • J. Liesen and Z.S., Krylov Subspace Methods, Principles and Analysis.

Oxford University Press (2013)

  • Z.S. and P

. Tichý, On efficient numerical approximation of the bilinear form c∗A−1b, SIAM J. Sci. Comput. 33, 565-587 (2011)

slide-27
SLIDE 27
  • Z. Strakoš

27

Thank you for your patience!

slide-28
SLIDE 28
  • Z. Strakoš

28

Czech and German Elephant

Kr´ alick´ y Snˇ eˇ zn´ ık - Glatzer Schneeberg German Artist’ Union Jetscher Artist Amei Hallenger Made by Co F¨

  • rster, Zuckmantel (Zlat´

e Hory) 1932