The Mathematics of ExaStencils
Hannah Rittich
joint work with: Matthias Bolten and Karsten Kahl
Bergische Universit¨ at Wuppertal
E aStencils Overview Task: Solve a PDE (efficiently). Multigrid - - PowerPoint PPT Presentation
The Mathematics of ExaStencils Hannah Rittich joint work with: Matthias Bolten and Karsten Kahl Bergische Universit at Wuppertal April 15, 2015 E aStencils Overview Task: Solve a PDE (efficiently). Multigrid methods is a framework
Bergische Universit¨ at Wuppertal
◮ Task: Solve a PDE (efficiently). ◮ Multigrid methods is a framework for constucting algorithms. ◮ These methods are build from a set of different ingredients.
◮ How do these ingredients influence (the computational and)
◮ Estimate the (numerical) performance is needed.
∂x2 + ∂2u ∂y2
∂x2 + ∂2u ∂y2
Photo by Graham Richardson / CC BY
∂x2 + ∂2u ∂y2
0.5 1
0.5 1
∂x2 + ∂2u ∂y2
1 0.5 1 0.5
∂x2 + ∂2u ∂y2
∂x2 + ∂2u ∂y2
0.05 0.1 0.15 0.2 0.8 1 0.6 0.2 0.4 0.2 0.4 0.6 0.8 1
0.005 0.01 0.015 0.8 1 0.4 0.6 0.2 0.2 0.4 0.6 0.8 1
0.005 0.01 0.015 0.4 0.6 0.8 1 0.2 0.2 0.4 0.6 0.8 1
◮ For many simple iterative methods and
◮ Smooth error can be represented on a
◮ Use recursion on the coarse problem.
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Bad Parallelization...
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Bad Parallelization...
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Bad Parallelization...
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Bad Parallelization...
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Bad Parallelization...
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Bad Parallelization...
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 2 Msgs.
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 2 Msgs.
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 2 Msgs.
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 2 Msgs.
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 2 Msgs.
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 2 Msgs.
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 2 Msgs.
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 2 Msgs.
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 2 Msgs.
4(fx,y − ux,y−1 − ux−1,y − ux+1,y − ux,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 2 Msgs.
x,y = 1 4(fx,y − uold x,y−1 − uold x−1,y − uold x+1,y − uold x,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 1 Msgs.
x,y = 1 4(fx,y − uold x,y−1 − uold x−1,y − uold x+1,y − uold x,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 1 Msgs.
x,y = 1 4(fx,y − uold x,y−1 − uold x−1,y − uold x+1,y − uold x,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 1 Msgs.
x,y = 1 4(fx,y − uold x,y−1 − uold x−1,y − uold x+1,y − uold x,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 1 Msgs.
x,y = 1 4(fx,y − uold x,y−1 − uold x−1,y − uold x+1,y − uold x,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 1 Msgs.
x,y = 1 4(fx,y − uold x,y−1 − uold x−1,y − uold x+1,y − uold x,y+1) ◮ Parallelizable ◮ Per Iteration:
◮ ≈ 4
◮ 1 Msgs.
ℓ+1P T ℓ Lℓ
ℓ (I − Pℓ(I − Eγ ℓ+1)(Lℓ+1)−1RℓLℓ)Mν1 ℓ
ℓ+1P T ℓ Lℓ
ℓ (I − Pℓ(I − Eγ ℓ+1)(Lℓ+1)−1RℓLℓ)Mν1 ℓ
◮ Let Ωh = h1Z × · · · × hdZ. ◮ If L is a linear operator on the grid Ωh it
◮ For many operators:
◮ To analyze the local behavior it is
u(x) u(x + κ) sx(κ)
◮ Consider the operator given by the constant stencil
◮ Grid size independent analysis.
◮ “Move boundary conditions to infinity.”
◮ Exploits simple structure in the frequency Domain. ◮ Quantitative estimates.
h
5 10 0.2 0.4 0.6 0.8 1 (truncated) time domain
1 2 3 0.2 0.4 0.6 0.8 1 frequency domain vol1/2
h
(2π)d/2
vol1/2
h
(2π)d/2
h
5 10 0.2 0.4 0.6 0.8 1 (truncated) time domain
1 2 3 0.2 0.4 0.6 0.8 1 frequency domain vol1/2
h
(2π)d/2
vol1/2
h
(2π)d/2
h
5 10 0.2 0.4 0.6 0.8 1 (truncated) time domain
1 2 3 0.2 0.4 0.6 0.8 1 frequency domain vol1/2
h
(2π)d/2
vol1/2
h
(2π)d/2
h
5 10 0.2 0.4 0.6 0.8 1 (truncated) time domain
1 2 3 0.2 0.4 0.6 0.8 1 frequency domain vol1/2
h
(2π)d/2
vol1/2
h
(2π)d/2
h
5 10 0.2 0.4 0.6 0.8 1 (truncated) time domain
1 2 3 0.2 0.4 0.6 0.8 1 frequency domain vol1/2
h
(2π)d/2
vol1/2
h
(2π)d/2
ϑ
ϑ
1 2 3 0.2 0.4 0.6 0.8 1
∂2u ∂x2 = f
2−exp(ihϑ)
1 2 3 0.2 0.4 0.6 0.8 1
∂2u ∂x2 = f
◮ Red-Black Gauß-Seidel is not shift-invariant. ◮ Low and High Frequencies interact.
2h, π 2h)
1 2
−A(ϑ)+1 A(ϑ+π)+1
2
A(ϑ)−1 A(ϑ+π)+1
0.5 1 1.5 0.2 0.4 0.6 0.8 1 low → low
0.5 1 1.5 0.2 0.4 0.6 0.8 1 high → low
0.5 1 1.5 0.2 0.4 0.6 0.8 1 low → high
0.5 1 1.5 0.2 0.4 0.6 0.8 1 high → high
∂2u ∂x2 = f
hS z | 0 ≤ zj < S, z ∈ Zd}
0.5 1 1.5 2 1 2 3 4 5 6
0.5 1 1.5 2 0.75 1.25
0.5 1 1.5 2 0.75 1.25
0.5 1 1.5 2 0.75 1.25
0.5 1 1.5 2 0.75 1.25
hS z | 0 ≤ zj < S, z ∈ Zd}
ϑ
ϑ
0.5 1 1.5 2 1 2 3 4 5 6
0.5 1 1.5 2 0.75 1.25
0.5 1 1.5 2 0.75 1.25
0.5 1 1.5 2 0.75 1.25
0.5 1 1.5 2 0.75 1.25
1 √neiσj,τk
◮ Symbol of a block stencil
◮ How to analyze more general operators like
◮ Use linearity
◮ the composition of operators
◮ And in case of different block sizes:
◮ How about a whole Multigrid method?
1 √n(1, . . . , 1)
1 √n(1, . . . , 1)T
0.5 1 1.5 2 1 2 3 4 5 6
0.5 1 1.5 2 0.75 1.25
0.5 1 1.5 2 0.75 1.25
0.5 1 1.5 2 0.75 1.25
0.5 1 1.5 2 0.75 1.25
◮ a : Ω → R ◮ Finite Volume Discretization ◮ a discontinuous at interfaces
u(x) 5 10 15 20 25 30 35 X 5 10 15 20 25 30 35 Y 2e-05 4e-05 6e-05 8e-05 0.0001 0.00012 u(x) 5e-05 1e-05 1e-06 1e-07 1e-09 1e-11 5 10 15 20 25 30 35 X 5 10 15 20 25 30 35
◮ a discontinuous at interfaces
a = 106 a = 1
◮ Pointwise smoother (ω-Jacobi, Gauß-Seidel).
ℓ Lℓ) ◮ Galerkin coarse grid approximation.
ℓ LℓPℓ ◮ Operator dependent interpolation (see [ABDP81]). ◮ Full Coarsening (Two-Level).
◮ LFA analyzes local behavior.
◮ Repeated jumps yield the same behavior as the original
◮ a discontinuous at interfaces
a = 106 a = 1
Jac (0.8), Poisson 5 10 15 20 25 30 5 10 15 20 25 30 0.2 0.4 0.6 0.8 1 Jac (0.8), JCP (4x4) 5 10 15 20 25 30 5 10 15 20 25 30 0.2 0.4 0.6 0.8 1
◮ a discontinuous at interfaces
a = 106 a = 1
ℓ+1P T ℓ Lℓ
GCA (Bilinear), JCP (4x4) 5 10 15 20 25 30 5 10 15 20 25 30 0.5 1 1.5 2 2.5 GCA (Adaptive), JCP (4x4) 5 10 15 20 25 30 5 10 15 20 25 30 0.5 1 1.5 2 2.5
◮ a discontinuous at interfaces
a = 106 a = 1
ℓ+1P T ℓ Lℓ)Mℓ
Jac (0.8), Dep., GCA, JCP (4x4) 5 10 15 20 25 30 5 10 15 20 25 30 0.2 0.4 0.6 0.8 1
◮ To maximize the usage of your hardware choose from different
◮ LFA can be used to analyze complex problems.
◮ Write all your operators in block stencil form. ◮ Use the formula
◮ Combine the operators (linearity and composition). ◮ Easily compute the norm or spectral radius.
◮ A Software tool is in preparation for easy application of the