High-Order Explicit Local Time-Stepping Methods For Wave Propagation - - PowerPoint PPT Presentation

high order explicit local time stepping methods for wave
SMART_READER_LITE
LIVE PREVIEW

High-Order Explicit Local Time-Stepping Methods For Wave Propagation - - PowerPoint PPT Presentation

High-Order Explicit Local Time-Stepping Methods For Wave Propagation Marcus Grote Universit e de B ale joint work with: M. Mehlin, T. Mitkova, Univ. de B ale J. Diaz, INRIA, Pau S. Sauter, Univ. de Zurich D. Peter, M. Rietmann, O.


slide-1
SLIDE 1

High-Order Explicit Local Time-Stepping Methods For Wave Propagation

Marcus Grote Universit´ e de Bˆ ale

joint work with:

  • M. Mehlin, T. Mitkova, Univ. de Bˆ

ale

  • J. Diaz, INRIA, Pau
  • S. Sauter, Univ. de Zurich
  • D. Peter, M. Rietmann, O. Schenk, USI
  • B. U¸

car, CNRS & ENS-Lyon

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-2
SLIDE 2

Wave Phenomena

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-3
SLIDE 3

Adaptive Mesh Refinement

geometric features Tohoku fault: mesh generation

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-4
SLIDE 4

Overcoming Geometry Induced Stiffness

Problem

Locally refined meshes induce severe stability restrictions for explicit time-stepping schemes.

Solutions

  • Locally implicit (IMEX) schemes

e.g. Ascher 1995, Piperno 2006, Verwer 2009, Dolean et al. 2010, Chabassier, Imperiale 2015, Descombes, Lanteri, Moya 2015

  • Explicit local time-stepping (LTS)

schemes in this talk!

  • Local exponential integrators

Hochbruck, Ostermann 2011

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-5
SLIDE 5

High-order Local Time Stepping (LTS) Methods

Outline:

  • The (damped) wave equation
  • CG, IP-DG and nodal DG FE discretizations
  • LTS methods: previous work
  • Runge-Kutta based LTS methods
  • Multi-level leap-frog based LTS methods
  • Parallel performance
  • Concluding remarks

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-6
SLIDE 6

The (Damped) Wave Equation

Model problem (second-order form) utt + σut − ∇ · ( c ∇u) = f in Ω × (0, T) u = 0

  • n ∂Ω × (0, T)

u|t=0 = u0 , ut|t=0 = v0 in Ω

  • Ω ⊂ Rd bounded, σ(x) ≥ 0, c(x) > 0

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-7
SLIDE 7

The (Damped) Wave Equation

Model problem (second-order form) utt + σut − ∇ · ( c ∇u) = f in Ω × (0, T) u = 0

  • n ∂Ω × (0, T)

u|t=0 = u0 , ut|t=0 = v0 in Ω

  • Ω ⊂ Rd bounded, σ(x) ≥ 0, c(x) > 0

Weak formulation Find u ∈ C0(0, T; H1

0(Ω)) ∩ C1(0, T; L2(Ω)):

utt, v(H−1,H1

0) + (σut, v) + a(u, v) = (f, v),

∀v ∈ H1

0(Ω),

a(u, v) = (c ∇u, ∇v)

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-8
SLIDE 8

The (Damped) Wave Equation

Model problem (second-order form) utt + σut − ∇ · ( c ∇u) = f in Ω × (0, T) u = 0

  • n ∂Ω × (0, T)

u|t=0 = u0 , ut|t=0 = v0 in Ω

  • Ω ⊂ Rd bounded, σ(x) ≥ 0, c(x) > 0

Weak formulation Find u ∈ C0(0, T; H1

0(Ω)) ∩ C1(0, T; L2(Ω)):

utt, v(H−1,H1

0) + (σut, v) + a(u, v) = (f, v),

∀v ∈ H1

0(Ω),

a(u, v) = (c ∇u, ∇v) Energy conservation For σ = 0, f = 0 the energy E[u](t) := 1 2

  • ut2 + a(u, u)
  • ≡ const.

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-9
SLIDE 9

Second-order semi-discrete FE formulations

  • Conforming mass-lumped FEM:

(Cohen-Joly-Roberts-Tordjman, SINUM, 2001) a(u, ϕ) :=

  • K∈Th
  • K

c∇u · ∇ϕ dx

  • IP-DG FEM: (G.-Schneebeli-Sch¨
  • tzau, SINUM 2006)

aDG(u, ϕ) :=

  • K∈Th
  • K

c∇u · ∇ϕ dx −

  • e∈Eh
  • e

[ [ϕ] ] · { {c∇u} } dA −

  • e∈Eh
  • e

[ [u] ] · { {c∇ϕ} } dA +

  • e∈Eh

a[ [u] ] · [ [ϕ] ] dA

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-10
SLIDE 10

The (Damped) Wave Equation

Model problem (first-order form, v := ut and w := −∇u) vt + σv + ∇ · (c w) = f in Ω × (0, T) wt + ∇v = 0 in Ω × (0, T) v = 0

  • n ∂Ω × (0, T)

v|t=0 = v0 , w|t=0 = −∇u0 in Ω qt + Σ q + ∇ · F(q) = S with q = (v, w)t

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-11
SLIDE 11

Nodal DG FE Formulation

Find qh : [0, T] × Vh → R such that (qh

t , ψ)+(Σ qh, ψ)+aDG(qh, ψ) = (S, ψ)

∀ ψ ∈ Vh , t ∈ (0, T) .

  • Nodal DG FEM: (Hesthaven-Warburton, Springer, 2008)

aDG(q, ψ) :=

  • K∈Th
  • K

(∇ · F(q)) · ψ dx −

  • e∈Eh
  • e

(n · F(q) − (n · F(q))∗) · ψ dA Here, (n · F(q))∗ denotes a suitable numerical flux in the unit normal direction n.

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-12
SLIDE 12

Semi-Discrete Galerkin FE Formulations

The discretization in space leads to a system of ODE’s M d2U dt2 (t) + Mσ dU dt (t) + K U(t) = R(t) , t ∈ (0, T)

  • r

MdQ dt (t) + Mσ Q(t) + K Q(t) = R(t) , t ∈ (0, T) . The stiffness matrix K and the mass matrix M are sparse. Moreover, the mass matrix M is SPD and (block-)diagonal ⇒ computing M−1 is cheap ⇒ fully explicit time-stepping!

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-13
SLIDE 13

Semi-Discrete Galerkin FE Formulations

The discretization in space leads to a system of ODE’s M d2U dt2 (t) + Mσ dU dt (t) + K U(t) = R(t) , t ∈ (0, T)

  • r

MdQ dt (t) + Mσ Q(t) + K Q(t) = R(t) , t ∈ (0, T) . The stiffness matrix K and the mass matrix M are sparse. Moreover, the mass matrix M is SPD and (block-)diagonal ⇒ computing M−1 is cheap ⇒ fully explicit time-stepping! adaptivity, small geometric features ⇓ locally refined meshes ⇓ CFL condition for explicit time-stepping ∆t ≤ C h, h = min

T ∈Th hT Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-14
SLIDE 14

Multirate Time-Stepping for ODEs / Previous Work

  • Rice, J. Res. Nat. Bureau Stand.-B 1960
  • Split Runge-Kutta methods
  • Gear-Wells, BIT 1984
  • Multirate linear multistep methods: “fast-first”, “slow-first”

unther-Kværnø-Rentrop, BIT 2001

  • Multirate partitioned (IMEX) Runge-Kutta methods
  • Leimkuhler-Reich, JCP 2001
  • The reversible averaging (RA) method
  • Hairer-Lubich-Wanner, Geometric Numerical Integration 2002
  • Multiple time-stepping for ODEs
  • Savcenco-Hundsdorfer-Verwer, BIT, 2007
  • Multirate (IMEX) time-stepping strategy for stiff ODEs
  • A. Kl¨
  • ckner, PhD thesis, 2010
  • Multirate ABk time-stepping (Gear-Wells type)

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-15
SLIDE 15

Explicit LTS for PDEs / Previous Work

  • Berger and Oliger, JCP 1984
  • AMR method, based on rectangular FD patches (AMROC)
  • Collino et al., Numer. Math. 2003, JCP 2006; Piperno, M2AN 2006
  • Sympletic second-order St¨
  • rmer-Verlet
  • Dumbser et al., Geophys. J. Int. 2007; Int. J. Numer. Model. 2009
  • LTS ADER-DG schemes
  • Constantinescu-Sandu, J. Sc. Comp. 2007, 2009
  • Multirate time integration, limited to second order accuracy
  • Diaz-G., SISC 2009, CMAME 2015
  • σ = 0: LTS-LF of arbitrarily high accuracy, multi-level version
  • G.-Mitkova, JCAM 2010, 2013
  • σ ≥ 0: LTS-AB of arbitrarily high accuracy
  • Hochbruck-Ostermann, BIT 2011
  • Exponential multistep methods of Adams type

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-16
SLIDE 16

RK Based Explicit LTS

Advantages of RK methods:

  • One-step method, no starting procedure
  • Time adaptivity straightforward
  • Larger stability regions (but more work per step)
  • Low storage (LSRK) versions available
  • Knoth et al., BIT 2009, JCAM 2009
  • Multirate RK for advection equations, 3d order
  • Liu, Li, Hu, JCP 2010
  • Non-uniform LDDRK-DG for CFD, linear coupling

conditions

“...the availability of extrapolation from past values is an advantage for multistep methods

  • ver Runge-Kutta methods in the multirate context.”

(Gear-Wells, BIT, 1984)

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-17
SLIDE 17

RK Based Explicit LTS Methods

Goal: Derive Runge-Kutta (RK) based explicit LTS methods for dy dt (t) = By(t) + F(t) , t ∈ (0, T) . (1) B involves the factor M−1. The mass matrix M is (block-)diagonal ⇒ computing M−1 is cheap ⇒ fully explicit time-stepping!

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-18
SLIDE 18

RK Based Explicit LTS Methods

Goal: Derive Runge-Kutta (RK) based explicit LTS methods for dy dt (t) = By(t) + F(t) , t ∈ (0, T) . (1) B involves the factor M−1. The mass matrix M is (block-)diagonal ⇒ computing M−1 is cheap ⇒ fully explicit time-stepping! adaptivity, small geometric features ⇓ locally refined meshes ⇓ CFL condition for explicit time-stepping ∆t ≤ C h, h = min

T ∈Th hT Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-19
SLIDE 19

RK-methods and numerical integration

y′(t) = f(y(t), t)), y(0) = y0

k1 = f(yn, tn), k2 = f(yn + ∆ta21k1, tn + c2∆t), . . . ks = f(yn + ∆t

s−1

  • i=1

asiki, tn + cs∆t), y(tn+1) ≈ yn+1 = yn + ∆t

s

  • i=1

biki. c2 a21 c3 a31 a32 . . . . . . . . . ... cs as1 . . . as,s−1 b1 . . . bs−1 bs

Butcher-tableau of an explicit RKs scheme of order k.

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-20
SLIDE 20

RK-methods and numerical integration

y′(t) = f(y(t), t)), y(0) = y0

k1 = f(yn, tn), k2 = f(yn + ∆ta21k1, tn + c2∆t), . . . ks = f(yn + ∆t

s−1

  • i=1

asiki, tn + cs∆t), y(tn+1) ≈ yn+1 = yn + ∆t

s

  • i=1

biki. c2 a21 c3 a31 a32 . . . . . . . . . ... cs as1 . . . as,s−1 b1 . . . bs−1 bs

Butcher-tableau of an explicit RKs scheme of order k.

Underlying quadrature formula with weights b1, . . . , bs and nodes 0, c2, . . . , cs has at least order k.

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-21
SLIDE 21

RK Based Explicit LTS

Let us now split y and F in two parts

y =

  • ycoarse
  • +
  • yfine
  • = (I − P)y + Py, P2 = P,

F =

  • Fcoarse
  • +
  • Ffine
  • = (I − P)F + PF, P2 = P.

Then, we have

d dty = By + F = B(I − P)y + BPy + (I − P)F + PF

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-22
SLIDE 22

RK Based Explicit LTS

Let us now split y and F in two parts

y =

  • ycoarse
  • +
  • yfine
  • = (I − P)y + Py, P2 = P,

F =

  • Fcoarse
  • +
  • Ffine
  • = (I − P)F + PF, P2 = P.

Then, we have

d dty = By + F = B(I − P)y + BPy + (I − P)F + PF

  • r

y(tn + ξ∆t) = y(tn) + tn+ξ∆t

tn

B(I − P)y(t) + (I − P)F(t) dt + tn+ξ∆t

tn

BPy(t) + PF(t) dt .

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-23
SLIDE 23

RK Based Explicit LTS/ LTS-RK2(p)

Coarse part

tn+ξ∆t

tn

B(I − P)y(t) + (I − P)F(t) dt

1 1 1/2 1/2

  • coeff. of RK2

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-24
SLIDE 24

RK Based Explicit LTS/ LTS-RK2(p)

Coarse part

tn+ξ∆t

tn

B(I − P)y(t) + (I − P)F(t) dt

1 1 1/2 1/2

  • coeff. of RK2

tn+ξ∆t

tn

B(I − P)y(t) dt ≈ ξ∆t 2 B(I − P) (y(tn) + y(tn + ξ∆t)) (QF) ≈ ξ∆t 2

  • B(I − P)yn + B(I − P)
  • yn + ξ∆t (Byn + Fn)
  • (Taylor)

= ξ∆tB(I − P)

  • yn + ξ∆t

2

  • Byn + Fn
  • Marcus Grote

Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-25
SLIDE 25

RK Based Explicit LTS/ LTS-RK2(p)

Coarse part

tn+ξ∆t

tn

B(I − P)y(t) + (I − P)F(t) dt

1 1 1/2 1/2

  • coeff. of RK2

tn+ξ∆t

tn

B(I − P)y(t) dt ≈ ξ∆t 2 B(I − P) (y(tn) + y(tn + ξ∆t)) (QF) ≈ ξ∆t 2

  • B(I − P)yn + B(I − P)
  • yn + ξ∆t (Byn + Fn)
  • (Taylor)

= ξ∆tB(I − P)

  • yn + ξ∆t

2

  • Byn + Fn
  • We replace (I − P)F(t) by (I − P)q(t), where q(t) is the interpolation

polynomial through the points (tn, F(tn)), (tn + ∆t, F(tn + ∆t)).

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-26
SLIDE 26

RK Based Explicit LTS / LTS-RK2(p)

Fine part

tn+ξ∆t

tn

BPy(t) + PF(t) dt ≈ ξ∆t BP y(τ) + PF(tn + τ) dτ y(tn + ξ∆t) ≈ yn + ξ∆tB(I − P)

  • yn + ξ ∆t

2 (Byn + Fn)

  • +

ξ∆t (I − P)q(tn + τ) dτ + ξ∆t BP y(τ) + PF(tn + τ) dτ + O(∆t3)

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-27
SLIDE 27

RK Based Explicit LTS / LTS-RK2(p)

Fine part

tn+ξ∆t

tn

BPy(t) + PF(t) dt ≈ ξ∆t BP y(τ) + PF(tn + τ) dτ y(tn + ξ∆t) ≈ yn + ξ∆tB(I − P)

  • yn + ξ ∆t

2 (Byn + Fn)

  • +

ξ∆t (I − P)q(tn + τ) dτ + ξ∆t BP y(τ) + PF(tn + τ) dτ + O(∆t3) Where y is the solution of

  • y(0) = yn

d dτ y(τ) = B(I − P)

  • yn + τ (Byn + Fn)
  • +(I − P)q(tn + τ)

+BP y(τ) + PF(tn + τ) y(tn + ξ∆t) ≈ y(ξ∆t) = ⇒ yn+1 := y(∆t)

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-28
SLIDE 28

RK Based Explicit LTS / LTS-RK2(p) Algorithm

We compute y(τ) for 0 ≤ τ ≤ ∆t again by using the RK2 method with the smaller time-step ∆τ = ∆t/p.

  • wn,0 := B(I − P)yn + (I − P)Fn

wn,1 := B(I − P)(Byn + Fn) + (I − P)Fn+1 − Fn ∆t i = 0 , . . . , p − 1 k1, i+1

p

:= wn,0 + i∆τ wn,1 + BP y i

p + PFn,i

k2, i+1

p

:= wn,0 + (i + 1) ∆τwn,1 + BP

  • y i

p + ∆τk1, i+1 p

  • + PFn,i+1
  • y i+1

p

:= y i

p + ∆τ

2

  • k1, i+1

p

+ k2, i+1

p

  • yn+1 :=

y(∆t) = ⇒ yn+1 := y1

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-29
SLIDE 29

RK Based Explicit LTS / LTS-RK2(p) Algorithm

We compute y(τ) for 0 ≤ τ ≤ ∆t again by using the RK2 method with the smaller time-step ∆τ = ∆t/p.

  • wn,0 := B(I − P)yn + (I − P)Fn

wn,1 := B(I − P)(Byn + Fn) + (I − P)Fn+1 − Fn ∆t i = 0 , . . . , p − 1 k1, i+1

p

:= wn,0 + i∆τ wn,1 + BP y i

p + PFn,i

k2, i+1

p

:= wn,0 + (i + 1) ∆τwn,1 + BP

  • y i

p + ∆τk1, i+1 p

  • + PFn,i+1
  • y i+1

p

:= y i

p + ∆τ

2

  • k1, i+1

p

+ k2, i+1

p

  • yn+1 :=

y(∆t) = ⇒ yn+1 := y1 The LTS-RK2 scheme requires two multiplications by B(I − P) and 2p multiplications by BP per time-step ∆t. For P = 0 or p = 1, the LTS-RK2 scheme coincides with the RK2 scheme.

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-30
SLIDE 30

High-order explicit LTS-RK and LTS-LSRK

The previous derivation can be extended to any explicit RK scheme of order k, including Low-Storage schemes.

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-31
SLIDE 31

High-order explicit LTS-RK and LTS-LSRK

The previous derivation can be extended to any explicit RK scheme of order k, including Low-Storage schemes.

Proposition:

Consider an explicit RK method of order k (with at least k − 1 different coefficients c1, . . . , cs). Then the corresponding LTS-RKk(p) scheme has order k.

Theorem:

For s = k = 2, 3, 4 the LTS-RKs(p) scheme is convergent

  • f order k (in the ODE sense).

Remark:

Like standard RK methods, the LTS-RKs(p) inherently conserve linear invariants.

G.-Mehlin-Mitkova, SISC 2015

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-32
SLIDE 32

Numerical Experiments 1D: LTS-RK3(p)

  • Computational domain:

Ω = [0, 6] , Ωcoarse = [0, 2] ∪ [4, 6] , Ωfine = [2, 4] , p = 2, 5, 11

  • Exact solution: (c ≡ 1)

u(x, t) = cos(t) · sin(πx)

  • Source data

f(x, t) = sin(πx)

  • (π2 − 1) cos(t) − σ sin(t)
  • Homogeneous Dirichlet boundary condition
  • Damping parameter: σ ≡ 0.1
  • Space discretization: P2 mass-lumped and nodal DG FEM

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-33
SLIDE 33

Numerical Experiments 1D: LTS-RK3(p)

local refinement p = 2, ∆t = ∆tRK3 (optimal CFL condition)

  • ∆tRK3 = max. time-step for standard non-LTS RK3

solution at T = 10 L2-errors at T = 10

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-34
SLIDE 34

Numerical Experiments 1D: LTS-RK3(p)

local refinement p = 5, ∆t = ∆tRK3 (optimal CFL condition)

  • ∆tRK3 = max. time-step for standard non-LTS RK3

solution at T = 10 L2-errors at T = 10

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-35
SLIDE 35

Numerical Experiments 1D: LTS-RK3(p)

local refinement p = 11, ∆t = ∆tRK3 (optimal CFL condition)

  • ∆tRK3 = max. time-step for standard non-LTS RK3

solution at T = 10 L2-errors at T = 10

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-36
SLIDE 36

Numerical Experiments 2D: Narrow Gap

  • Computational domain: Ω is rectangular of size [0, 2] × [0, 1] with two

rectangular barriers inside forming a narrow gap

  • Mesh with ratio of local refinement p = 7 resolves the small geometric

features of the gap The initial triangular mesh : hfine ≈ hcoarse/7

  • Homogeneous source data and Neumann BC
  • Model parameters: c ≡ 1, σ = 0.1
  • A plane wave is excited through the initial conditions
  • LTS-LSRK3(7) combined with P2 mass-lumped FEM

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-37
SLIDE 37

Numerical Experiments 2D: Narrow Gap

t = 0.1 Numerical solution on the global refinement level 4

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-38
SLIDE 38

Numerical Experiments 2D: Narrow Gap

t = 0.3 Numerical solution on the global refinement level 4

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-39
SLIDE 39

Numerical Experiments 2D: Narrow Gap

t = 0.45 Numerical solution on the global refinement level 4

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-40
SLIDE 40

Numerical Experiments 2D: Narrow Gap

t = 0.55 Numerical solution on the global refinement level 4

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-41
SLIDE 41

Numerical Experiments 2D: Narrow Gap

t = 0.7 Numerical solution on the global refinement level 4

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-42
SLIDE 42

Numerical Experiments 2D: Narrow Gap

t = 0.9 Numerical solution on the global refinement level 4

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-43
SLIDE 43

Leap-Frog based LTS methods

Semi-discrete Galerkin FE of the wave equation with (σ = 0): d2 dt2 y(t) + Ay(t) = F where A = M−1/2KM−1/2.

  • conforming FE (with mass lumping) or IP-DG
  • M and K are symmetric positive definite
  • M is (block-) diagonal

⇒ computing M−1 is cheap ⇒ fully explicit time-stepping!

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-44
SLIDE 44

LTS-LF methods

Following similar ideas, one can derive an LTS-LF method:

  • second-order accurate
  • energy conserving (for ∆t sufficiently small)
  • can be extended to arbitrary (even) order
  • can be extended recursively to multiple levels: MLTS-LF

yn+1 = −yn−1 + 2 LTS2(yn, −A(I − P)yn), where the function znew = LTS2(z, w) is defined as:

1 znew := z + 1

2∆τ 2 (w − APz),

∆τ = ∆t/p

2 For m = 1, ..., p − 1

(i) zold := z; z := znew (ii) znew := 2z − zold + ∆τ 2 (w − APz) Diaz and G., SISC (2009), CMAME (2015)

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-45
SLIDE 45

Convergence of LTS-LF Methods

Convergence ”in the PDE sense” as both h, ∆t → 0 is not straightforward.

Two major difficulties:

  • Potential order reduction when combining high-order space

and time discretizations (Moya, Verwer 2011)

  • Standard proof techniques do not apply due to

non-commuting projection and differentiation

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-46
SLIDE 46

Convergence of LTS-LF Methods

Convergence ”in the PDE sense” as both h, ∆t → 0 is not straightforward.

Two major difficulties:

  • Potential order reduction when combining high-order space

and time discretizations (Moya, Verwer 2011)

  • Standard proof techniques do not apply due to

non-commuting projection and differentiation

Theorem: (G.-Mehlin-Sauter)

Under standard regularity assumptions on u and for ∆t sufficiently small (CFL condition), the error at time tn satisfies u − uhL∞([0,T];L2(Ω)) ≤ Chm+1 (1 + T) uW 1,∞([0,T];Hm+1(Ω))

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-47
SLIDE 47

Convergence: 1-D experiments, p = 13, h, ∆t → 0

IP-DG P 3 elements Energy conserved!

10

−2

10

−1

10 10

−4

10

−3

10

−2

10

−1

log(Err) log(h2) 1 2 3 4 5 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 numerical analytical 10 20 30 40 50 60 3.2461 3.2461 3.2461 x 10

4

T E

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-48
SLIDE 48

Multi-level local time stepping (MLTS)

Consider now a sequence of nested grids, such as from iterative refinement (hp adaptivity). h2 = h1 p2 = hcoarse p1p2 Apply LTS-LF method recursively: ∆θ = ∆τ p2 = ∆t p1p2

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-49
SLIDE 49

Two-level local time stepping (p1, p2)

Convergence as hcoarse, ∆t → 0 MLTS-2, IP-DG P 1 FE MLTS-4, IP-DG P 3 FE MLTS-2: Energy conservation proved regardless of p1, p2, . . .

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-50
SLIDE 50

Numerical Experiments: SPECFEM3D-LTS

User Manual

COMPUTATIONAL INFRASTRUCTURE FOR GEODYNAMICS (CIG)

Version 2.1

Piero Basini Céline Blitz Ebru Bozdağ Emanuele Casarotti Joseph Charles Min Chen Dominik Göddeke Vala Hjörleifsdóttir Sue Kientz Dimitri Komatitsch Jesús Labarta Nicolas Le Goff Pieyre Le Loher Qinya Liu Yang Luo Alessia Maggi Federica Magnoni Roland Martin René Matzen Dennis McRitchie Matthias Meschede Peter Messer David Michéa Tarje Nissen-Meyer Daniel Peter Max Rietmann Brian Savage Bernhard Schuberth Anne Sieminski Leif Strand Carl Tape Jeroen Tromp Jean-Pierre Vilotte Zhinan Xie Hejun Zhu

PRINCETON UNIVERSITY (USA) CNRS, INRIA and UNIVERSITY OF PAU (FRANCE)

SPECFEM 3D

Cartesian

SPECFEM3D: SEM for the viscoelastic wave equation SCOTCH: parallel graph partitioning software

Rietmann, G., Peter, Schenk, U¸ car ”Load-balanced local time stepping for large-scale wave propagation”, IPDPS 2015.

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-51
SLIDE 51

Numerical Experiments: 3D seismology

Mesh: 350’000 elements (23 Mio. dof’s), T = 180 [s], single 8-core CPU-time: 16 [h] (instead of 40 [min]) due to local refinement)

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-52
SLIDE 52

Numerical Experiments: Expected Speed-up

∆t/32 ∆t/16 ∆t/8 ∆t/4 ∆t/2 ∆t 1 2 3 4 ×105 1,829 1,494 49 1,400 5,613 3.42 · 105 # elements

Expected speedup: (32 × 352482)/(32 × 1829 +16 × 1494 + 8 × 49 + 4 × 1400 + 2 × 5613 + 342097) = 25.5.

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-53
SLIDE 53

Numerical Experiments: Parallel performance

2 4 8 16 1 10 100

,CPU non-LTS ,GPU non-LTS ,CPU LTS , G P U L T S 6.5x 3.7x 4.4x 24x 106x

Relative Performance Number of Nodes (CPU×8,GPU×1)

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-54
SLIDE 54

Concluding Remarks

  • Wave equations with (or without) damping
  • Mass-lumped/IP-DG/nodal DG FE discretization ⇒

block-diagonal mass matrix ⇒ explicit time integration

  • Proposed explicit LTS schemes of arbitrarily high accuracy:
  • Without damping (σ = 0): LTS L-F type methods

⇒ discrete energy conserved

  • With damping (σ ≤ 0): LTS AB(k) methods

⇒ optimal CFL for k ≥ 3

  • With damping (σ ≤ 0): LTS RK and LSRK methods

⇒ optimal CFL for k ≥ 3

  • Proved convergence for LTS-LF as h, ∆t → 0
  • Achieves high parallel performance on HPC architectures

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-55
SLIDE 55

Concluding Remarks

  • Wave equations with (or without) damping
  • Mass-lumped/IP-DG/nodal DG FE discretization ⇒

block-diagonal mass matrix ⇒ explicit time integration

  • Proposed explicit LTS schemes of arbitrarily high accuracy:
  • Without damping (σ = 0): LTS L-F type methods

⇒ discrete energy conserved

  • With damping (σ ≤ 0): LTS AB(k) methods

⇒ optimal CFL for k ≥ 3

  • With damping (σ ≤ 0): LTS RK and LSRK methods

⇒ optimal CFL for k ≥ 3

  • Proved convergence for LTS-LF as h, ∆t → 0
  • Achieves high parallel performance on HPC architectures
  • Current work: prove convergence for LTS-RK as ∆t, h → 0

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016

slide-56
SLIDE 56

Concluding Remarks

  • Wave equations with (or without) damping
  • Mass-lumped/IP-DG/nodal DG FE discretization ⇒

block-diagonal mass matrix ⇒ explicit time integration

  • Proposed explicit LTS schemes of arbitrarily high accuracy:
  • Without damping (σ = 0): LTS L-F type methods

⇒ discrete energy conserved

  • With damping (σ ≤ 0): LTS AB(k) methods

⇒ optimal CFL for k ≥ 3

  • With damping (σ ≤ 0): LTS RK and LSRK methods

⇒ optimal CFL for k ≥ 3

  • Proved convergence for LTS-LF as h, ∆t → 0
  • Achieves high parallel performance on HPC architectures
  • Current work: prove convergence for LTS-RK as ∆t, h → 0

Thank you for your attention!

Marcus Grote Universit´ e de Bˆ ale CCN 2016, Nice, 27-29 sept., 2016