Parallelization of stencil-based methods: step-69 (and beyond) - - PowerPoint PPT Presentation

parallelization of stencil based methods step 69 and
SMART_READER_LITE
LIVE PREVIEW

Parallelization of stencil-based methods: step-69 (and beyond) - - PowerPoint PPT Presentation

Parallelization of stencil-based methods: step-69 (and beyond) Martin Kronbichler 1 , Matthias Maier, Ignacio Tomas 2 Department of Mathematics Texas A&M University 1 Institute for Computational Mechanics, TUM, Munich 2 CSRI, Sandia


slide-1
SLIDE 1

Parallelization of “stencil-based” methods: step-69 (and beyond)

Martin Kronbichler1, Matthias Maier, Ignacio Tomas2

Department of Mathematics Texas A&M University

1 Institute for Computational Mechanics, TUM, Munich 2 CSRI, Sandia National Laboratories 1 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-2
SLIDE 2

Motivation

2 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-3
SLIDE 3

Motivation

Simulation of hyperbolic systems of conservation laws Goals: › robust (?!) › accurate (?!) › scalable (!)

3 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-4
SLIDE 4

Outline

1 Motivation 2 The compressible Euler equations 3 step-69 4 . . . and beyond

4 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-5
SLIDE 5

The compressible Euler equations

5 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-6
SLIDE 6

Compressible Euler equations

Hyperbolic system of conservation laws

ut + div f (u) = 0; where u(x; t) : Rd ˆ R ! Rd+2, and f(u) : Rd+2 ! R(d+2)ˆd, with space dim. d – 1. State: u =

2 6 4

 m E

3 7 5

Flux: f (u) =

2 6 6 6 4

m> `1m ˙ m + Ip

m>  (E + p)

3 7 7 7 5 ;

Here  2 R+ denotes the density, m 2 Rd is the momentum, and E 2 R+ is the total energy of the system. The pressure p 2 R+ is determined by an equation of state,

  • e. g.,

Polytropic gas equation: p(u) := (‚ ` 1)

E ` jmj2

2 

«

; ‚ 2 (1; 5=3]:

6 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-7
SLIDE 7

Compressible Euler equations

Variational principle viable ???

Testing with u: 1 2 d dt

‚ ‚ ‚u ‚ ‚ ‚

2 L2Ω +

div f (u); u

L2(Ω)

| {z }

???

= 0: › no energy estimate available › no (quasi) best approximation › etc. ` ! none of the classical finite element toolbox available. . .

Solution theory

We call u a viscosity solution if u = lim

"!0 u";

with u"

t + div f (u") = "∆u":

` ! global existence and uniqueness

  • f viscosity solutions is an open

problem. So what can we do?

7 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-8
SLIDE 8

Compressible Euler equations

What can we meaningfully expect?

Invariant set

If u is a viscosity solution, then u(x; t) 2 B 8 x 2 Ω; 8 t – 0; where u 2 B implies › positivity of density:  > 0 › positivity of internal energy: E ` jmj2

2

> 0 › local minimum principle on specific entropy: s(u) – min

x2Ω s(u0(x))

Conservation

Conservation of mass, momentum and total energy: d dt ˆ

u dt + ˆ

@Ω

n ´ f (u) dox = 0: ` ! Robustness: We want a numerical scheme that is conservative and maintains the invariant set.

8 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-9
SLIDE 9

Compressible Euler equations

How does (a subset of) the community judge our code? Picture norm comparison of flow characteristics to (known) experimental results.

9 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-10
SLIDE 10

Compressible Euler equations

To summarize: › Robustness: conservative and invariant domain preserving Most importantly: This implies that our scheme will never crash (no matter what mesh was used, or (admissible) initial data was prescribed) › Accuracy (not discussed): second order convergent in case of known, smooth viscosity solutions › Scalability: Large-scale 3D computation

10 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-11
SLIDE 11

step-69

11 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-12
SLIDE 12

step-69

› Guermond & Popov, Invariant domains and first-order continuous finite element approximation for hyperbolic systems, SIAM J. Numer. Anal. 54(4):2466-2489.

(Grossly oversimplified) formal derivation

Let uh(x; t) =

P

j ’j(x)Uj(t) be a finite element approximation:

d dt

’i; uh

+

’i; div f(uh)

= 0: Forward Euler:

X

j2I(i)

mij Un+1

j

` Un

j

fin +

’i; div f

“ X

j2I(i)

Un

j

””

= 0: Lump mass matrix and approximate flux, f(un

h) ı

P

j f(Un j ) ’j:

mi Un+1

i

` Un

i

fin +

X

j2I(i)

f

Un

j

´ cij = 0; mi = ˆ

’i; cij = ˆ

’ir’j:

12 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-13
SLIDE 13

step-69

› Guermond & Popov, Invariant domains and first-order continuous finite element approximation for hyperbolic systems, SIAM J. Numer. Anal. 54(4):2466-2489.

Scheme

mi Un+1

i

` Un

i

fin +

X

j2I(i)

f

Un

j

´ cij `

X

j2I(i)

dn

ij Un j

= 0; where mi = ˆ

’i; cij = ˆ

’ir’j; and where we have introduced a “graph viscosity” dn

ij as stabilization term.

The correct construction of dn

ij is key for robustness

13 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-14
SLIDE 14

step-69

Compute two characteristic propagation speeds associated with Un

i or Un j :

–1

`(Un i ; p˜) := ˜

un

i

` ˜ cn

i

v u u t1 + ‚ + 1 2 ‚ " p˜ ` ˜ pn

i

˜ pn

i

#

pos

; –3

+(Un j ; p˜) := ˜

un

j

+ ˜ cn

j

v u u t1 + ‚ + 1 2 ‚ " p˜ ` ˜ pn

j

˜ pn

j

#

pos

;

and a two-rarefaction pressure:

˜ p˜(Un

i ; Un j ) = ˜

pj B @ ˜ ci + ˜ cj ` ‚`1

2

`˜ uj ` ˜ ui ´ ˜ ci ` ˜

pi ˜ pj

´` ‚`1

2 ‚ + ˜

cj 1 C A

2 ‚ ‚`1

;

and a monotone increasing and concave down function

(p) := f (Un

i ; p) + f (Un j ; p) + ˜

uj ` ˜ ui; f (U; p) := 8 > > > > < > > > > : p 2 (p ` ˜ p) q ˜  ˆ (‚ + 1) p + (‚ ` 1) ˜ p ˜ if p – ˜ p; » (p=˜ p)

‚`1 2 ‚ ` 1

– 2 ˜ c ‚ ` 1 ;

  • therwise:

Now:

˜ –max = max “ˆ –1

`(Un i ; p˜)

˜

neg ;

ˆ –3

+(Un j ; p˜)

˜

pos

” ; p˜ := ˜ p˜(Un

i ; Un j )

if (pmax) < 0; min(˜ pmax; ˜ p˜(Un

i ; Un j ))

  • therwise:

14 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-15
SLIDE 15

step-69

Scheme

Un+1

i

= Un

i

+ fin mi

„ X

j2I(i)

f

Un

j

´ cij `

X

j2I(i)

dn

ij Un j

«

where

mi = ˆ

’i; cij = ˆ

’ir’j; dn

ij = max

“ ˜ –max(nij; Un

i ; Un j ) jcijj ; ˜

–max(nji; Un

j ; Un i ) jcjij

I(i) — all column indices coupling to i. Stencil based: compare to matrix-vector multiplication yi =

X

j2I(i)

aij xj; . . . but highly nonlinear!

15 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-16
SLIDE 16

step-69

euler_step // Step 1: compute off-diagonal dn

ij :

for i = 1, . . . , N do for j 2 I(i), j 6= i do dn

ij

max “ ˜ –max(nij; Un

i ; Un j ) jcijj ; ˜

–max(nji; Un

j ; Un i ) jcjij

” // Step 2: compute dn

ii and fin:

fin +1 for i = 1, . . . , N do dn

ii

` P

j2I(i);j6=i dn ij ,

fin min “ fin ; `ccfl

mi 2dn

ii

” // Step 3: perform update for i = 1, . . . , N do for j 2 I(i) do Un+1

i

Un

i

+

fin mi

“ P

j2I(i) f

` Un

j

´ ´ cij ` P

j2I(i) dn ij Un j

” // Step 4: MPI synchronization U.update_ghost_values()

16 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-17
SLIDE 17

step-69

Parallelization approach (step-69)

› MPI parallelization via parallel::distributed::Triangulation LinearAlgebra::distributed::Vector › Precompute mi and cij. (This is the only place where finite elements enter) › Distribute work onto threads with parallel::apply_to_subranges() › Avoids global to (MPI) local index translations. › (Asynchronous IO.)

17 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-18
SLIDE 18

MPI local index translations: Asynchronous IO:

18 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-19
SLIDE 19

19 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-20
SLIDE 20

. . . and beyond

20 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-21
SLIDE 21

. . . and beyond

mi

UL;n+1

i

` Un

i

= fin

X

j2I(i)

` f(Un

j ) ´ cij + dL;n ij

Un

j ` Un i

”«

:

X

j2I(i)

mij

UH;n+1

j

` Un

j

= fin

X

j2I(i)

` f(Un

j ) ´ cij + dL;n ij

¸n

i + ¸n j

2

Un

j ` Un i

”«

; with a suitable indicator ¸i. › The low-order UL;n+1

i

is robust, the high-order UH;n+1

i

is not. › Key idea: Invariant-domain preserving convex limiting 3 UH;n+1

i

` UL;n+1

i

=

X

j2I(i)

Pn

ij;

where Pn

ij = : : :

Un+1

i

` UL;n+1

i

=

X

j2I(i)

ln

ij Pn ij;

0 » ln

ij » 1:

3Guermond, Nazarov, Popov, Tomas, Second-order invariant domain preserving approximation of the

Euler equations using convex limiting, SIAM J. Sci. Comput. 40 (2018)

21 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-22
SLIDE 22
slide-23
SLIDE 23

Coming soon. . .

https://github.com/conservation-laws/

23 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods

slide-24
SLIDE 24

Thank you for your attention!

2D 38M gridpoints: https://www.youtube.com/watch?v=xIwJZlsXpZ4 3D 1.8B gridpoints: https://www.youtube.com/watch?v=vBCRAF_c8m8 https://www.dealii.org/ https://github.com/conservation-laws/

24 Kronbichler, Maier, Tomas | Parallelization of “stencil-based” methods