Development of a Terrestrial Dynamical Core for E3SM (TDycore) - - PowerPoint PPT Presentation

development of a terrestrial dynamical core for e3sm
SMART_READER_LITE
LIVE PREVIEW

Development of a Terrestrial Dynamical Core for E3SM (TDycore) - - PowerPoint PPT Presentation

Development of a Terrestrial Dynamical Core for E3SM (TDycore) Nathan Collier Oak Ridge National Laboratory https://github.com/TDycores-Project/TDycore June 2019 Tale of Two Talks Common Theme: Considerations in choosing a discretization


slide-1
SLIDE 1

Development of a Terrestrial Dynamical Core for E3SM (TDycore)

Nathan Collier Oak Ridge National Laboratory https://github.com/TDycores-Project/TDycore June 2019

slide-2
SLIDE 2

Tale of Two Talks

Common Theme: Considerations in choosing a discretization method

  • I. The Effect of a Higher Continuous Basis on Solver Performance

Victor Calo (Curtin), David Pardo (Ikerbasque), Lisandro Dalcin (KAUST), Maciej Paszynski (AGH)

  • II. Selection of a Numerical Method for a Terrestrial Dynamical Core

Jed Brown (Colorado), Gautam Bisht (PNNL), Matthew Knepley (Buffalo), Jennifer Fredrick (SNL), Glenn Hammond (SNL), Satish Karra (LANL)

slide-3
SLIDE 3

Higher Continuous Basis?

Standard C0 basis Cp-1 continuous basis

Increasing p

slide-4
SLIDE 4

Poisson problem on unit cube

103 104 105 106 Number of Degrees of Freedom 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 | True Error |_H1 p1 C0 p2 C0 p3 C0 p4 C0 p5 C0 p6 C0

slide-5
SLIDE 5

Poisson problem on unit cube

103 104 105 106 Number of Degrees of Freedom 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 | True Error |_H1 p1 C0 p2 C0 p2 C1 p3 C0 p3 C2 p4 C0 p4 C3 p5 C0 p5 C4 p6 C0 p6 C5

slide-6
SLIDE 6

Are higher continuous spaces an efficient way to p−refine?

What effect does continuity have on the solver performance?

slide-7
SLIDE 7

Are higher continuous spaces an efficient way to p−refine?

What effect does continuity have on the solver performance?

Spoiler Alert!

C 0 C p−1 C p−1/C 0 Multifrontal direct solver O(N2 + Np6) O(N2p3) O(p3) Iterative solvers∗ O(Np4) O(Np6) O(p2)

∗Estimates for Matrix-Vector products

slide-8
SLIDE 8

Multi-frontal direct solver

Based on the concepts of the Schur complement and nested dissection.

¡

slide-9
SLIDE 9

Key concept: size s of the separator

s = 1 for C 0 s = p for C p−1

slide-10
SLIDE 10

Estimates and Results (d = 3, N = 30k)

C 0 C p−1 Time O(N2 + Np6) O(N2p3) Memory O(N4/3 + Np3) O(N4/3p2)

1 2 3 4 5 6 7 8 polynomial order 6 8 10 12 14 16 18 20 22 24 time (s) Time Memory 200 250 300 350 400 450 500 550 Memory (Mb) 1 2 3 4 5 6 7 8 polynomial order 200 400 600 800 1000 1200 1400 time (s) Time Memory 500 1000 1500 2000 2500 3000 3500 4000 Memory (Mb)

slide-11
SLIDE 11

Solution time for C 0 vs C p−1 (d = 3, N = 30k)

1 2 3 4 5 6 7 8 polynomial order 500 1000 1500 time (s) C^0 C^(p-1)

slide-12
SLIDE 12

Iterative solvers

Much more complex to assess costs: P (Ax − b) = 0 Need a model for: ◮ Matrix-vector multiplication ◮ Preconditioner (P) setup and application ◮ Convergence

slide-13
SLIDE 13

Sample Linear Systems

C 0 space C p−1 space

slide-14
SLIDE 14

Matrix-vector multiplication - C 0

The cost of a sparse matrix-vector multiply is proportional to the number of nonzero entries in the matrix.

2p+1 interactions p+1 interactions element considered Vertex DOF: Interior DOF:

slide-15
SLIDE 15

Matrix-vector multiplication - C 0

Number DOFs Number Dimension Entity

  • f Entities

per Entity

  • f interactions

1D vertex 1 1 (2p + 1) 1D interior 1 (p − 1) (p + 1) 2D vertex 1 1 (2p + 1)2 2D edge 2 (p − 1) (2p + 1)(p + 1) 2D interior 1 (p − 1)2 (p + 1)2 3D vertex 1 1 (2p + 1)3 3D edge 3 (p − 1) (2p + 1)2(p + 1) 3D face 3 (p − 1)2 (2p + 1)(p + 1)2 3D interior 1 (p − 1)3 (p + 1)3

slide-16
SLIDE 16

Matrix-vector multiplication - C 0

nnzC 0 = (p − 1)3

  • interior DOF

· (p + 1)3 + 3(p − 1)2

  • face DOF

· (2p + 1)(p + 1)2 + 3(p − 1)

  • edge DOF

· (2p + 1)2(p + 1) + 1

  • vertex DOF

· (2p + 1)3 = p6 + 6p5 + 12p4 + 8p3 = p3(p + 2)3 = O(p6)

slide-17
SLIDE 17

Matrix-vector multiplication - C p−1

The B-spline C p−1 basis is very regular, each DOF interacts with 2p + 1

  • thers in 1D.

nnzC p−1 = p3(2p + 1)3 = 8p6 + 12p5 + 6p4 + p3 = O(8p6)

slide-18
SLIDE 18

Matrix-vector multiplication

1 2 3 4 5 polynomial order p 1 2 3 4 5 time ratio Cp−1/C0 1.00 1.94 2.77 3.39 4.02

Matrix-Vector product

theoretical measured

slide-19
SLIDE 19

Matrix-vector multiplication

However, for C 0 spaces, we can use static condensation as in the multifrontal direct solver. Number DOFs Number Statically Entity

  • f Entities

per Entity

  • f interactions

condensed vertex 1 1 (2p + 1)3 −8(p − 1)3 edge 3 (p − 1) (2p + 1)2(p + 1) −4(p − 1)3 face 3 (p − 1)2 (2p + 1)(p + 1)2 −2(p − 1)3 33p4 − 12p3 + 9p2 − 6p + 3 = O(33p4)

slide-20
SLIDE 20

Matrix-vector multiplication

2 4 6 8 10 12 14 polynomial order, p 10 20 30 40 50 60 70 ratio of number of nonzeros Cp−1 to C0 Cp−1 to C0 with static condensation

slide-21
SLIDE 21

3D Poisson + CG + ILU

10

2

10

3

10

4

10

5

10

6

Number of Degrees of Freedom 2 4 6 8 10 12 Solve time ratio, C^p-1 vs C0 p =2 p =3 p =4 p =5 p =6

slide-22
SLIDE 22

3D Poisson + CG + ILU + static condensation

10

3

10

4

10

5

10

6

Number of Degrees of Freedom 20 40 60 80 100 120 Solve time ratio, C^p-1 vs C0 p =2 p =3 p =4 p =5 p =6

slide-23
SLIDE 23

Related Publications

◮ N Collier, D Pardo, L Dalcin, M Paszynski, VM Calo, The cost of continuity: A study of the performance of isogeometric finite elements using direct solvers, Computer Methods in Applied Mechanics and Engineering 213, 353-361, 2012. 10.1016/j.cma.2011.11.002 ◮ N Collier, L Dalcin, D Pardo, VM Calo, The cost of continuity: performance of iterative solvers on isogeometric finite elements, SIAM Journal on Scientific Computing 35 (2), A767-A784, 2013. 10.1137/120881038 ◮ N Collier, L Dalcin, VM Calo, On the computational efficiency of isogeometric methods for smooth elliptic problems using direct solvers, International Journal for Numerical Methods in Engineering 100 (8), 620-632. 10.1002/nme.4769

slide-24
SLIDE 24

Tale of Two Talks

Common Theme: Considerations in choosing a discretization method

  • I. The Effect of a Higher Continuous Basis on Solver Performance

Victor Calo (Curtin), David Pardo (Ikerbasque), Lisandro Dalcin (KAUST), Maciej Paszynski (AGH)

  • II. Selection of a Numerical Method for a Terrestrial Dynamical Core

Jed Brown (Colorado), Gautam Bisht (PNNL), Matthew Knepley (Buffalo), Jennifer Fredrick (SNL), Glenn Hammond (SNL), Satish Karra (LANL)

slide-25
SLIDE 25

Energy Exascale Earth System Model (E3SM)

◮ The terrestrial water cycle is a key component of the Earth system model ◮ While conceptually key processes transport water laterally, the representation is 1D in current models ◮ Requirements: accurate velocities on distorted grids with uncertain and rough coefficients at global scale ◮ Naturally think of mixed finite elements

slide-26
SLIDE 26

Simplified Problem Statement

Strong form Find u and p such that, u = −K∇p in Ω ∇ · u = f in Ω p = g

  • n ΓD

u · n = 0

  • n ΓN
slide-27
SLIDE 27

Simplified Problem Statement

Strong form Find u and p such that, u = −K∇p in Ω ∇ · u = f in Ω p = g

  • n ΓD

u · n = 0

  • n ΓN

Candidate approaches: ◮ Mixed finite elements (BDM) + FieldSplit/BDDC/hybridization ◮ Wheeler-Yotov (WY) + AMG ◮ Arnold-Boffi-Falk (ABF) + FieldSplit/BDDC/hybridization ◮ Multipoint flux approximation (MFPA) + AMG

slide-28
SLIDE 28

Simplified Problem Statement

Strong form Find u and p such that, u = −K∇p in Ω ∇ · u = f in Ω p = g

  • n ΓD

u · n = 0

  • n ΓN

Weak form Find u ∈ V and p ∈ W such that,

  • K −1u, v
  • = (p, ∇ · v) − g, v · nΓD ,

v ∈ V (∇ · u, w) = (f , w) , w ∈ W where V = {v ∈ Hdiv (Ω) : v · n = 0 on ΓN}, W = L2 (Ω)

slide-29
SLIDE 29

Problem statement

Strong form Find u and p such that, u = −K∇p in Ω ∇ · u = f in Ω p = g

  • n ΓD

u · n = 0

  • n ΓN

Weak form Find u ∈ V and p ∈ W such that,

  • K −1u, v
  • = (p, ∇ · v) − g, v · nΓD,

v ∈ V (∇ · u, w) = (f , w) , w ∈ W where V = {v ∈ Hdiv (Ω) : v · n = 0 on ΓN}, W = L2 (Ω)

slide-30
SLIDE 30

Wheeler & Yotov 2006

u40 u41 N4(x) n0 n1

Ingredients: ◮ Brezzi–Douglas–Marini (BDM1) velocity space ◮ Basis interpolatory at corners N4(x4) · n0 = u40 N4(x4) · n1 = u41 ◮ Vertex-based quadrature (under-integrated) ◮ Constant pressure space

slide-31
SLIDE 31

Wheeler & Yotov 2006

u40 u41 N4(x) n0 n1

Ingredients: ◮ Brezzi–Douglas–Marini (BDM1) velocity space ◮ Basis interpolatory at corners N4(x4) · n0 = u40 N4(x4) · n1 = u41 ◮ Vertex-based quadrature (under-integrated) ◮ Constant pressure space This means that velocity DOFs only couple to each other at vertices.

slide-32
SLIDE 32

Wheeler & Yotov Assembly

1: for vertex v in mesh do 2:

setup vertex local problem A BT B U P

  • =

G F

  • 3:

for element e connected to v do

4:

A ←

  • K −1uv, vv
  • Ωe

5:

BT ← − (pe, ∇ · vv)Ωe

6:

G ← − g, vv · nΓD,e

7:

F ← (fe, we)Ωe

8:

end for

9:

Assemble Schur complement (BA−1BT)P = F − BA−1G

10: end for

slide-33
SLIDE 33

Wheeler & Yotov Assembly

1: for vertex v in mesh do 2:

setup vertex local problem A BT B U P

  • =

G F

  • 3:

for element e connected to v do

4:

A ←

  • K −1uv, vv
  • Ωe

5:

BT ← − (pe, ∇ · vv)Ωe

6:

G ← − g, vv · nΓD,e

7:

F ← (fe, we)Ωe

8:

end for

9:

Assemble Schur complement (BA−1BT)P = F − BA−1G

10: end for

Global cell-centered pressure system which is SPD

slide-34
SLIDE 34

Sample Wheeler-Yotov Stencils

K = 1 1

  • +0.000
  • 1.000

+0.000

  • 1.000

+4.000

  • 1.000

+0.000

  • 1.000

+0.000 +0.079

  • 0.531

+0.061

  • 0.953

+3.130

  • 1.021
  • 0.066
  • 0.712

+0.013

slide-35
SLIDE 35

Sample Wheeler-Yotov Stencils

K = 3 1

  • R10 K RT

10

R45 K RT

45

+0.000

  • 1.000

+0.000

  • 3.000

+8.000

  • 3.000

+0.000

  • 1.000

+0.000

  • 0.209
  • 0.985

+0.133

  • 2.865

+7.850

  • 2.865

+0.133

  • 0.985
  • 0.209
  • 0.750
  • 1.500

+0.250

  • 1.500

+7.000

  • 1.500

+0.250

  • 1.500
  • 0.750
slide-36
SLIDE 36

Sample Wheeler-Yotov Stencils

K = 1 1

  • K · (10−3 if x > 2/3)

+0.000

  • 1.000

+0.000

  • 1.000

+4.000

  • 1.000

+0.000

  • 1.000

+0.000 +0.000

  • 1.000

+0.000

  • 1.000

+3.002

  • 0.002

+0.000

  • 1.000

+0.000

slide-37
SLIDE 37

SPE10 Test Problem

We use the permeabilities from the SPE10 problem: ◮ 60 × 220 × 85 = 1,122,000 cells ◮ Diagonal permeability Kxx = Kyy = Kzz ◮ We induce flow by Dirichlet conditions ◮ Solve on original permeabilities and also rotate around two axes Sample slice of the permeability field

slide-38
SLIDE 38

Solver Options

WY Options

  • ksp type cg
  • pc type hypre

BDM Options

  • ksp type gmres
  • pc type fieldsplit
  • pc fieldsplit type schur
  • pc fieldsplit schur fact type full
  • pc fieldsplit schur precondition selfp
  • fieldsplit 0 ksp type cg
  • fieldsplit 0 pc type jacobi
  • fieldsplit 1 ksp type cg
  • fieldsplit 1 pc type hypre
slide-39
SLIDE 39

Solver Performance

10

2

10

1

100 Time [s] 50 100 150 200 250 300 350 400 Cost [CPU s / DOF] WY WY rotated BDM

SPE10 138,600 cells

np 1 50 100 150

slide-40
SLIDE 40

Solver Performance

10

1

100 101 Time [s] 50 100 150 200 250 300 350 400 Cost [CPU s / DOF] WY WY rotated

SPE10 1,122,000 cells

np 1 50 100 150

slide-41
SLIDE 41

Assembly Performance (not optimized)

100 Time [s] 100 200 300 400 500 600 Cost [CPU s / DOF] WY BDM

SPE10 138,600 cells

np 1 50 100 150

slide-42
SLIDE 42

Concluding Remarks

TDycore: ◮ From limited results, WY approach is at least competitive although it appears to hit the strong scaling limit before BDM/fieldsplit ◮ BDM appears to use more memory than WY (≈ 10 times) ◮ WY assembly is competitive although BDM lends itself to easier vectorization ◮ Experimentation is key: -tdy method {wy|bdm|...}

slide-43
SLIDE 43

Concluding Remarks

TDycore: ◮ From limited results, WY approach is at least competitive although it appears to hit the strong scaling limit before BDM/fieldsplit ◮ BDM appears to use more memory than WY (≈ 10 times) ◮ WY assembly is competitive although BDM lends itself to easier vectorization ◮ Experimentation is key: -tdy method {wy|bdm|...} Talk/Meeting: ◮ All of the presented work uses PETSc (PetIGA + DMPlex/Section) ◮ Using DMPlex/Section opens doors for solver approaches ◮ Most of my exposure to solvers comes from using PETSc ◮ Originally exposed to PETSc ≈ 11 years ago at DOE ACTS workshops

slide-44
SLIDE 44

Important Links

◮ PetIGA: https://bitbucket.org/dalcinl/petiga ◮ TDycore: https://github.com/TDycores-Project/TDycore