Development of a Terrestrial Dynamical Core for E3SM (TDycore) - PowerPoint PPT Presentation

Development of a Terrestrial Dynamical Core for E3SM (TDycore) Nathan Collier Oak Ridge National Laboratory https://github.com/TDycores-Project/TDycore June 2019

Tale of Two Talks Common Theme: Considerations in choosing a discretization method I. The Effect of a Higher Continuous Basis on Solver Performance Victor Calo (Curtin), David Pardo (Ikerbasque), Lisandro Dalcin (KAUST), Maciej Paszynski (AGH) II. Selection of a Numerical Method for a Terrestrial Dynamical Core Jed Brown (Colorado), Gautam Bisht (PNNL), Matthew Knepley (Buffalo), Jennifer Fredrick (SNL), Glenn Hammond (SNL), Satish Karra (LANL)

C p-1 continuous basis Increasing p Higher Continuous Basis? Standard C 0 basis

Poisson problem on unit cube 10 0 p1 C0 10 -1 10 -2 p2 C0 10 -3 | True Error |_H1 p3 C0 10 -4 10 -5 p4 C0 10 -6 10 -7 p5 C0 10 -8 p6 C0 10 -9 10 3 10 4 10 5 10 6 Number of Degrees of Freedom

Poisson problem on unit cube 10 0 p1 C0 10 -1 p2 C1 10 -2 p2 C0 p3 C2 10 -3 | True Error |_H1 p3 C0 10 -4 p4 C3 10 -5 p5 C4 p4 C0 10 -6 p6 C5 10 -7 p5 C0 10 -8 p6 C0 10 -9 10 3 10 4 10 5 10 6 Number of Degrees of Freedom

Are higher continuous spaces an efficient way to p − refine? What effect does continuity have on the solver performance?

Are higher continuous spaces an efficient way to p − refine? What effect does continuity have on the solver performance? Spoiler Alert! C 0 C p − 1 C p − 1 / C 0 O ( N 2 + Np 6 ) O ( N 2 p 3 ) O ( p 3 ) Multifrontal direct solver O ( Np 4 ) O ( Np 6 ) O ( p 2 ) Iterative solvers ∗ ∗ Estimates for Matrix-Vector products

Multi-frontal direct solver Based on the concepts of the Schur complement and nested dissection. ¡

Key concept: size s of the separator s = 1 for C 0 s = p for C p − 1

Estimates and Results ( d = 3 , N = 30 k ) C 0 C p − 1 O ( N 2 + Np 6 ) O ( N 2 p 3 ) Time O ( N 4 / 3 + Np 3 ) O ( N 4 / 3 p 2 ) Memory 24 550 4000 22 Time 1400 Time 500 20 Memory (Mb) 3500 Memory (Mb) 1200 Memory Memory 450 18 3000 time (s) time (s) 1000 400 16 2500 800 14 2000 350 600 12 1500 300 400 10 1000 250 200 8 500 6 200 0 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 polynomial order polynomial order

Solution time for C 0 vs C p − 1 ( d = 3 , N = 30 k ) 1500 C^0 C^(p-1) time (s) 1000 500 0 1 2 3 4 5 6 7 8 polynomial order

Iterative solvers Much more complex to assess costs: P ( Ax − b ) = 0 Need a model for: ◮ Matrix-vector multiplication ◮ Preconditioner ( P ) setup and application ◮ Convergence

Sample Linear Systems C 0 space C p − 1 space

Matrix-vector multiplication - C 0 The cost of a sparse matrix-vector multiply is proportional to the number of nonzero entries in the matrix. Vertex DOF: Interior DOF: 2 p + 1 interactions p + 1 interactions element considered

Matrix-vector multiplication - C 0 Number DOFs Number Dimension Entity of Entities per Entity of interactions 1D vertex 1 1 (2 p + 1) 1D interior 1 ( p − 1) ( p + 1) (2 p + 1) 2 2D vertex 1 1 2D edge 2 ( p − 1) (2 p + 1)( p + 1) ( p − 1) 2 ( p + 1) 2 2D interior 1 (2 p + 1) 3 3D vertex 1 1 (2 p + 1) 2 ( p + 1) 3D edge 3 ( p − 1) ( p − 1) 2 (2 p + 1)( p + 1) 2 3D face 3 ( p − 1) 3 ( p + 1) 3 3D interior 1

Matrix-vector multiplication - C 0 nnz C 0 ( p − 1) 3 · ( p + 1) 3 = � �� interior DOF 3( p − 1) 2 · (2 p + 1)( p + 1) 2 + � �� face DOF · (2 p + 1) 2 ( p + 1) + 3( p − 1) � �� edge DOF · (2 p + 1) 3 + 1 �� vertex DOF p 6 + 6 p 5 + 12 p 4 + 8 p 3 = p 3 ( p + 2) 3 = O ( p 6 ) =

Matrix-vector multiplication - C p − 1 The B-spline C p − 1 basis is very regular, each DOF interacts with 2 p + 1 others in 1D. nnz C p − 1 = p 3 (2 p + 1) 3 = 8 p 6 + 12 p 5 + 6 p 4 + p 3 = O (8 p 6 )

Matrix-vector multiplication Matrix-Vector product 5 theoretical 4 . 02 measured 4 time ratio C p − 1 /C 0 3 . 39 3 2 . 77 1 . 94 2 1 . 00 1 0 1 2 3 4 5 polynomial order p

Matrix-vector multiplication However, for C 0 spaces, we can use static condensation as in the multifrontal direct solver. Number DOFs Number Statically Entity of Entities per Entity of interactions condensed (2 p + 1) 3 − 8( p − 1) 3 vertex 1 1 (2 p + 1) 2 ( p + 1) − 4( p − 1) 3 edge 3 ( p − 1) ( p − 1) 2 (2 p + 1)( p + 1) 2 − 2( p − 1) 3 face 3 33 p 4 − 12 p 3 + 9 p 2 − 6 p + 3 = O (33 p 4 )

Matrix-vector multiplication 70 60 C p − 1 to C 0 with static condensation ratio of number of nonzeros 50 40 30 20 C p − 1 to C 0 10 0 2 4 6 8 10 12 14 polynomial order, p

3D Poisson + CG + ILU p =6 12 Solve time ratio, C^p-1 vs C0 10 8 p =5 6 p =4 p =3 4 p =2 2 0 2 3 4 5 6 10 10 10 10 10 Number of Degrees of Freedom

3D Poisson + CG + ILU + static condensation 120 p =6 100 Solve time ratio, C^p-1 vs C0 80 60 40 p =5 20 p =4 p =3 p =2 0 3 4 5 6 10 10 10 10 Number of Degrees of Freedom

Related Publications ◮ N Collier, D Pardo, L Dalcin, M Paszynski, VM Calo, The cost of continuity: A study of the performance of isogeometric finite elements using direct solvers, Computer Methods in Applied Mechanics and Engineering 213, 353-361, 2012. 10.1016/j.cma.2011.11.002 ◮ N Collier, L Dalcin, D Pardo, VM Calo, The cost of continuity: performance of iterative solvers on isogeometric finite elements, SIAM Journal on Scientific Computing 35 (2), A767-A784, 2013. 10.1137/120881038 ◮ N Collier, L Dalcin, VM Calo, On the computational efficiency of isogeometric methods for smooth elliptic problems using direct solvers, International Journal for Numerical Methods in Engineering 100 (8), 620-632. 10.1002/nme.4769

Tale of Two Talks Common Theme: Considerations in choosing a discretization method I. The Effect of a Higher Continuous Basis on Solver Performance Victor Calo (Curtin), David Pardo (Ikerbasque), Lisandro Dalcin (KAUST), Maciej Paszynski (AGH) II. Selection of a Numerical Method for a Terrestrial Dynamical Core Jed Brown (Colorado), Gautam Bisht (PNNL), Matthew Knepley (Buffalo), Jennifer Fredrick (SNL), Glenn Hammond (SNL), Satish Karra (LANL)

Energy Exascale Earth System Model (E3SM) ◮ The terrestrial water cycle is a key component of the Earth system model ◮ While conceptually key processes transport water laterally, the representation is 1D in current models ◮ Requirements: accurate velocities on distorted grids with uncertain and rough coefficients at global scale ◮ Naturally think of mixed finite elements

Simplified Problem Statement Strong form Find u and p such that, u = − K ∇ p in Ω ∇ · u = f in Ω p = g on Γ D u · n = 0 on Γ N

Simplified Problem Statement Strong form Find u and p such that, u = − K ∇ p in Ω ∇ · u = f in Ω p = g on Γ D u · n = 0 on Γ N Candidate approaches: ◮ Mixed finite elements (BDM) + FieldSplit/BDDC/hybridization ◮ Wheeler-Yotov (WY) + AMG ◮ Arnold-Boffi-Falk (ABF) + FieldSplit/BDDC/hybridization ◮ Multipoint flux approximation (MFPA) + AMG

Simplified Problem Statement Strong form Find u and p such that, u = − K ∇ p in Ω ∇ · u = f in Ω p = g on Γ D u · n = 0 on Γ N Weak form Find u ∈ V and p ∈ W such that, � � K − 1 u , v = ( p , ∇ · v ) − � g , v · n � Γ D , v ∈ V ( ∇ · u , w ) = ( f , w ) , w ∈ W where V = { v ∈ H div (Ω) : v · n = 0 on Γ N } , W = L 2 (Ω)

Problem statement Strong form Find u and p such that, u = − K ∇ p in Ω ∇ · u = f in Ω p = g on Γ D u · n = 0 on Γ N Weak form Find u ∈ V and p ∈ W such that, � � K − 1 u , v = ( p , ∇ · v ) − � g , v · n � Γ D , v ∈ V ( ∇ · u , w ) = ( f , w ) , w ∈ W where V = { v ∈ H div (Ω) : v · n = 0 on Γ N } , W = L 2 (Ω)

Wheeler & Yotov 2006 Ingredients: u 41 n 1 ◮ Brezzi–Douglas–Marini (BDM 1 ) N 4 ( x ) velocity space u 40 ◮ Basis interpolatory at corners N 4 ( x 4 ) · n 0 = u 40 N 4 ( x 4 ) · n 1 = u 41 n 0 ◮ Vertex-based quadrature (under-integrated) ◮ Constant pressure space

Wheeler & Yotov 2006 Ingredients: u 41 n 1 ◮ Brezzi–Douglas–Marini (BDM 1 ) N 4 ( x ) velocity space u 40 ◮ Basis interpolatory at corners N 4 ( x 4 ) · n 0 = u 40 N 4 ( x 4 ) · n 1 = u 41 n 0 ◮ Vertex-based quadrature (under-integrated) ◮ Constant pressure space This means that velocity DOFs only couple to each other at vertices.

Wheeler & Yotov Assembly 1: for vertex v in mesh do setup vertex local problem 2: � A � � U � � G � B T = B 0 P F for element e connected to v do 3: � � K − 1 u v , v v A ← 4: Ω e B T ← − ( p e , ∇ · v v ) Ω e 5: G ← − � g , v v · n � Γ D , e 6: F ← ( f e , w e ) Ω e 7: end for 8: Assemble Schur complement 9: ( BA − 1 B T ) P = F − BA − 1 G 10: end for

Wheeler & Yotov Assembly 1: for vertex v in mesh do setup vertex local problem 2: � A � � U � � G � B T = B 0 P F for element e connected to v do 3: � � K − 1 u v , v v A ← 4: Ω e B T ← − ( p e , ∇ · v v ) Ω e 5: G ← − � g , v v · n � Γ D , e 6: F ← ( f e , w e ) Ω e 7: end for 8: Assemble Schur complement 9: ( BA − 1 B T ) P = F − BA − 1 G 10: end for Global cell-centered pressure system which is SPD

Development of a Terrestrial Dynamical Core for E3SM (TDycore) - PowerPoint PPT Presentation

Development of a Terrestrial Dynamical Core for E3SM (TDycore) Nathan Collier Oak Ridge National Laboratory https://github.com/TDycores-Project/TDycore June 2019 Tale of Two Talks Common Theme: Considerations in choosing a discretization

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

What is DTT? DTT stands for digital terrestrial television (or digital terrestrial transmission).

Jan 30 Terrestrial Forest Biomes of the World World Wildlife Fund Terrestrial Biomes and

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Homotopy theories of dynamical systems Rick Jardine University of Western Ontario July 15, 2013

Continuous orbit equivalence rigidity Xin Li Dynamical systems and operator algebras Dynamical

Optimising Global Connectivity by Submarine and Terrestrial Routes Terrestrial Routes Matching

Scandinavian/North European Scandinavian/North European Network of Terrestrial Terrestrial Field

The Role of the Terrestrial Biosphere in the Global Carbon Cycle Peter Rayner June 18, 2015

Dynamical analysis of euclidean algorithms Introduction Dynamical analysis of euclidean

ANALYSIS of EUCLIDEAN ALGORITHMS An Arithmetical Instance of Dynamical Analysis Dynamical

ANALYSIS of EUCLIDEAN ALGORITHMS An Arithmetical Instance of Dynamical Analysis Dynamical

Lecture 5: Basic Dynamical Systems CS 344R/393R: Robotics Benjamin Kuipers Dynamical Systems

The GFDL Finite-Volume Cubed-sphere Dynamical Core Lucas Harris, Xi Chen, Shian-Jiann Lin and the

BDCP Steering Committee July 15, 2010 PRELIMINARY DRAFT NOT FOR DISTRIBUTION Terrestrial

Motivation Memory is a shared resource Core Core Memory Core Core Threads requests

Secure Resource Sharing for Embedded Protected Module Architectures Jo Van Bulck, Job Noorman,

Model Predictive Control on an FPGA: Aerospace and Space Scenarios Edward Hartley (

Neural Networks: Optimization Part 1 Intro to Deep Learning, Fall 2017 Story so far Neural

Final Exam Review December 14th 10:30am JFB 101 (this room) Pick up midterms up front TA-led

The impact of Meltre and Specdown on microkernel systems Matthias Lange, Kernkonzept GmbH, FOSDEM

Picturing Quantum Processes Aleks Kissinger QTFT, V axj o 2015 June 10, 2015 Introduction

the state of hammer: ready for ics Sven M. Hallberg Adam Crain Meredith L. Patterson Sergey

Nuclear structure (and reactions) with Quantum Computers - II Alessandro Roggero figure credit: