A DAPTIVITY T HROUGH THE L ENS OF p4est 1 N ONCONFORMING M ESHES IN - - PowerPoint PPT Presentation

a daptivity t hrough the l ens of p4est
SMART_READER_LITE
LIVE PREVIEW

A DAPTIVITY T HROUGH THE L ENS OF p4est 1 N ONCONFORMING M ESHES IN - - PowerPoint PPT Presentation

U NITING P ERFORMANCE AND E XTENSIBILITY IN A DAPTIVE F INITE E LEMENT C OMPUTATIONS Toby Isaac tisaac@ices.utexas.edu The University of Chicago at Austin September 14, 2015 CAAM Colloqium Rice University T. Isaac (U. Chicago) Adaptivity:


slide-1
SLIDE 1

UNITING PERFORMANCE AND EXTENSIBILITY IN ADAPTIVE FINITE ELEMENT COMPUTATIONS

Toby Isaac tisaac@ices.utexas.edu

The University of Chicago at Austin

September 14, 2015 CAAM Colloqium Rice University

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 1 / 43

slide-2
SLIDE 2

1

ADAPTIVITY THROUGH THE LENS OF p4est

2

NONCONFORMING MESHES IN PETSC

3

THE INTERACTIVE PORTION. . .

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 2 / 43

slide-3
SLIDE 3

WHY ADAPTIVE MESH REFINEMENT (AMR)?

WHY ADAPTIVE ANYTHING?

Non-adaptive (branch-free) calculations are fast. Why bother?

1 Your non-adaptive calculations have reached the end of your resources

(or the end of weak-scalability), and you want to push back.

2 (Ideally) you have a performance model that predicts it can help.

EXAMPLE: hp-FEM THEORY

Predicts exponential convergence in Ndof: If we want zero error, it’s worth it. If we have a nonzero tolerance, we must consider that hp systems require more resources per dof to solve than uniform, low-order

  • systems. There is always a crossover.
  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 3 / 43

slide-4
SLIDE 4

EXAMPLE APPLICATIONS

MANTLE CONVECTION

Subduction zone resolution globally ⇒ trillions of dofs. AMR reduces to O(108 − 109). [Stadler et al., 2010]: Stabilized [Q1]3 × Q1 elements, black-box algebraic multigrid. [Rudi et al., 2015]: Stable [Q2]3 × Qdisc elements, custom hybrid algebraic/geometric multigrid solver, demonstrated implicit solver weak-scalability to 1.5 million BG/Q cores and O(1011) dofs.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 4 / 43

slide-5
SLIDE 5

EXAMPLE APPLICATIONS

ICE SHEET DYNAMICS

[T.I. et al., 2015c]: Stable [Qk]3 × Qdisc

k−2 finite elements, complex

domain with variable resolution demands, Robin-type boundary conditions, domain anisotropy, unusual solver demands.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 5 / 43

slide-6
SLIDE 6

EXAMPLE APPLICATIONS

UNCERTAINTY QUANTIFICATION

[T.I. et al., 2015b]: Inversion (deterministic and Bayesian) for unknown boundary coefficient fields (O(105) parameters) from surface

  • bservations in the previous ice sheet model.

[Not in the above work] The tools that drive adaptivity are important for quantifying model error in Bayesian inversion. Bayesian inversion requires two components: a prior distribution on the parameters and a likelihood function of the parameters given data ∼ the probability of the data given parameters, π(d|p). This should incorporate not only the “noise” of the data, but the uncertainty due to error in the model-to-parameter map, i.e., the a posteriori error of the finite element solution.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 6 / 43

slide-7
SLIDE 7

APPROACHES TO MESHING AND ADAPTIVITY

WHICH SHOULD I CHOOSE? STRUCTURED (GRID/LATTICE)

Fast Adaptivity: uniform, (occasionally) tensor

UNSTRUCTURED (ADJACENCY GRAPH/CW-COMPLEX)

Flexible Adaptivity: arbitrary

SEMI-STRUCTURED (EXPLICIT TREE/IMPLICIT TREE)

Dynamic Adaptivity: local, (occasionally) anisotropic

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 7 / 43

slide-8
SLIDE 8

COMPOSITION OF MESHING APPROACHES

LIBRARIES/FRAMEWORKS FOR COMPOSITE MESHING

Several examples exist:

PATCH-BASED AMR (CHOMBO, SAMRAI, ETC.)

Fast stencil-based computations with local refinement & unstructured trees.

HIERARCHICAL HYBRID GRIDS [GMEINER ET AL., 2015]

Fast stencil-based computations on non-Cartesian geometries.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 8 / 43

slide-9
SLIDE 9

p4est: FORESTS OF QUADTREES/OCTREES

Main developers: C. Burstedde, T.I., many other contributors. [Burstedde et al., 2011, T.I. et al., 2012, 2015a], p4est.org Backend: deal.II, PETSc (in progress). An unstructured hexahedral mesh (“the forest”); where each hexahedron contains an arbitrarily refined octree; space-filling curve (SFC) orders elements; philosophy: as-simple-as-possible coarse mesh describes geometry, refinement captures all detail.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 9 / 43

slide-10
SLIDE 10

p4est: FORESTS OF QUADTREES/OCTREES

Main developers: C. Burstedde, T.I., many other contributors. [Burstedde et al., 2011, T.I. et al., 2012, 2015a], p4est.org Backend: deal.II, PETSc (in progress). An unstructured hexahedral mesh (“the forest”); where each hexahedron contains an arbitrarily refined octree; space-filling curve (SFC) orders elements; philosophy: as-simple-as-possible coarse mesh describes geometry, refinement captures all detail. k0 k1 p0 p1 p1 p2 k0 k1 x0 y0 x1 y1

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 9 / 43

slide-11
SLIDE 11

p4est’S REFINEMENT CYCLE

CREATE

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 10 / 43

slide-12
SLIDE 12

p4est’S REFINEMENT CYCLE

REFINE

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 10 / 43

slide-13
SLIDE 13

p4est’S REFINEMENT CYCLE

2:1 BALANCE

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 10 / 43

slide-14
SLIDE 14

p4est’S REFINEMENT CYCLE

REPARTITION (LOAD BALANCE)

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 10 / 43

slide-15
SLIDE 15

p4est’S REFINEMENT CYCLE

REPARTITION (LOAD BALANCE)

Not pictured: construct FE basis and communication patterns.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 10 / 43

slide-16
SLIDE 16

p4est’S SCALABILITY

WEAK SCALING OF MESH REFINEMENT CYCLE (2:1 BALANCE HIGHLIGHTED)

10 20 30 40 50 60 70 80 90 100 12 60 432 3444 27540 220320 Percentage of runtime Number of CPU cores Partition Balance Ghost Nodes

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 11 / 43

slide-17
SLIDE 17

p4est’S SCALABILITY

WEAK SCALING OF MESH REFINEMENT CYCLE (2:1 BALANCE HIGHLIGHTED)

1 2 3 4 5 6 12 96 768 6144 49152 112128 Seconds per (million elements / core) Number of CPU cores Old New

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 11 / 43

slide-18
SLIDE 18

EXAMPLE: AN ICE SHEET MODEL BUILT ON p4est

Ice sheet thickness: ∼2 km Ice sheet extent: O(103) km

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 12 / 43

slide-19
SLIDE 19

THE PROBLEM WITH OCTREES IN THIN DOMAINS

The space filling curve does not respect column order: Columns split between processors when partitioning. Dofs not ordered in columns for efficient preconditioning (e.g., ILU).

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 13 / 43

slide-20
SLIDE 20

AN ANISOTROPIC SOLUTION

ANOTHER LAYER OF MESH COMPOSITION

partition 0 partition 1 A p4est forest of quadtrees to manage columns, with each column stored as a flat, linear binary tree of layers, which guarantees column integrity. An extension to p4est: hybrid routines have the prefix “p6est_”, reproduce most of the standard p4est API, are documented on the website1.

1p4est.github.io/api

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 14 / 43

slide-21
SLIDE 21

AN ANISOTROPIC SOLUTION

ANOTHER LAYER OF MESH COMPOSITION

A p4est forest of quadtrees to manage columns, with each column stored as a flat, linear binary tree of layers, which guarantees column integrity. An extension to p4est: hybrid routines have the prefix “p6est_”, reproduce most of the standard p4est API, are documented on the website1.

1p4est.github.io/api

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 14 / 43

slide-22
SLIDE 22

ANOTHER APPLICATION

NUMA: NONHYDROSTATIC UNIFIED MODEL OF THE ATMOSPHERE

Well-suited for other climate and earth systems models. NUMA: Non-hydrostatic Unified Model of the Atmosphere2 [Giraldo et al., 2013] is using p6est for partitioning (adaptivity in progress). Scalability to 1M processes on Mira BG/Q [in preparation].

2faculty.nps.edu/fxgirald/projects/NUMA/Introduction_to_NUMA.html

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 15 / 43

slide-23
SLIDE 23

ANOTHER APPLICATION

NUMA: NONHYDROSTATIC UNIFIED MODEL OF THE ATMOSPHERE

Well-suited for other climate and earth systems models. NUMA: Non-hydrostatic Unified Model of the Atmosphere2 [Giraldo et al., 2013] is using p6est for partitioning (adaptivity in progress). Scalability to 1M processes on Mira BG/Q [in preparation]. Ωp

2faculty.nps.edu/fxgirald/projects/NUMA/Introduction_to_NUMA.html

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 15 / 43

slide-24
SLIDE 24

TESTING THE LIMITS OF p4est’S PARTITIONING

Scalability to 458K BG/Q cores of JUQUEEN from [T.I. et al., 2015a]. 101 102 103 104 105 106 10−2 100 102 5.7k 28k 240k 2M 16M130M 1B 8B 64B510B P forest-to-mesh runtime in seconds P, 16-way: 16 128 1024 8192 65536 458752 P, 32-way: 32 256 2048 16384 131072 917504 P, 64-way: 64 512 4096 32768 262144

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 16 / 43

slide-25
SLIDE 25

TESTING THE LIMITS OF p4est’S PARTITIONING

Scalability to 458K BG/Q cores of JUQUEEN from [T.I. et al., 2015a]. 101 102 103 104 105 106 10−2 100 102 5.7k 28k 240k 2M 16M130M 1B 8B 64B510B O(P) P forest-to-mesh runtime in seconds P, 16-way: 16 128 1024 8192 65536 458752 P, 32-way: 32 256 2048 16384 131072 917504 P, 64-way: 64 512 4096 32768 262144

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 16 / 43

slide-26
SLIDE 26

TESTING THE LIMITS OF p4est’S PARTITIONING

MEMORY PER HARDWARE THREAD IS NOT INCREASING

O(P) storage has to be avoided. BG/Q: 250MB per hardware thread.

If you have 8 things (doubles/64-bit ints) to keep track of per process per process. . . and 1.8 million processes (Mira). . . then half your memory is gone.

SFC partitions work well with multithreaded setups (e.g., MPI+OpenMP), but this approach has its limits (buffer packing/unpacking).

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 17 / 43

slide-27
SLIDE 27

SHARED MEMORY SOLUTIONS

MPI-3 WINDOW ROUTINES

Redundant arrays (e.g., partition offsets) can be stored once per NUMA region, rather than once per process (reduce redundant arrays by factor of 64 on BG/Q). Library interface does not change: backward compatible. Depends on a good implementation of MPI_Win_allocate_shared. . .

Small (not very scientific) test on laptop (gcc-4.6 -O2 optimized MPICH 3.1.4 build) of MPI_Allgather vs. allgather with shared redundant arrays, P = 4: O(10) times slower. Runtime configuration would be optimal.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 18 / 43

slide-28
SLIDE 28

DMFOREST

DMPlex DMForest —————————— p4est | . . .

1 Support non-conforming cell interfaces in DMPlex. [PETSc 3.6] 2 p4est-to-DMPlex conversion [devel. p4est] 3 DMForest interface, with p4est as first implementation. [started]

Immediate solver support via backend conversion to DMPlex. High-performance, native solver support if needed.

4 Runtime conversion of DMPlex to other types [planned]:

user specifies coarse DMPlex that captures topology/geometry, has immediate access to all DMForest implementations.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 19 / 43

slide-29
SLIDE 29

DMFOREST

? ?

1 Support non-conforming cell interfaces in DMPlex. [PETSc 3.6] 2 p4est-to-DMPlex conversion [devel. p4est] 3 DMForest interface, with p4est as first implementation. [started]

Immediate solver support via backend conversion to DMPlex. High-performance, native solver support if needed.

4 Runtime conversion of DMPlex to other types [planned]:

user specifies coarse DMPlex that captures topology/geometry, has immediate access to all DMForest implementations.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 19 / 43

slide-30
SLIDE 30

DMFOREST

1 Support non-conforming cell interfaces in DMPlex. [PETSc 3.6] 2 p4est-to-DMPlex conversion [devel. p4est] 3 DMForest interface, with p4est as first implementation. [started]

Immediate solver support via backend conversion to DMPlex. High-performance, native solver support if needed.

4 Runtime conversion of DMPlex to other types [planned]:

user specifies coarse DMPlex that captures topology/geometry, has immediate access to all DMForest implementations.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 19 / 43

slide-31
SLIDE 31

DMFOREST

? ? ?

1 Support non-conforming cell interfaces in DMPlex. [PETSc 3.6] 2 p4est-to-DMPlex conversion [devel. p4est] 3 DMForest interface, with p4est as first implementation. [started]

Immediate solver support via backend conversion to DMPlex. High-performance, native solver support if needed.

4 Runtime conversion of DMPlex to other types [planned]:

user specifies coarse DMPlex that captures topology/geometry, has immediate access to all DMForest implementations.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 19 / 43

slide-32
SLIDE 32

DMPLEX

THE UNSTRUCTURED INTERFACE

Why would we want to convert other formats to DMPlex? Regain the flexibility lost by specialized formats:

arbitrary repartitioning (e.g., mesh is Eulerian, but we want partition to follow Lagrangian particles), arbitrary submesh extraction (e.g., restrict to an embedded interface).

Numerical methods (such as additive Schwarz) are likely to be developed on DMPlex:

good for testing.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 20 / 43

slide-33
SLIDE 33

THE CW-COMPLEX

A DASH OF ALGEBRAIC TOPOLOGY

A B a b c d e α β γ δ A B a b c d e α β γ δ Each n-D polytope relates only to the (n − 1)-D polytopes on its boundary and (n + 1)-D polytopes (chains and cochains).

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 21 / 43

slide-34
SLIDE 34

THE CW-COMPLEX

MY FIRST ENCOUNTER AT RICE

A Socratic dialogue retracing the evolution of mathematicians’ disposition to the Euler-Poincaré characteristic, χ = V −E+F[= 2 for polyhedra]. Shameful distillation: the CW-Complex and related algebraic concepts reduced the truth of χ = 2 to simple consequences of properties of chains and cycles.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 22 / 43

slide-35
SLIDE 35

THE CW-COMPLEX

EXAMPLE OPERATIONS

A B a b c d e α β γ δ A B a b c d e α β γ δ

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 23 / 43

slide-36
SLIDE 36

THE CW-COMPLEX

EXAMPLE OPERATIONS

A B a b c d e α β γ δ A B a b c d e α β γ δ cone(A) [finite volume method: flux calculations]

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 23 / 43

slide-37
SLIDE 37

THE CW-COMPLEX

EXAMPLE OPERATIONS

A B a b c d e α β γ δ A B a b c d e α β γ δ clos(A) [finite element method: restrict function to cell]

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 23 / 43

slide-38
SLIDE 38

THE CW-COMPLEX

EXAMPLE OPERATIONS

A B a b c d e α β γ δ A B a b c d e α β γ δ supp(δ) [finite volume method: lift calculations]

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 23 / 43

slide-39
SLIDE 39

THE CW-COMPLEX

EXAMPLE OPERATIONS

A B a b c d e α β γ δ A B a b c d e α β γ δ star(δ) [finite element method: matrix sparsity pattern]

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 23 / 43

slide-40
SLIDE 40

CW-COMPLEXES AND NONCONFORMAL MESHES

HOW TO RECONCILE?

A B C a b c d e f g h α β γ δ ǫ

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 24 / 43

slide-41
SLIDE 41

CW-COMPLEXES AND NONCONFORMAL MESHES

A TRUE COMPLEX, BUT. . .

A B C a b c d e f g h α β γ δ ǫ A becomes a degenerate quadrilateral: requires special handling at a low level in finite element routines. Not good.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 25 / 43

slide-42
SLIDE 42

CW-COMPLEXES AND NONCONFORMAL MESHES

A HIERARCHICAL SOLUTION

A B C a b c d e f g h α β γ δ ǫ A hierarchy (orthogonal to the complex) indicating which points are contained in others:

Allows finite element dofs to be calculated, broadcast and gathered.

Symmetry-breaking support maps:

Allows finite volume adjacency to be determined.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 26 / 43

slide-43
SLIDE 43

HIDING COMPLEXITY

For conformal meshes, give PETSc:

a mesh T (DMPlex), a finite element ( ˆ K, P( ˆ K), ˆ Σ) (PetscFE), and a weak form on a space V (experimental),

and it will calculate the approximation space Vh, residuals, and Jacobians of equations. For hierarchical nonconformal meshes, can we give the same information, plus:

a refinement rule,

and have it work just as well?

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 27 / 43

slide-44
SLIDE 44

THE CIARLET FINITE ELEMENT

OUR STARTING POINT

A reference element ˆ K (the reference square): x y ˆ K A space P( ˆ K) (bilinear functions): P( ˆ K) = {f : f| ˆ

K(x, y) = (ax + b) ∗ (cy + d), {a, b, c, d} ∈ R}.

A unisolvent set of functionals ˆ Σ ⊂ P ∗ (pick any you like).

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 28 / 43

slide-45
SLIDE 45

FINITE ELEMENT APPROXIMATION SPACES

OUR (DMPLEX’S) TASK

Given a mesh T (mappings from the reference element onto the elements), T =

M

  • i=1

ϕi : ˆ K → Ki, determine the size of the finite element space, N = |Vh|, Vh = {v ∈ C0(Ω) : ∀ i, ϕiv(= v ◦ ϕi) ∈ P( ˆ K)}. Ready?

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 29 / 43

slide-46
SLIDE 46

TEST #1

IT’S NOT A TRICK

K1

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 30 / 43

slide-47
SLIDE 47

TEST #1

IT’S NOT A TRICK

K1

. . . 4.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 30 / 43

slide-48
SLIDE 48

TEST #2

IT’S STILL NOT A TRICK

K1 K2

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 31 / 43

slide-49
SLIDE 49

TEST #2

IT’S STILL NOT A TRICK

K1 K2

. . . 6.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 31 / 43

slide-50
SLIDE 50

TEST #3

(a cubed sphere with 600 quadrilaterals)

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 32 / 43

slide-51
SLIDE 51

TEST #3

(a cubed sphere with 600 quadrilaterals) Hint: E = 2F.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 32 / 43

slide-52
SLIDE 52

TEST #3

(a cubed sphere with 600 quadrilaterals) Hint: E = 2F.

. . . 602.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 32 / 43

slide-53
SLIDE 53

ENUMERATING FINITE ELEMENT SPACES

CONFORMAL MESHES

This demonstrates the basic rule of calculating |Vh| for conformal meshes: Every dof in Vh is associated with a point p in the mesh: the dof (hat function in this case) is supported in star(p). Add up the number of points of each dimension × the number of dofs associated with those points by the functionals in ˆ Σ.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 33 / 43

slide-54
SLIDE 54

TEST #4

IS IT REALLY CONFORMAL?

(nonaffine mappings)

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 34 / 43

slide-55
SLIDE 55

TEST #4

IS IT REALLY CONFORMAL?

(nonaffine mappings)

. . . 5.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 34 / 43

slide-56
SLIDE 56

TEST #4

IS IT REALLY CONFORMAL?

(nonaffine mappings)

. . . 5.

This is cheating a bit: DMPlex doesn’t allow this. We require (ϕ∗

i ϕ−∗ j ) to map traces of P( ˆ

K) into traces of P( ˆ K).

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 34 / 43

slide-57
SLIDE 57

CONSEQUENCES FOR NONCONFORMAL MESHES

How is this condition satisfied for nonconformal meshes? Ki Kj ˆ K ϕi ϕj ϕ−1

j

  • ϕi

ϕ−1

j

  • ϕ−1

i

is affine (simplicial elements) or componentwise affine (tensor elements).

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 35 / 43

slide-58
SLIDE 58

TEST #5

K1 K2 K3

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 36 / 43

slide-59
SLIDE 59

TEST #5

K1 K2 K3

. . . 7.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 36 / 43

slide-60
SLIDE 60

ENUMERATING FINITE ELEMENT SPACES

HIERARCHICALLY NONCONFORMAL MESHES

Amending the conformal rules for nonconformal elements: Children (subset points) do not have global dofs associated with them. To maintain continuity, local dofs of children are linearly constrained to the dofs of their parents’ closure. These linear constraints are calculated by pushing forward (adjoint to pulling back) the children’s functionals (in ˆ Σ) into the parents’ cells to be evaluated by their basis functions. These pushforward calculations can be computed once on the reference element and reused.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 37 / 43

slide-61
SLIDE 61

TEST #6

LAST ONE!

K1 K2 K3 K4 K5

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 38 / 43

slide-62
SLIDE 62

TEST #6

LAST ONE! IT’S A TRICK.

K1 K2 K3 K4 K5

. . . 7.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 38 / 43

slide-63
SLIDE 63

TEST #6

LAST ONE! IT’S A TRICK.

K1 K2 K3 K4 K5

. . . 7.

A nodal basis is not possible. (Also not allowed by DMPLex.)

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 38 / 43

slide-64
SLIDE 64

ENUMERATING FINITE ELEMENT SPACES

THE GENERAL ALGORITHM

Accumulate all continuity conditions by pushing forward functionals

  • nto neighboring cells, like for hierarchical nonconformal meshes.

Determine the rank of these continuity conditions (e.g., RRQR).

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 39 / 43

slide-65
SLIDE 65

Thank you for your time!

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 40 / 43

slide-66
SLIDE 66

REFERENCES I

Carsten Burstedde, Lucas C. Wilcox, and Omar Ghattas. p4est: Scalable algorithms for parallel adaptive mesh refinement on forests of octrees. SIAM Journal on Scientific Computing, 33(3):1103–1133, 2011. doi: 10.1137/100791634. Francis X Giraldo, JF Kelly, and EM Constantinescu. Implicit-explicit formulations for a 3D nonhydrostatic unified model of the atmosphere (NUMA). SIAM Journal of Scientific Computing, 2013. Björn Gmeiner, Ulrich Rüde, Holger Stengel, Christian Waluga, and Barbara Wohlmuth. Performance and scalability of Hierarchical Hybrid Multigrid solvers for Stokes systems. SIAM Journal on Scientific Computing, 37(2):C143–C168, 2015. doi: 10.1137/130941353.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 41 / 43

slide-67
SLIDE 67

REFERENCES II

Johann Rudi, A. Crisiano I. Malossi, T.I., Georg Stadler, Michael Gurnis, Yves Ineichen, Costas Bekas, Alessandro Curioni, and Omar Ghattas. An extreme-scale implicit solver for complex pdes: Highly heterogeneous flow in earth’s mantle. SC15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (submitted)., 2015. Georg Stadler, Michael Gurnis, Carsten Burstedde, Lucas C. Wilcox, Laura Alisic, and Omar Ghattas. The dynamics of plate tectonics and mantle flow: From local to global scales. Science, 329(5995):1033–1038, 2010. doi: 10.1126/science.1191223. T.I., Carsten Burstedde, and Omar Ghattas. Low-cost parallel algorithms for 2:1 octree balance. In Proceedings of the 26th IEEE International Parallel & Distributed Processing Symposium. IEEE, 2012. doi:10.1109/IPDPS.2012.57.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 42 / 43

slide-68
SLIDE 68

REFERENCES III

T.I., Carsten Burstedde, Lucas C. Wilcox, and Omar Ghattas. Recursive algorithms for distributed forests of octrees. SIAM Journal on Scientific Computing (accepted), 2015a. http://arxiv.org/abs/1406.0089. T.I., Noemi Petra, Georg Stadler, and Omar Ghattas. Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for large-scale problems, with application to flow

  • f the Antarctic ice sheet. Journal of Computational Physics, 2015b. doi:

doi:10.1016/j.jcp.2015.04.047. T.I., Georg Stadler, and Omar Ghattas. Solution of nonlinear Stokes equations discretized by high-order finite elements on nonconforming and anisotropic meshes, with application to ice sheet dynamics. SIAM Journal on Scientific Computing (accepted), 2015c. http://arxiv.org/abs/1406.6573.

  • T. Isaac (U. Chicago)

Adaptivity: Performance & Extensibility September 14, 2015 43 / 43