Extruded meshes for high aspect ratio simulations in Firedrake and - - PowerPoint PPT Presentation

extruded meshes for high aspect ratio simulations in
SMART_READER_LITE
LIVE PREVIEW

Extruded meshes for high aspect ratio simulations in Firedrake and - - PowerPoint PPT Presentation

Extruded meshes for high aspect ratio simulations in Firedrake and PyOP2 Gheoghe-Teodor (Doru) Bercea (a) , Andrew TT McRae (b) , Lawrence Mitchell (c) , David A Ham, Paul HJ Kelly (a) gheorghe-teodor.bercea08@imperial.ac.uk (b)


slide-1
SLIDE 1

Extruded meshes for high aspect ratio simulations in Firedrake and PyOP2

Gheoghe-Teodor (Doru) Bercea(a), Andrew TT McRae(b), Lawrence Mitchell(c), David A Ham, Paul HJ Kelly

(a) gheorghe-teodor.bercea08@imperial.ac.uk (b) a.mcrae12@imperial.ac.uk (c) lawrence.mitchell@imperial.ac.uk

1

slide-2
SLIDE 2
slide-3
SLIDE 3

3

slide-4
SLIDE 4

3

slide-5
SLIDE 5
  • (Very) high aspect ratio domains
  • Want vertically column-aligned meshes
  • e.g. important that vertical gradients don't leak

into horizontal (and vice-versa)

  • Possible to achieve with fully unstructured meshes
  • Often don't need unstructured benefits in short

"vertical" direction

4

slide-6
SLIDE 6

Exploit column structure

  • Extrude an unstructured base mesh
  • New cells gain a dimension (interval→quad,

triangle→prism)

  • Arrange for extruded direction to be innermost

iteration index: direct addressing (see, e.g., MacDonald

et al. Int. J. HPC App. 25(4):392 (2010))

  • Exploit this in execution

5

slide-7
SLIDE 7

Direct addressing in vertical

6

slide-8
SLIDE 8

Direct addressing in vertical

6

slide-9
SLIDE 9

Direct addressing in vertical

a c a + 1 c + 1 a + 2 c + 2 a + (n − 1) c + (n − 1) a + n c + n

6

slide-10
SLIDE 10

Direct addressing in vertical

  • Connectivity specified by

base cell indirection and fixed

  • ffset for each dof in extruded

cells

  • Walking up column is

prefetch-friendly stride-n access

  • Also decreases memory

footprint of indirections

a c a + 1 c + 1 a + 2 c + 2 a + (n − 1) c + (n − 1) a + n c + n

6

slide-11
SLIDE 11

Does it work?

  • What to measure?
  • Bandwidth, flops? Depends what regime we're in.
  • For bandwidth bound problems, we measure valuable bandwidth
  • assume there is a perfect ordering that allows us to stream data

from main memory, use it and then evict, always using full cache lines

  • Doesn't count memory footprint of indirections
  • Easy to measure: data volume / execution time
  • Lower bound on actual data movement, can compare to STREAM

7

slide-12
SLIDE 12
  • 1

4 12 24 916 5000 10000 15000 20000 25000 29688 MPI processes VBW [GB/s]

75 cell layers

  • 1

8 16 25 50 Number of cell layers

12 MPI processes

  • P0xP0

P0xP1 P0xP1dg P1xP0 P1xP1 P1xP1dg P1dgxP0 P1dgxP1

12 core Intel Xeon E5-2620 @ 2.0 GHz Intel 14.0.1 (-O3 -xAVX), Intel MPI 3.1.038 STREAM Triad 42 GB/s

(Very) simple (bandwidth bound) rhs assembly:

  • Base mesh: ~800000 triangles (Hilbert curve numbering)

8

slide-13
SLIDE 13

What about FLOP-limited cases?

  • We use (modified) versions of the FEniCS toolchain to generate

assembly kernels

  • These are further optimised by an FE-aware loop-nest compiler,

Luporini et al. arxiv:1407.0904 [cs.MS]

  • Performs loop-invariant code motion, vector-alignment and

padding, loop unrolling/permutation and manual vectorisation

  • For P2 Helmholtz operator, we sustain ~20GFlop/s on a single core
  • Guaranteed not to exceed 30.4 GFlop/s, with simultaneous

issue of 1 AVX mul and 1 AVX add per cycle (not Haswell, so no FMA)

9

slide-14
SLIDE 14

Building elements

  • Performance no good if we can't express

variational problems

  • Drive Firedrake using (mostly) DOLFIN-compatible

Python interface

  • Express variational problems on extruded meshes

using extensions to UFL

10

slide-15
SLIDE 15

m = UnitIntervalMesh(5)

  • mesh = ExtrudedMesh(m, layers=5)
  • U0 = FiniteElement("CG", interval, 1)

U1 = FiniteElement("DG", interval, 0) V0 = FiniteElement("CG", interval, 1) V1 = FiniteElement("DG", interval, 0)

  • W0_elt = OuterProductElement(U0, V0)

W1_a = HDiv(OuterProductElement(U1, V0)) W1_b = HDiv(OuterProductElement(U0, V1))

  • W1_elt = W1_a + W1_b
  • W0 = FunctionSpace(mesh, W0_elt)

W1 = FunctionSpace(mesh, W1_elt)

  • W = W0*W1

sigma, u = TrialFunctions(W) tau, v = TestFunctions(W)

  • L = assemble((sigma*tau - inner(rot(tau), u) + inner(rot(sigma), v) +

div(u)*div(v))*dx)

slide-16
SLIDE 16

m = UnitIntervalMesh(5)

  • mesh = ExtrudedMesh(m, layers=5)
  • U0 = FiniteElement("CG", interval, 1)

U1 = FiniteElement("DG", interval, 0) V0 = FiniteElement("CG", interval, 1) V1 = FiniteElement("DG", interval, 0)

  • W0_elt = OuterProductElement(U0, V0)

W1_a = HDiv(OuterProductElement(U1, V0)) W1_b = HDiv(OuterProductElement(U0, V1))

  • W1_elt = W1_a + W1_b
  • W0 = FunctionSpace(mesh, W0_elt)

W1 = FunctionSpace(mesh, W1_elt)

  • W = W0*W1

sigma, u = TrialFunctions(W) tau, v = TestFunctions(W)

  • L = assemble((sigma*tau - inner(rot(tau), u) + inner(rot(sigma), v) +

div(u)*div(v))*dx)

slide-17
SLIDE 17

Product complexes

  • We support elements that are a tensor product of base mesh basis

functions, and interval basis functions

  • Motivating application: mimetic FE for numerical weather prediction,

requires a discrete de Rham complex

  • Construct from product of base mesh and interval complexes

12

slide-18
SLIDE 18

Product complexes

  • We support elements that are a tensor product of base mesh basis

functions, and interval basis functions

  • Motivating application: mimetic FE for numerical weather prediction,

requires a discrete de Rham complex

  • Construct from product of base mesh and interval complexes

U0 U1 U2

d d

12

slide-19
SLIDE 19

Product complexes

  • We support elements that are a tensor product of base mesh basis

functions, and interval basis functions

  • Motivating application: mimetic FE for numerical weather prediction,

requires a discrete de Rham complex

  • Construct from product of base mesh and interval complexes

U0 U1 U2

d d

V0 V1

d

12

slide-20
SLIDE 20

Product complexes

  • We support elements that are a tensor product of base mesh basis

functions, and interval basis functions

  • Motivating application: mimetic FE for numerical weather prediction,

requires a discrete de Rham complex

  • Construct from product of base mesh and interval complexes

U0 U1 U2

d d

V0 V1

d

U0 ⊗ V0 U0 ⊗ V1 ⊕ U1 ⊗ V0 U1 ⊗ V1 ⊕ U2 ⊗ V0 U2 ⊗ V1

d d d

12

slide-21
SLIDE 21

Example: lowest order RT and Nedelec 1st kind

U0 = FiniteElement("CG", interval, 1) U1 = FiniteElement("DG", interval, 0) V0 = FiniteElement("CG", interval, 1) V1 = FiniteElement("DG", interval, 0)

  • W1_a = OuterProductElement(U0, V1)
  • W1_b = OuterProductElement(U1, V0)
  • W1 = W1_a + W1_b
  • RT1 = HDiv(W1)
  • N1 = HCurl(W1)

13

slide-22
SLIDE 22

Example: lowest order RT and Nedelec 1st kind

U0 = FiniteElement("CG", interval, 1) U1 = FiniteElement("DG", interval, 0) V0 = FiniteElement("CG", interval, 1) V1 = FiniteElement("DG", interval, 0)

  • W1_a = OuterProductElement(U0, V1)
  • W1_b = OuterProductElement(U1, V0)
  • W1 = W1_a + W1_b
  • RT1 = HDiv(W1)
  • N1 = HCurl(W1)

13

slide-23
SLIDE 23

Example: lowest order RT and Nedelec 1st kind

U0 = FiniteElement("CG", interval, 1) U1 = FiniteElement("DG", interval, 0) V0 = FiniteElement("CG", interval, 1) V1 = FiniteElement("DG", interval, 0)

  • W1_a = OuterProductElement(U0, V1)
  • W1_b = OuterProductElement(U1, V0)
  • W1 = W1_a + W1_b
  • RT1 = HDiv(W1)
  • N1 = HCurl(W1)

13

slide-24
SLIDE 24

Example: lowest order RT and Nedelec 1st kind

U0 = FiniteElement("CG", interval, 1) U1 = FiniteElement("DG", interval, 0) V0 = FiniteElement("CG", interval, 1) V1 = FiniteElement("DG", interval, 0)

  • W1_a = OuterProductElement(U0, V1)
  • W1_b = OuterProductElement(U1, V0)
  • W1 = W1_a + W1_b
  • RT1 = HDiv(W1)
  • N1 = HCurl(W1)

13

slide-25
SLIDE 25

Example: lowest order RT and Nedelec 1st kind

U0 = FiniteElement("CG", interval, 1) U1 = FiniteElement("DG", interval, 0) V0 = FiniteElement("CG", interval, 1) V1 = FiniteElement("DG", interval, 0)

  • W1_a = OuterProductElement(U0, V1)
  • W1_b = OuterProductElement(U1, V0)
  • W1 = W1_a + W1_b
  • RT1 = HDiv(W1)
  • N1 = HCurl(W1)

13

slide-26
SLIDE 26

Example: lowest order RT and Nedelec 1st kind

U0 = FiniteElement("CG", interval, 1) U1 = FiniteElement("DG", interval, 0) V0 = FiniteElement("CG", interval, 1) V1 = FiniteElement("DG", interval, 0)

  • W1_a = OuterProductElement(U0, V1)
  • W1_b = OuterProductElement(U1, V0)
  • W1 = W1_a + W1_b
  • RT1 = HDiv(W1)
  • N1 = HCurl(W1)

13

slide-27
SLIDE 27

N2_1 = FiniteElement("N2curl", triangle, 1) CG_2 = FiniteElement("CG", interval, 2)

  • Ned_horiz = HCurl(OuterProductElement(N2_1, CG_2))
  • P2tri = FiniteElement("CG", triangle, 2)

P1dg = FiniteElement("DG", interval, 1) Ned_vert = HCurl(OuterProductElement(P2tri, P1dg))

  • Ned_wedge = Ned_horiz + Ned_vert

Lowest order Nedelec 2nd kind on prism

slide-28
SLIDE 28

N2_1 = FiniteElement("N2curl", triangle, 1) CG_2 = FiniteElement("CG", interval, 2)

  • Ned_horiz = HCurl(OuterProductElement(N2_1, CG_2))
  • P2tri = FiniteElement("CG", triangle, 2)

P1dg = FiniteElement("DG", interval, 1) Ned_vert = HCurl(OuterProductElement(P2tri, P1dg))

  • Ned_wedge = Ned_horiz + Ned_vert

Lowest order Nedelec 2nd kind on prism

slide-29
SLIDE 29

Future work

  • MF Multigrid for mixed Helmholtz problems on

extruded meshes, with Eike Müller (Bath)

  • Exploit structure inside tensor product kernels

(currently we just splat everything out)

  • 3d numerics and performance characterisation for

atmosphere, with Colin Cotter (Imperial)

15

slide-30
SLIDE 30

Questions?

Funding: Grantham Institute for climate change NERC grants NE/K008951/1, NE/K006789/1, NE/G523512/1 EPSRC grants EP/L000407/1, EP/K008730/1, EP/I00677X/1

16

www.firedrakeproject.org firedrake@imperial.ac.uk