CCGO CENTER FOR COMPUTATIONAL GEOSCIENCES & OPTIMIZATION - - PowerPoint PPT Presentation

ccgo
SMART_READER_LITE
LIVE PREVIEW

CCGO CENTER FOR COMPUTATIONAL GEOSCIENCES & OPTIMIZATION - - PowerPoint PPT Presentation

Low-Cost Parallel Algorithms for 2:1 Octree Balance Toby Isaac 1 , Carsten Burstedde 2 , Omar Ghattas 134 1 Institute for Computational Engineering and Sciences, UT-Austin 3 Jackson School of Geosciences, UT-Austin 4 Dept. of Mechanical Engineering,


slide-1
SLIDE 1

Low-Cost Parallel Algorithms for 2:1 Octree Balance

Toby Isaac1, Carsten Burstedde2, Omar Ghattas134

1Institute for Computational Engineering and Sciences, UT-Austin 3Jackson School of Geosciences, UT-Austin

  • 4Dept. of Mechanical Engineering, UT-Austin

2Insitut f¨

ur Numerische Simulation, Universit¨ at Bonn

IEEE International Parallel & Distributed Processing Symposium, Shanghai, China Tuesday, 22 May, 2012

CCGO

CENTER FOR COMPUTATIONAL GEOSCIENCES & OPTIMIZATION Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 1 / 46

slide-2
SLIDE 2

Outline

1

Parallel AMR

2

Octree-based parallel AMR

3

2:1 Balance

4

Parallel 2:1 Balance

5

Results

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 2 / 46

slide-3
SLIDE 3

Parallel AMR

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 3 / 46

slide-4
SLIDE 4

Parallel adaptive mesh refinement (AMR)

Parallel AMR libraries are frameworks for managing meshes which require locally varying degrees of refinement during their use.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 4 / 46

slide-5
SLIDE 5

General requirements for Parallel AMR

Typical AMR framework methods: refine and coarsen to achieve desired resolution enforce mesh quality (more on this below) partition a mesh between processes

balances the workload minimizes communication minimizes the difference from the current partition

provide local neighborhood information, e.g. local adjacency and

  • ut-of-process adjacency

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 5 / 46

slide-6
SLIDE 6

Example: unstructured

mesh is adjacency graph adjacency to out-of-processor triangles is called ghost layer refinement (coarsening) replaces triangles with more (fewer) that currently occupy the same region. mesh-quality ∼ triangle shape, enforced during refinement / coarsening partitioned using parallel graph partitioning algorithm (c.f. ParMETIS, Zoltan, etc.) Jonathan Richard Shewchuk’s Triangle (http://www.cs.cmu.edu/∼quake/triangle.html)

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 6 / 46

slide-7
SLIDE 7

Example: block structured

mesh is a hierarchy of structured grids adjacency implicit from index arithmetic refinement (coarsening) by adding (subtracting) grids mesh-quality: size difference between neighboring grids cannot be too large (similar to 2:1 condition) partitioning by assigning whole grids to processes, in addition to splitting / merging grids Lawrence Berkeley Lab’s Chombo (http://commons.lbl.gov/display/chombo)

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 7 / 46

slide-8
SLIDE 8

Octree-based parallel AMR

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 8 / 46

slide-9
SLIDE 9

Octant Datatype

Basic unit: octant, a coordinate and a compatible size (x,ℓ) x ∈ Rd 2ℓ, ℓ ∈ Z To be valid, x = 2ℓr for some r ∈ Zd, e.g.:

ℓ = 3, x =    101000 011000 010000

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 9 / 46

slide-10
SLIDE 10

Space filling curve

A space filling curve organizes octants into an array in a predictable fashion. When partitioned w.r.t. a space filling cure, octree partitions have probably good shape bound on number of neighboring partitions

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 10 / 46

slide-11
SLIDE 11

Mesh representation: linear octrees

The mesh is a linear octree: a sorted (w.r.t. space filling curve) array of octants, without gap

  • verlaps

Implicitly a tree: no ancestors of the “leaves” are stored.

  • verlaps and gaps

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 11 / 46

slide-12
SLIDE 12

Mesh representation: linear octrees

The mesh is a linear octree: a sorted (w.r.t. space filling curve) array of octants, without gap

  • verlaps

Implicitly a tree: no ancestors of the “leaves” are stored. gaps

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 11 / 46

slide-13
SLIDE 13

Mesh representation: linear octrees

The mesh is a linear octree: a sorted (w.r.t. space filling curve) array of octants, without gap

  • verlaps

Implicitly a tree: no ancestors of the “leaves” are stored. linear octree

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 11 / 46

slide-14
SLIDE 14

Extending the framework to multiple trees: p4est

A linear octree is assigned to each hexahedron in a coarse hexahedral mesh: a “forest” of octrees. (http://www.p4est.org) k0 k1 p0 p1 p1 p2 k0 k1 x0 y0 x1 y1 In parallel, the only data duplicated on each process is coarse mesh topology ranges (i.e. interval of the space filling curve) assigned to each process The next few slides illustrate the essential parallel AMR routines for the forest-of-octree approach.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 12 / 46

slide-15
SLIDE 15

Refine / Coarsen

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 13 / 46

slide-16
SLIDE 16

Refine / Coarsen

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 13 / 46

slide-17
SLIDE 17

2:1 Balance

(More on this in the next section and beyond)

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 14 / 46

slide-18
SLIDE 18

2:1 Balance

(More on this in the next section and beyond)

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 14 / 46

slide-19
SLIDE 19

Partition / Repartition

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 15 / 46

slide-20
SLIDE 20

Partition / Repartition

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 15 / 46

slide-21
SLIDE 21

Brief description of algorithmic complexity

O(n) single pass, no

communication

Refine Coarsen

O(n) single pass, predictable

communication

Ghost Repartition

O(nlogn), random access,

asymmetric communication

2:1 Balance Nodes (Meshing)

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 16 / 46

slide-22
SLIDE 22

Time in p4est (old 2:1 balance algorithms)

10 20 30 40 50 60 70 80 90 100 12 60 432 3444 27540 220320 Percentage of runtime Number of CPU cores Partition Balance Ghost Nodes

[1]

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 17 / 46

slide-23
SLIDE 23

2:1 Balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 18 / 46

slide-24
SLIDE 24

2:1 Balance conditions

2D:

0-balance 1-balance 2-balance

3D: 0- (none), 1- (faces), 2- (faces & edges) & 3-balance (faces, edges & corners) The choice of balance condition is application specific.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 19 / 46

slide-25
SLIDE 25

Concept: coarsest balanced neighborhood

Each octant o has a neighborhood Nk(o) that is as coarse as can be for a given balance condition. It has the same shape for each family of sibling

  • ctants.

2D 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 20 / 46

slide-26
SLIDE 26

Concept: coarsest balanced neighborhood

Each octant o has a neighborhood Nk(o) that is as coarse as can be for a given balance condition. It has the same shape for each family of sibling

  • ctants.

2D 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 20 / 46

slide-27
SLIDE 27

Concept: coarsest balanced neighborhood

Each octant o has a neighborhood Nk(o) that is as coarse as can be for a given balance condition. It has the same shape for each family of sibling

  • ctants.

3D 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 20 / 46

slide-28
SLIDE 28

Concept: coarsest balanced neighborhood

Each octant o has a neighborhood Nk(o) that is as coarse as can be for a given balance condition. It has the same shape for each family of sibling

  • ctants.

3D 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 20 / 46

slide-29
SLIDE 29

Concept: coarsest balanced neighborhood

Each octant o has a neighborhood Nk(o) that is as coarse as can be for a given balance condition. It has the same shape for each family of sibling

  • ctants.

3D 3-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 20 / 46

slide-30
SLIDE 30

Concept: coarsest balanced octree

The coarsest balanced neighborhood can be extended to the coarsest balanced octree Tk(o). It does not look the same for all octants.

1-balance 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 21 / 46

slide-31
SLIDE 31

Concept: coarsest balanced octree

The coarsest balanced neighborhood can be extended to the coarsest balanced octree Tk(o). It does not look the same for all octants.

1-balance 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 21 / 46

slide-32
SLIDE 32

Formal definition of a Balance algorithm

Definition (Tk(S))

Given an arbitrary set of octants S, Tk(S) is equal to the leaves (i.e. non-ancestor) octants in

  • ∈S Tk(o).

The purpose of a Balance algorithm is to convert a linear

  • ctree T into Tk(T).

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 22 / 46

slide-33
SLIDE 33

Ripple effect

Example (2D, 1-balance)

This octree would be balanced if the blue octants were not present. The presence of the blue octants causes changes across the whole diameter

  • f the octree.

This has big implications for parallel algorithms.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 23 / 46

slide-34
SLIDE 34

Ripple effect

Example (2D, 1-balance)

This octree would be balanced if the blue octants were not present. The presence of the blue octants causes changes across the whole diameter

  • f the octree.

This has big implications for parallel algorithms.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 23 / 46

slide-35
SLIDE 35

Ripple effect

Example (2D, 1-balance)

This octree would be balanced if the blue octants were not present. The presence of the blue octants causes changes across the whole diameter

  • f the octree.

This has big implications for parallel algorithms.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 23 / 46

slide-36
SLIDE 36

Ripple effect

Example (2D, 1-balance)

This octree would be balanced if the blue octants were not present. The presence of the blue octants causes changes across the whole diameter

  • f the octree.

This has big implications for parallel algorithms.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 23 / 46

slide-37
SLIDE 37

Serial 2:1 Balance algorithm

Serial 2:1 Balance algorithm

Start: unbalanced linear octree T for o ∈ T T ← T ∪Nk(o) (T no longer linear octree: not ordered, overlaps)

  • rder T and remove overlaps

Recognizing and eliminating redundant octants from this process greatly improves its performance. For more information, see Section 3 of our paper.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 24 / 46

slide-38
SLIDE 38

Parallel 2:1 Balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 25 / 46

slide-39
SLIDE 39

Parallel Balance: the Ripple algorithm

p q r s ↔ ↔ ↔ local balance using a serial algorithm exchange neighboring information local rebalance using neighboring information repeat

O(P) rounds of communication may be necessary. This algorithm is

appropriate in low latency settings.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 26 / 46

slide-40
SLIDE 40

Parallel Balance: the Ripple algorithm

p q r s ↔ ↔ ↔ local balance using a serial algorithm exchange neighboring information local rebalance using neighboring information repeat

O(P) rounds of communication may be necessary. This algorithm is

appropriate in low latency settings.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 26 / 46

slide-41
SLIDE 41

Parallel Balance: the Ripple algorithm

p q r s ↔ ↔ ↔ local balance using a serial algorithm exchange neighboring information local rebalance using neighboring information repeat

O(P) rounds of communication may be necessary. This algorithm is

appropriate in low latency settings.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 26 / 46

slide-42
SLIDE 42

Parallel Balance: the Ripple algorithm

p q r s ↔ ↔ ↔ local balance using a serial algorithm exchange neighboring information local rebalance using neighboring information repeat

O(P) rounds of communication may be necessary. This algorithm is

appropriate in low latency settings.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 26 / 46

slide-43
SLIDE 43

Parallel Balance: the One-Pass algorithm

local balance using a serial algorithm exchange neighboring and remote information local rebalance using all pertinent information Na¨ ıvely requires all-to-all communication

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 27 / 46

slide-44
SLIDE 44

The insulation layer I(r)

When enforcing 2:1 balance, an octant is only affected by octants within its insulation layer[3]. process p process q greatly reduces number of processes that must communicate relationship is not symmetric

p does not know a priori it affects q

An efficient scheme for determining communicating pairs is required. See section 5 of our paper.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 28 / 46

slide-45
SLIDE 45

The insulation layer I(r)

When enforcing 2:1 balance, an octant is only affected by octants within its insulation layer[3]. r process p process q I(r) p affects q greatly reduces number of processes that must communicate relationship is not symmetric

p does not know a priori it affects q

An efficient scheme for determining communicating pairs is required. See section 5 of our paper.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 28 / 46

slide-46
SLIDE 46

The insulation layer I(r)

When enforcing 2:1 balance, an octant is only affected by octants within its insulation layer[3]. r process p process q I(r) p affects q q does not affect p greatly reduces number of processes that must communicate relationship is not symmetric

p does not know a priori it affects q

An efficient scheme for determining communicating pairs is required. See section 5 of our paper.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 28 / 46

slide-47
SLIDE 47

One-pass communication

process p process q

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 29 / 46

slide-48
SLIDE 48

One-pass communication

r process q ⇒ send r to p

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 29 / 46

slide-49
SLIDE 49

One-pass communication

  • r

process p ⇒ send o to q

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 29 / 46

slide-50
SLIDE 50

One-pass communication

  • r

process q

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 29 / 46

slide-51
SLIDE 51

One-pass communication

  • r

process q Once remote octants are received, how do we determine their effect?

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 29 / 46

slide-52
SLIDE 52

Rebalancing with remote octants, old algorithm

  • r

process q

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 30 / 46

slide-53
SLIDE 53

Rebalancing with remote octants, old algorithm

  • r

process q

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 30 / 46

slide-54
SLIDE 54

Rebalancing with remote octants, old algorithm

  • r

process q

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 30 / 46

slide-55
SLIDE 55

Rebalancing with remote octants, old algorithm

  • process q

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 30 / 46

slide-56
SLIDE 56

Rebalancing with remote octants, old algorithm

  • process q

This method is not O(1), represent redundant work

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 30 / 46

slide-57
SLIDE 57

Determining remote balance

  • ℓ?

process q δx δy We want an O(1) method to determine the size ℓ from displacement δ

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 31 / 46

slide-58
SLIDE 58

Illustration: 2D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46

slide-59
SLIDE 59

Illustration: 2D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46

slide-60
SLIDE 60

Illustration: 2D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46

slide-61
SLIDE 61

Illustration: 2D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46

slide-62
SLIDE 62

Illustration: 2D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46

slide-63
SLIDE 63

Illustration: 2D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46

slide-64
SLIDE 64

Illustration: 2D, 2-balance

2ℓ ∼ max{δx,δy}

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46

slide-65
SLIDE 65

Illustration: 2D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46

slide-66
SLIDE 66

Illustration: 2D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46

slide-67
SLIDE 67

Illustration: 2D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46

slide-68
SLIDE 68

Illustration: 2D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46

slide-69
SLIDE 69

Illustration: 2D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46

slide-70
SLIDE 70

Illustration: 2D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46

slide-71
SLIDE 71

Illustration: 2D, 1-balance

2ℓ ∼ δx +δy

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46

slide-72
SLIDE 72

Illustration: 3D, 3-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46

slide-73
SLIDE 73

Illustration: 3D, 3-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46

slide-74
SLIDE 74

Illustration: 3D, 3-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46

slide-75
SLIDE 75

Illustration: 3D, 3-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46

slide-76
SLIDE 76

Illustration: 3D, 3-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46

slide-77
SLIDE 77

Illustration: 3D, 3-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46

slide-78
SLIDE 78

Illustration: 3D, 3-balance

2ℓ ∼ max{δx,δy,δz}

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46

slide-79
SLIDE 79

Illustration: 3D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46

slide-80
SLIDE 80

Illustration: 3D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46

slide-81
SLIDE 81

Illustration: 3D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46

slide-82
SLIDE 82

Illustration: 3D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46

slide-83
SLIDE 83

Illustration: 3D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46

slide-84
SLIDE 84

Illustration: 3D, 2-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46

slide-85
SLIDE 85

Computing the Sierpinski profile

The size 2ℓ can be computed using ternary addition: Reinterpret the binary displacement δ as a base-3 number. Set λ = δx +δy +δz, base-3. Reinterpret λ as a binary number. Return λ. We only need the most significant bit of λ, so we can approximate it with the expression

Carry3(δx,δy,δz) := max{δx,δy,δz,δx +δy +δz −(δx|δy|δz)}.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 36 / 46

slide-86
SLIDE 86

Illustration: 3D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46

slide-87
SLIDE 87

Illustration: 3D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46

slide-88
SLIDE 88

Illustration: 3D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46

slide-89
SLIDE 89

Illustration: 3D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46

slide-90
SLIDE 90

Illustration: 3D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46

slide-91
SLIDE 91

Illustration: 3D, 1-balance

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46

slide-92
SLIDE 92

Illustration: 3D, 1-balance

2ℓ ∼ Carry3(δy +δz,δz +δx,δx +δy)

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46

slide-93
SLIDE 93

Results

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 38 / 46

slide-94
SLIDE 94

Weak scaling

Compared old and new algorithms on the Jaguar XT5 supercomputer at Oak Ridge National Laboratory. Fractal refinement pattern, increasing refinement proportional to the number of cpus. ∼ 1.3 Million octants per cpu.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 39 / 46

slide-95
SLIDE 95

Weak scaling: full one-pass algorithm

1 2 3 4 5 6 12 96 768 6144 49152 112128 Seconds per (million elements / core) Number of CPU cores Old New Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 40 / 46

slide-96
SLIDE 96

Weak scaling: components

0.5 1 1.5 2 2.5 3 12 96 768 6144 49152 112128 Seconds per (million elements / core) Number of CPU cores Old New 0.5 1 1.5 2 2.5 12 96 768 6144 49152 112128 Seconds per (million elements / core) Number of CPU cores Old New

Local balance (serial algorithm) Local rebalance (remote balancing)

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 41 / 46

slide-97
SLIDE 97

Strong scaling

Compared old and new algorithms on the Jaguar XT5 supercomputer at Oak Ridge National Laboratory. Mesh of Antarctic ice sheet, with localized refinement to resolve the transition from grounded to floating ice, with ∼ 90 million octants. Doubling processor counts from 12 to 6,144.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 42 / 46

slide-98
SLIDE 98

Strong scaling: full one-pass algorithm

0.1 1 10 100 12 24 48 96 192 384 768 1536 3072 6144 Seconds Number of CPU cores Perfect Scaling Old New Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 43 / 46

slide-99
SLIDE 99

Strong scaling: components

0.01 0.1 1 10 12 24 48 96 192 384 768 1536 3072 6144 Seconds Number of CPU cores Perfect Scaling Old New 0.001 0.01 0.1 1 10 12 24 48 96 192 384 768 1536 3072 6144 Seconds Number of CPU cores Perfect Scaling Old New

Local balance (serial algorithm) Local rebalance (remote balancing)

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 44 / 46

slide-100
SLIDE 100

Thank you

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 45 / 46

slide-101
SLIDE 101

References I

  • C. BURSTEDDE, L. C. WILCOX, AND O. GHATTAS, p4est: Scalable

algorithms for parallel adaptive mesh refinement on forests of octrees, SIAM Journal on Scientific Computing, 33 (2011), pp. 1103–1133.

  • G. STADLER, M. GURNIS, C. BURSTEDDE, L. C. WILCOX, L. ALISIC, AND
  • O. GHATTAS, The dynamics of plate tectonics and mantle flow: From local

to global scales, Science, 329 (2010), pp. 1033–1038.

  • H. SUNDAR, R. SAMPATH, AND G. BIROS, Bottom-up construction and 2:1

balance refinement of linear octrees in parallel, SIAM Journal on Scientific Computing, 30 (2008), pp. 2675–2708.

Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 46 / 46