Towards achieving GPU-native adaptive mesh refinement
Ania Brown Prof Takayuki Aoki
Towards achieving GPU-native adaptive mesh refinement Ania Brown - - PowerPoint PPT Presentation
Towards achieving GPU-native adaptive mesh refinement Ania Brown Prof Takayuki Aoki Why AMR? AMR is not GPU friendly Complicated, time varying data structures Can you use AMR and keep GPU performance? My conclusion: yes, but
Ania Brown Prof Takayuki Aoki
data structures
keep GPU performance? My conclusion: yes, but it’s messy
calculations on a square structured mesh
i, j+1 i+1, j i-1, j i, j i, j-1
Simulation mesh Refinement representation
Simulation mesh Refinement representation
Simulation mesh Refinement representation
Simulation mesh Refinement representation … …
Simulation mesh Refinement representation … … … …
Block-structured Enzo CHOMBO Octree RAMSES Octree + patches FLASH, using PARAMESH NIRVANA
CPU GPU
Octree + patches GAMER (2011) Daino (2016)
Initialize data structures Create halo regions Update patch values Refine/coarsen patches Output values for visualisation
Main loop:
Update neighbour relations
Initialize data structures
Main loop: CPU GPU
Create halo regions Update patch values Refine/coarsen patches Output values for visualisation Update neighbour relations
1 CUDA block
coalesced access
Initialize data structures
Main loop: CPU GPU
Create halo regions Update patch values Refine/coarsen patches Output values for visualisation Update neighbour relations
Initialize data structures
Main loop: CPU GPU
Create halo regions Update patch values Refine/coarsen patches Output values for visualisation Update neighbour relations
At each step:
the original curve
Leaf nodes: Neighbour indices in each direction Parent index Parent nodes: Child indices
Find correct neighbour node
Find correct neighbour node
Find correct neighbour node
Copy halo values
Main loop: CPU GPU
Refine/coarsen patch values Update neighbour relations Defragment value array Find patches to coarsen/refine
refined, coarsened or unchanged
Coarsened: -3 Unchanged: 0
array
steps
Load balancing
Node 0 Node 1 Node 2
Node 0 Node 1 Node 2
for initialisation, resolution criterion, stencil calc
structures
interpolation level, stencil type
Code by: T.Shimokawabe, T.Takaki (2011)
7 refinement levels in quad-tree
Regular mesh Adaptive mesh
256 x 256
L = 1.5 × 10−3m
R = 4.5 × 10−4m
∆xmin = 6 × 10−6m ∆xmax = 1.2 × 10−5m
8192 x 8192
L = 1.5 × 10−3m
R = 4.5 × 10−4m
∆xmax = 1.2 × 10−5m ∆xmin = 1.9 × 10−7m
Execution time per timestep (ms)
62.5 125 187.5 250
Resolution
1 2 4 2 4 8 3 7 2 4 9 6 5 1 2 6 1 4 4 7 1 6 8 8 1 9 2
AMR Regular Mesh Worst case AMR
2 5 6 5 1 2
neighbour relations on CPU
φ(x, y, t)
c = (1 − φ)cL + φcS
c(x, y, t)
cL cS
: phase : liquid concentration : solid concentration
∂c ∂t = r · [DSφrcS + DL(1 φ)rcL]
Diffusion Interface anisotropy Chemical driving force Phase change
∂φ ∂t = Mφ r · (a2rφ) + ∂ ∂x(a ∂a ∂φx |rφ|2) + ∂ ∂y (a ∂a ∂φy |rφ|2) + ∂ ∂z (a ∂a ∂φz |rφ|2) S∆T dp(φ) dφ W dq(φ) dφ
a p(φ), q : mobility : interface anisotropy : interpolating function DS, DL S W T q(φ) : double well function : diffusion in solid, liquid : entropy of fusion : temperature : height of double well potential