Flexible, Scalable Mesh and Data Management using PETSc DMPlex M. - - PowerPoint PPT Presentation

▶

Apr 22, 2023 183 likes •328 views

Flexible, Scalable Mesh and Data Management using PETSc DMPlex M. Lange 1 M. Knepley 2 G. Gorman 1 1 AMCG, Imperial College London 2 Computation Institute, University of Chicago April 23, 2015 M. Lange, M. Knepley, G. Gorman DMPlex Mesh

SLIDE 1

Flexible, Scalable Mesh and Data Management using PETSc DMPlex

M. Lange1
M. Knepley2
G. Gorman1

1AMCG, Imperial College London 2Computation Institute, University of Chicago

April 23, 2015

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 2

Unstructured Mesh Management

Mesh management

◮ Many tasks are common across applications:

Mesh input, partitioning, checkpointing, . . .

◮ File I/O can become severe bottleneck!

Mesh file formats

◮ Range of mesh generators and formats

Gmsh, Cubit, Triangle, ExodusII, CGNS, SILO, . . .

◮ No universally accepted format ◮ Applications often “roll their own” ◮ No interoperability between codes

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 3

Unstructured Mesh Management

Finding the right level of abstraction

◮ Abstract mesh topology interface ◮ Provided by a widely used library ◮ Extensible support for multiple formats ◮ Single point for extension and optimisation ◮ Many applications inherit capabilities ◮ Mesh management optimisations ◮ Scalable read/write routines ◮ Parallel partitioning and load-balancing ◮ Mesh renumbering techniques ◮ Parallel mesh adaptivity

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 4

DMPlex: Mesh topology abstraction

DMPlex - PETSc’s unstructured mesh API1

◮ Abstract mesh connectivity ◮ Directed Acyclic Graph (DAG)2 ◮ Dimensionless access ◮ Topology separate from discretisation ◮ Pre-allocate data structures ◮ Enables new preconditioners ◮ FieldSplit ◮ Geometric Multigrid 2 3 4 1 9 14 12 11 10 13 5 6 7 8 9 10 11 12 13 14 1 2 3 4

1M. Knepley and D. Karpeev. Mesh Algorithms for PDE with Sieve I: Mesh Distribution. Sci. Program.,

17(3):215–230, August 2009

2Anders Logg. Efficient representation of computational meshes. International Journal of Computational

Science and Engineering, 4:283–295, 2009

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 5

DMPlex: Mesh topology abstraction

DMPlex - PETSc’s unstructured mesh API1

◮ Input: ExodusII, Gmsh, CGNS, Fluent-Case ◮ Output: VTK, HDF5 + Xdmf ◮ Visualizable checkpoints ◮ Parallel distribution ◮ Partitioners: Chaco, Metis/ParMetis ◮ Automated halo exchange via PetscSF ◮ Mesh renumbering ◮ Reverse Cuthill-McGee (RCM) 2 3 4 1 9 14 12 11 10 13 5 6 7 8 9 10 11 12 13 14 1 2 3 4

1M. Knepley and D. Karpeev. Mesh Algorithms for PDE with Sieve I: Mesh Distribution. Sci. Program.,

17(3):215–230, August 2009

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 6

Fluidity-DMPlex Integration

Fluidity

◮ Unstructured finite element code ◮ Anisotropic mesh adaptivity ◮ Uses PETSc as linear solver engine ◮ Applications: ◮ CFD, geophysical flows, ocean

modelling, reservoir modelling, mining, nuclear safety, renewable energies, etc.

Bottleneck: Parallel pre-processing1

2 4 6 8 1 2 1 2

1X. Guo, M. Lange, G. Gorman, L. Mitchell, and M. Weiland. Developing a scalable hybrid MPI/OpenMP

unstructured finite element model. Computers & Fluids, 110(0):227 – 234, 2015. ParCFD 2013

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 7

Fluidity - DMPlex Integration

Mesh Fields Fields Mesh Fields

Preprocessor Fluidity

Zoltan

Original

Mesh DMPlex DMPlex Fields

Fluidity

DMPlexDistribute

Current

Mesh DMPlex DMPlex Fields

Fluidity

Load Balance

Goal

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 8

Fluidity - DMPlex Integration

DMPlexDistribute

◮ Before: ◮ One-to-many ◮ Single-level overlap ◮ Overlap is expensive ◮ After: ◮ Generic mesh migration ◮ Parallel N-level overlap ◮ All-to-all via ParMetis ◮ Available to other codes ◮ Firedrake, Moose, . . .

2 6 12 24 48 96 Number of processors 100 101 time [sec]

DMPlexDistribute on unit cube (20483 cells)

Distribute DistributeOverlap Distribute::Mesh Partition Distribute::Mesh Migration

2 6 12 24 48 96 Number of processors 100 101 time [sec]

Load balancing a unit cube (20483 cells)

Redistribute::Mesh Partition Redistribute::Mesh Migration

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 9

Fluidity-DMPlex Integration

Mesh reordering

◮ Fluidity Halos ◮ Separate L1/L2 regions ◮ “Trailing receives” ◮ Requires permutation ◮ DMPlex provides RCM ◮ Generated locally ◮ Fields inherit reordering ◮ Better cache coherency

Serial Parallel Native ordering RCM reordering

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 10

Benchmark

Archer

◮ Cray XC30 ◮ 4920 nodes (118,080 cores) ◮ 12-core E5-2697 (Ivy Bridge)

Simulation

◮ Flow past a square cylinder ◮ 3D mesh, generated with Gmsh

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 11

Results - Simulation Startup

Startup on 4 nodes

◮ Runtime distribution wins ◮ Fast topology distribution ◮ No clear I/O gains ◮ Gmsh does not scale

8615 1183692 1942842 2944992 Mesh Size [elements] 5 10 15 20 25 30 time [sec]

Fluidity Startup - File I/O

Preprocessor-Read Preprocessor-Write Fluidity-Read Fluidity-Total DMPlex-DAG

8615 1183692 1942842 2944992 Mesh Size [elements] 20 40 60 80 100 120 time [sec]

Fluidity Startup

Preprocessor Fluidity (preprocessed) Fluidity Total Fluidity-DMPlex

8615 1183692 1942842 2944992 Mesh Size [elements] 10 20 30 40 50 60 70 time [sec]

Fluidity Startup - Distribute

Zoltan + Callbacks DMPlexDistribute

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 12

Results - Simulation Performance

Performance

◮ Mesh with ∼2 mio elements ◮ Preprocessor + 10 timesteps ◮ RCM brings improvements ◮ Pressure solve ◮ Velocity assembly

2 6 12 24 48 96 Number of Processes 102 time [sec]

Pressure Solve

Fluidity-DMPlex: RCM Fluidity-DMPlex: native Fluidity-Preprocessor

2 6 12 24 48 96 Number of Processes 103 time [sec]

Full Simulation

Fluidity-DMPlex: RCM Fluidity-DMPlex: native Fluidity-Preprocessor

2 6 12 24 48 96 Number of Processes 102 time [sec]

Velocity Assembly

Fluidity-DMPlex: RCM Fluidity-DMPlex: native Fluidity-Preprocessor

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

SLIDE 13

Discussion and Future Work

DMPlex mesh management for Fluidity

◮ No need to preprocess ◮ Increased interoperability: ◮ ExodusII, CGNS, Fluent-Case ◮ Performance benefits ◮ Fast runtime mesh distribution ◮ Optional RCM renumbering

Future work

◮ DMPlex-based checkpointing in Fluidity ◮ Scalable parallel mesh reads with DMPlex ◮ Anisotropic mesh adaptivity via DMPlex

M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management