Flexible, Scalable Mesh and Data Management using PETSc DMPlex
- M. Lange1
- M. Knepley2
- G. Gorman1
1AMCG, Imperial College London 2Computation Institute, University of Chicago
April 23, 2015
- M. Lange, M. Knepley, G. Gorman
DMPlex Mesh Management
Flexible, Scalable Mesh and Data Management using PETSc DMPlex M. - - PowerPoint PPT Presentation
Flexible, Scalable Mesh and Data Management using PETSc DMPlex M. Lange 1 M. Knepley 2 G. Gorman 1 1 AMCG, Imperial College London 2 Computation Institute, University of Chicago April 23, 2015 M. Lange, M. Knepley, G. Gorman DMPlex Mesh
1AMCG, Imperial College London 2Computation Institute, University of Chicago
DMPlex Mesh Management
◮ Many tasks are common across applications:
◮ File I/O can become severe bottleneck!
◮ Range of mesh generators and formats
◮ No universally accepted format ◮ Applications often “roll their own” ◮ No interoperability between codes
DMPlex Mesh Management
◮ Abstract mesh topology interface ◮ Provided by a widely used library ◮ Extensible support for multiple formats ◮ Single point for extension and optimisation ◮ Many applications inherit capabilities ◮ Mesh management optimisations ◮ Scalable read/write routines ◮ Parallel partitioning and load-balancing ◮ Mesh renumbering techniques ◮ Parallel mesh adaptivity
DMPlex Mesh Management
◮ Abstract mesh connectivity ◮ Directed Acyclic Graph (DAG)2 ◮ Dimensionless access ◮ Topology separate from discretisation ◮ Pre-allocate data structures ◮ Enables new preconditioners ◮ FieldSplit ◮ Geometric Multigrid 2 3 4 1 9 14 12 11 10 13 5 6 7 8 9 10 11 12 13 14 1 2 3 4
17(3):215–230, August 2009
2Anders Logg. Efficient representation of computational meshes. International Journal of Computational
Science and Engineering, 4:283–295, 2009
DMPlex Mesh Management
◮ Input: ExodusII, Gmsh, CGNS, Fluent-Case ◮ Output: VTK, HDF5 + Xdmf ◮ Visualizable checkpoints ◮ Parallel distribution ◮ Partitioners: Chaco, Metis/ParMetis ◮ Automated halo exchange via PetscSF ◮ Mesh renumbering ◮ Reverse Cuthill-McGee (RCM) 2 3 4 1 9 14 12 11 10 13 5 6 7 8 9 10 11 12 13 14 1 2 3 4
17(3):215–230, August 2009
DMPlex Mesh Management
◮ Unstructured finite element code ◮ Anisotropic mesh adaptivity ◮ Uses PETSc as linear solver engine ◮ Applications: ◮ CFD, geophysical flows, ocean
2 4 6 8 1 2 1 2
unstructured finite element model. Computers & Fluids, 110(0):227 – 234, 2015. ParCFD 2013
DMPlex Mesh Management
Zoltan
DMPlexDistribute
Load Balance
DMPlex Mesh Management
◮ Before: ◮ One-to-many ◮ Single-level overlap ◮ Overlap is expensive ◮ After: ◮ Generic mesh migration ◮ Parallel N-level overlap ◮ All-to-all via ParMetis ◮ Available to other codes ◮ Firedrake, Moose, . . .
2 6 12 24 48 96 Number of processors 100 101 time [sec]
DMPlexDistribute on unit cube (20483 cells)
Distribute DistributeOverlap Distribute::Mesh Partition Distribute::Mesh Migration
2 6 12 24 48 96 Number of processors 100 101 time [sec]
Load balancing a unit cube (20483 cells)
Redistribute::Mesh Partition Redistribute::Mesh Migration
DMPlex Mesh Management
◮ Fluidity Halos ◮ Separate L1/L2 regions ◮ “Trailing receives” ◮ Requires permutation ◮ DMPlex provides RCM ◮ Generated locally ◮ Fields inherit reordering ◮ Better cache coherency
DMPlex Mesh Management
◮ Cray XC30 ◮ 4920 nodes (118,080 cores) ◮ 12-core E5-2697 (Ivy Bridge)
◮ Flow past a square cylinder ◮ 3D mesh, generated with Gmsh
DMPlex Mesh Management
◮ Runtime distribution wins ◮ Fast topology distribution ◮ No clear I/O gains ◮ Gmsh does not scale
8615 1183692 1942842 2944992 Mesh Size [elements] 5 10 15 20 25 30 time [sec]
Fluidity Startup - File I/O
Preprocessor-Read Preprocessor-Write Fluidity-Read Fluidity-Total DMPlex-DAG
8615 1183692 1942842 2944992 Mesh Size [elements] 20 40 60 80 100 120 time [sec]
Fluidity Startup
Preprocessor Fluidity (preprocessed) Fluidity Total Fluidity-DMPlex
8615 1183692 1942842 2944992 Mesh Size [elements] 10 20 30 40 50 60 70 time [sec]
Fluidity Startup - Distribute
Zoltan + Callbacks DMPlexDistribute
DMPlex Mesh Management
◮ Mesh with ∼2 mio elements ◮ Preprocessor + 10 timesteps ◮ RCM brings improvements ◮ Pressure solve ◮ Velocity assembly
2 6 12 24 48 96 Number of Processes 102 time [sec]
Pressure Solve
Fluidity-DMPlex: RCM Fluidity-DMPlex: native Fluidity-Preprocessor
2 6 12 24 48 96 Number of Processes 103 time [sec]
Full Simulation
Fluidity-DMPlex: RCM Fluidity-DMPlex: native Fluidity-Preprocessor
2 6 12 24 48 96 Number of Processes 102 time [sec]
Velocity Assembly
Fluidity-DMPlex: RCM Fluidity-DMPlex: native Fluidity-Preprocessor
DMPlex Mesh Management
◮ No need to preprocess ◮ Increased interoperability: ◮ ExodusII, CGNS, Fluent-Case ◮ Performance benefits ◮ Fast runtime mesh distribution ◮ Optional RCM renumbering
◮ DMPlex-based checkpointing in Fluidity ◮ Scalable parallel mesh reads with DMPlex ◮ Anisotropic mesh adaptivity via DMPlex
DMPlex Mesh Management