Flexible, Scalable Mesh and Data Management using PETSc DMPlex M. - - PowerPoint PPT Presentation

flexible scalable mesh and data management using petsc
SMART_READER_LITE
LIVE PREVIEW

Flexible, Scalable Mesh and Data Management using PETSc DMPlex M. - - PowerPoint PPT Presentation

Flexible, Scalable Mesh and Data Management using PETSc DMPlex M. Lange 1 M. Knepley 2 G. Gorman 1 1 AMCG, Imperial College London 2 Computation Institute, University of Chicago April 23, 2015 M. Lange, M. Knepley, G. Gorman DMPlex Mesh


slide-1
SLIDE 1

Flexible, Scalable Mesh and Data Management using PETSc DMPlex

  • M. Lange1
  • M. Knepley2
  • G. Gorman1

1AMCG, Imperial College London 2Computation Institute, University of Chicago

April 23, 2015

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-2
SLIDE 2

Unstructured Mesh Management

Mesh management

◮ Many tasks are common across applications:

Mesh input, partitioning, checkpointing, . . .

◮ File I/O can become severe bottleneck!

Mesh file formats

◮ Range of mesh generators and formats

Gmsh, Cubit, Triangle, ExodusII, CGNS, SILO, . . .

◮ No universally accepted format ◮ Applications often “roll their own” ◮ No interoperability between codes

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-3
SLIDE 3

Unstructured Mesh Management

Finding the right level of abstraction

◮ Abstract mesh topology interface ◮ Provided by a widely used library ◮ Extensible support for multiple formats ◮ Single point for extension and optimisation ◮ Many applications inherit capabilities ◮ Mesh management optimisations ◮ Scalable read/write routines ◮ Parallel partitioning and load-balancing ◮ Mesh renumbering techniques ◮ Parallel mesh adaptivity

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-4
SLIDE 4

DMPlex: Mesh topology abstraction

DMPlex - PETSc’s unstructured mesh API1

◮ Abstract mesh connectivity ◮ Directed Acyclic Graph (DAG)2 ◮ Dimensionless access ◮ Topology separate from discretisation ◮ Pre-allocate data structures ◮ Enables new preconditioners ◮ FieldSplit ◮ Geometric Multigrid 2 3 4 1 9 14 12 11 10 13 5 6 7 8 9 10 11 12 13 14 1 2 3 4

  • 1M. Knepley and D. Karpeev. Mesh Algorithms for PDE with Sieve I: Mesh Distribution. Sci. Program.,

17(3):215–230, August 2009

2Anders Logg. Efficient representation of computational meshes. International Journal of Computational

Science and Engineering, 4:283–295, 2009

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-5
SLIDE 5

DMPlex: Mesh topology abstraction

DMPlex - PETSc’s unstructured mesh API1

◮ Input: ExodusII, Gmsh, CGNS, Fluent-Case ◮ Output: VTK, HDF5 + Xdmf ◮ Visualizable checkpoints ◮ Parallel distribution ◮ Partitioners: Chaco, Metis/ParMetis ◮ Automated halo exchange via PetscSF ◮ Mesh renumbering ◮ Reverse Cuthill-McGee (RCM) 2 3 4 1 9 14 12 11 10 13 5 6 7 8 9 10 11 12 13 14 1 2 3 4

  • 1M. Knepley and D. Karpeev. Mesh Algorithms for PDE with Sieve I: Mesh Distribution. Sci. Program.,

17(3):215–230, August 2009

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-6
SLIDE 6

Fluidity-DMPlex Integration

Fluidity

◮ Unstructured finite element code ◮ Anisotropic mesh adaptivity ◮ Uses PETSc as linear solver engine ◮ Applications: ◮ CFD, geophysical flows, ocean

modelling, reservoir modelling, mining, nuclear safety, renewable energies, etc.

Bottleneck: Parallel pre-processing1

2 4 6 8 1 2 1 2

  • 1X. Guo, M. Lange, G. Gorman, L. Mitchell, and M. Weiland. Developing a scalable hybrid MPI/OpenMP

unstructured finite element model. Computers & Fluids, 110(0):227 – 234, 2015. ParCFD 2013

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-7
SLIDE 7

Fluidity - DMPlex Integration

Mesh Fields Fields Mesh Fields

Preprocessor Fluidity

Zoltan

Original

Mesh DMPlex DMPlex Fields

Fluidity

DMPlexDistribute

Current

Mesh DMPlex DMPlex Fields

Fluidity

Load Balance

Goal

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-8
SLIDE 8

Fluidity - DMPlex Integration

DMPlexDistribute

◮ Before: ◮ One-to-many ◮ Single-level overlap ◮ Overlap is expensive ◮ After: ◮ Generic mesh migration ◮ Parallel N-level overlap ◮ All-to-all via ParMetis ◮ Available to other codes ◮ Firedrake, Moose, . . .

2 6 12 24 48 96 Number of processors 100 101 time [sec]

DMPlexDistribute on unit cube (20483 cells)

Distribute DistributeOverlap Distribute::Mesh Partition Distribute::Mesh Migration

2 6 12 24 48 96 Number of processors 100 101 time [sec]

Load balancing a unit cube (20483 cells)

Redistribute::Mesh Partition Redistribute::Mesh Migration

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-9
SLIDE 9

Fluidity-DMPlex Integration

Mesh reordering

◮ Fluidity Halos ◮ Separate L1/L2 regions ◮ “Trailing receives” ◮ Requires permutation ◮ DMPlex provides RCM ◮ Generated locally ◮ Fields inherit reordering ◮ Better cache coherency

Serial Parallel Native ordering RCM reordering

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-10
SLIDE 10

Benchmark

Archer

◮ Cray XC30 ◮ 4920 nodes (118,080 cores) ◮ 12-core E5-2697 (Ivy Bridge)

Simulation

◮ Flow past a square cylinder ◮ 3D mesh, generated with Gmsh

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-11
SLIDE 11

Results - Simulation Startup

Startup on 4 nodes

◮ Runtime distribution wins ◮ Fast topology distribution ◮ No clear I/O gains ◮ Gmsh does not scale

8615 1183692 1942842 2944992 Mesh Size [elements] 5 10 15 20 25 30 time [sec]

Fluidity Startup - File I/O

Preprocessor-Read Preprocessor-Write Fluidity-Read Fluidity-Total DMPlex-DAG

8615 1183692 1942842 2944992 Mesh Size [elements] 20 40 60 80 100 120 time [sec]

Fluidity Startup

Preprocessor Fluidity (preprocessed) Fluidity Total Fluidity-DMPlex

8615 1183692 1942842 2944992 Mesh Size [elements] 10 20 30 40 50 60 70 time [sec]

Fluidity Startup - Distribute

Zoltan + Callbacks DMPlexDistribute

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-12
SLIDE 12

Results - Simulation Performance

Performance

◮ Mesh with ∼2 mio elements ◮ Preprocessor + 10 timesteps ◮ RCM brings improvements ◮ Pressure solve ◮ Velocity assembly

2 6 12 24 48 96 Number of Processes 102 time [sec]

Pressure Solve

Fluidity-DMPlex: RCM Fluidity-DMPlex: native Fluidity-Preprocessor

2 6 12 24 48 96 Number of Processes 103 time [sec]

Full Simulation

Fluidity-DMPlex: RCM Fluidity-DMPlex: native Fluidity-Preprocessor

2 6 12 24 48 96 Number of Processes 102 time [sec]

Velocity Assembly

Fluidity-DMPlex: RCM Fluidity-DMPlex: native Fluidity-Preprocessor

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management

slide-13
SLIDE 13

Discussion and Future Work

DMPlex mesh management for Fluidity

◮ No need to preprocess ◮ Increased interoperability: ◮ ExodusII, CGNS, Fluent-Case ◮ Performance benefits ◮ Fast runtime mesh distribution ◮ Optional RCM renumbering

Future work

◮ DMPlex-based checkpointing in Fluidity ◮ Scalable parallel mesh reads with DMPlex ◮ Anisotropic mesh adaptivity via DMPlex

  • M. Lange, M. Knepley, G. Gorman

DMPlex Mesh Management