asynchronous distributed memory task parallel algorithm
play

Asynchronous Distributed-Memory Task-Parallel Algorithm for - PowerPoint PPT Presentation

Asynchronous Distributed-Memory Task-Parallel Algorithm for Compressible Flows on 3D Unstructured Grids J. Bakosi, M. Charest, A. Pandare , J. Waltz Los Alamos National Laboratory, Los Alamos, NM, USA October 20, 2020 LA-UR-20-28309 Project


  1. Asynchronous Distributed-Memory Task-Parallel Algorithm for Compressible Flows on 3D Unstructured Grids J. Bakosi, M. Charest, A. Pandare , J. Waltz Los Alamos National Laboratory, Los Alamos, NM, USA October 20, 2020 LA-UR-20-28309

  2. Project goals ◮ Large-scale Computational Fluid Dynamics (CFD) capability ◮ Simulation use cases ◮ shocked flow over surrogate reentry bodies ◮ blast loading on vehicles or other complex structures ◮ weapons effects calculations in urban environments ◮ Distinguishing characteristics ◮ external flows over complex 3D geometries ◮ high-speed compressible flow ◮ Capability requirements compared to internal flow calculations ◮ complex domain must be explicitly meshed (rather than modeled) ◮ multiple orders of magnitude larger computational meshes ◮ larger demand for HPC: O (10 9 ) cells, O (10 4 ) CPUs must be routine calculations

  3. Quinoa::Inciter: Built on Charm++ ◮ Compressible hydro (single or multiple materials) ◮ Unstructured 3D (tetrahedra only) grids ◮ Continuous and discontinuous Galerkin finite elements ◮ Adaptive: mesh refinement (WIP), polynomial-degree refinement ◮ Native Charm++ code interoperating with MPI libs ◮ Overdecomposition ◮ Parallel I/O ◮ SMP, non-SMP ◮ Automatic load balancing ◮ Open source: quinoacomputing.org

  4. Quinoa::Inciter: ALECG hydro scheme, numerical method ◮ Edge-based finite element (or node-centered finite volume) method ◮ Compressible single-material (Euler, ideal gas) flow     ρ ρu j ∂U ∂t + ∂F j     = 0 , U = ρu i  , F j = ρu i u j + pδ ij ∂x j u j ( ρE + p ) ρE    ◮ Galerkin lumped-mass, locally conservative formulation � � � d U v = − 1 � � D vw j F vw B vw � F v j + F w � + B v j F v + j j j j V v d t j vw ∈ v vw ∈ v N v ∂N w − N w ∂N v = 1 � � � � � N v ( � x ) U v , D vw U ( � x ) = d Ω j 2 ∂x j ∂x j Ω h v ∈ Ω h Ω h ∈ vw = 1 � � � � B vw N v N w n j d Γ , B v N v N v n j d Γ j = j 2 Γ h Γ h Γ h ∈ vw Γ h ∈ v

  5. Quinoa::Inciter: ALECG hydro scheme, References I ◮ [1, 2, 3] J. Waltz, N. Morgan, T.R. Canfield, M.R.J. Charest, L.D. Risinger, and J.G. Wohlbier. A three-dimensional finite element arbitrary Lagrangian-Eulerian method for shock hydrodynamics on unstructured grids. Computers & Fluids , 92:172–187, 2014. J. Waltz, T.R. Canfield, N.R. Morgan, L.D. Risinger, and J.G. Wohlbier. Verification of a three-dimensional unstructured finite element method using analytic and manufactured solutions. Computers & Fluids , 81:57 – 67, 2013. J. Waltz, T.R. Canfield, N.R. Morgan, L.D. Risinger, and J.G. Wohlbier. Manufactured solutions for the three-dimensional Euler equations with relevance to Inertial Confinement Fusion. J. Comp. Phys. , 267:196 – 209, 2014.

  6. Solution verification: Vortical flow -3 10 ρ ρu 1 ρu 2 ρu 3 ρE -4 2nd order 10 log( L 2 ) -5 10 -6 10 -3 -2 -1 10 10 10 log( h ) Figure: Left: initial (first column) and final (second column) velocity, pressure (third column), and total energy distributions (fourth column). Right: L 2 errors as a function of mesh resolution.

  7. Solution verification: Sedov 4 Mesh 1 Mesh 2 Mesh 3 Mesh 4 3 semi-analytic density 2 1 0 0 0.2 0.4 0.6 0.8 1 1.2 x -1 10 Log(L1) rho (Slope = 0.9592) 1st order -2 10 -3 -2 10 10 Log(h)

  8. Solution validation: square cavity, domain and initial conditions State 1 State 2 6 0.5 5 5 5 10 Figure: Domain and initial conditions for square cavity problem. Dimensions are in cm.

  9. Solution validation: square cavity, solution with experimental data Figure: Solutions with increasingly finer meshes for the square cavity problem. Lines S 1 , Sr 1 , and Sr 2 denote experimental shock positions.

  10. Solution validation: Onera M6 wing, mesh and numerical solution Figure: Top – upper and lower surface mesh used for the ONERA M6 wing configuration. Bottom – computed pressure contours on the upper and lower surface.

  11. Solution validation: Onera M6 wing, simulation & experiments Surface pressure coefficient at 20% semispan Surface pressure coefficient at 44% semispan Surface pressure coefficient at 65% semispan 1.2 1.2 1.5 experiment experiment experiment computation (coarse mesh) computation (coarse mesh) computation (coarse mesh) 1 computation (finer mesh) 1 computation (finer mesh) computation (finer mesh) 0.8 0.8 1 0.6 0.6 0.4 0.4 0.5 0.2 -C p -C p 0.2 -C p 0 0 0 -0.2 -0.2 -0.4 -0.4 -0.5 -0.6 -0.8 -0.6 -1 -0.8 -1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x/c x/c x/c Surface pressure coefficient at 80% semispan Surface pressure coefficient at 90% semispan Surface pressure coefficient at 95% semispan 1.4 1.4 1.4 experiment experiment experiment computation (coarse mesh) computation (coarse mesh) computation (coarse mesh) 1.2 1.2 1.2 computation (finer mesh) computation (finer mesh) computation (finer mesh) 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 -C p -C p -C p 0.2 0.2 0.2 0 0 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x/c x/c x/c Figure: Comparison beween the computed and experimental surface pressure coefficient for the ONERA wing section at 20%, 44%, 65%, 80%, 90%, and 95% semispans.

  12. Quinoa::Inciter: ALECG, on-node performance Time step profile: µ s % rhs 8482724 91 bgrad 34333 0.4 diag 48549 0.5 solve 40355 0.4 total 27830000 100 RHS profile: µ s % grad 1109746 51 domain 677741 30 bnd 2565 src 413999 19 total 2183459 100

  13. Quinoa::Inciter: ALECG, on-node performance improvements 1. Remove unnecessary code for generating unused derived data structures: 1.6x . 2. Replace a tree-based data structure with a flat one, enabling a streaming-style (contiguous) access to normals associated to edges: 1.3x . 3. Re-write domain-integral from a nested loop (over mesh points and over edges connected to a point) as a single loop over unique edges: 1.3x . 4. Optimize data access in the source term: 1.4x . 5. Re-write the loop computing primitive-variable gradients from a gather-scatter loop over elements to a nested loop over mesh points with an inner loop over edges connected to a point: 1.5x . Altogether: 6.2x speedup

  14. Quinoa::Inciter: 3 hydro schemes, strong scaling 4 10 Single-material hydro, 794M cells (100 time steps, no I/O) 900 CG, non-SMP CG, SMP 1800 DG(P1), non-SMP Wall clock time, sec DG(P1), SMP ALECG, non-SMP 3 3600 10 ALECG, SMP ideal 360 7200 14400 900 28800 2 1800 50400 10 3600 7200 14400 36000 28800 1 10 2 3 4 5 10 10 10 10 Number of CPUs (36/node)

  15. Quinoa::Inciter: Parallel load imbalance triggered by physics Figure: Spatial distributions of extra load in each cell whose fluid density exceeds the value of 1.5, during time evolution of the Sedov problem: (left) shortly after the onset of load imbalance, (right) at a later time of the simulation.

  16. Quinoa::Inciter: Automatic load balancing yields 10x speedup 20000 no extra load, virt=0, noLB no extra load, virt=100x, noLB extra load, virt=0, noLB extra load, virt=10x, GreedyCommLB 15000 extra load, virt=100x, GreedyCommLB extra load, virt=100x, DistributedLB grind-time, ms/timestep extra load, virt=100x, NeighborLB 10000 5000 0 0 100 200 300 400 500 time step Figure: Grind-time during time stepping computing a Sedov problem with load imbalance, using various built-in load balancers in Charm++. Run on 10 compute nodes with 36CPUs/node.

  17. Current and future work 1. Multi-material FV/DG at large scales 2. P-adaptation 3. Productization (SBIR, PI:Charmworks) 4. 3D mesh-to-mesh solution transfer toward large-scale fluid-structure interaction (see next talk by Eric Mikida)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend