load balancing and data migration in a hybrid
play

Load Balancing and Data Migration in a Hybrid Computational Fluid - PowerPoint PPT Presentation

Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Esteban Meneses Patrick Pisciuneri Center for Simulation and Modeling (SaM) University of Pittsburgh University of Pittsburgh High Performance


  1. Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Esteban Meneses Patrick Pisciuneri Center for Simulation and Modeling (SaM) 
 University of Pittsburgh

  2. University of Pittsburgh High Performance Computing Computer Science Scientific Computing 2 Load Balancing in a CFD Application

  3. Center for Simulation and Modeling (SaM) Frank HPC researchers/consultants Research Technical Educational 521 users 8,040 cores Sciences Health Engineering 91% utilization in 2014 3 Load Balancing in a CFD Application

  4. IPLMCFD • A massively parallel solver for turbulent reactive flows. • LES via filtered density function (FDF). 4 Load Balancing in a CFD Application

  5. Load Imbalance • IPLMCFD uses a graph partitioning library (METIS) to redistribute work. • Requires to split execution between calls to repartition cells. 5 Load Balancing in a CFD Application

  6. Reasons for Load Imbalance in CFD Traditional IPLMCFD Langer et al , SBAC-PAD, 2012. Adaptive Mesh Refinement Chemical Reaction • Approaches: ❖ Charm++ ❖ Zoltan 
 ❖ Task-parallel 
 6 Load Balancing in a CFD Application

  7. Agenda • IPLMCFD: A Hybrid Computational Fluid Dynamics Application • Zoltan Library • PaSR Benchmark • Zoltan vs Charm++ Comparison 7 Load Balancing in a CFD Application

  8. Hybrid CFD Application • IPLMCFD: Irregularly Portioned Lagrangian Monte Carlo Finite Di ff erence. • Domain divided into cells, the atomic distribution unit. • Ensemble of cells: • Same number of FD points. • Same number of MC particles. 8 Load Balancing in a CFD Application

  9. Computational Fluid Dynamics Required" Serial"""" GFLOP"per" Memory" Run>*me"" #"Grids" #"Par*cles" #"Species" #"Itera*ons" itera*on" GBs" (1"GFLOP/s)" 10 6$ 6$x$10 6$ 9$ 1.69$ 29.5$ 60,000$ 20.5$days$ 10 6$ 6$x$10 6$ 19$ 2.48$ 90.7$ 60,000$ 63$days$ 5$x$10 6$ 50$x$10 6$ 19$ 24.0$ 544.7$ 220,000$ 3.8$years$ 9 Load Balancing in a CFD Application

  10. Code Structure Iplmcfd 10,101 LOC Ipfd Iplmc C++ MPI 3,091 LOC C++ Interface Fortran/ Metis TVMet Chemkin ODE Pack C 10 Load Balancing in a CFD Application

  11. IPLMCFD • A scalable algorithm for hybrid Eulerian/Lagrangian solvers. • Goals: • Balance the computational load among processors through weighted graph partitioning. • To minimize the number of adjacent elements assigned to di ff erent processors (minimize the edge-cut). • Irregularly shaped decompositions: • Disadvantages: • Nontrivial communication patterns P. H. Pisciuneri et al ., SIAM J. • Increased communication cost. Sci. Comput. , vol. 35, no. 4, pp. • Advantage (major): C438-C452 (2013). • Evenly distributed load among partitions. 11 Load Balancing in a CFD Application

  12. Strong Scaling • Geometry: • 2.5 million FD points • 20 million MC particles • Chemistry: 9 species, 5-step • Top: • Unbalanced: 22% e ffi ciency (9K cores) • IPLMCFD: 76% e ffi ciency (9K cores) • Bottom: • Performance of IPLMCFD improves as the number of MC particles increases • IPLMCFD: 84% e ffi ciency at 9k processors for 40M particles • Timing: • The average of 10 iterations immediately after load balancing 12 Load Balancing in a CFD Application

  13. Simulation of a Premixed Flame 13 Load Balancing in a CFD Application

  14. Temporal Performance of IPLMCFD • Unbalanced: approx. static performance • IPLMCFD: variable performance • Load balancing is performed approx. every 2000 iterations • Optimal performance immediately after load balancing • Performance degrades in time • Potential walltime savings a ff orded by T Unbalanced - T IPLMCFD = 30 hours IPLMCFD for this example: 14 Load Balancing in a CFD Application

  15. Cost of Repartitioning • Naïve ¡approach: ¡ • Immediately before load-balancing checkpoint the entire simulation • Restart the simulation with a new decomposition Costly, involves: • Writing to shared filesystem • Simulation cleanup • Simulation startup • Reading from shared filesystem • • Does not scale O(10 2 – 10 3 ) iterations in cost • • Op.mal ¡approach: ¡ • Repartitioning should be handled in memory • The new partition is aware of the previous partition, thus minimal data movement and interruption 15 Load Balancing in a CFD Application

  16. Zoltan Dynamic load balancing Parallel repartitioning • “ A toolkit of parallel combinatorial algorithms Data migration tools for unstructured and/or adaptive computations ”. Distributed data • Sandia-OSU collaboration directories since 2000. Unstructured • Part of Trilinos package. communication • Zoltan2 project in C++. Dynamic memory management 16 Load Balancing in a CFD Application

  17. Zoltan IPLMCFD • Zoltan’s callback function interface. • Methodology: ❖ Atomic unit ⟶ cell (irregular subdomains). ❖ Data registration ⟶ number of objects, object weights. ❖ Graph management ⟶ number of edges, edge weights. ❖ Migration ⟶ pack/unpack functions. ❖ Load balancing ⟶ partition, repartition, refinement. ❖ Global information ⟶ distributed data directory. 17 Load Balancing in a CFD Application

  18. Charm++ IPLMCFD • Goal: fully exploit Charm++ features. • Methodology: ❖ Atomic unit ⟶ subdomain (regular subdomains). ❖ Containing class ⟶ 3D chare array . ❖ Process-based data ⟶ chare group . ❖ Communication ⟶ outermost level. ❖ Structured control flow ⟶ Structured Dagger. ❖ Migration ⟶ PUP methods. 18 Load Balancing in a CFD Application

  19. Partially Stirred Reactor (PaSR) • Parameters: • IC: Stoichiometric mixture of methane&air reacted until equilibrium (T ≈ 2230 K) • Simulation duration: t end =10 𝜐 res • Realizability: • Lower bound, no mixing • Upper bound, perfectly stirred 100% ¡AIR ¡ 300 ¡K ¡ PRODUCTS 60% ¡CH4 ¡ 40% ¡AIR ¡ 300 ¡K 19 Load Balancing in a CFD Application

  20. Dynamic Load-Balancing Static Partition Dynamic Partitioning 20 Load Balancing in a CFD Application

  21. Strong Scaling • Parameters: ❖ 10,000 particles ❖ Chemistry: 9 species, 5-step • Timings over the entire simulation (Stampede) ❖ The Zoltan and Charm++ timings include all overhead associated with repartitioning and data migration ZOLTAN Charm++ 21 Load Balancing in a CFD Application

  22. Programming E ff ort Zoltan Charm++ IPLMCFD IPLMCFD Startup 39 0 Object Graph Management 80 0 Data Migration 427 61 Load Balancing 40 3 Measured in lines of code (LOC) 22 Load Balancing in a CFD Application

  23. Charm++ Wishlist • MPI ⟶ Charm++ migration guide: ❖ Instructions on using Charm++ with build systems. ❖ Translating common MPI programming patterns. ❖ Dealing with communication operations. ❖ Highlighting opportunities for improvement. • Parallel I/O documentation. • Accelerator programming documentation. 23 Load Balancing in a CFD Application

  24. Conclusions • Competitive performance between Zoltan and Charm++ for adaptive simulations of turbulent reactive flows. • Charm++ alleviates programming e ff ort of infrastructure for adaptive computation. Thank You! Q&A 24 Load Balancing in a CFD Application

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend