vector load balancing in charm
play

Vector Load Balancing in Charm++ Ronak Buch Parallel Programming - PowerPoint PPT Presentation

Vector Load Balancing in Charm++ Ronak Buch Parallel Programming Laboratory, University of Illinois at Urbana-Champaign October 21, 2020 18th Annual Workshop on Charm++ and Its Applications Ronak Buch rabuch2@illinois.edu Vector Load


  1. Vector Load Balancing in Charm++ Ronak Buch Parallel Programming Laboratory, University of Illinois at Urbana-Champaign October 21, 2020 18th Annual Workshop on Charm++ and Its Applications Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 1/23 1 / 23

  2. Load Balancing achieve high performance and scalability because of it Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 2/23 • Load balancing is a hallmark of Charm++ • Performance often limited by maximum load on a PE • RTS measures load and migrates objects in response • Dynamic, irregular applications have been able to 2 / 23

  3. What is Load? about performance metric for this value or pipeline stalls improves upon merely profiling, sometimes more detail is helpful Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 3/23 • Load is really just a proxy value we use to reason ◦ In truth, we want to minimize execution time ◦ Unbalanced, fast program > balanced, slow program • CPU time per object by itself is often a sufficient • However, in the same way measuring cache misses 3 / 23

  4. Vector Load Balancing of multiple values measurements (CPU/GPU/network/memory), discrete parts of an iteration, application specific parameters, etc. Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 4/23 • Rather than being a single value, load is now a vector ◦ Store vector loads in LBDatabase ◦ Pass vector loads to strategies ◦ Use vector loads in strategies • Can be used generically: for various hardware 4 / 23

  5. Vector Strategies computationally difficult approximations Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 5/23 • Extra dimensionality makes vector load balancing • Objects can no longer be totally ordered • Want to minimize the maximum in each dimension • NP-complete problem, so only interested in 5 / 23

  6. Vector Strategies load dimension and places it on PE with minimum load in that dimension holistically Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 6/23 • A simple strategy finds object with global maximum ◦ Only works well when object has load in only one dimension • For more realistic cases, have to consider vector 6 / 23

  7. Vector Strategies normal hyperplane, then repeatedly allow furthest PE below the hyperplane to choose an object Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 7/23 • Find object with maximum p -norm and place on PE with minimum p -norm after placement ◦ Works well, but computationally expensive ◦ PE “weight” varies with object, i.e. ∥ (2 , 0) ∥ 2 < ∥ (0 , 3) ∥ 2 , but when adding (3 , 0) , ∥ (5 , 0) ∥ 2 > ∥ (3 , 3) ∥ 2 • Calculate average load vector in d -space and create a 7 / 23

  8. New Load Balancing APIs - Phase iteration separated by barriers (or weaker sychronization) load balancing: called again, all automatic LB measurements for calling chare attributed to specified phase phase Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 8/23 • Many applications have orthogonal phases within an • New functions have been added to track phases for ◦ void CkMigratable::CkLBSetPhase(int phase) - Until ◦ int CkMigratable::CkLBGetPhase() - Returns current 8 / 23

  9. New Load Balancing APIs - Manual load, int dimension) - Sets specified dimension of vector load for calling chare CkMigratable::CkLBGetObjVectorLoad() - Returns current vector load for calling chare Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 9/23 • Added new API for recording vector load data ◦ void CkMigratable::CkLBSetObjTime(LBRealType ◦ std::vector<LBRealType> 9 / 23

  10. Using Vector Strategies support vector load balancing and HybridLB vector version of the chosen strategy is automatically used if available Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 10/23 • Currently only strategies built on top of TreeLB ◦ TreeLB is new flexible, optimized replacement of CentralLB ◦ Eventually all non-distributed strategies should use TreeLB • If vector loads are detected in the LB database, a 10 / 23

  11. Writing Vector Strategies replicated in a static constexpr field for external access accessible with LBRealType getLoad(int dimension) vector and non-vector cases Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 11/23 • Objects and PEs are templated on dimension, • A specific dimension of Object or PE load is • Template specialization allows LB author to handle 11 / 23

  12. Writing Vector Strategies void solve(std::vector<Obj<1>>& objs, std::vector<P>& procs, 12/23 Vector Load Balancing in Charm++ rabuch2@illinois.edu Ronak Buch }; } // scalar implementation S& solution, bool objsSorted) { public : template < typename O, typename P, typename S> class Example <Obj<1>, P, S> : public Strategy<Obj<1>, P, S> { template < typename P, typename S> }; } // vector implementation S& solution, bool objsSorted) { void solve(std::vector<O>& objs, std::vector<P>& procs, public : class Example : public Strategy<O, P, S> { 12 / 23

  13. Vector LB Performance - AMPI AMPI - No Load Balancing Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 13/23 13 / 23

  14. Vector LB Performance - AMPI AMPI - Regular Load Balancing Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 14/23 14 / 23

  15. Vector LB Performance - AMPI AMPI - Vector Load Balancing Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 15/23 15 / 23

  16. Vector LB Performance - AMPI LB Off Phase Unaware (1.44x speedup) Phase Aware (1.67x speedup) Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 16/23 16 / 23

  17. Vector LB Performance Timeline of phase-based application: Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 17/23 17 / 23

  18. Vector LB Performance No LB Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 18/23 18 / 23

  19. Vector LB Performance (non-vector) GreedyLB Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 19/23 19 / 23

  20. Vector LB Performance Vector Greedy Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 20/23 20 / 23

  21. Applications dimension in vector would benefit! Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 21/23 • ChaNGa ◦ Working, but no performance results at scale yet ◦ Time spent in each rung of multi-stepping corresponds to • NAMD ◦ In process of making vector of CPU and GPU load • Please contact me if you think your application 21 / 23

  22. Future Vector LB Work needed long way automatically record accelerator load for cache/memory balancing Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 22/23 • Performance is still an issue, so optimizations ◦ Discretization, clustering, space-partitioning, etc. should go a • Exploit distribution of load per-dimension • Integrate HAPI into load measurement to • Add support for constraint based objective functions 22 / 23

  23. Conclusions balance more complex, this scope will likely increase been shown to improve decision quality over traditional LB in testing Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 23/23 • Applications often have scope for improved load • As programming techniques and hardware become • Providing more detailed load data via Vector LB has 23 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend