Vector Load Balancing in Charm++ Ronak Buch Parallel Programming - - PowerPoint PPT Presentation

vector load balancing in charm
SMART_READER_LITE
LIVE PREVIEW

Vector Load Balancing in Charm++ Ronak Buch Parallel Programming - - PowerPoint PPT Presentation

Vector Load Balancing in Charm++ Ronak Buch Parallel Programming Laboratory, University of Illinois at Urbana-Champaign October 21, 2020 18th Annual Workshop on Charm++ and Its Applications Ronak Buch rabuch2@illinois.edu Vector Load


slide-1
SLIDE 1

Vector Load Balancing in Charm++

Ronak Buch

Parallel Programming Laboratory, University of Illinois at Urbana-Champaign October 21, 2020 18th Annual Workshop on Charm++ and Its Applications

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 1/23

1/23

slide-2
SLIDE 2

Load Balancing

  • Load balancing is a hallmark of Charm++
  • Performance often limited by maximum load on a PE
  • RTS measures load and migrates objects in response
  • Dynamic, irregular applications have been able to

achieve high performance and scalability because of it

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 2/23

2/23

slide-3
SLIDE 3

What is Load?

  • Load is really just a proxy value we use to reason

about performance

  • In truth, we want to minimize execution time
  • Unbalanced, fast program > balanced, slow program
  • CPU time per object by itself is often a sufficient

metric for this value

  • However, in the same way measuring cache misses
  • r pipeline stalls improves upon merely profiling,

sometimes more detail is helpful

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 3/23

3/23

slide-4
SLIDE 4

Vector Load Balancing

  • Rather than being a single value, load is now a vector
  • f multiple values
  • Store vector loads in LBDatabase
  • Pass vector loads to strategies
  • Use vector loads in strategies
  • Can be used generically: for various hardware

measurements (CPU/GPU/network/memory), discrete parts of an iteration, application specific parameters, etc.

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 4/23

4/23

slide-5
SLIDE 5

Vector Strategies

  • Extra dimensionality makes vector load balancing

computationally difficult

  • Objects can no longer be totally ordered
  • Want to minimize the maximum in each dimension
  • NP-complete problem, so only interested in

approximations

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 5/23

5/23

slide-6
SLIDE 6

Vector Strategies

  • A simple strategy finds object with global maximum

load dimension and places it on PE with minimum load in that dimension

  • Only works well when object has load in only one dimension
  • For more realistic cases, have to consider vector

holistically

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 6/23

6/23

slide-7
SLIDE 7

Vector Strategies

  • Find object with maximum p-norm and place on PE

with minimum p-norm after placement

  • Works well, but computationally expensive
  • PE “weight” varies with object, i.e. ∥(2, 0)∥2 < ∥(0, 3)∥2, but

when adding (3, 0), ∥(5, 0)∥2 > ∥(3, 3)∥2

  • Calculate average load vector in d-space and create a

normal hyperplane, then repeatedly allow furthest PE below the hyperplane to choose an object

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 7/23

7/23

slide-8
SLIDE 8

New Load Balancing APIs - Phase

  • Many applications have orthogonal phases within an

iteration separated by barriers (or weaker sychronization)

  • New functions have been added to track phases for

load balancing:

  • void CkMigratable::CkLBSetPhase(int phase) - Until

called again, all automatic LB measurements for calling chare attributed to specified phase

  • int CkMigratable::CkLBGetPhase() - Returns current

phase

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 8/23

8/23

slide-9
SLIDE 9

New Load Balancing APIs - Manual

  • Added new API for recording vector load data
  • void CkMigratable::CkLBSetObjTime(LBRealType

load, int dimension) - Sets specified dimension of vector load for calling chare

  • std::vector<LBRealType>

CkMigratable::CkLBGetObjVectorLoad() - Returns current vector load for calling chare

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 9/23

9/23

slide-10
SLIDE 10

Using Vector Strategies

  • Currently only strategies built on top of TreeLB

support vector load balancing

  • TreeLB is new flexible, optimized replacement of CentralLB

and HybridLB

  • Eventually all non-distributed strategies should use TreeLB
  • If vector loads are detected in the LB database, a

vector version of the chosen strategy is automatically used if available

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 10/23

10/23

slide-11
SLIDE 11

Writing Vector Strategies

  • Objects and PEs are templated on dimension,

replicated in a static constexpr field for external access

  • A specific dimension of Object or PE load is

accessible with LBRealType getLoad(int dimension)

  • Template specialization allows LB author to handle

vector and non-vector cases

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 11/23

11/23

slide-12
SLIDE 12

Writing Vector Strategies

template <typename O, typename P, typename S> class Example : public Strategy<O, P, S> { public: void solve(std::vector<O>& objs, std::vector<P>& procs, S& solution, bool objsSorted) { // vector implementation } }; template <typename P, typename S> class Example<Obj<1>, P, S> : public Strategy<Obj<1>, P, S> { public: void solve(std::vector<Obj<1>>& objs, std::vector<P>& procs, S& solution, bool objsSorted) { // scalar implementation } }; Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 12/23

12/23

slide-13
SLIDE 13

Vector LB Performance - AMPI

AMPI - No Load Balancing

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 13/23

13/23

slide-14
SLIDE 14

Vector LB Performance - AMPI

AMPI - Regular Load Balancing

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 14/23

14/23

slide-15
SLIDE 15

Vector LB Performance - AMPI

AMPI - Vector Load Balancing

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 15/23

15/23

slide-16
SLIDE 16

Vector LB Performance - AMPI

LB Off Phase Unaware (1.44x speedup) Phase Aware (1.67x speedup)

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 16/23

16/23

slide-17
SLIDE 17

Vector LB Performance

Timeline of phase-based application:

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 17/23

17/23

slide-18
SLIDE 18

Vector LB Performance

No LB

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 18/23

18/23

slide-19
SLIDE 19

Vector LB Performance

(non-vector) GreedyLB

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 19/23

19/23

slide-20
SLIDE 20

Vector LB Performance

Vector Greedy

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 20/23

20/23

slide-21
SLIDE 21

Applications

  • ChaNGa
  • Working, but no performance results at scale yet
  • Time spent in each rung of multi-stepping corresponds to

dimension in vector

  • NAMD
  • In process of making vector of CPU and GPU load
  • Please contact me if you think your application

would benefit!

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 21/23

21/23

slide-22
SLIDE 22

Future Vector LB Work

  • Performance is still an issue, so optimizations

needed

  • Discretization, clustering, space-partitioning, etc. should go a

long way

  • Exploit distribution of load per-dimension
  • Integrate HAPI into load measurement to

automatically record accelerator load

  • Add support for constraint based objective functions

for cache/memory balancing

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 22/23

22/23

slide-23
SLIDE 23

Conclusions

  • Applications often have scope for improved load

balance

  • As programming techniques and hardware become

more complex, this scope will likely increase

  • Providing more detailed load data via Vector LB has

been shown to improve decision quality over traditional LB in testing

Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 23/23

23/23