Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + - - PowerPoint PPT Presentation

dynamic load balancing in dynamic load balancing in charm
SMART_READER_LITE
LIVE PREVIEW

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + - - PowerPoint PPT Presentation

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Outline Dynamic Load Balancing framework in Charm+ + Measurement Based Load Balancing Examples:


slide-1
SLIDE 1

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ +

Abhinav S Bhatele Parallel Programming Lab, UIUC

slide-2
SLIDE 2

Outline Outline

  • Dynamic Load Balancing framework in

Charm+ +

  • Measurement Based Load Balancing
  • Examples:

– Hybrid Load Balancers – Topology- aware Load Balancers

  • User Control and Flexibility
  • Future Work
slide-3
SLIDE 3

Dynamic Load- Bal Dynamic Load- Balancing ancing

  • Task of load balancing (LB)

– Given a collection of migratable objects and a set

  • f processors

– Find a mapping of objects to processors

  • Almost same amount of computation on each processor

– Additional constraints

  • Ensure communication between processors is minimum
  • Take topology of the machine into consideration
  • Dynamic mapping of chares to processors

– Load on processors keeps changing during the actual execution

slide-4
SLIDE 4

Load- Balanc Load- Balancing Approaches ing Approaches

  • A rich set of strategies in Charm+ +
  • Two main ideas

– No correlation between successive iterations

  • Fully dynamic
  • Seed load balancers

– Load varies slightly over iterations

  • CSE, Molecular Dynamics simulations
  • Measurement- based load balancers
slide-5
SLIDE 5

Principle of Persiste Principle of Persistence nce

  • Object communication patterns and

computational loads tend to persist over time

– In spite of dynamic behavior

  • Abrupt and large, but infrequent changes (e.g. AMR)
  • Slow and small changes (e.g. particle migration)
  • Parallel analog of principle of locality

– Heuristics, that hold for most CSE applications

slide-6
SLIDE 6

Measurement Based Load Balancing Measurement Based Load Balancing

  • Based on principle of persistence
  • Runtime instrumentation (LB Database)

– communication volume and computation time

  • Measurement based load balancers

– Use the database periodically to make new decisions – Many alternative strategies can use the database

  • Centralized vs. distributed
  • Greedy improvements vs. complete reassignment
  • Topology- aware
slide-7
SLIDE 7

Load Balancer Str Load Balancer Strategies ategies

  • Centralized

– Object load data are sent to processor 0 – Integrate to a complete object graph – Migration decision is broadcasted from processor 0 – Global barrier

  • Distributed

– Load balancing among neighboring processors – Build partial object graph – Migration decision is sent to its neighbors – No global barrier

slide-8
SLIDE 8

Load Balancing on Large Machines Load Balancing on Large Machines

  • Existing load balancing strategies don’t

scale on extremely large machines

  • Limitations of centralized strategies:

– Central node: memory/ communication bottleneck – Decision- making algorithms tend to be very slow

  • Limitations of distributed strategies:

– Difficult to achieve well- informed load balancing decisions

slide-9
SLIDE 9

Simulation Study - Simulation Study - Memory Overhead Memory Overhead

5 1 1 5 2 2 5 3 3 5 4 4 5 5

Memory usage (MB)

1 2 8 K 2 5 6 K 5 1 2 K 1 M

Number of objects

32K processors 64K processors

lb_test benchmark is a parameterized program that creates a specified number of communicating objects in 2D- mesh.

Simulation performed with the performance simulator BigSim

slide-10
SLIDE 10

Load Balancing Load Balancing Execution Time Execution Time

5 1 1 5 2 2 5 3 3 5 4

E x e c u t i

  • n

T i me ( i n s e c

  • n

d s )

1 2 8 K 2 5 6 K 5 1 2 K 1 M

N u mb e r

  • f

O b j e c t s G r e e d y L B G r e e d y C

  • mmL

B R e f i n e L B

Execution time of load balancing algorithms on a 64K processor simulation

slide-11
SLIDE 11

Hierarchical Load Hierarchical Load Balancers Balancers

  • Hierarchical distributed load

balancers

– Divide into processor groups – Apply different strategies at each level – Scalable to a large number of processors

slide-12
SLIDE 12

Hierarchical Tree (an exa Hierarchical Tree (an example) mple)

1023 65535 64512

1024

2047 64511 63488

… …...

1024 63488 64512 1

64K processor hierarchical tree

Apply different strategies at each level

Level 0 Level 1 Level 2

1024

64

slide-13
SLIDE 13

An Example: Hybrid An Example: Hybrid LB LB

  • Dividing processors into independent sets of

groups, and groups are organized in hierarchies (decentralized)

  • Each group has a leader (the central node)

which performs centralized load balancing

  • A particular hybrid strategy that works well

Gengbin Zheng, PhD Thesis, Gengbin Zheng, PhD Thesis, 2005 2005

slide-14
SLIDE 14

Our HybridLB Our HybridLB Scheme Scheme

1023 65535 64512

1024

2047 64511 63488

… …...

1024 63488 64512 1 Load Data (OCG)

Refinement- based Load balancing

Greedy- based Load balancing

Load Data token

  • bject
slide-15
SLIDE 15

Memory Overhead Memory Overhead

5 1 1 5 2 2 5 3 3 5 4 4 5 5 Memory usage (MB) 2 5 6 K 5 1 2 K 1 M Number of Objects

CentralLB HybridLB

Simulation of lb_test (for 64k processors)

slide-16
SLIDE 16

Total Load Ba Total Load Balancing Time lancing Time

5 1 1 5 2 2 5 3 3 5 4 4 5 Time(s) 2 5 6 K 5 1 2 K 1 M Number of Objects

Simulation of lb_test for 64K processors

GreedyCommLB HybridLB(GreedyCommLB)

22.63MB 22.57MB 6.8MB Memory 16384 8192 4096 N procs lb_test benchmark’s actual run on BG/ L at IBM (512K objects)

slide-17
SLIDE 17

Load Balancing Load Balancing Quality Quality

. 2 . 4 . 6 . 8 . 1 . 1 2

Maximum predicted load (seconds)

2 5 6 K 5 1 2 K 1 M

Number of Objects

Simulation of lb_test for 64K processors

GreedyCommLB HybridLB

slide-18
SLIDE 18

Topology- aware mapping of tasks Topology- aware mapping of tasks

  • Problem

– Map tasks to processors connected in a topology, such that:

  • Compute load on processors is balanced
  • Communicating chares (objects) are placed on

nearby processors.

slide-19
SLIDE 19

Mapping Mo Mapping Model del

  • Task Graph :

– Gt = (Vt , E

t)

– Weighted graph, undirected edges – Nodes  chares, w(va)  computation – Edges  communication, cab  bytes between va and vb

  • Topology- graph :

– Gp = (Vp , E

p)

– Nodes  processors – Edges  Direct Network Links – Ex: 3D- Torus, 2D- Mesh, Hypercube

slide-20
SLIDE 20

Model (Contd.) Model (Contd.)

  • Task Mapping

– Assigns tasks to processors – P : Vt  Vp

  • Hop- Bytes

– Hop- Bytes  Communication cost – The cost imposed on the network is more if more links are used – Weigh inter- processor communication by distance on the network

slide-21
SLIDE 21

Load Balancing Framework in Charm+ + Load Balancing Framework in Charm+ +

  • Issues of mapping and decomposition

separated

  • User had full control over mapping
  • Many choices

– Initial static mapping – Mapping at run- time as newer objects created – Write a new load balancing strategy: inherit from BaseLB

slide-22
SLIDE 22

Future Work Future Work

  • Hybrid Model- based Load Balancers

– User gives a model to the LB – Combine it with measurement based load balancer

  • Multicast aware Load Balancers

– Try and place targets of multicast on the same processor

slide-23
SLIDE 23

Conclusions Conclusions

  • Measurement based LBs are good for most

cases

  • Need scalable LBs in the future due to large

machines like BG/ L

– Hybrid Load Balancers – Communication sensitive LBs – Topology aware LBs