dynamic load balancing in dynamic load balancing in charm
play

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + - PowerPoint PPT Presentation

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Outline Dynamic Load Balancing framework in Charm+ + Measurement Based Load Balancing Examples:


  1. Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele Parallel Programming Lab, UIUC

  2. Outline Outline • Dynamic Load Balancing framework in Charm+ + • Measurement Based Load Balancing • Examples: – Hybrid Load Balancers – Topology- aware Load Balancers • User Control and Flexibility • Future Work

  3. Dynamic Load- Bal Dynamic Load- Balancing ancing • Task of load balancing (LB) – Given a collection of migratable objects and a set of processors – Find a mapping of objects to processors • Almost same amount of computation on each processor – Additional constraints • Ensure communication between processors is minimum • Take topology of the machine into consideration • Dynamic mapping of chares to processors – Load on processors keeps changing during the actual execution

  4. Load- Balanc Load- Balancing Approaches ing Approaches • A rich set of strategies in Charm+ + • Two main ideas – No correlation between successive iterations • Fully dynamic • Seed load balancers – Load varies slightly over iterations • CSE, Molecular Dynamics simulations • Measurement- based load balancers

  5. Principle of Persiste Principle of Persistence nce • Object communication patterns and computational loads tend to persist over time – In spite of dynamic behavior • Abrupt and large, but infrequent changes (e.g. AMR) • Slow and small changes (e.g. particle migration) • Parallel analog of principle of locality – Heuristics, that hold for most CSE applications

  6. Measurement Based Load Balancing Measurement Based Load Balancing • Based on principle of persistence • Runtime instrumentation (LB Database) – communication volume and computation time • Measurement based load balancers – Use the database periodically to make new decisions – Many alternative strategies can use the database • Centralized vs. distributed • Greedy improvements vs. complete reassignment • Topology- aware

  7. Load Balancer Strategies Load Balancer Str ategies • Centralized • Distributed – Object load data are – Load balancing sent to processor 0 among neighboring processors – Integrate to a complete object graph – Build partial object graph – Migration decision is broadcasted from – Migration decision is processor 0 sent to its neighbors – Global barrier – No global barrier

  8. Load Balancing on Large Machines Load Balancing on Large Machines • Existing load balancing strategies don’t scale on extremely large machines • Limitations of centralized strategies: – Central node: memory/ communication bottleneck – Decision- making algorithms tend to be very slow • Limitations of distributed strategies: – Difficult to achieve well- informed load balancing decisions

  9. Simulation Study - Simulation Study - Memory Overhead Memory Overhead Simulation performed with the performance simulator BigSim 5 0 0 4 5 0 4 0 0 3 5 0 3 0 0 Memory usage 2 5 0 (MB) 32K processors 2 0 0 1 5 0 64K processors 1 0 0 5 0 0 1 2 8 K 2 5 6 K 5 1 2 K 1 M Number of objects lb_test benchmark is a parameterized program that creates a specified number of communicating objects in 2D- mesh .

  10. Load Balancing Load Balancing Execution Time Execution Time 4 0 0 3 5 0 3 0 0 2 5 0 E x e c u t i o n 2 0 0 T i me ( i n G r e e d y L B s e c o n d s ) G r e e d y C o mmL B 1 5 0 R e f i n e L B 1 0 0 5 0 0 1 2 8 K 2 5 6 K 5 1 2 K 1 M N u mb e r o f O b j e c t s Execution time of load balancing algorithms on a 64K processor simulation

  11. Hierarchical Load Hierarchical Load Balancers Balancers • Hierarchical distributed load balancers – Divide into processor groups – Apply different strategies at each level – Scalable to a large number of processors

  12. Hierarchical Tree (an exa Hierarchical Tree (an example) mple) 64K processor hierarchical tree 1 Level 2 0 1024 63488 64512 Level 1 64 … … … … …... Level 0 0 1023 1024 2047 63488 64511 64512 65535 Apply different strategies at each level 1024

  13. An Example: Hybrid An Example: Hybrid LB LB • Dividing processors into independent sets of groups, and groups are organized in hierarchies (decentralized) • Each group has a leader (the central node) which performs centralized load balancing • A particular hybrid strategy that works well Gengbin Zheng, PhD Thesis, Gengbin Zheng, PhD Thesis, 2005 2005

  14. Our HybridLB Scheme Our HybridLB Scheme Refinement- based Load balancing 1 Load Data 0 1024 63488 64512 Load Data (OCG) … … … … …... 0 1023 1024 2047 63488 64511 64512 65535 token Greedy- based Load balancing object

  15. Memory Overhead Memory Overhead 5 0 0 4 5 0 4 0 0 3 5 0 3 0 0 Memory usage 2 5 0 (MB) CentralLB 2 0 0 HybridLB 1 5 0 1 0 0 5 0 0 2 5 6 K 5 1 2 K 1 M Number of Objects Simulation of lb_test (for 64k processors)

  16. Total Load Ba Total Load Balancing Time lancing Time Simulation of lb_test for 64K processors 4 5 0 4 0 0 3 5 0 3 0 0 2 5 0 Time(s) 2 0 0 GreedyCommLB 1 5 0 HybridLB(GreedyCommLB) 1 0 0 5 0 0 2 5 6 K 5 1 2 K 1 M Number of Objects N procs 4096 8192 16384 Memory 6.8MB 22.57MB 22.63MB lb_test benchmark’s actual run on BG/ L at IBM (512K objects)

  17. Load Balancing Quality Load Balancing Quality Simulation of lb_test for 64K processors 0 . 1 2 0 . 1 0 . 0 8 Maximum predicted 0 . 0 6 load (seconds) GreedyCommLB 0 . 0 4 HybridLB 0 . 0 2 0 2 5 6 K 5 1 2 K 1 M Number of Objects

  18. Topology- aware mapping of tasks Topology- aware mapping of tasks • Problem – Map tasks to processors connected in a topology, such that: • Compute load on processors is balanced • Communicating chares (objects) are placed on nearby processors.

  19. Mapping Mo Mapping Model del • Task Graph : – G t = (V t , E t ) – Weighted graph, undirected edges – Nodes  chares, w ( v a )  computation – Edges  communication, c ab  bytes between v a and v b • Topology- graph : – G p = (V p , E p ) – Nodes  processors – Edges  Direct Network Links – Ex: 3D- Torus, 2D- Mesh, Hypercube

  20. Model (Contd.) Model (Contd.) • Task Mapping – Assigns tasks to processors – P : V t  V p • Hop- Bytes – Hop- Bytes  Communication cost – The cost imposed on the network is more if more links are used – Weigh inter- processor communication by distance on the network

  21. Load Balancing Framework in Charm+ + Load Balancing Framework in Charm+ + • Issues of mapping and decomposition separated • User had full control over mapping • Many choices – Initial static mapping – Mapping at run- time as newer objects created – Write a new load balancing strategy: inherit from BaseLB

  22. Future Work Future Work • Hybrid Model- based Load Balancers – User gives a model to the LB – Combine it with measurement based load balancer • Multicast aware Load Balancers – Try and place targets of multicast on the same processor

  23. Conclusions Conclusions • Measurement based LBs are good for most cases • Need scalable LBs in the future due to large machines like BG/ L – Hybrid Load Balancers – Communication sensitive LBs – Topology aware LBs

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend