Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal - - PowerPoint PPT Presentation

epidemic algorithm for load balancing
SMART_READER_LITE
LIVE PREVIEW

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal - - PowerPoint PPT Presentation

Load Balancing Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25 Load Balancing Outline 1 Introduction Motivation Background Load Balancing Strategies 2 Distributed Load Balancing Information


slide-1
SLIDE 1

Load Balancing

Epidemic Algorithm for Load Balancing

Harshitha Menon, Laxmikant Kal´ e 15th April

1 / 25

slide-2
SLIDE 2

Load Balancing

Outline

1 Introduction

Motivation Background Load Balancing Strategies

2 Distributed Load Balancing

Information Propagation Load Transfer

3 Evaluation 4 Conclusion

2 / 25

slide-3
SLIDE 3

Load Balancing Introduction

Outline

1 Introduction

Motivation Background Load Balancing Strategies

2 Distributed Load Balancing

Information Propagation Load Transfer

3 Evaluation 4 Conclusion

3 / 25

slide-4
SLIDE 4

Load Balancing Introduction Motivation

Motivation

Load imbalance in parallel applications

Performance is limited by most overloaded processor Leads to drop in system utilization Hampers scalability of the application The chance that one processor is severely overloaded gets higher as no of processors increases For some applications computation load varies over time

4 / 25

slide-5
SLIDE 5

Load Balancing Introduction Background

Dynamic Load Balancing Framework in Charm++

Application is composed of large number of migratable units

5 / 25

slide-6
SLIDE 6

Load Balancing Introduction Background

Dynamic Load Balancing Framework in Charm++

Application is composed of large number of migratable units Load balancing strategy is invoked periodically

5 / 25

slide-7
SLIDE 7

Load Balancing Introduction Background

Dynamic Load Balancing Framework in Charm++

Application is composed of large number of migratable units Load balancing strategy is invoked periodically Based on principle of persistence

5 / 25

slide-8
SLIDE 8

Load Balancing Introduction Background

Dynamic Load Balancing Framework in Charm++

Application is composed of large number of migratable units Load balancing strategy is invoked periodically Based on principle of persistence Instruments the application tasks at fine-grained level

5 / 25

slide-9
SLIDE 9

Load Balancing Introduction Background

Dynamic Load Balancing Framework in Charm++

Application is composed of large number of migratable units Load balancing strategy is invoked periodically Based on principle of persistence Instruments the application tasks at fine-grained level When the load balancing is invoked

Gathers the statistics based on the strategy (centralized or hierarchical) Executes load balancing strategy Migrates objects based on new mapping

5 / 25

slide-10
SLIDE 10

Load Balancing Introduction Load Balancing Strategies

Load Balancing Strategies

Centralized Strategies

Has global view of the system (good quality load balancing) Clear bottleneck beyond few thousand processors

Distributed Strategies

Processors make autonomous decisions based on local view (neighborhood) Scalable Yield poor load balance due to limited information

Hierarchical Load balancer

Subgroup of processors collect information at the root and receive aggregated information at higher levels Scalable and good quality May suffer from excessive data collection at lowest levels

6 / 25

slide-11
SLIDE 11

Load Balancing Distributed Load Balancing

Outline

1 Introduction

Motivation Background Load Balancing Strategies

2 Distributed Load Balancing

Information Propagation Load Transfer

3 Evaluation 4 Conclusion

7 / 25

slide-12
SLIDE 12

Load Balancing Distributed Load Balancing

Grapevine - Proposed Distributed Load Balancer

Key Features Fully distributed scheme

8 / 25

slide-13
SLIDE 13

Load Balancing Distributed Load Balancing

Grapevine - Proposed Distributed Load Balancer

Key Features Fully distributed scheme Use partial information of the global state of the system

8 / 25

slide-14
SLIDE 14

Load Balancing Distributed Load Balancing

Grapevine - Proposed Distributed Load Balancer

Key Features Fully distributed scheme Use partial information of the global state of the system Propabilistic transfer of load

8 / 25

slide-15
SLIDE 15

Load Balancing Distributed Load Balancing

Grapevine - Proposed Distributed Load Balancer

Key Features Fully distributed scheme Use partial information of the global state of the system Propabilistic transfer of load Scalable and good quality

8 / 25

slide-16
SLIDE 16

Load Balancing Distributed Load Balancing

Grapevine - Proposed Distributed Load Balancer

Two Phases

9 / 25

slide-17
SLIDE 17

Load Balancing Distributed Load Balancing

Grapevine - Proposed Distributed Load Balancer

Two Phases Information propagation

9 / 25

slide-18
SLIDE 18

Load Balancing Distributed Load Balancing

Grapevine - Proposed Distributed Load Balancer

Two Phases Information propagation Load transfer

9 / 25

slide-19
SLIDE 19

Load Balancing Distributed Load Balancing Information Propagation

Information Propagation

Based on gossip protocol

10 / 25

slide-20
SLIDE 20

Load Balancing Distributed Load Balancing Information Propagation

Information Propagation

Based on gossip protocol Each underloaded processor starts the gossip Randomly sample peers and send its load information

10 / 25

slide-21
SLIDE 21

Load Balancing Distributed Load Balancing Information Propagation

Information Propagation

Based on gossip protocol Each underloaded processor starts the gossip Randomly sample peers and send its load information On receiving load information,

Combine the information with already known Forward it to random peers

10 / 25

slide-22
SLIDE 22

Load Balancing Distributed Load Balancing Information Propagation

Information Propagation

Based on gossip protocol Each underloaded processor starts the gossip Randomly sample peers and send its load information On receiving load information,

Combine the information with already known Forward it to random peers

No explicit synchronization

10 / 25

slide-23
SLIDE 23

Load Balancing Distributed Load Balancing Information Propagation

Information Propagation

Number of rounds taken to propagate a single update r = O(logf n)

4 8 12 16 20 4096 8192 12288 16384 Rounds System Size (n) f=2 f=3 f=4 Expected number of rounds taken to spread information

11 / 25

slide-24
SLIDE 24

Load Balancing Distributed Load Balancing Information Propagation

Information Propagation

8 10 12 14 16 18 4096 8192 12288 16384 Rounds System Size (n) Naive Informed

Expected number of rounds taken to spread information

Two Flavors Naive

Random selection

Informed

Biased selection Incorporate current knowledge

12 / 25

slide-25
SLIDE 25

Load Balancing Distributed Load Balancing Load Transfer

Load Transfer

Probabilistic transfer of load

Naive transfer: Select processors uniformly at random Informed transfer: Select processors based on their load pi = 1 Z ×

  • 1 − Li

Lavg

  • pi probability assigned to ith processor

Li load of ith processor Lavg average load of the system Z normalization constant

13 / 25

slide-26
SLIDE 26

Load Balancing Distributed Load Balancing Load Transfer

Load Transfer

Naive Transfer

10 20 30 40 50 1000 2000 3000 4000 Load Underloaded Processors 0.00012 0.00024 0.00036 0.00048 1000 2000 3000 4000 Probability Underloaded Processors 5 10 15 20 1000 2000 3000 4000 Requests Underloaded Processors 10 20 30 40 50 1000 2000 3000 4000 Load Underloaded Processors

Informed Transfer

10 20 30 40 50 1000 2000 3000 4000 Load Underloaded Processors 0.0004 0.0008 0.0012 1000 2000 3000 4000 Probability Underloaded Processors 5 10 15 20 1000 2000 3000 4000 Requests Underloaded Processors 10 20 30 40 50 1000 2000 3000 4000 Load Underloaded Processors

(a) Initial load (b) Probabilities assigned (c) Work units transferred (d) Final load.

14 / 25

slide-27
SLIDE 27

Load Balancing Distributed Load Balancing Load Transfer

Quality of Load Balancing

30 40 50 60 70 1 4 16 64 256 1024 4096 0.25 0.5 0.75 1 Max Load Imbalance Underloaded Processor Info Max Load Imbalance

Evaluation of partial information

Quality is evaluation based on Imbalance given by I = Lmax Lavg − 1

15 / 25

slide-28
SLIDE 28

Load Balancing Evaluation

Outline

1 Introduction

Motivation Background Load Balancing Strategies

2 Distributed Load Balancing

Information Propagation Load Transfer

3 Evaluation 4 Conclusion

16 / 25

slide-29
SLIDE 29

Load Balancing Evaluation

Evaluation

Applications

LeanMD AMR

Applications were run on IBM BG/Q Vesta Comparison with

GreedyLB RefineLB AmrLB DiffusionLB HybridLB

Metrics to evaluate

Execution time per step excluding LB time Load balancing overhead Total application time

17 / 25

slide-30
SLIDE 30

Load Balancing Evaluation

Evaluation with LeanMD

Time per step Quality of our strategy is equivalent to centralized

10 100 1000 2048 4096 8192 16384 32768 Time per Step (ms) Number of Processes

No LB Diff LB Greedy LB Refine LB Hybrid LB Gv LB

18 / 25

slide-31
SLIDE 31

Load Balancing Evaluation

Evaluation with LeanMD

Load Balancing overhead Centralized have high

  • verhead

Distributed schemes have low overhead

Strategies Number of Processes 2048 4096 8192 16384 32768 HybridLB

  • 1.35

0.7 0.368 0.2375 GreedyLB 8.62 8.9 10.33 11.2 23.4 RefineLB 55 50 27 34 121 DiffLB 0.039 0.043 0.040 0.043 0.040 GvLB 0.013 0.016 0.023 0.030 0.045

Load balancing cost (in seconds) of various strategies for LeanMD

19 / 25

slide-32
SLIDE 32

Load Balancing Evaluation

Evaluation LeanMD

Total application time Using centralized strategies overhead exceeds benefit Grapevine gives the best performance

Strategies Number of Processes 2048 4096 8192 16384 32768 NoLB 201 102 51 25 13 HybridLB

  • 72

37 20 12 GreedyLB 201 148 133 127 243 RefineLB 675 567 306 362 1227 DiffLB 140 72 37 22 13 GvLB 119 64 32 17 10

Total application time (in seconds) for LeanMD on BG/Q

20 / 25

slide-33
SLIDE 33

Load Balancing Evaluation

Evaluation with AMR

Time per step Quality of our strategy is equivalent to centralized

10 100 1000 1024 2048 4096 8192 16384 Time per Step (ms) Number of Processes

No LB Diff LB Amr LB Refine LB Hybrid LB Gv LB

21 / 25

slide-34
SLIDE 34

Load Balancing Evaluation

Evaluation with AMR

Load Balancing overhead Centralized have high

  • verhead

Distributed schemes have low overhead

Strategies Number of Processes 1024 2048 4096 8192 16384 HybridLB

  • 8.29

7.2 2.6 AmrLB 1.09 1.37 2.00 3.30 4.40 RefineLB 12 21 23 33 76 DiffLB 0.015 0.014 0.014 0.014 0.015 GvLB 0.011 0.011 0.015 0.021 0.030

Load balancing cost (in seconds) of various strategies for AMR.

22 / 25

slide-35
SLIDE 35

Load Balancing Evaluation

Evaluation with AMR

Total application time Load balancing overhead exceeds benefit for most strategies Diffusion based load balancer gives marginal benefit Grapevine gives the best performance

Strategies Number of Processes 1024 2048 4096 8192 16384 NoLB 137 75 43 27 20 HybridLB

  • 93

69 39 AmrLB 136 69 45 49 47 RefineLB 199 217 209 255 546 DiffLB 135 68 38 25 18 GvLB 123 59 30 21 14

Total application time (in seconds) for AMR on BG/Q.

23 / 25

slide-36
SLIDE 36

Load Balancing Conclusion

Outline

1 Introduction

Motivation Background Load Balancing Strategies

2 Distributed Load Balancing

Information Propagation Load Transfer

3 Evaluation 4 Conclusion

24 / 25

slide-37
SLIDE 37

Load Balancing Conclusion

Conclusion

Simple strategy Scales well Can be tuned to optimize for either cost or quality

25 / 25