MetaBalancer: An automatic load balancer based on application - - PowerPoint PPT Presentation

metabalancer an automatic load balancer based on
SMART_READER_LITE
LIVE PREVIEW

MetaBalancer: An automatic load balancer based on application - - PowerPoint PPT Presentation

Metabalancer MetaBalancer: An automatic load balancer based on application characteristics Harshitha Menon UIUC 7 th May, 2012 1 / 29 Metabalancer Outline 1 Motivation 2 Meta-Balancer: Overview 3 Load Balancer: Existing Framework 4


slide-1
SLIDE 1

Metabalancer

MetaBalancer: An automatic load balancer based

  • n application characteristics

Harshitha Menon

UIUC

7th May, 2012

1 / 29

slide-2
SLIDE 2

Metabalancer

Outline

1 Motivation 2 Meta-Balancer: Overview 3 Load Balancer: Existing Framework 4 Meta-Balancer

Statistics Collection Ideal LB Period Strategy Selection

5 Conclusion 6 Future Work

2 / 29

slide-3
SLIDE 3

Metabalancer Motivation

Outline

1 Motivation 2 Meta-Balancer: Overview 3 Load Balancer: Existing Framework 4 Meta-Balancer

Statistics Collection Ideal LB Period Strategy Selection

5 Conclusion 6 Future Work

3 / 29

slide-4
SLIDE 4

Metabalancer Motivation

Motivation

Load balancing decisions depend on application

Multiple runs required to observe and decide Tough to judge the correct load balancing parameters

4 / 29

slide-5
SLIDE 5

Metabalancer Motivation

Motivation

Load balancing decisions depend on application

Multiple runs required to observe and decide Tough to judge the correct load balancing parameters

Dynamic applications require dynamic load balancing decisions

Some phases may need frequent load balancing, others may be static Computation to communication ratio may change

4 / 29

slide-6
SLIDE 6

Metabalancer Meta-Balancer: Overview

Outline

1 Motivation 2 Meta-Balancer: Overview 3 Load Balancer: Existing Framework 4 Meta-Balancer

Statistics Collection Ideal LB Period Strategy Selection

5 Conclusion 6 Future Work

5 / 29

slide-7
SLIDE 7

Metabalancer Meta-Balancer: Overview

Meta-Balancer

Charm++ RTS monitors applications

Computation and communication per chare is maintained RTS maintains and controls the placement of chares

6 / 29

slide-8
SLIDE 8

Metabalancer Meta-Balancer: Overview

Meta-Balancer

Charm++ RTS monitors applications

Computation and communication per chare is maintained RTS maintains and controls the placement of chares

Charm++ RTS is aware of the system characteristics

6 / 29

slide-9
SLIDE 9

Metabalancer Meta-Balancer: Overview

Meta-Balancer

Charm++ RTS monitors applications

Computation and communication per chare is maintained RTS maintains and controls the placement of chares

Charm++ RTS is aware of the system characteristics Offload the load balancing related decision making to Charm++ RTS Meta-Balancer makes load balancing decisions without any user involvement

6 / 29

slide-10
SLIDE 10

Metabalancer Meta-Balancer: Overview

Decisions in Meta-Balancer

Frequency of load balancing

7 / 29

slide-11
SLIDE 11

Metabalancer Meta-Balancer: Overview

Decisions in Meta-Balancer

Frequency of load balancing Adaptive triggering of load balancing

7 / 29

slide-12
SLIDE 12

Metabalancer Meta-Balancer: Overview

Decisions in Meta-Balancer

Frequency of load balancing Adaptive triggering of load balancing Strategy Selection

Communication vs Computation strategy Comprehensive vs Refinement strategy

7 / 29

slide-13
SLIDE 13

Metabalancer Load Balancer: Existing Framework

Outline

1 Motivation 2 Meta-Balancer: Overview 3 Load Balancer: Existing Framework 4 Meta-Balancer

Statistics Collection Ideal LB Period Strategy Selection

5 Conclusion 6 Future Work

8 / 29

slide-14
SLIDE 14

Metabalancer Load Balancer: Existing Framework

Existing Framework

User decides LB frequency and strategy

9 / 29

slide-15
SLIDE 15

Metabalancer Load Balancer: Existing Framework

Existing Framework

User decides LB frequency and strategy Control flow

1 AtSync called whenever load balancing is to be performed in

the application

2 RTS enforces a chare level local barrier within every processor 3 Global barrier to collect statistics

9 / 29

slide-16
SLIDE 16

Metabalancer Load Balancer: Existing Framework

Existing Framework

User decides LB frequency and strategy Control flow

1 AtSync called whenever load balancing is to be performed in

the application

2 RTS enforces a chare level local barrier within every processor 3 Global barrier to collect statistics 4 Execute load balancing strategy and perform migration 5 Application resumes

9 / 29

slide-17
SLIDE 17

Metabalancer Meta-Balancer

Outline

1 Motivation 2 Meta-Balancer: Overview 3 Load Balancer: Existing Framework 4 Meta-Balancer

Statistics Collection Ideal LB Period Strategy Selection

5 Conclusion 6 Future Work

10 / 29

slide-18
SLIDE 18

Metabalancer Meta-Balancer

Lifecycle

Periodically during an application run

11 / 29

slide-19
SLIDE 19

Metabalancer Meta-Balancer

Lifecycle

Periodically during an application run

1 Every processor contributes its statistics

11 / 29

slide-20
SLIDE 20

Metabalancer Meta-Balancer

Lifecycle

Periodically during an application run

1 Every processor contributes its statistics 2 Based on the statistics collected, the central processor (root)

Finds the ideal LB period and informs other processors If immediate LB required, informs other processors

11 / 29

slide-21
SLIDE 21

Metabalancer Meta-Balancer

Lifecycle

Periodically during an application run

1 Every processor contributes its statistics 2 Based on the statistics collected, the central processor (root)

Finds the ideal LB period and informs other processors If immediate LB required, informs other processors

3 During load balancing, root decides the LB strategy

11 / 29

slide-22
SLIDE 22

Metabalancer Meta-Balancer Statistics Collection

Asynchronous Collection of Stats via Reduction

Statistics are collected via reduction periodically and frequently Collection has to be asynchronous - presence of a frequent local and global barrier results in substantial overheads

12 / 29

slide-23
SLIDE 23

Metabalancer Meta-Balancer Statistics Collection

Asynchronous Collection of Stats via Reduction

Statistics are collected via reduction periodically and frequently Collection has to be asynchronous - presence of a frequent local and global barrier results in substantial overheads Only minimal statistics are collected via custom reduction in Charm++

Maximum load - max reducer over all processor’s load Average load - sum reducer over all processor’s load Minimum Utilization - min reducer over all processor’s utilization (ratio of busy time and total time)

12 / 29

slide-24
SLIDE 24

Metabalancer Meta-Balancer Statistics Collection

Asynchronous Collection of Stats via Reduction

a1 b1 a2 b2 c1 d2 c2 d1 e1 e2 e3 e4

Stats Red 1

c3 e11 e12 e13 a9 b10 c8 d7

ROOT PE0 PE1 PE2 Stats Red 2

13 / 29

slide-25
SLIDE 25

Metabalancer Meta-Balancer Ideal LB Period

Ideal LB Period

Load balancing removes load imbalance, but causes following

  • verheads:

Data collection and strategy cost Migration cost

14 / 29

slide-26
SLIDE 26

Metabalancer Meta-Balancer Ideal LB Period

Ideal LB Period

Load balancing removes load imbalance, but causes following

  • verheads:

Data collection and strategy cost Migration cost

Optimal performance obtained if load balancing is performed at an ideal period Gains obtained from load balancing is maximized despite the incurred overheads.

14 / 29

slide-27
SLIDE 27

Metabalancer Meta-Balancer Ideal LB Period

Ideal LB Period

Assuming, τ - ideal LB period, γ - total iterations Γ - execution time, θ - cost of LB y = ax + ca - average load line equation y = mx + cm - maximum load w.r.t average load

15 / 29

slide-28
SLIDE 28

Metabalancer Meta-Balancer Ideal LB Period

Ideal LB Period

Assuming, τ - ideal LB period, γ - total iterations Γ - execution time, θ - cost of LB y = ax + ca - average load line equation y = mx + cm - maximum load w.r.t average load We obtain total execution time as Γ = γ

τ × (

τ

0 (mx + cm)dx + θ) +

γ

0 (ax + ca)dx

15 / 29

slide-29
SLIDE 29

Metabalancer Meta-Balancer Ideal LB Period

Ideal LB Period

Assuming, τ - ideal LB period, γ - total iterations Γ - execution time, θ - cost of LB y = ax + ca - average load line equation y = mx + cm - maximum load w.r.t average load We obtain total execution time as Γ = γ

τ × (

τ

0 (mx + cm)dx + θ) +

γ

0 (ax + ca)dx

Differentiating the above, following LB period is obtained for minimum execution time τ =

m

15 / 29

slide-30
SLIDE 30

Metabalancer Meta-Balancer Ideal LB Period

Results: Jacobi2D

5 10 15 20 25 30 35 50 100 150 200 250 300 350 400 Elapsed time (s) LB Period Elapsed time vs LB Period elapsed time 16 / 29

slide-31
SLIDE 31

Metabalancer Meta-Balancer Ideal LB Period

Results: Jacobi2D

0.016 0.017 0.018 0.019 0.02 0.021 0.022 0.023 0.024 0.025 50 100 150 200 250 300 350 400 Benchmark time (s) Iterations jacobi2D average load maxmum load 17 / 29

slide-32
SLIDE 32

Metabalancer Meta-Balancer Ideal LB Period

LB Period Augmentations

When the root informs the LB period, some chares may have gone beyond it Consensus mechanism to detect such cases, and decide the new LB period

18 / 29

slide-33
SLIDE 33

Metabalancer Meta-Balancer Ideal LB Period

LB Period Augmentations

When the root informs the LB period, some chares may have gone beyond it Consensus mechanism to detect such cases, and decide the new LB period As application characteristic changes, LB period may change

Capability to refine (expand and contract) LB period if possible

18 / 29

slide-34
SLIDE 34

Metabalancer Meta-Balancer Ideal LB Period

LB Period Augmentations

When the root informs the LB period, some chares may have gone beyond it Consensus mechanism to detect such cases, and decide the new LB period As application characteristic changes, LB period may change

Capability to refine (expand and contract) LB period if possible

If prediction and statistics collected do not match, immediate trigger if required

18 / 29

slide-35
SLIDE 35

Metabalancer Meta-Balancer Strategy Selection

Communication vs Computation

Applications can be communication bound, computationally intensive, or a mixture of two

19 / 29

slide-36
SLIDE 36

Metabalancer Meta-Balancer Strategy Selection

Communication vs Computation

Applications can be communication bound, computationally intensive, or a mixture of two Meta-Balancer uses αβ cost of an application to identify if it is communication intensive, which consist of two components:

1 α cost - start up cost of the messages sent

19 / 29

slide-37
SLIDE 37

Metabalancer Meta-Balancer Strategy Selection

Communication vs Computation

Applications can be communication bound, computationally intensive, or a mixture of two Meta-Balancer uses αβ cost of an application to identify if it is communication intensive, which consist of two components:

1 α cost - start up cost of the messages sent 2 β cost - bandwidth cost of bytes sent

19 / 29

slide-38
SLIDE 38

Metabalancer Meta-Balancer Strategy Selection

Refine vs Comprehensive

First time load balancing uses comprehensive load balancers

20 / 29

slide-39
SLIDE 39

Metabalancer Meta-Balancer Strategy Selection

Refine vs Comprehensive

First time load balancing uses comprehensive load balancers Thereafter, refinement strategies are invoked unless history shows poor quality of refinement based strategies

20 / 29

slide-40
SLIDE 40

Metabalancer Meta-Balancer Strategy Selection 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6 100 200 300 400 500 600 700 800 900 1000 Load (s) Iterations leanmd average load maximum load

Figure: leanmd mini-application

21 / 29

slide-41
SLIDE 41

Metabalancer Meta-Balancer Strategy Selection 1 2 3 4 5 6 7 8 50 100 150 200 250 300 350 400 Ratio Iterations kNeighbor Communication Intensive imbalance ratio (max/avg) idle/load

Figure: kNeighbor with high communication

22 / 29

slide-42
SLIDE 42

Metabalancer Meta-Balancer Strategy Selection 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 50 100 150 200 250 300 350 400 Load (s) Iterations kNeighbor average load maximum load

Figure: Dynamic triggering of LB for kNeighbor

23 / 29

slide-43
SLIDE 43

Metabalancer Meta-Balancer Strategy Selection

Overall Scheme

Lot of idle time load imbalance No LB Comm LB N N Y Y Y

∝퓑cost

first time Y Comprehensive strategy N high imb N Y good comprehensive lb Y N good Refine LB N Y RefineLB N start

LB Strategy Selection

24 / 29

slide-44
SLIDE 44

Metabalancer Conclusion

Outline

1 Motivation 2 Meta-Balancer: Overview 3 Load Balancer: Existing Framework 4 Meta-Balancer

Statistics Collection Ideal LB Period Strategy Selection

5 Conclusion 6 Future Work

25 / 29

slide-45
SLIDE 45

Metabalancer Conclusion

Conclusion

Load imbalance affects performance and scalability of an application Leaving it to the application programmer to manually handle this imbalance in a dynamic application is unreasonable and inefficient

26 / 29

slide-46
SLIDE 46

Metabalancer Conclusion

Conclusion

Load imbalance affects performance and scalability of an application Leaving it to the application programmer to manually handle this imbalance in a dynamic application is unreasonable and inefficient Meta-Balancer relieves the user from load balancing decisions by

Frequently collecting minimal statistics about the application Controlling the load balancing decision based on the application characteristics

26 / 29

slide-47
SLIDE 47

Metabalancer Future Work

Outline

1 Motivation 2 Meta-Balancer: Overview 3 Load Balancer: Existing Framework 4 Meta-Balancer

Statistics Collection Ideal LB Period Strategy Selection

5 Conclusion 6 Future Work

27 / 29

slide-48
SLIDE 48

Metabalancer Future Work

Future Work

Expand strategy selection

Hierarchical vs Centralized Topology-aware vs topology oblivious

28 / 29

slide-49
SLIDE 49

Metabalancer Future Work

Future Work

Expand strategy selection

Hierarchical vs Centralized Topology-aware vs topology oblivious

More accurate prediction of load - higher order curves

28 / 29

slide-50
SLIDE 50

Metabalancer Future Work

Thank You!

29 / 29