Robson E. De Grande Azzedine Boukerche PARADISE Laboratory SITE - - PowerPoint PPT Presentation

robson e de grande azzedine boukerche
SMART_READER_LITE
LIVE PREVIEW

Robson E. De Grande Azzedine Boukerche PARADISE Laboratory SITE - - PowerPoint PPT Presentation

Robson E. De Grande Azzedine Boukerche PARADISE Laboratory SITE University of Ottawa September 2010 DS-RT 2011 . Introduction High Level Architecture Dynamic Load Balancing Related


slide-1
SLIDE 1

Robson E. De Grande Azzedine Boukerche

PARADISE Laboratory SITE – University of Ottawa September 2010

slide-2
SLIDE 2

 Introduction

 High Level Architecture  Dynamic Load Balancing

 Related Work  Challenging Issues  Proposed Balancing Scheme

 Architecture  Functioning  Prediction Model

 Experiments and Results  Conclusion and Future Work

2 DS-RT 2011 .

slide-3
SLIDE 3

 High Level Architecture

 Coordination of Distributed Simulations

 Interoperability and Reusability

 No management of resources Load Imbalances  DDM  only Communication Filtering

 It partially works for communication balancing

3 DS-RT 2011 .

slide-4
SLIDE 4

 Grids services

 Resource Sharing Management System  Grids + Stateful Web Services  Access/Monitoring/Authentication – VO/Data Replication  Globus ToolKit

4 DS-RT 2011 .

slide-5
SLIDE 5

 Dynamic Load Balancing

 Static partitioning

 Deterministic processing

 On demand adaptation

 Unpredictable changes

 Large-scale environments

 Heterogeneity  Shared resources  Large communication latencies

5 DS-RT 2011 .

slide-6
SLIDE 6

Sim Monitoring Re-distribution Migration Heterog.

  • Ext. load

Glazer & Tropper Opt t advance comp

  • partially

partially Jiang et al. Opt t advance comp

  • weights

partially Burdorf & Marti Opt LVT/vector comp/speed/StD simple/slow partially partially Schlagenhaft

  • et. al.

Opt VTP comp/pVTP + mig vague partially partially Avril & Tropper Opt comm/ throughput load (comm) vague partially partially Carothers & Fujimoto Opt PAT load (policies) clustered/ slow partially partially Jiang et al. Opt IPC comp+comm clustered/ slow partially partially

6 DS-RT 2011 .

slide-7
SLIDE 7

Sim Monitoring Re-distribution Migration Heterog.

  • Ext. load

Deelman & Szymanski Opt unproc event comp (chains) neighbor

  • Choe & Tropper

Opt space-time product comp vague partially partially Low Opt *CPU load comm/comp/ lookahead

  • Peschlow et. al.

Opt t advance comm/comp

  • partially

partially Wilson & Shen Disc CPU load policies (comm/ comp)

  • Boukerche &

Das Con CPU load comm/comp

  • Xiao et. al.

Con comm dep sched lvl

  • 7

DS-RT 2011 .

slide-8
SLIDE 8

Sim Monitoring Re-distribution Migration Heterog.

  • Ext. load

Gan et. al. Con Sim time Central (priority)

  • Boukerche

Con Entropy (!) Comp+comm

  • Ajaltouni et. al.

Con CPU load Comm/comp Global sync

  • Luthi &

Grossmman HLA

  • Global sync
  • Zajac et. al.

HLA Grids

  • Global sync
  • Monitor

Cai et. al. HLA Grids

  • Global sync
  • Monitor

Tan & Lim HLA

  • queues
  • Bononi et. al.

HLA

  • Comm. Dep

Comm Fed objects Partially

  • Grande &

Boukerche HLA

  • Comm. Dep/

CPU load Comm/comp Freeze free yes yes

8 DS-RT 2011 .

slide-9
SLIDE 9

 A balancing approach fully covers

 Heterogeneity  External background load  Scalability  HLA simulation characteristics

 However

 Responsiveness  Lack of efficiency

 Totally reactive scheme  Cyclic load oscillations

 Precipitated load transfers

9 DS-RT 2011 .

slide-10
SLIDE 10

 Architecture

10 DS-RT 2011 .

slide-11
SLIDE 11

 Reactive

 Balancing cycles

 Load Balancing in 3 phases

 Monitoring

 Data gathering  Detection of imbalances

 Re-distribution  Migration

 Prediction

 Detection  Re-distribution

11 DS-RT 2011 .

slide-12
SLIDE 12

 Collection

 Cluster

 WebMDS

 CPU load  Normalization

 Local

 Management Java Library

 CPU load

 Hierarchical gathering

 LLBs and CLBs

 Filtering

 Irrelevant data  Non-managed resources

 Not balanced  Overloaded nodes without federates

 Cut-off position

12

fe fed fe fed fe fed MDS DS

DS-RT 2011 .

slide-13
SLIDE 13

 Hierarchical/Region structure

 Redistribution among neighbour CLBs  Inter-relations between CLBs

 Two scopes

 Local

 Pair-match evaluations

 Cluster

 Comparisons between neighbours  Pair-match evaluations

13 DS-RT 2011 .

slide-14
SLIDE 14

 Detection/Redistribution

 Predictions  current load status + [past,forecast]  Different levels

 Short term

 Responsiveness to current imbalances

 Medium and Long terms

 Preventive measures for future load trends

 Local Scope

 Redistribution on each detection

 Inter-domain Scope

 1 - Cluster load evaluation  2 - Redistribution on each detection

DS-RT 2011 . 14

slide-15
SLIDE 15

 Load comparisons

 Ordered by prediction

 Short term  Medium term  Long term  Emphasis on predictions closer to current time

 Inter-domain

 Ordered by prediction

 Selection of resource candidates

 In prediction scopes

DS-RT 2011 . 15

slide-16
SLIDE 16

 Balancing cycles

 Uniformly spaced time intervals

 Time series  Smoothing and Forecasting  Past is considered to define a future load status

 Double EWMA

 Load tendency

 Extrapolation of smoothing

 Future balancing cycles: SP, MP, and LP

DS-RT 2011 . 16

SP SP MP MP LP LP

slide-17
SLIDE 17

 Predictive adjustment

 Adjustment of balancing parameters

 Before pair-match analysis

 Direction analysis

 Source  Destination

 3 conditions  enforcement

 1 – Load difference is increasing

 Less imbalance tolerance

 2 – One resource is stabilizing

 Intermediary tolerance

 3 – Both resources are stabilizing

 More imbalance tolerance DS-RT 2011 . 17

slide-18
SLIDE 18

 2-step migration

 No global synchronization  Grids RFT  Initialization files  Peer-to-peer Execution state + messages

 Less migration delay

 Wait -> state + messages

 Minimum latency

 Larger system’s reactivity

 Migration Proxy

 Facilitate transient data transfer

18

Fe Federate Fe Federate

Ini Init F File les Status + + Me Message ssages s

MM MM MM’ ’

Status + + Me Message ssages s

DS-RT 2011 .

slide-19
SLIDE 19

 Experimental Scenario

 Federates deployed on a 56-machine distributed system

 Two clusters: 32 and 24 nodes

 Each federate  communication + computation

 Emphasis on computation

 Synthetic load

 Scenario

 Tank fight simulation  From 1 to 1000 federates  1 object per federate

 Predictive scheme

 Prediction ranges: 1, 3, 5

19 DS-RT 2011 .

slide-20
SLIDE 20

 Static simulation load

 Increasing number of federates

 1 to 1000

20 DS-RT 2011 .

slide-21
SLIDE 21

 Static external load

 Increasing number of federates

 1 to 1000

21 DS-RT 2011 .

slide-22
SLIDE 22

 Dynamic simulation load

 Random, periodic load changes

 1 to 1000 federates

22 DS-RT 2011 .

slide-23
SLIDE 23

 Predictive, distributed balancing system

 Forecasting of computational load changes  Three levels of prediction:

 Short term  smoothing mostly  Medium term  Long term

 Efficiency gain

 Less unnecessary migrations  Prevention of load imbalances

 Cyclic oscillations

 Future Work

 Further prediction analysis

 Migration time  Cyclic load changes  size of cycle period  Heterogeneous simulations

 Other prediction models

23 DS-RT 2011 .

slide-24
SLIDE 24

Thanks

24 DS-RT 2011 .