Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks - - PowerPoint PPT Presentation

toward a fully decentralized algorithm for multiple bag
SMART_READER_LITE
LIVE PREVIEW

Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks - - PowerPoint PPT Presentation

Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks Application Scheduling on Grids R emi Bertin, Arnaud Legrand, Corinne Touati Laboratoire LIG, CNRS-INRIA Grenoble, France Aussois Workshop A. Legrand (CNRS-LIG) INRIA-MESCAL


slide-1
SLIDE 1

Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks Application Scheduling

  • n Grids

R´ emi Bertin, Arnaud Legrand, Corinne Touati

Laboratoire LIG, CNRS-INRIA Grenoble, France

Aussois Workshop

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling 1 / 24

slide-2
SLIDE 2

Outline

1

Framework

2

Lagrangian Optimization

3

Simulations: Early Results

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling 2 / 24

slide-3
SLIDE 3

Motivation

Large-scale distributed computing platforms result from the collab-

  • ration of many users:

◮ Sharing resources amongst users should somehow be fair.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling 3 / 24

slide-4
SLIDE 4

Motivation

Large-scale distributed computing platforms result from the collab-

  • ration of many users:

◮ Sharing resources amongst users should somehow be fair. ◮ The size of these systems prevents the use of centralized ap-

proaches need for distributed scheduling.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling 3 / 24

slide-5
SLIDE 5

Motivation

Large-scale distributed computing platforms result from the collab-

  • ration of many users:

◮ Sharing resources amongst users should somehow be fair. ◮ The size of these systems prevents the use of centralized ap-

proaches need for distributed scheduling.

◮ Task regularity (SETI@home, BOINC, . . . )

steady-state scheduling.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling 3 / 24

slide-6
SLIDE 6

Motivation

Large-scale distributed computing platforms result from the collab-

  • ration of many users:

◮ Sharing resources amongst users should somehow be fair. ◮ The size of these systems prevents the use of centralized ap-

proaches need for distributed scheduling.

◮ Task regularity (SETI@home, BOINC, . . . )

steady-state scheduling. Designing a Fair and Distributed scheduling algorithm for this framework.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling 3 / 24

slide-7
SLIDE 7

Outline

1

Framework

2

Lagrangian Optimization

3

Simulations: Early Results

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 4 / 24

slide-8
SLIDE 8

Platform Model

Bi→j Wi Wj

◮ General platform graph G = (N, E, W, B).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 5 / 24

slide-9
SLIDE 9

Platform Model

Bi→j Wi Wj

◮ General platform graph G = (N, E, W, B). ◮ Speed of Pn ∈ N: Wn (in MFlops/s).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 5 / 24

slide-10
SLIDE 10

Platform Model

Bi→j Wi Wj

◮ General platform graph G = (N, E, W, B). ◮ Speed of Pn ∈ N: Wn (in MFlops/s). ◮ Bandwidth of (Pi → Pj): Bi,j (in MB/s).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 5 / 24

slide-11
SLIDE 11

Platform Model

Bi→j Wi Wj

◮ General platform graph G = (N, E, W, B). ◮ Speed of Pn ∈ N: Wn (in MFlops/s). ◮ Bandwidth of (Pi → Pj): Bi,j (in MB/s). ◮ Linear-cost communication and computa-

tion model: X/Bi,j time units to send a message of size X from Pi to Pj.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 5 / 24

slide-12
SLIDE 12

Platform Model

Bi→j Wi Wj

◮ General platform graph G = (N, E, W, B). ◮ Speed of Pn ∈ N: Wn (in MFlops/s). ◮ Bandwidth of (Pi → Pj): Bi,j (in MB/s). ◮ Linear-cost communication and computa-

tion model: X/Bi,j time units to send a message of size X from Pi to Pj.

◮ Communications and computations can be

  • verlapped.
  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 5 / 24

slide-13
SLIDE 13

Platform Model

Bi→j Wi Wj

◮ General platform graph G = (N, E, W, B). ◮ Speed of Pn ∈ N: Wn (in MFlops/s). ◮ Bandwidth of (Pi → Pj): Bi,j (in MB/s). ◮ Linear-cost communication and computa-

tion model: X/Bi,j time units to send a message of size X from Pi to Pj.

◮ Communications and computations can be

  • verlapped.

◮ Multi-port communication model.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 5 / 24

slide-14
SLIDE 14

Application Model

Multiple applications:

◮ A set A of K applications A1, . . . , AK.

A3 A2 A1

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 6 / 24

slide-15
SLIDE 15

Application Model

Multiple applications:

◮ A set A of K applications A1, . . . , AK. ◮ Each consisting in a large number of same-size independent

tasks each application is defined by a communication cost wk (in MFlops) and a communication cost bk (in MB).

A3 A2 A1

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 6 / 24

slide-16
SLIDE 16

Application Model

Multiple applications:

◮ A set A of K applications A1, . . . , AK. ◮ Each consisting in a large number of same-size independent

tasks each application is defined by a communication cost wk (in MFlops) and a communication cost bk (in MB).

◮ Different communication and computation demands for differ-

ent applications.

A3 A2 A1

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 6 / 24

slide-17
SLIDE 17

Hierarchical Deployment

Pm(1) Pm(2)

◮ Each application originates from a master

node Pm(k) that initially holds all the input data necessary for each application Ak.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 7 / 24

slide-18
SLIDE 18

Hierarchical Deployment

Pm(1) Pm(2)

◮ Each application originates from a master

node Pm(k) that initially holds all the input data necessary for each application Ak.

◮ Communication are only required outwards

from the master nodes: the amount of data returned by the worker is negligible.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 7 / 24

slide-19
SLIDE 19

Hierarchical Deployment

Pm(1) Pm(2)

◮ Each application originates from a master

node Pm(k) that initially holds all the input data necessary for each application Ak.

◮ Communication are only required outwards

from the master nodes: the amount of data returned by the worker is negligible.

◮ Each application Ak is deployed on the

platform as a tree. Therefore if an application k wants to use a node Pn, all its data will use a single path from Pm(k) to Pn denoted by (Pm(k) Pn).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 7 / 24

slide-20
SLIDE 20

Steady-State Scheduling and Utility

◮ All tasks of a given application are identical and independent

we do not really need to care about where and when (as

  • pposed to classical scheduling problems).
  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 8 / 24

slide-21
SLIDE 21

Steady-State Scheduling and Utility

◮ All tasks of a given application are identical and independent

we do not really need to care about where and when (as

  • pposed to classical scheduling problems).

◮ We only need to focus on average values in steady-state.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 8 / 24

slide-22
SLIDE 22

Steady-State Scheduling and Utility

◮ All tasks of a given application are identical and independent

we do not really need to care about where and when (as

  • pposed to classical scheduling problems).

◮ We only need to focus on average values in steady-state. ◮ Steady-state values:

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 8 / 24

slide-23
SLIDE 23

Steady-State Scheduling and Utility

◮ All tasks of a given application are identical and independent

we do not really need to care about where and when (as

  • pposed to classical scheduling problems).

◮ We only need to focus on average values in steady-state. ◮ Steady-state values:

◮ Variables: average number of tasks of type k processed by pro-

cessor n per time unit: ̺n,k.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 8 / 24

slide-24
SLIDE 24

Steady-State Scheduling and Utility

◮ All tasks of a given application are identical and independent

we do not really need to care about where and when (as

  • pposed to classical scheduling problems).

◮ We only need to focus on average values in steady-state. ◮ Steady-state values:

◮ Variables: average number of tasks of type k processed by pro-

cessor n per time unit: ̺n,k.

◮ Throughput of application k : ̺k =

n∈N ̺n,k.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 8 / 24

slide-25
SLIDE 25

Steady-State Scheduling and Utility

◮ All tasks of a given application are identical and independent

we do not really need to care about where and when (as

  • pposed to classical scheduling problems).

◮ We only need to focus on average values in steady-state. ◮ Steady-state values:

◮ Variables: average number of tasks of type k processed by pro-

cessor n per time unit: ̺n,k.

◮ Throughput of application k : ̺k =

n∈N ̺n,k.

Theorem 1. From “feasible” ̺n,k, it is possible to build an optimal periodic infi- nite schedule (i.r. whose steady-state rates are exactly the ̺n,k). Such a schedule is asymptotically optimal for the makespan.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 8 / 24

slide-26
SLIDE 26

Decentralized Scheduling

The rates ̺n,k are sufficient to help simple demand-driven scheduling algorithms.

◮ Dispatch incoming tasks of

type k to the queues (n, k) with “proportion” ̺n,k.

◮ Request tasks from your fa-

ther when incomming queue sizes get below a fixed thresh-

  • ld.

Deviation = ̺(th)

k

− ̺(exp)

k

̺(th)

k

0.02 0.04 0.06 0.08 0.1 0.12 0.2 0.4 0.6 0.8 1 Frequency Deviation from theoretical throughput

We can focus on finding the ̺n,k.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 9 / 24

slide-27
SLIDE 27

Utility and Optimization Problem

◮ Let Uk(̺k) be the utility associated to application k. We aim

at maximizing

k∈K Uk(̺k).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 10 / 24

slide-28
SLIDE 28

Utility and Optimization Problem

◮ Let Uk(̺k) be the utility associated to application k. We aim

at maximizing

k∈K Uk(̺k). ◮ It has been shown that different values of Uk leads to different

kind of fairness. Typically, Uk(̺k) = log(̺k) (proportional fairness) or Uk(̺k) = ̺α

k/(1 − α) (α-fairness).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 10 / 24

slide-29
SLIDE 29

Utility and Optimization Problem

◮ Let Uk(̺k) be the utility associated to application k. We aim

at maximizing

k∈K Uk(̺k). ◮ It has been shown that different values of Uk leads to different

kind of fairness. Typically, Uk(̺k) = log(̺k) (proportional fairness) or Uk(̺k) = ̺α

k/(1 − α) (α-fairness).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 10 / 24

slide-30
SLIDE 30

Utility and Optimization Problem

◮ Let Uk(̺k) be the utility associated to application k. We aim

at maximizing

k∈K Uk(̺k). ◮ It has been shown that different values of Uk leads to different

kind of fairness. Typically, Uk(̺k) = log(̺k) (proportional fairness) or Uk(̺k) = ̺α

k/(1 − α) (α-fairness). ◮ Maximize k log(̺k) under the constraints:

̺k =

  • n

̺n,k ∀n,

  • k

̺n,kwk Wn ∀(Pi → Pj),

  • k
  • n such that

(Pi→Pj)∈(Pm(k)Pn)

̺n,kbk Bi,j

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 10 / 24

slide-31
SLIDE 31

Utility and Optimization Problem

◮ Let Uk(̺k) be the utility associated to application k. We aim

at maximizing

k∈K Uk(̺k). ◮ It has been shown that different values of Uk leads to different

kind of fairness. Typically, Uk(̺k) = log(̺k) (proportional fairness) or Uk(̺k) = ̺α

k/(1 − α) (α-fairness). ◮ Maximize k log(̺k) under the constraints:

̺k =

  • n

̺n,k ∀n,

  • k

̺n,kwk Wn ∀(Pi → Pj),

  • k
  • n such that

(Pi→Pj)∈(Pm(k)Pn)

̺n,kbk Bi,j

◮ Can be solved in polynomial time with semi-definite program-

ming [Touati.et.al.06]. It is very centralized though. Can we solve it in a distributed way?

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 10 / 24

slide-32
SLIDE 32

Utility and Optimization Problem

◮ Let Uk(̺k) be the utility associated to application k. We aim

at maximizing

k∈K Uk(̺k). ◮ It has been shown that different values of Uk leads to different

kind of fairness. Typically, Uk(̺k) = log(̺k) (proportional fairness) or Uk(̺k) = ̺α

k/(1 − α) (α-fairness). ◮ Maximize k log(̺k) under the constraints:

̺k =

  • n

̺n,k ∀n,

  • k

̺n,kwk Wn ∀(Pi → Pj),

  • k
  • n such that

(Pi→Pj)∈(Pm(k)Pn)

̺n,kbk Bi,j

◮ Can be solved in polynomial time with semi-definite program-

ming [Touati.et.al.06]. It is very centralized though. Can we solve it in a distributed way?

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Framework 10 / 24

slide-33
SLIDE 33

Outline

1

Framework

2

Lagrangian Optimization

3

Simulations: Early Results

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 11 / 24

slide-34
SLIDE 34

Lagrangian Optimization: Basics

◮ Designed to solve non linear optimization problems:

◮ Let α → f(α) be a function to maximize. ◮ Let (Ci(α) 0)i∈[1..n] be a set of n constraints. ◮ We wish to solve:

(P)

  • maximize f(α)

∀i ∈ [1..n], Ci(α) 0, and α 0

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 12 / 24

slide-35
SLIDE 35

Lagrangian Optimization: Basics

◮ Designed to solve non linear optimization problems:

◮ Let α → f(α) be a function to maximize. ◮ Let (Ci(α) 0)i∈[1..n] be a set of n constraints. ◮ We wish to solve:

(P)

  • maximize f(α)

∀i ∈ [1..n], Ci(α) 0, and α 0

◮ The Lagrangian function: L(α, λ) = f(α) −

  • i∈[1..n]

λiCi(α).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 12 / 24

slide-36
SLIDE 36

Lagrangian Optimization: Basics

◮ Designed to solve non linear optimization problems:

◮ Let α → f(α) be a function to maximize. ◮ Let (Ci(α) 0)i∈[1..n] be a set of n constraints. ◮ We wish to solve:

(P)

  • maximize f(α)

∀i ∈ [1..n], Ci(α) 0, and α 0

◮ The Lagrangian function: L(α, λ) = f(α) −

  • i∈[1..n]

λiCi(α).

◮ The dual functional: d(λ) = max α0 L(α, λ).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 12 / 24

slide-37
SLIDE 37

Lagrangian Optimization: Basics

◮ Designed to solve non linear optimization problems:

◮ Let α → f(α) be a function to maximize. ◮ Let (Ci(α) 0)i∈[1..n] be a set of n constraints. ◮ We wish to solve:

(P)

  • maximize f(α)

∀i ∈ [1..n], Ci(α) 0, and α 0

◮ The Lagrangian function: L(α, λ) = f(α) −

  • i∈[1..n]

λiCi(α).

◮ The dual functional: d(λ) = max α0 L(α, λ). ◮ Under some weak hypothesis, solving (P) is equivalent to solve

the dual problem: (D)

  • minimize d(λ)

λ 0

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 12 / 24

slide-38
SLIDE 38

Lagrangian Optimization: Basics

◮ Designed to solve non linear optimization problems:

◮ Let α → f(α) be a function to maximize. ◮ Let (Ci(α) 0)i∈[1..n] be a set of n constraints. ◮ We wish to solve:

(P)

  • maximize f(α)

∀i ∈ [1..n], Ci(α) 0, and α 0

◮ The Lagrangian function: L(α, λ) = f(α) −

  • i∈[1..n]

λiCi(α).

◮ The dual functional: d(λ) = max α0 L(α, λ). ◮ Under some weak hypothesis, solving (P) is equivalent to solve

the dual problem: (D)

  • minimize d(λ)

λ 0 So what?..

◮ Two coupled problems with simple constraints. ◮ The structure of constraints is transposed to (D)

and a gradient descent algorithm is a natural way to solve these two problems.

◮ This technique has been used successfully for

network resource sharing [Kelly.98], TCP anal- ysis [Low.03], flow control in multi-path net- work [Hang.et.al.03], . . .

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 12 / 24

slide-39
SLIDE 39

Trying to use Lagrangian optimization

◮ What does the Lagrangian function look like ?

L(α, λ, µ) =

  • k∈K

log

  • i

̺i,k

  • +
  • i

λi

  • Wi −
  • k

̺i,kwk

  • +
  • (Pi→Pj)

µi,j    Bi,j −

  • k
  • n such that

(Pi→Pj)∈(Pm(k)Pn)

̺n,kbk    

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 13 / 24

slide-40
SLIDE 40

Trying to use Lagrangian optimization

◮ What does the Lagrangian function look like ?

L(α, λ, µ) =

  • k∈K

log

  • i

̺i,k

  • +
  • i

λi

  • Wi −
  • k

̺i,kwk

  • +
  • (Pi→Pj)

µi,j    Bi,j −

  • k
  • n such that

(Pi→Pj)∈(Pm(k)Pn)

̺n,kbk    

◮ Remember, we want to compute minλ,µ0 max̺0 L(α, λ, µ).

We can solve this problem by simply doing a “alternate” gra- dient descent (I’m skipping a few details here to keep it simple and just present the general idea):        ̺i,k ← ̺i,k + γ ∂L

∂̺i,k

λi ← λi − γ ∂L

∂λi

µi,j ← µi,j − γ ∂L

∂µi,j

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 13 / 24

slide-41
SLIDE 41

Toward a Distributed Algorithm...

◮ ̺i,k is “private” to the agent of application k running on node

i.

◮ λi is attached to node i and µi,j is attached to (Pi → Pj).

λi and µi,j are called shadow variables or shadow prices. They can naturally thought of as the price to pay to use the corre- sponding resource.

◮ A gradient descent algorithm on the primal-dual problem can

thus be seen as a bargain between applications and resources.

◮ We need to find an efficient way to implement this bargain,

i.e., to compute the update. To this end, the following quanti- ties are useful and easy to compute via recursive propagation:              σn

k

=

  • p such that n∈(Pm(k)Pp)

̺p,k

  • aggregate throughput
  • f a subtree.

ηn

k

=

  • (Pi→Pj)∈(Pm(k)Pn)

µi,j

  • aggregate communication

price

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 14 / 24

slide-42
SLIDE 42

Toward a Distributed Algorithm...

◮ ̺i,k is “private” to the agent of application k running on node

i.

◮ λi is attached to node i and µi,j is attached to (Pi → Pj).

λi and µi,j are called shadow variables or shadow prices. They can naturally thought of as the price to pay to use the corre- sponding resource.

◮ A gradient descent algorithm on the primal-dual problem can

thus be seen as a bargain between applications and resources.

◮ We need to find an efficient way to implement this bargain,

i.e., to compute the update. To this end, the following quanti- ties are useful and easy to compute via recursive propagation: Hierarchical deployment

̺k ̺i,k µi,j

             σn

k

=

  • p such that n∈(Pm(k)Pp)

̺p,k

  • aggregate throughput
  • f a subtree.

ηn

k

=

  • (Pi→Pj)∈(Pm(k)Pn)

µi,j

  • aggregate communication

price

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 14 / 24

slide-43
SLIDE 43

Toward a Distributed Algorithm...

◮ ̺i,k is “private” to the agent of application k running on node

i.

◮ λi is attached to node i and µi,j is attached to (Pi → Pj).

λi and µi,j are called shadow variables or shadow prices. They can naturally thought of as the price to pay to use the corre- sponding resource.

◮ A gradient descent algorithm on the primal-dual problem can

thus be seen as a bargain between applications and resources.

◮ We need to find an efficient way to implement this bargain,

i.e., to compute the update. To this end, the following quanti- ties are useful and easy to compute via recursive propagation: Hierarchical deployment

σi

k

̺k ̺i,k µi,j

             σn

k

=

  • p such that n∈(Pm(k)Pp)

̺p,k

  • aggregate throughput
  • f a subtree.

ηn

k

=

  • (Pi→Pj)∈(Pm(k)Pn)

µi,j

  • aggregate communication

price

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 14 / 24

slide-44
SLIDE 44

Toward a Distributed Algorithm...

◮ ̺i,k is “private” to the agent of application k running on node

i.

◮ λi is attached to node i and µi,j is attached to (Pi → Pj).

λi and µi,j are called shadow variables or shadow prices. They can naturally thought of as the price to pay to use the corre- sponding resource.

◮ A gradient descent algorithm on the primal-dual problem can

thus be seen as a bargain between applications and resources.

◮ We need to find an efficient way to implement this bargain,

i.e., to compute the update. To this end, the following quanti- ties are useful and easy to compute via recursive propagation: Hierarchical deployment

ηi

k

̺k ̺i,k µi,j

             σn

k

=

  • p such that n∈(Pm(k)Pp)

̺p,k

  • aggregate throughput
  • f a subtree.

ηn

k

=

  • (Pi→Pj)∈(Pm(k)Pn)

µi,j

  • aggregate communication

price

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 14 / 24

slide-45
SLIDE 45

Toward a Distributed Algorithm...

Prices and rates can thus be propagated and aggregated to perform the following updates: pi

k(t + 1) ← bkηi k(t) + wkλi(t)

̺k(t + 1) ← σm(k)

k

(t + 1) ̺i,k(t + 1) ←

  • ̺i,k(t) + γ̺(U′

k(̺k(t)) − pi k(t))

+ λi(t + 1) ←

  • λi(t) + γλ
  • k

wk̺i,k − Wi + µi,j(t + 1) ←

  • µi,j(t) + γµ
  • k

bkσi

k − Bi,j

+

◮ This algorithm is fully distributed and converges to the optimal

solution provided a good choice of γ̺, γλ and γµ is done.

◮ This algorithm seamlessly adapts to application/node arrival

and to load variations.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Lagrangian Optimization 15 / 24

slide-46
SLIDE 46

Outline

1

Framework

2

Lagrangian Optimization

3

Simulations: Early Results

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 16 / 24

slide-47
SLIDE 47

Experimental Setting

◮ The simulator is SimGrid.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 17 / 24

slide-48
SLIDE 48

Experimental Setting

◮ The simulator is SimGrid. ◮ Fully synchronous gradient.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 17 / 24

slide-49
SLIDE 49

Experimental Setting

◮ The simulator is SimGrid. ◮ Fully synchronous gradient. ◮ Checking the correctness of the results using semi-definite pro-

gramming.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 17 / 24

slide-50
SLIDE 50

Experimental Setting

◮ The simulator is SimGrid. ◮ Fully synchronous gradient. ◮ Checking the correctness of the results using semi-definite pro-

gramming.

◮ Very simple platform and applications:

B = 5.108 W = 5.108

m(3) m(1) m(2) C A B E D

We used three kinds of applications of respective (b, w): (1000, 5000), (2000, 800), and (1500, 1500).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 17 / 24

slide-51
SLIDE 51

Basic Version of the Algorithm

5 10 15 20 25 30 35 40 45 50 100 150 200 250 300 350 400 35 36 37 38 39 40 50 100 150 200 250 300 350 400

Objective function

k log ̺k: numerical instabilities and global in-

efficiencies.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 18 / 24

slide-52
SLIDE 52

Basic Version of the Algorithm

5 10 15 20 25 30 35 40 45 50 100 150 200 250 300 350 400 35 36 37 38 39 40 50 100 150 200 250 300 350 400

Objective function

k log ̺k: using a smaller steps γ̺ no more

instability but slow convergence.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 18 / 24

slide-53
SLIDE 53

Basic Version of the Algorithm

100000 200000 300000 400000 500000 600000 50 100 150 200 250 300 350 400

̺1 ̺2 ̺3

Throughput of each of the three applications: between two itera- tions, a decrease or increase of magnitude five or more can happen!

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 18 / 24

slide-54
SLIDE 54

Basic Version of the Algorithm

100000 200000 300000 400000 500000 600000 50 100 150 200 250 300 350 400

̺B,1 ̺E,1 ̺D,1 ̺A,1 ̺C,1

Detailing the rates for application 1.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 18 / 24

slide-55
SLIDE 55

Basic Version of the Algorithm

50000 100000 150000 200000 200 400 600 800 1000 1200 1400

µE,D ̺E,k

Correlation between the rate of an application on a given node and the price it experiences.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 18 / 24

slide-56
SLIDE 56

Scaling!

The original update equation for ̺ is: ̺i,k(t + 1) ←

  • ̺i,k(t) + γ̺
  • 1

̺k(t) − pi

k(t)

+ A small value of ̺ leads to huge updates and thus to severe oscil- lations.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 19 / 24

slide-57
SLIDE 57

Scaling!

The original update equation for ̺ is: ̺i,k(t + 1) ←

  • ̺i,k(t) + γ̺
  • 1

̺k(t) − pi

k(t)

+ A small value of ̺ leads to huge updates and thus to severe oscil-

  • lations. This is a known issue and, as mentioned in [Hang.et.al.03],
  • ne can normalize as follows:

̺i,k(t + 1) ←

  • ̺i,k(t) + γ̺
  • 1 − ̺k(t).pi

k(t)

+ . Unfortunately, it does not help (the previous experiments actually use this normalized update). It merely avoids division by 0 but is insufficient to damp oscillations.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 19 / 24

slide-58
SLIDE 58

Scaling!

The original update equation for ̺ is: ̺i,k(t + 1) ←

  • ̺i,k(t) + γ̺
  • 1

̺k(t) − pi

k(t)

+ A small value of ̺ leads to huge updates and thus to severe oscil-

  • lations. This is a known issue and, as mentioned in [Hang.et.al.03],
  • ne can normalize as follows:

̺i,k(t + 1) ←

  • ̺i,k(t) + γ̺
  • 1 − ̺k(t).pi

k(t)

+ . Unfortunately, it does not help (the previous experiments actually use this normalized update). It merely avoids division by 0 but is insufficient to damp oscillations. Updating ̺ has an impact on the prices λ and µ, which in turn impact on the ̺’s update.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 19 / 24

slide-59
SLIDE 59

Scaling!

The original update equation for ̺ is: ̺i,k(t + 1) ←

  • ̺i,k(t) + γ̺
  • 1

̺k(t) − pi

k(t)

+ A small value of ̺ leads to huge updates and thus to severe oscil-

  • lations. This is a known issue and, as mentioned in [Hang.et.al.03],
  • ne can normalize as follows:

̺i,k(t + 1) ←

  • ̺i,k(t) + γ̺
  • 1 − ̺k(t).pi

k(t)

+ . Unfortunately, it does not help (the previous experiments actually use this normalized update). It merely avoids division by 0 but is insufficient to damp oscillations. Updating ̺ has an impact on the prices λ and µ, which in turn impact on the ̺’s update. The second update of ̺ should have the same order of magnitude (or be smaller) as the first one to avoid numerical instabilities that prevent convergence of the algorithm.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 19 / 24

slide-60
SLIDE 60

Scaling Again!

Assume that we have reached the equilibrium. Then increase λi by ∆λi. Then: ∆̺i,k = −γ(2)

̺ wk∆λi̺k.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 20 / 24

slide-61
SLIDE 61

Scaling Again!

Assume that we have reached the equilibrium. Then increase λi by ∆λi. Then: ∆̺i,k = −γ(2)

̺ wk∆λi̺k.

In turn, such a variation incurs a variation of λi:

  • k

γλ.wk.∆̺i,k = ∆λi.

  • k

γλ.γ(2)

̺ w2 k̺k

  • .
  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 20 / 24

slide-62
SLIDE 62

Scaling Again!

Assume that we have reached the equilibrium. Then increase λi by ∆λi. Then: ∆̺i,k = −γ(2)

̺ wk∆λi̺k.

In turn, such a variation incurs a variation of λi:

  • k

γλ.wk.∆̺i,k = ∆λi.

  • k

γλ.γ(2)

̺ w2 k̺k

  • .

Thus, the solution of our gradient is stable only if

  • k

γλ.γ(2)

̺ w2 k̺k < 1.

Therefore, λ’s update should be replaced by λi(t + 1) ←

  • λi(t) + γλ
  • k wk̺i,k − Wi
  • k w2

k̺k

+ It doesn’t hurt and similar scaling can be done for the µ’s.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 20 / 24

slide-63
SLIDE 63

Scaled Version of the Algorithm

37 38 39 40 41 42 43 44 45 10 20 30 40 50 37 38 39 40 41 42 43 44 45 50 100 150 200 250 300 350 400

The oscillations, due to a really badly chosen initialization value quickly vanish (left graph).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 21 / 24

slide-64
SLIDE 64

Scaled Version of the Algorithm

37 38 39 40 41 42 43 44 45 10 20 30 40 50 37 38 39 40 41 42 43 44 45 50 100 150 200 250 300 350 400

The algorithm almost instantly reaches a decent value (5% of the

  • ptimal value after 17 iterations), and relatively quickly to a good

value (1% of the optimal value after 83 iterations) (right plot).

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 21 / 24

slide-65
SLIDE 65

Scaled Version of the Algorithm

39 39.2 39.4 39.6 39.8 40 40.2 40.4 40.6 500 1000 1500 2000 2500 3000 3500 4000 39.5 39.55 39.6 39.65 39.7 39.75 39.8 500 1000 1500 2000 2500 3000 3500 4000

High number of iterations: after 498 iterations, the performance remains higher than 99.5% of the optimal and still further increase with the number of iterations.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 21 / 24

slide-66
SLIDE 66

Scaled Version of the Algorithm

500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06 500 1000 1500 2000 2500 3000 3500 4000 460000 480000 500000 520000 540000 560000 580000 600000 620000 640000 2000 2500 3000 3500 4000

̺1 ̺2 ̺3

Convergence of ̺i with i = 1..3: no more oscillations occur. The throughput of each application slowly converges to their “optimal” values.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 21 / 24

slide-67
SLIDE 67

Scaled Version of the Algorithm

2e−08 4e−08 6e−08 8e−08 1e−07 1.2e−07 1.4e−07 50 100 150 200 250 300 350 400

λC λE λA λD λB

Prices evolve smoothly. As the number of iterations increase, they converge to their optimal value while remaining positive, meaning that the resources they refer to is neither under utilized nor over- loaded.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 21 / 24

slide-68
SLIDE 68

Conclusion

◮ Not enough time to present related work but this approach is

very inspired by Low’s work [Hang.et.al.03] on flow control in multi-path network.

◮ The setting (BoT applications, grids) is different though and

new problem arise.

◮ The resulting algorithms are different (few sources and many

sinks here).

◮ The convergence issue is mainly due to the fact that the re-

source usage is not homogeneous (each application has its own wk and bk). The previous scaling is effective and easy to im- plement.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 22 / 24

slide-69
SLIDE 69

Future Work

◮ There may be situations where the previous scaling may not be

sufficient though. When the optimal throughput of the appli- cations do not have the same order of magnitude, it may be necessary for each application to have its own step size γ(2)

̺ .

We may need to find auto-scaling for the ̺’s update as well.

◮ The present convergence study is rather limited in term of scal-

  • ability. . .

◮ We target grid or desktop-grid-like platforms. What if the num-

ber of application has the same order of magnitude as the num- ber of participants in the system (like in a peer-to-peer system)? Would the steady-state approach still make sense (completion- based metrics like stretch. . . )?

◮ We rely on steady-state. How does such a system react to high

churn?

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 23 / 24

slide-70
SLIDE 70

Bibliography

Frank Kelly, Aman Maulloo, and David Tan. Rate control in communication networks: shadow prices, pro- portional fairness and stability. Journal of the Operational Research Society, 49:237–252, 1998. Steven Low. A duality model of TCP and queue management algorithms. IEEE/ACM Transactions on Networking, 11(4):525–536, 2003. Corinne Touati, Eitan Altman, and J´ erˆ

  • me Galtier.

Generalized Nash bargaining solution for bandwidth allocation. Computer Networks, 50(17):3242–3263, December 2006. Wei-Hua Wang, Marimuthu Palaniswami, and Steven Low. Optimal flow control and routing in multi-path networks. Performance Evaluation, 52:119–132, 2003.

  • A. Legrand (CNRS-LIG) INRIA-MESCAL

Fair and Distributed Scheduling Simulations: Early Results 24 / 24