Fairness issues in new large scale parallel platforms. Denis - - PowerPoint PPT Presentation

fairness issues in new large scale parallel platforms
SMART_READER_LITE
LIVE PREVIEW

Fairness issues in new large scale parallel platforms. Denis - - PowerPoint PPT Presentation

Fairness issues in new large scale parallel platforms. Fairness issues in new large scale parallel platforms. Denis TRYSTRAM LIG Universit de Grenoble Alpes Inria Institut Universitaire de France july 15, 2015 Fairness issues in new


slide-1
SLIDE 1

Fairness issues in new large scale parallel platforms.

Fairness issues in new large scale parallel platforms.

Denis TRYSTRAM

LIG – Université de Grenoble Alpes – Inria Institut Universitaire de France

july 15, 2015

slide-2
SLIDE 2

Fairness issues in new large scale parallel platforms. Introduction New computing systems

New challenges from e-Science

The scientific community has today the unprecedented ability to combine various computational resources into a powerful distributed system capable of analyzing massive data sets. The main challenge is to allocate efficiently such jobs to the available resources.

Denis Trystram july 15, 2015 2 / 39

slide-3
SLIDE 3

Fairness issues in new large scale parallel platforms. Introduction New computing systems

Example: An e-Science platform in Grenoble

Several labs issued from various communities share their computing resources...

Denis Trystram july 15, 2015 3 / 39

slide-4
SLIDE 4

Fairness issues in new large scale parallel platforms. Introduction New computing systems

Example: An e-Science platform in Grenoble

Several labs issued from various communities share their computing resources...

Denis Trystram july 15, 2015 3 / 39

slide-5
SLIDE 5

Fairness issues in new large scale parallel platforms. Introduction New computing systems

Example: An e-Science platform in Grenoble

Several labs issued from various communities share their computing resources...

Denis Trystram july 15, 2015 3 / 39

slide-6
SLIDE 6

Fairness issues in new large scale parallel platforms. Introduction New computing systems

CiGRI: Each site has its own particular objective

Molecular Chemistry Chemists are interested in obtaining the results of their simulations as fast as possible. Objective: to minimize the maximum completion time Medical analysis by bio-Imaging Doctors are interested in delivering results of medical imaging analysis. Objective: to minimize the average completion time or throughput Ph.D students Tuning an academic program for a delivery in a given deadline. Objective: to minimize the completion time of a part (say 10%) of their jobs

Denis Trystram july 15, 2015 4 / 39

slide-7
SLIDE 7

Fairness issues in new large scale parallel platforms. Introduction New computing systems

CiGRI: Each site has its own particular objective

Molecular Chemistry Chemists are interested in obtaining the results of their simulations as fast as possible. Objective: to minimize the maximum completion time Medical analysis by bio-Imaging Doctors are interested in delivering results of medical imaging analysis. Objective: to minimize the average completion time or throughput Ph.D students Tuning an academic program for a delivery in a given deadline. Objective: to minimize the completion time of a part (say 10%) of their jobs

Denis Trystram july 15, 2015 4 / 39

slide-8
SLIDE 8

Fairness issues in new large scale parallel platforms. Introduction New computing systems

CiGRI: Each site has its own particular objective

Molecular Chemistry Chemists are interested in obtaining the results of their simulations as fast as possible. Objective: to minimize the maximum completion time Medical analysis by bio-Imaging Doctors are interested in delivering results of medical imaging analysis. Objective: to minimize the average completion time or throughput Ph.D students Tuning an academic program for a delivery in a given deadline. Objective: to minimize the completion time of a part (say 10%) of their jobs

Denis Trystram july 15, 2015 4 / 39

slide-9
SLIDE 9

Fairness issues in new large scale parallel platforms. Introduction New computing systems

CiGRI: Each site has its own particular objective

Molecular Chemistry Chemists are interested in obtaining the results of their simulations as fast as possible. Objective: to minimize the maximum completion time Medical analysis by bio-Imaging Doctors are interested in delivering results of medical imaging analysis. Objective: to minimize the average completion time or throughput Ph.D students Tuning an academic program for a delivery in a given deadline. Objective: to minimize the completion time of a part (say 10%) of their jobs

Denis Trystram july 15, 2015 4 / 39

slide-10
SLIDE 10

Fairness issues in new large scale parallel platforms. Introduction New computing systems

Another context: large scale HPC platforms

Sometimes various communities (users) share the same computing parallel platform. Multi-user scheduling Jobs are submitted by campaigns by multiple users who are competing against each others for the available computing resources.

Denis Trystram july 15, 2015 5 / 39

slide-11
SLIDE 11

Fairness issues in new large scale parallel platforms. Introduction New computing systems Denis Trystram july 15, 2015 6 / 39

slide-12
SLIDE 12

Fairness issues in new large scale parallel platforms. Introduction New computing systems

Motivation

Most available HPC platforms are hierarchical clusters.

cores node rack

To present several important problems involving cooperation. To look at some algorithmic issues.

Denis Trystram july 15, 2015 7 / 39

slide-13
SLIDE 13

Fairness issues in new large scale parallel platforms. Introduction New computing systems

Objective of this talk

To investigate several facets of the rules that govern how different participants engage in cooperation. We will show how to use scheduling algorithms to ensure efficient use of resources when cooperation takes place in several situations including: Classical systems without any local cooperation (pure centralized control) Forced cooperation between organizations that cannot be completely trusted Fairness among users

Denis Trystram july 15, 2015 8 / 39

slide-14
SLIDE 14

Fairness issues in new large scale parallel platforms. Introduction Classical results

Main milestones

Key parameters: Jobs: sequential workflows, parallel (rigid, moldable, malleable), divisible loads Resources: identical, uniform hierarchical, heterogeneous Objective: minimize max of Ci (called makespan), mean flow time (ΣCi), weighted versions, flow, stretch, ...

  • ff-line or on-line

Ci denotes the completion time of job i.

Denis Trystram july 15, 2015 9 / 39

slide-15
SLIDE 15

Fairness issues in new large scale parallel platforms. Introduction Classical results

The simplest case

Jobs: sequential workflows, parallel (rigid, moldable, malleable), divisible loads Resources: identical , uniform hierarchical, heterogeneous Objective: minimize max of Ci (makespan), mean flow time (ΣCi). Schedule n independent jobs on m identical processors, aiming at minimizing the maximum completion time Cmax.

Denis Trystram july 15, 2015 10 / 39

slide-16
SLIDE 16

Fairness issues in new large scale parallel platforms. Introduction Classical results

A magical recipe: list scheduling

Principle: List algorithms are based on a list of ready jobs [Graham in 69]. As soon as there are available resources (processors), we allocate ready jobs. This algorithm has a constant approximation guarantee of 2 in the worst case. Remarks: List is a low cost algorithm (linear in the number of jobs). It is asymptotically optimal for a large number of jobs It works for both off-line and on-line settings.

Denis Trystram july 15, 2015 11 / 39

slide-17
SLIDE 17

Fairness issues in new large scale parallel platforms. Introduction Classical results

What about parallel jobs?

Jobs: sequential workflow, parallel rigid or malleable, divisible loads Resources: identical , uniform hierarchical, heterogeneous Objective: Again, minimize the makespan, mean flow time (ΣCi). (multiple) Strip packing problems.

Denis Trystram july 15, 2015 12 / 39

slide-18
SLIDE 18

Fairness issues in new large scale parallel platforms. Introduction Classical results

Rigid jobs

Rigid jobs correspond to parallel applications (where the number of processors is fixed like MPI programs).

Denis Trystram july 15, 2015 13 / 39

slide-19
SLIDE 19

Fairness issues in new large scale parallel platforms. Introduction Classical results

Algorithms for one strip

Existing results (upper bounds) FCFS: arbitrarly bad List Scheduling is still a (2 − 1

m)-approximation for

non-continuous case only! Introduced by Graham-Garey in 1975. Steinberg or Schiermeyer: fast 2-approximation. Jansen: very costly ( 3

2 + ǫ)-approximation.

Denis Trystram july 15, 2015 14 / 39

slide-20
SLIDE 20

Fairness issues in new large scale parallel platforms. Introduction Classical results

Extension for multiple strips

The problem is completely solved now. More sophisticated analysis, but the main point is that the bound is 2 instead of 3

2.

Denis Trystram july 15, 2015 15 / 39

slide-21
SLIDE 21

Fairness issues in new large scale parallel platforms. Introduction Classical results

Flavor of a centralized efficient algorithm.

Use a decomposition of the input (High jobs LH, long and extra long jobs (L and XL) and the rest) and design algorithm which respects the structure of an optimal schedule:

Denis Trystram july 15, 2015 16 / 39

slide-22
SLIDE 22

Fairness issues in new large scale parallel platforms. Introduction Classical results

Flavor of a centralized efficient algorithm.

Use a decomposition of the input (High jobs LH, long and extra long jobs (L and XL) and the rest) and design algorithm which respects the structure of an optimal schedule: Topological properties P(LH) Nω Only one “high” at any time instant on a cluster Q(LXL

LL) Nm

Only one “long” on any processor S(I ′) Nmω All the jobs fit in the optimal

Denis Trystram july 15, 2015 16 / 39

slide-23
SLIDE 23

Fairness issues in new large scale parallel platforms. Introduction Classical results

We target a 5

2-approximation using a dual approximation scheme.

Denis Trystram july 15, 2015 17 / 39

slide-24
SLIDE 24

Fairness issues in new large scale parallel platforms. Introduction Classical results

Running the algorithm (first steps...)

  • 2w

w 2w w O6 O2 O5 O3 O4 O1

Denis Trystram july 15, 2015 18 / 39

slide-25
SLIDE 25

Fairness issues in new large scale parallel platforms. Introduction Classical results

Running the algorithm (first steps...)

  • 2w

w 2w w LB O6 O2 O5 O3 O4 O1

Denis Trystram july 15, 2015 18 / 39

slide-26
SLIDE 26

Fairness issues in new large scale parallel platforms. Introduction Classical results

Running the algorithm (first steps...)

  • 2w

w 2w w LB LH O6 O2 O5 O3 O4 O1

Denis Trystram july 15, 2015 18 / 39

slide-27
SLIDE 27

Fairness issues in new large scale parallel platforms. Introduction Classical results

Running the algorithm (first steps...)

  • 2w

w 2w w LB LH O6 O2 O5 O3 O4 O1

Denis Trystram july 15, 2015 18 / 39

slide-28
SLIDE 28

Fairness issues in new large scale parallel platforms. Introduction Classical results

Running the algorithm (first steps...)

  • Bin created by

"Create_Padding" 2w w 2w w LB LH O6 O2 O5 O3 O4 O1

Denis Trystram july 15, 2015 18 / 39

slide-29
SLIDE 29

Fairness issues in new large scale parallel platforms. Introduction Classical results

Running the algorithm (first steps...)

  • Bin created by

"Create_Padding" 2w w 2w w LB LXL LH O6 O2 O5 O3 O4 O1

Denis Trystram july 15, 2015 18 / 39

slide-30
SLIDE 30

Fairness issues in new large scale parallel platforms. Introduction Classical results

Multiple organizations: multiple strip packing

Motivation: Share computing power to dampen peaks (centralized control). N clusters of m identical processors each. This number may also be different. The inapproximation bound is 2 (proof by a Gap reduction).

Denis Trystram july 15, 2015 19 / 39

slide-31
SLIDE 31

Fairness issues in new large scale parallel platforms. Introduction Classical results

Multiple organizations: multiple strip packing

Motivation: Share computing power to dampen peaks (centralized control). N clusters of m identical processors each. This number may also be different. The inapproximation bound is 2 (proof by a Gap reduction).

Denis Trystram july 15, 2015 19 / 39

slide-32
SLIDE 32

Fairness issues in new large scale parallel platforms. Multi-organization

Outline

1

Multi-organization

2

Fairness issues and solution

3

Concluding remarks

Denis Trystram july 15, 2015 20 / 39

slide-33
SLIDE 33

Fairness issues in new large scale parallel platforms. Multi-organization

Model of multi-organization scheduling

  • rganizations O(u) have resources (clusters) and some local

jobs {J(u)

i

}

system goal: global makespan Cmax each organization minimizes the makespan of its local jobs C(u)

max = maxi C(u) i

idea: move jobs across clusters to optimize Cmax

Denis Trystram july 15, 2015 21 / 39

slide-34
SLIDE 34

Fairness issues in new large scale parallel platforms. Multi-organization

Multi-objective optimization based on constraints on

  • rganizations’ objectives

an organization can not increase its local makespan C(u)

max by

cooperating with others schedule jobs locally (with makespan C(u)

max(loc))

  • ptimization: min max C(u)

max subject to ∀u : C(u) max C(u) i

(loc)

Denis Trystram july 15, 2015 22 / 39

slide-35
SLIDE 35

Fairness issues in new large scale parallel platforms. Multi-organization

Local constraints lead to a 3/2 lower bound on the global makespan

(a) local scheduling (b) globally-optimal with constraints (c) globally-optimal, no constraints

MOSP is NP-hard in the strong sense.

Denis Trystram july 15, 2015 23 / 39

slide-36
SLIDE 36

Fairness issues in new large scale parallel platforms. Multi-organization

Outline of the scheduling algorithm (MOCCA)

3-approximation of the global makespan; local constraints are not violated [P .-F . Dutot, F. Pascual, K. Rzadca, D. Trystram, IEEE TPDS 2011]

1 schedule jobs locally using highest-first (HF) ordering 2 unschedule jobs that complete after 3LB (LB is lower bound

  • n the global makespan), sort them by HF

3 schedule large (> m/2) jobs backwards from 3LB 4 schedule remaining jobs in the gaps of the schedule

Denis Trystram july 15, 2015 24 / 39

slide-37
SLIDE 37

Fairness issues in new large scale parallel platforms. Multi-organization

Example run: first, we ensure the worst-case performance . . .

(a) local scheduling (b) MOCCA with gaps

Denis Trystram july 15, 2015 25 / 39

slide-38
SLIDE 38

Fairness issues in new large scale parallel platforms. Multi-organization

Example run: . . . then, we collapse the schedule.

(b) MOCCA with gaps (c) MOCCA, collapsed

Denis Trystram july 15, 2015 26 / 39

slide-39
SLIDE 39

Fairness issues in new large scale parallel platforms. Multi-organization

Outline of the scheduling algorithm (MOCCA)

3-approximation of the global makespan; local constraints are not violated

1 schedule jobs locally using highest-first (HF)1 ordering 2 unschedule jobs that complete after 3LB (LB is lower bound

  • n the global makespan), sort them by HF

3 schedule large jobs (> m/2) backwards from 3LB 4 schedule remaining jobs in the gaps of the schedule

1it is the natural extension of LPT... Denis Trystram july 15, 2015 27 / 39

slide-40
SLIDE 40

Fairness issues in new large scale parallel platforms. Multi-organization

Final load balancing improves organizations’ makespans

To improve (almost) everyone, we balance loads in order of increasing organizations’ makespans.

(c) collapsed schedule

Denis Trystram july 15, 2015 28 / 39

slide-41
SLIDE 41

Fairness issues in new large scale parallel platforms. Multi-organization

Final load balancing improves organizations’ makespans

To improve (almost) everyone, we balance loads in order of increasing organizations’ makespans.

(e) collapsed schedule (f) final load balancing

Denis Trystram july 15, 2015 28 / 39

slide-42
SLIDE 42

Fairness issues in new large scale parallel platforms. Multi-organization

Summary: optimize the system goal, respect local goals

Multi-Organization Scheduling Problem (MOSP):

  • rganizations have supercomputers and local jobs

MOCCA does not worsen goals of organizations (local C(u)

max)

MOCCA 3-approximates the global makespan the organizations can not modify the proposed schedule

Denis Trystram july 15, 2015 29 / 39

slide-43
SLIDE 43

Fairness issues in new large scale parallel platforms. Fairness issues and solution

Outline

1

Multi-organization

2

Fairness issues and solution

3

Concluding remarks

Denis Trystram july 15, 2015 30 / 39

slide-44
SLIDE 44

Fairness issues in new large scale parallel platforms. Fairness issues and solution

Links between local versus Global

Strict constraints MOSP’s local constraints (and also constraints like selfishness) are too strict in practice. They strongly limit the freedom of the scheduler to find a good global Cmax. A clear trade-off There is a correlation between the guarantees that we can provide individually for each organization and the global performance of the platform. Question How much can we improve the global Cmax of the entire platform if we allow some controlled degradation of the local performance?

Denis Trystram july 15, 2015 31 / 39

slide-45
SLIDE 45

Fairness issues in new large scale parallel platforms. Fairness issues and solution

Links between local versus Global

Strict constraints MOSP’s local constraints (and also constraints like selfishness) are too strict in practice. They strongly limit the freedom of the scheduler to find a good global Cmax. A clear trade-off There is a correlation between the guarantees that we can provide individually for each organization and the global performance of the platform. Question How much can we improve the global Cmax of the entire platform if we allow some controlled degradation of the local performance?

Denis Trystram july 15, 2015 31 / 39

slide-46
SLIDE 46

Fairness issues in new large scale parallel platforms. Fairness issues and solution

Links between local versus Global

Strict constraints MOSP’s local constraints (and also constraints like selfishness) are too strict in practice. They strongly limit the freedom of the scheduler to find a good global Cmax. A clear trade-off There is a correlation between the guarantees that we can provide individually for each organization and the global performance of the platform. Question How much can we improve the global Cmax of the entire platform if we allow some controlled degradation of the local performance?

Denis Trystram july 15, 2015 31 / 39

slide-47
SLIDE 47

Fairness issues in new large scale parallel platforms. Fairness issues and solution

What is Fairness?

Cmax is probably not the right objective (no meaning for the fairness). Starting by a small (easy) example: two users are submitting their jobs, aiming each at minimizing the makespan of their jobs. Let consider user 1 submits 2 jobs (4,4), same for user 2 who is submitting (3,7). Question: How many possible situations?

Denis Trystram july 15, 2015 32 / 39

slide-48
SLIDE 48

Fairness issues in new large scale parallel platforms. Fairness issues and solution

Pareto Optimality

Starting by a small (easy) example: two users are submitting their jobs, aiming each at minimizing the makespan of their jobs. Let consider user 1 submits 2 jobs (4,4), same for user 2 who is submitting (3,7). Question: How many possible situations?

Denis Trystram july 15, 2015 33 / 39

slide-49
SLIDE 49

Fairness issues in new large scale parallel platforms. Fairness issues and solution

What is the best solution for each user?

User 1 User 2 8 11 18 10 18 14

Denis Trystram july 15, 2015 34 / 39

slide-50
SLIDE 50

Fairness issues in new large scale parallel platforms. Fairness issues and solution

Toward looking at Fairness in Combinatorial Optimization

The stretch (or slowdown factor) of job i is defined as: si = Ci−ri

pi

. Bounded stretch: si =

Ci−ri max(α,pi).

Question: What are the (expected) results for max stretch and (weighted) sum stretch?

Denis Trystram july 15, 2015 35 / 39

slide-51
SLIDE 51

Fairness issues in new large scale parallel platforms. Fairness issues and solution

Toward looking at Fairness in Combinatorial Optimization

The stretch (or slowdown factor) of job i is defined as: si = Ci−ri

pi

. Bounded stretch: si =

Ci−ri max(α,pi).

Question: What are the (expected) results for max stretch and (weighted) sum stretch? Adaptation to the campaign scheduling problem.

Denis Trystram july 15, 2015 35 / 39

slide-52
SLIDE 52

Fairness issues in new large scale parallel platforms. Fairness issues and solution Ostritch algorithm

Classical fairsharing

M5 M4 M3 M2 M1

time

M6

Denis Trystram july 15, 2015 36 / 39

slide-53
SLIDE 53

Fairness issues in new large scale parallel platforms. Fairness issues and solution Ostritch algorithm

Efficiency of FCFS

m

x 1

time

Denis Trystram july 15, 2015 37 / 39

slide-54
SLIDE 54

Fairness issues in new large scale parallel platforms. Concluding remarks

Outline

1

Multi-organization

2

Fairness issues and solution

3

Concluding remarks

Denis Trystram july 15, 2015 38 / 39

slide-55
SLIDE 55

Fairness issues in new large scale parallel platforms. Concluding remarks

Centralized vs distributed

Most efficient algorithms are centralized: they require global knowledge and a single executing entity. Scheduling (allocation) might become a bottleneck when systems are scaled to millions of cores. The answer: distributed multiobjective scheduling algorithms! Add fairness issues

Denis Trystram july 15, 2015 39 / 39

slide-56
SLIDE 56

Fairness issues in new large scale parallel platforms. Concluding remarks

Take home message

Cooperation matters! Depending on the system, cooperation can be modelled using various techniques: optimization, multi-objective optimization, game theory (selfishness, fairness). We demonstrated how scheduling algorithms can be tuned to collaborative systems.

Denis Trystram july 15, 2015 40 / 39