Malleable task-graph scheduling with a practical speed-up model - - PowerPoint PPT Presentation

malleable task graph scheduling with a practical speed up
SMART_READER_LITE
LIVE PREVIEW

Malleable task-graph scheduling with a practical speed-up model - - PowerPoint PPT Presentation

Malleable task-graph scheduling with a practical speed-up model Loris Marchal 1 Bertrand Simon 1 Oliver Sinnen 2 Frdric Vivien 1 1: CNRS, INRIA, ENS Lyon and Univ. Lyon, FR. 2: Univ. Auckland, NZ. New Challenges in Scheduling Theory


slide-1
SLIDE 1

Malleable task-graph scheduling with a practical speed-up model

Loris Marchal1 Bertrand Simon1 Oliver Sinnen2 Frédéric Vivien1

1: CNRS, INRIA, ENS Lyon and Univ. Lyon, FR. 2: Univ. Auckland, NZ.

New Challenges in Scheduling Theory — Aussois March 2016

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 1 / 22

slide-2
SLIDE 2

Motivation

Context:

Optimize the time performance of multifrontal sparse solvers

(e.g., MUMPS or QR-MUMPS)

Computations well described by a tree of tasks Generalization to Series-Parallel graphs Purpose: find a schedule achieving the lowest makespan

T T Objectives:

Provide theoretical guarantees on widely used scheduling algorithms Design ones with smaller makespan

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 2 / 22

slide-3
SLIDE 3

Motivation

Context:

Optimize the time performance of multifrontal sparse solvers

(e.g., MUMPS or QR-MUMPS)

Computations well described by a tree of tasks Generalization to Series-Parallel graphs Purpose: find a schedule achieving the lowest makespan

G1 G2 G1;G2 Objectives:

Provide theoretical guarantees on widely used scheduling algorithms Design ones with smaller makespan

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 2 / 22

slide-4
SLIDE 4

Motivation

Context:

Optimize the time performance of multifrontal sparse solvers

(e.g., MUMPS or QR-MUMPS)

Computations well described by a tree of tasks Generalization to Series-Parallel graphs Purpose: find a schedule achieving the lowest makespan

G1 G2 G1 ∥ G2 Objectives:

Provide theoretical guarantees on widely used scheduling algorithms Design ones with smaller makespan

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 2 / 22

slide-5
SLIDE 5

Motivation

Context:

Optimize the time performance of multifrontal sparse solvers

(e.g., MUMPS or QR-MUMPS)

Computations well described by a tree of tasks Generalization to Series-Parallel graphs Purpose: find a schedule achieving the lowest makespan

Objectives:

Provide theoretical guarantees on widely used scheduling algorithms Design ones with smaller makespan

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 2 / 22

slide-6
SLIDE 6

Motivation

Context:

Optimize the time performance of multifrontal sparse solvers

(e.g., MUMPS or QR-MUMPS)

Computations well described by a tree of tasks Generalization to Series-Parallel graphs Purpose: find a schedule achieving the lowest makespan

1 2 4 6 5 3

Objectives:

Provide theoretical guarantees on widely used scheduling algorithms Design ones with smaller makespan

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 2 / 22

slide-7
SLIDE 7

Application modeling

Coarse-grain picture: tree of tasks (or SP task graph)

Each task: partial factorization, graph of smaller sub-tasks

  • Expand all tasks and schedule resulting graph ?
  • Scheduling trees simpler than general graphs (forget sub-tasks)

Behavior of coarse-grain tasks

parallel and malleable Speed-up model −

→ trade-off between: Accuracy : fits well the data Tractability : amenable to perf. analysis, guaranteed algorithms

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 3 / 22

slide-8
SLIDE 8

Application modeling

Coarse-grain picture: tree of tasks (or SP task graph)

Each task: partial factorization, graph of smaller sub-tasks

POTRF-1 TRSM-4-1 TRSM-2-1 TRSM-3-1 GEMM-4-2-1 GEMM-4-2-0 GEMM-4-3-0 GEMM-4-3-1 GEMM-4-1-0 TRSM-1-0 GEMM-2-1-0 GEMM-3-1-0 GEMM-3-2-1 TRSM-4-0 POTRF-0 TRSM-3-0 TRSM-2-0 GEMM-3-2-0 SYRK-1-1-0 SYRK-4-4-0 SYRK-4-4-1 SYRK-2-2-0 SYRK-2-2-1 SYRK-3-3-1 SYRK-3-3-0

  • Expand all tasks and schedule resulting graph ?
  • Scheduling trees simpler than general graphs (forget sub-tasks)

Behavior of coarse-grain tasks

parallel and malleable Speed-up model −

→ trade-off between: Accuracy : fits well the data Tractability : amenable to perf. analysis, guaranteed algorithms

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 3 / 22

slide-9
SLIDE 9

Application modeling

Coarse-grain picture: tree of tasks (or SP task graph)

Each task: partial factorization, graph of smaller sub-tasks

POTRF-1 TRSM-4-1 TRSM-2-1 TRSM-3-1 GEMM-4-2-1 GEMM-4-2-0 GEMM-4-3-0 GEMM-4-3-1 GEMM-4-1-0 TRSM-1-0 GEMM-2-1-0 GEMM-3-1-0 GEMM-3-2-1 TRSM-4-0 POTRF-0 TRSM-3-0 TRSM-2-0 GEMM-3-2-0 SYRK-1-1-0 SYRK-4-4-0 SYRK-4-4-1 SYRK-2-2-0 SYRK-2-2-1 SYRK-3-3-1 SYRK-3-3-0

  • Expand all tasks and schedule resulting graph ?
  • Scheduling trees simpler than general graphs (forget sub-tasks)

Behavior of coarse-grain tasks

parallel and malleable Speed-up model −

→ trade-off between: Accuracy : fits well the data Tractability : amenable to perf. analysis, guaranteed algorithms

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 3 / 22

slide-10
SLIDE 10

General speed-up models

Literature: studies with few assumptions

speed-up(p) = time(1 proc.) time(p proc.)

  • work(p) = p ·time(p proc.)

Non-increasing speed-up and work

Independent tasks: theoretical FPTAS and practical

2-approximations [Jansen 2004, Fan et al. 2012]

SP-graphs: ≈ 2.6-approximation [Lepère et al. 2001]

with concave speed-up: (2+ε)-approximation of unspecified complexity [Makarychev et al. 2014]

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 4 / 22

slide-11
SLIDE 11

Previous work (Europar 2015, with A. Guermouche)

Prasanna & Musicus model [PM 1996]:

speed−up(p) = pα

1 1 speed-up processors

α = 1

perfect parallelism 0 < α < 1

α = 0

no parallelism

Conclusions:

Average Accuracy Rational numbers of processors Optimal algorithm for SP-graphs No guarantees for

distributed platforms

Task finish times complex

to compute

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 5 / 22

slide-12
SLIDE 12

Today: simpler model

Simple and reasonable model of a parallel malleable task Ti

Perfect parallelism up to a threshold δi: time = wi /min(p, δi) Rational allocation for free (McNaughton’s wrap-around rule)

processors speed-up s l

  • p

e = 1

δi

Related studies

2-approximation [Balmin et al. 13] that we will discuss [Kell et al. 2015] : time = wi

p +(p −1)c;

2-approximation for p = 3, open for p ≥ 4

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 6 / 22

slide-13
SLIDE 13

Today: simpler model

Simple and reasonable model of a parallel malleable task Ti

Perfect parallelism up to a threshold δi: time = wi /min(p, δi) Rational allocation for free (McNaughton’s wrap-around rule)

processors speed-up s l

  • p

e = 1

δi

Related studies

2-approximation [Balmin et al. 13] that we will discuss [Kell et al. 2015] : time = wi

p +(p −1)c;

2-approximation for p = 3, open for p ≥ 4

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 6 / 22

slide-14
SLIDE 14

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Outline

1

Problem complexity

2

Analysis of PROPORTIONALMAPPING [Pothen et al. 1993]

3

Design of a greedy strategy

4

Experimental comparison

5

Conclusion

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 7 / 22

slide-15
SLIDE 15

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Overview of the problem

Given a SP-graph, p processors: compute the optimal makespan

Problem known as P|sp −graph,any,spdp-lin,δi|Cmax Malleability + perfect parallelism

= ⇒ P

  • . . .

+ thresholds =

⇒ NP-complete

Existing proof in [Drozdowski and Kubiak 1999] : arguably complex

Contribution

New NP-completeness proof

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 8 / 22

slide-16
SLIDE 16

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Overview of the problem

Given a SP-graph, p processors: compute the optimal makespan

Problem known as P|sp −graph,any,spdp-lin,δi|Cmax Malleability + perfect parallelism

= ⇒ P

  • . . .

+ thresholds =

⇒ NP-complete

Existing proof in [Drozdowski and Kubiak 1999] : arguably complex

Contribution

New NP-completeness proof

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 8 / 22

slide-17
SLIDE 17

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Overview of the problem

Given a SP-graph, p processors: compute the optimal makespan

Problem known as P|sp −graph,any,spdp-lin,δi|Cmax Malleability + perfect parallelism

= ⇒ P

  • . . .

+ thresholds =

⇒ NP-complete

Existing proof in [Drozdowski and Kubiak 1999] : arguably complex

Contribution

New NP-completeness proof

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 8 / 22

slide-18
SLIDE 18

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Overview of the problem

Given a SP-graph, p processors: compute the optimal makespan

Problem known as P|sp −graph,any,spdp-lin,δi|Cmax Malleability + perfect parallelism

= ⇒ P

  • . . .

+ thresholds =

⇒ NP-complete

Existing proof in [Drozdowski and Kubiak 1999] : arguably complex

Contribution

New NP-completeness proof

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 8 / 22

slide-19
SLIDE 19

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Widget for the proof

Two 3-task chains time processors area = wi

δi ≈ p

Each task:

δi = wi

  • min. computing time of 1

Simultaneous start: Cmax ≈ 5 p Time-shift: Cmax ≈ 4 p

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 9 / 22

slide-20
SLIDE 20

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Widget for the proof

Two 3-task chains time processors area = wi

δi ≈ p

Each task:

δi = wi

  • min. computing time of 1

Simultaneous start: Cmax ≈ 5 p Time-shift: Cmax ≈ 4 p

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 9 / 22

slide-21
SLIDE 21

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Widget for the proof

Two 3-task chains time processors area = wi

δi ≈ p

Each task:

δi = wi

  • min. computing time of 1

Simultaneous start: Cmax ≈ 5 p Time-shift: Cmax ≈ 4 p

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 9 / 22

slide-22
SLIDE 22

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Proof sketch

Reduction from 3-SAT (ex: x1 OR x2 OR x2)

Idea: each variable ⇒ a modified widget (a chain for both xi, xi) extremities length ⇒ variable — middle ⇒ clause The one starting later: TRUE Gray chain: profile allowing only correct behaviors time processor usage L0 Lx2 Lx2 Lx1 Lx1 t1 = 0 t2 M −t2 M −t1

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 10 / 22

slide-23
SLIDE 23

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Proof sketch

Reduction from 3-SAT (ex: x1 OR x2 OR x2)

Idea: each variable ⇒ a modified widget (a chain for both xi, xi) extremities length ⇒ variable — middle ⇒ clause The one starting later: TRUE Gray chain: profile allowing only correct behaviors time processor usage L0 Lx2 Lx2 Lx1 Lx1 t1 = 0 t2 M −t2 M −t1

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 10 / 22

slide-24
SLIDE 24

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Proof sketch

Reduction from 3-SAT (ex: x1 OR x2 OR x2)

Idea: each variable ⇒ a modified widget (a chain for both xi, xi) extremities length ⇒ variable — middle ⇒ clause The one starting later: TRUE Gray chain: profile allowing only correct behaviors time processor usage L0 Lx2 Lx2 Lx1 Lx1 t1 = 0 t2 M −t2 M −t1

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 10 / 22

slide-25
SLIDE 25

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Proof sketch

Reduction from 3-SAT (ex: x1 OR x2 OR x2)

Idea: each variable ⇒ a modified widget (a chain for both xi, xi) extremities length ⇒ variable — middle ⇒ clause The one starting later: TRUE Gray chain: profile allowing only correct behaviors time processor usage L0 Lx2 Lx2 Lx1 Lx1 t1 = 0 t2 M −t2 M −t1

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 10 / 22

slide-26
SLIDE 26

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Outline

1

Problem complexity

2

Analysis of PROPORTIONALMAPPING [Pothen et al. 1993]

3

Design of a greedy strategy

4

Experimental comparison

5

Conclusion

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 11 / 22

slide-27
SLIDE 27

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

PROPORTIONALMAPPING [Pothen et al. 1993]

Description

Simple allocation for trees or SP-graphs On G1 ∥ G2: constant share to Gi, proportional to its weight Wi

Algorithm 1: PROPORTIONALMAPPING (graph G , q procs)

1 Define the share allocated to sub-graphs of G:

if G = G1;G2;...Gk then

∀i, pi ← q

if G = G1 ∥ G2 ∥ ...Gk then

∀i, pi ← qWi/

j Wj

2 Call PROPORTIONALMAPPING (Gi, pi) for each sub-graph Gi

Then schedule tasks on pi processors ASAP

Notes

Produces a moldable schedule (fixed allocation over time) Unaware of task thresholds

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 12 / 22

slide-28
SLIDE 28

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Analysis of PROPORTIONALMAPPING schedules

Theorem

PROPORTIONALMAPPING is a 2-approximation of the optimal makespan.

Proof.

Consider makespan without thresholds: M∞ ≤ Mopt There is an idle-free path Φ from the entry task to the end Split the tasks of Φ in two sets:

A = tasks limited by their thresholds: len(A) ≤ critical path ≤ Mopt B = tasks limited by the allocation: len(B) ≤ M∞ ≤ Mopt

Finally, M = len(Φ) = len(A)+len(B) ≤ 2Mopt

Note

Approximation ratio asymptotically tight

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 13 / 22

slide-29
SLIDE 29

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Analysis of PROPORTIONALMAPPING schedules

Theorem

PROPORTIONALMAPPING is a 2-approximation of the optimal makespan.

Proof.

Consider makespan without thresholds: M∞ ≤ Mopt There is an idle-free path Φ from the entry task to the end Split the tasks of Φ in two sets:

A = tasks limited by their thresholds: len(A) ≤ critical path ≤ Mopt B = tasks limited by the allocation: len(B) ≤ M∞ ≤ Mopt

Finally, M = len(Φ) = len(A)+len(B) ≤ 2Mopt

Note

Approximation ratio asymptotically tight

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 13 / 22

slide-30
SLIDE 30

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Analysis of PROPORTIONALMAPPING schedules

Theorem

PROPORTIONALMAPPING is a 2-approximation of the optimal makespan.

Proof.

Consider makespan without thresholds: M∞ ≤ Mopt There is an idle-free path Φ from the entry task to the end Split the tasks of Φ in two sets:

A = tasks limited by their thresholds: len(A) ≤ critical path ≤ Mopt B = tasks limited by the allocation: len(B) ≤ M∞ ≤ Mopt

Finally, M = len(Φ) = len(A)+len(B) ≤ 2Mopt

Note

Approximation ratio asymptotically tight

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 13 / 22

slide-31
SLIDE 31

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Analysis of PROPORTIONALMAPPING schedules

Theorem

PROPORTIONALMAPPING is a 2-approximation of the optimal makespan.

Proof.

Consider makespan without thresholds: M∞ ≤ Mopt There is an idle-free path Φ from the entry task to the end Split the tasks of Φ in two sets:

A = tasks limited by their thresholds: len(A) ≤ critical path ≤ Mopt B = tasks limited by the allocation: len(B) ≤ M∞ ≤ Mopt

Finally, M = len(Φ) = len(A)+len(B) ≤ 2Mopt

Note

Approximation ratio asymptotically tight

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 13 / 22

slide-32
SLIDE 32

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Outline

1

Problem complexity

2

Analysis of PROPORTIONALMAPPING [Pothen et al. 1993]

3

Design of a greedy strategy

4

Experimental comparison

5

Conclusion

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 14 / 22

slide-33
SLIDE 33

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Design of a greedy strategy: GREEDY-FILLING

Algorithm

Assign priorities to tasks (usually by bottom-level) Consider free tasks by decreasing priority Greedily insert each task in the current schedule:

Compute earliest starting time Pour task into the available processor space, respecting thresholds

Illustration initial profile:

time p b u s y

task insertion:

time p b u s y

final profile:

time p b u s y

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 15 / 22

slide-34
SLIDE 34

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Design of a greedy strategy: GREEDY-FILLING

Algorithm

Assign priorities to tasks (usually by bottom-level) Consider free tasks by decreasing priority Greedily insert each task in the current schedule:

Compute earliest starting time Pour task into the available processor space, respecting thresholds

Illustration initial profile:

time p b u s y

task insertion:

time p b u s y

final profile:

time p b u s y

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 15 / 22

slide-35
SLIDE 35

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Design of a greedy strategy: GREEDY-FILLING

Algorithm

Assign priorities to tasks (usually by bottom-level) Consider free tasks by decreasing priority Greedily insert each task in the current schedule:

Compute earliest starting time Pour task into the available processor space, respecting thresholds

Illustration initial profile:

time p b u s y

task insertion:

time p b u s y

final profile:

time p b u s y

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 15 / 22

slide-36
SLIDE 36

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Design of a greedy strategy: GREEDY-FILLING

Algorithm

Assign priorities to tasks (usually by bottom-level) Consider free tasks by decreasing priority Greedily insert each task in the current schedule:

Compute earliest starting time Pour task into the available processor space, respecting thresholds

Illustration initial profile:

time p b u s y

task insertion:

time p b u s y ws

δs

final profile:

time p b u s y

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 15 / 22

slide-37
SLIDE 37

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Analysis of GREEDY-FILLING schedules

Theorem

GREEDY-FILLING is a 2− δmin

p

approximation to the optimal makespan. Proof. Transposition of the classical (2− 1

p )-approximation result by Graham

Construct a path Φ in G: all idle times happen during tasks of Φ Bound Used and Idle areas (Used +Idle = pM)

At least δmin processors busy during Φ

Note

Theorem applies to every strategy without deliberate idle time

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 16 / 22

slide-38
SLIDE 38

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Outline

1

Problem complexity

2

Analysis of PROPORTIONALMAPPING [Pothen et al. 1993]

3

Design of a greedy strategy

4

Experimental comparison

5

Conclusion

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 17 / 22

slide-39
SLIDE 39

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Simulations

Third algorithm to compare with: FLOWFLEX

2-approximation designed in [Balmin et al. 13] to schedule

“Malleable Flows of MapReduce Jobs”

Solve the problem on an infinite number of processors Downscale the allocation on intervals when it is needed

Three datasets

SYNTH-PROP: Synthetic SP-graphs with δi = α×wi, SYNTH-RAND: Same but with a factor log-uniform in [0.1α,10α], TREES: Assembly trees of sparse matrices, δi = α×wi.

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 18 / 22

slide-40
SLIDE 40

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Results on SYNTH-PROP

1.0 1.1 1.2 1.3 1.4 0.0 2.5 5.0 7.5 10.0

Normalized number of processors Normalized makespan

Algorithm GREEDY-FILLING PROPMAPPING FLOWFLEX

Y: Makespan normalized by the lower bound LB = max(CP, W

p )

X: Number of processors normalized by:

parallelism = makespan with all δi = 1 and p = ∞ makespan with all δi = 1 and p = 1

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 19 / 22

slide-41
SLIDE 41

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Results on SYNTH-PROP

1.0 1.1 1.2 1.3 1.4 0.0 2.5 5.0 7.5 10.0

Normalized number of processors Normalized makespan

Algorithm GREEDY-FILLING PROPMAPPING FLOWFLEX

Plot: mean + ribbon with 90% of the results Small/large number of processors: similar results (simpler problem) GREEDY-FILLING:

≈ 25% of gain < 20% from the lower bound

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 19 / 22

slide-42
SLIDE 42

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Results on SYNTH-RAND

1.0 1.2 1.4 0.0 2.5 5.0 7.5 10.0

Normalized number of processors Normalized makespan

Algorithm GREEDY-FILLING PROPMAPPING FLOWFLEX

Similar results with random thresholds Larger gaps between GREEDY-FILLING and the others Maximum gap happens for smaller platforms

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 20 / 22

slide-43
SLIDE 43

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Results on TREES

1.0 1.2 1.4 1.6 4 32 256 2048

Normalized number of processors Normalized makespan

Algorithm GREEDY-FILLING PROPMAPPING FLOWFLEX

Shape of the results depends a lot on the matrix Here: one matrix with different ordering and amalgamation

parameters

GREEDY-FILLING (almost always) better than both others Smaller maximum gain (around 15%)

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 21 / 22

slide-44
SLIDE 44

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Outline

1

Problem complexity

2

Analysis of PROPORTIONALMAPPING [Pothen et al. 1993]

3

Design of a greedy strategy

4

Experimental comparison

5

Conclusion

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 22 / 22

slide-45
SLIDE 45

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Conclusion

On the algorithms

PROPMAPPING: does not take advantage of malleability FLOWFLEX: produces gaps that cannot be filled afterwards GREEDY-FILLING: simple, greedy, close to the lower bound

On the model

Simplest model to account for limited parallelism Still NP-complete Possible to derive theoretical guarantees (2-approx. algorithms)

Perspectives

Conduct experiments to assess the model and study thresholds Focus on moldable tasks – study the gain of malleability

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 23 / 22

slide-46
SLIDE 46

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Conclusion

On the algorithms

PROPMAPPING: does not take advantage of malleability FLOWFLEX: produces gaps that cannot be filled afterwards GREEDY-FILLING: simple, greedy, close to the lower bound

On the model

Simplest model to account for limited parallelism Still NP-complete Possible to derive theoretical guarantees (2-approx. algorithms)

Perspectives

Conduct experiments to assess the model and study thresholds Focus on moldable tasks – study the gain of malleability

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 23 / 22

slide-47
SLIDE 47

Problem complexity Proportional Mapping Greedy strategy Experimental comparison

Conclusion

On the algorithms

PROPMAPPING: does not take advantage of malleability FLOWFLEX: produces gaps that cannot be filled afterwards GREEDY-FILLING: simple, greedy, close to the lower bound

On the model

Simplest model to account for limited parallelism Still NP-complete Possible to derive theoretical guarantees (2-approx. algorithms)

Perspectives

Conduct experiments to assess the model and study thresholds Focus on moldable tasks – study the gain of malleability

  • L. Marchal, B. Simon, O. Sinnen, F. Vivien

Malleable task-graph scheduling with a practical speed-up model 23 / 22