malleable task graph scheduling with a practical speed up
play

Malleable task-graph scheduling with a practical speed-up model - PowerPoint PPT Presentation

Malleable task-graph scheduling with a practical speed-up model Loris Marchal 1 Bertrand Simon 1 Oliver Sinnen 2 Frdric Vivien 1 1: CNRS, INRIA, ENS Lyon and Univ. Lyon, FR. 2: Univ. Auckland, NZ. New Challenges in Scheduling Theory


  1. Malleable task-graph scheduling with a practical speed-up model Loris Marchal 1 Bertrand Simon 1 Oliver Sinnen 2 Frédéric Vivien 1 1: CNRS, INRIA, ENS Lyon and Univ. Lyon, FR. 2: Univ. Auckland, NZ. New Challenges in Scheduling Theory — Aussois March 2016 L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 1 / 22

  2. Motivation Context: � Optimize the time performance of multifrontal sparse solvers (e.g., MUMPS or QR-MUMPS) � Computations well described by a tree of tasks � Generalization to Series-Parallel graphs � Purpose: find a schedule achieving the lowest makespan T T Objectives: � Provide theoretical guarantees on widely used scheduling algorithms � Design ones with smaller makespan L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 2 / 22

  3. Motivation Context: � Optimize the time performance of multifrontal sparse solvers (e.g., MUMPS or QR-MUMPS) � Computations well described by a tree of tasks � Generalization to Series-Parallel graphs � Purpose: find a schedule achieving the lowest makespan G 1 G 2 G 1 ; G 2 Objectives: � Provide theoretical guarantees on widely used scheduling algorithms � Design ones with smaller makespan L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 2 / 22

  4. Motivation Context: � Optimize the time performance of multifrontal sparse solvers (e.g., MUMPS or QR-MUMPS) � Computations well described by a tree of tasks � Generalization to Series-Parallel graphs � Purpose: find a schedule achieving the lowest makespan G 1 G 2 G 1 ∥ G 2 Objectives: � Provide theoretical guarantees on widely used scheduling algorithms � Design ones with smaller makespan L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 2 / 22

  5. Motivation Context: � Optimize the time performance of multifrontal sparse solvers (e.g., MUMPS or QR-MUMPS) � Computations well described by a tree of tasks � Generalization to Series-Parallel graphs � Purpose: find a schedule achieving the lowest makespan Objectives: � Provide theoretical guarantees on widely used scheduling algorithms � Design ones with smaller makespan L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 2 / 22

  6. Motivation Context: � Optimize the time performance of multifrontal sparse solvers (e.g., MUMPS or QR-MUMPS) � Computations well described by a tree of tasks � Generalization to Series-Parallel graphs � Purpose: find a schedule achieving the lowest makespan 1 2 4 5 6 3 Objectives: � Provide theoretical guarantees on widely used scheduling algorithms � Design ones with smaller makespan L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 2 / 22

  7. Application modeling Coarse-grain picture: tree of tasks (or SP task graph) � Each task: partial factorization, graph of smaller sub-tasks Expand all tasks and schedule resulting graph ? � Scheduling trees simpler than general graphs (forget sub-tasks) � Behavior of coarse-grain tasks � parallel and malleable � Speed-up model − → trade-off between: Accuracy : fits well the data Tractability : amenable to perf. analysis, guaranteed algorithms L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 3 / 22

  8. Application modeling Coarse-grain picture: tree of tasks (or SP task graph) � Each task: partial factorization, graph of smaller sub-tasks POTRF-0 TRSM-1-0 TRSM-4-0 SYRK-1-1-0 TRSM-2-0 TRSM-3-0 GEMM-4-1-0 GEMM-4-2-0 GEMM-4-3-0 POTRF-1 GEMM-2-1-0 GEMM-3-2-0 GEMM-3-1-0 SYRK-4-4-0 TRSM-4-1 TRSM-2-1 SYRK-2-2-0 TRSM-3-1 SYRK-3-3-0 SYRK-4-4-1 GEMM-4-2-1 GEMM-4-3-1 SYRK-2-2-1 GEMM-3-2-1 SYRK-3-3-1 Expand all tasks and schedule resulting graph ? � Scheduling trees simpler than general graphs (forget sub-tasks) � Behavior of coarse-grain tasks � parallel and malleable � Speed-up model − → trade-off between: Accuracy : fits well the data Tractability : amenable to perf. analysis, guaranteed algorithms L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 3 / 22

  9. Application modeling Coarse-grain picture: tree of tasks (or SP task graph) � Each task: partial factorization, graph of smaller sub-tasks POTRF-0 TRSM-1-0 TRSM-4-0 SYRK-1-1-0 TRSM-2-0 TRSM-3-0 GEMM-4-1-0 GEMM-4-2-0 GEMM-4-3-0 POTRF-1 GEMM-2-1-0 GEMM-3-2-0 GEMM-3-1-0 SYRK-4-4-0 TRSM-4-1 TRSM-2-1 SYRK-2-2-0 TRSM-3-1 SYRK-3-3-0 SYRK-4-4-1 GEMM-4-2-1 GEMM-4-3-1 SYRK-2-2-1 GEMM-3-2-1 SYRK-3-3-1 Expand all tasks and schedule resulting graph ? � Scheduling trees simpler than general graphs (forget sub-tasks) � Behavior of coarse-grain tasks � parallel and malleable � Speed-up model − → trade-off between: Accuracy : fits well the data Tractability : amenable to perf. analysis, guaranteed algorithms L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 3 / 22

  10. General speed-up models Literature: studies with few assumptions speed-up ( p ) = time(1 proc.) � work ( p ) = p · time ( p proc. ) � time(p proc.) � Non-increasing speed-up and work � Independent tasks: theoretical FPTAS and practical 2-approximations [Jansen 2004, Fan et al. 2012] � SP-graphs: ≈ 2 . 6-approximation [Lepère et al. 2001] with concave speed-up: ( 2 + ε ) -approximation of unspecified complexity [Makarychev et al. 2014] L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 4 / 22

  11. Previous work (Europar 2015, with A. Guermouche) speed − up ( p ) = p α Prasanna & Musicus model [PM 1996]: speed-up α = 1 perfect parallelism 0 < α < 1 1 α = 0 no parallelism processors 1 Conclusions: � Average Accuracy � No guarantees for distributed platforms � Rational numbers of processors � Task finish times complex � Optimal algorithm for SP-graphs to compute L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 5 / 22

  12. Today: simpler model Simple and reasonable model of a parallel malleable task T i � Perfect parallelism up to a threshold δ i : time = w i / min ( p , δ i ) � Rational allocation for free (McNaughton’s wrap-around rule) speed-up 1 e = p o l s processors δ i Related studies � 2-approximation [Balmin et al. 13] that we will discuss � [Kell et al. 2015] : time = w i p + ( p − 1 ) c ; 2-approximation for p = 3, open for p ≥ 4 L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 6 / 22

  13. Today: simpler model Simple and reasonable model of a parallel malleable task T i � Perfect parallelism up to a threshold δ i : time = w i / min ( p , δ i ) � Rational allocation for free (McNaughton’s wrap-around rule) speed-up 1 e = p o l s processors δ i Related studies � 2-approximation [Balmin et al. 13] that we will discuss � [Kell et al. 2015] : time = w i p + ( p − 1 ) c ; 2-approximation for p = 3, open for p ≥ 4 L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 6 / 22

  14. Problem complexity Proportional Mapping Greedy strategy Experimental comparison Outline Problem complexity 1 Analysis of P ROPORTIONAL M APPING [Pothen et al. 1993] 2 Design of a greedy strategy 3 Experimental comparison 4 Conclusion 5 L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 7 / 22

  15. Problem complexity Proportional Mapping Greedy strategy Experimental comparison Overview of the problem Given a SP-graph, p processors: compute the optimal makespan � Problem known as P | sp − graph , any , spdp - lin , δ i | C max � Malleability + perfect parallelism ⇒ P = . . . + thresholds = ⇒ NP-complete � � Existing proof in [Drozdowski and Kubiak 1999] : arguably complex Contribution � New NP-completeness proof L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 8 / 22

  16. Problem complexity Proportional Mapping Greedy strategy Experimental comparison Overview of the problem Given a SP-graph, p processors: compute the optimal makespan � Problem known as P | sp − graph , any , spdp - lin , δ i | C max � Malleability + perfect parallelism ⇒ P = . . . + thresholds = ⇒ NP-complete � � Existing proof in [Drozdowski and Kubiak 1999] : arguably complex Contribution � New NP-completeness proof L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 8 / 22

  17. Problem complexity Proportional Mapping Greedy strategy Experimental comparison Overview of the problem Given a SP-graph, p processors: compute the optimal makespan � Problem known as P | sp − graph , any , spdp - lin , δ i | C max � Malleability + perfect parallelism ⇒ P = . . . + thresholds = ⇒ NP-complete � � Existing proof in [Drozdowski and Kubiak 1999] : arguably complex Contribution � New NP-completeness proof L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 8 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend