scheduling tree shaped task graphs to minimize memory and
play

Scheduling tree-shaped task graphs to minimize memory and makespan - PowerPoint PPT Presentation

Scheduling tree-shaped task graphs to minimize memory and makespan Lionel Eyraud-Dubois (INRIA, Bordeaux, France) , Loris Marchal (CNRS, Lyon, France) , Oliver Sinnen (Univ. Auckland, New Zealand) , Fr ed eric Vivien (INRIA, Lyon, France)


  1. Scheduling tree-shaped task graphs to minimize memory and makespan Lionel Eyraud-Dubois (INRIA, Bordeaux, France) , Loris Marchal (CNRS, Lyon, France) , Oliver Sinnen (Univ. Auckland, New Zealand) , Fr´ ed´ eric Vivien (INRIA, Lyon, France) New Challenges in Scheduling Theory Workshop, Aussois, March/April 2014 1/ 28

  2. Introduction Task graph scheduling ◮ Application modeled as a graph ◮ Map tasks on processors and schedule them ◮ Usual performance metric: makespan (time) Today: focus on memory ◮ Workflows with large temporary data ◮ Bad evolution of perf. for computation vs. communication: 1/Flops ≪ 1/bandwidth ≪ latency ◮ Gap between processing power and communication cost increasing exponentially annual improvements Flops rate 59% mem. bandwidth 26% mem. latency 5% ◮ Avoid communications ◮ Restrict to in-core memory (out-of-core is expensive) 2/ 28

  3. Introduction Task graph scheduling ◮ Application modeled as a graph ◮ Map tasks on processors and schedule them ◮ Usual performance metric: makespan (time) Today: focus on memory ◮ Workflows with large temporary data ◮ Bad evolution of perf. for computation vs. communication: 1/Flops ≪ 1/bandwidth ≪ latency ◮ Gap between processing power and communication cost increasing exponentially annual improvements Flops rate 59% mem. bandwidth 26% mem. latency 5% ◮ Avoid communications ◮ Restrict to in-core memory (out-of-core is expensive) 2/ 28

  4. Introduction Task graph scheduling ◮ Application modeled as a graph ◮ Map tasks on processors and schedule them ◮ Usual performance metric: makespan (time) Today: focus on memory ◮ Workflows with large temporary data ◮ Bad evolution of perf. for computation vs. communication: 1/Flops ≪ 1/bandwidth ≪ latency ◮ Gap between processing power and communication cost increasing exponentially annual improvements Flops rate 59% mem. bandwidth 26% mem. latency 5% ◮ Avoid communications ◮ Restrict to in-core memory (out-of-core is expensive) 2/ 28

  5. Focus on Task Trees Motivation: ◮ Arise in multifrontal sparse matrix factorization ◮ Assembly/Elimination tree: application task graph is a tree ◮ Large temporary data ◮ Memory usage becomes a bottleneck 3/ 28

  6. Outline Introduction and related work Complexity of parallel tree processing Heuristics for weighted task trees Simulations Summary and perspectives 4/ 28

  7. Outline Introduction and related work Complexity of parallel tree processing Heuristics for weighted task trees Simulations Summary and perspectives 5/ 28

  8. Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v − + + / v 7 × + − − + 2 z u t 5 z 1 x Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28

  9. Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v − + + / v 7 × + − − + 2 z u t 5 z 1 x Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28

  10. Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v − + + / v 7 × + − − + 2 z u t 5 z 1 x Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28

  11. Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v − + + / v 7 × + − − + 2 z u t 5 z 1 x Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28

  12. Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v − + + / v 7 × + − − + 2 z u t 5 z 1 x Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28

  13. Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v Complexity results Problem on trees: ◮ Polynomial algorithm [Sethi & Ullman, 1970] General problem on DAGs (common subexpressions): ◮ P-Space complete [Gilbert, Lengauer & Tarjan, 1980] ◮ Without re-computation: NP-complete [Sethi, 1973] Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28

  14. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i 2 ◮ Execution data of size n i n 2 3 n 3 ◮ Input data of leaf nodes 0 f 4 f 5 have null size 4 n 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 7/ 28

  15. Notations: Tree-Shaped Task Graphs 1 n 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i n 2 2 3 ◮ Execution data of size n i n 3 f 4 f 5 ◮ Input data of leaf nodes 0 have null size 4 5 n 4 n 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 7/ 28

  16. Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 0 2 2 4 2 2 5 0 0 Peak memory so far: Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  17. Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 2 0 2 2 4 2 5 0 0 Peak memory so far: 4 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  18. Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 2 0 2 4 2 2 5 0 0 Peak memory so far: 4 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  19. Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 2 2 0 2 4 2 5 0 0 Peak memory so far: 6 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  20. Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 2 2 0 4 2 2 5 0 0 Peak memory so far: 6 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  21. Impact of Schedule on Memory Peak 4 1 3 3 1 2 3 6 2 2 0 4 2 2 5 0 0 Peak memory so far: 8 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  22. Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 0 2 2 4 2 2 5 0 0 Peak memory so far: 8 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  23. Impact of Schedule on Memory Peak 4 1 3 3 2 6 1 3 0 2 2 4 2 2 5 0 0 Peak memory so far: 12 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  24. Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 0 2 2 4 2 2 5 0 0 Peak memory so far: 12 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  25. Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 0 2 2 4 2 2 5 0 0 Peak memory so far: 12 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  26. Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 0 2 2 4 2 2 5 0 0 Peak memory so far: Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  27. Impact of Schedule on Memory Peak 4 1 3 3 2 6 1 3 0 2 2 4 2 2 5 0 0 Peak memory so far: 9 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

  28. Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 2 0 2 2 4 2 5 0 0 Peak memory so far: 9 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend