part 3 memory aware dag scheduling
play

Part 3: Memory-Aware DAG Scheduling CR05: Data Aware Algorithms - PowerPoint PPT Presentation

Part 3: Memory-Aware DAG Scheduling CR05: Data Aware Algorithms October 12 & 15, 2020 Summary of the course Part 1: Pebble Games models of computations with limited memory Part 2: External Memory and Cache Oblivous Algoritm 2-level


  1. Part 3: Memory-Aware DAG Scheduling CR05: Data Aware Algorithms October 12 & 15, 2020

  2. Summary of the course ◮ Part 1: Pebble Games models of computations with limited memory ◮ Part 2: External Memory and Cache Oblivous Algoritm 2-level memory system, some parallelism (work stealing) ◮ Part 3: Streaming Algoritms Deal with big data, distributed computing ◮ Part 4: DAG scheduling (today) structured computations with limited memory ◮ Part 5: Communication Avoiding Algorithms regular computations (lin. algebra) in distributed setting 2 / 22

  3. Introduction ◮ Directed Acyclic Graphs: express task dependencies ◮ nodes: computational tasks ◮ edges: dependencies (data = output of a task = input of another task) ◮ Formalism proposed long ago in scheduling ◮ Back into fashion thanks to task based runtimes ◮ Decompose an application (scientific computations) into tasks ◮ Data produced/used by tasks created dependancies ◮ Task mapping and scheduling done at runtime ◮ Numerous projects: ◮ StarPU (Inria Bordeaux) – several codes for each task to execute on any computing resource (CPU, GPU, *PU) ◮ DAGUE, ParSEC (ICL, Tennessee) – task graph expressed in symbolic compact form, dedicated to linear algebra ◮ StartSs (Barcelona), Xkaapi (Grenoble), and others. . . ◮ Now included in OpenMP API 3 / 22

  4. Introduction ◮ Directed Acyclic Graphs: express task dependencies ◮ nodes: computational tasks ◮ edges: dependencies (data = output of a task = input of another task) ◮ Formalism proposed long ago in scheduling ◮ Back into fashion thanks to task based runtimes ◮ Decompose an application (scientific computations) into tasks ◮ Data produced/used by tasks created dependancies ◮ Task mapping and scheduling done at runtime ◮ Numerous projects: ◮ StarPU (Inria Bordeaux) – several codes for each task to execute on any computing resource (CPU, GPU, *PU) ◮ DAGUE, ParSEC (ICL, Tennessee) – task graph expressed in symbolic compact form, dedicated to linear algebra ◮ StartSs (Barcelona), Xkaapi (Grenoble), and others. . . ◮ Now included in OpenMP API 3 / 22

  5. Task graph scheduling and memory ◮ Consider a simple task graph ◮ Tasks have durations and memory demands B D A F C E ◮ Peak memory: maximum memory usage ◮ Trade-off between peak memory and performance (time to solution) 4 / 22

  6. Task graph scheduling and memory ◮ Consider a simple task graph ◮ Tasks have durations and memory demands B D memory A F duration C E ◮ Peak memory: maximum memory usage ◮ Trade-off between peak memory and performance (time to solution) 4 / 22

  7. Task graph scheduling and memory ◮ Consider a simple task graph ◮ Tasks have durations and memory demands Processor 2: C E F Processor 1: A B D time ◮ Peak memory: maximum memory usage ◮ Trade-off between peak memory and performance (time to solution) 4 / 22

  8. Task graph scheduling and memory ◮ Consider a simple task graph ◮ Tasks have durations and memory demands Processor 2: C E F Processor 1: A B D time out of memory! ◮ Peak memory: maximum memory usage ◮ Trade-off between peak memory and performance (time to solution) 4 / 22

  9. Task graph scheduling and memory ◮ Consider a simple task graph ◮ Tasks have durations and memory demands Processor 2: C E F Processor 1: A B D time ◮ Peak memory: maximum memory usage ◮ Trade-off between peak memory and performance (time to solution) 4 / 22

  10. Going back to sequential processing ◮ Temporary data require memory ◮ Scheduling influences the peak memory B D A F C E When minimum memory demand > available memory: ◮ Store some temporary data on a larger, slower storage (disk) ◮ Out-of-core computing, with Input/Output operations (I/O) ◮ Decide both scheduling and eviction scheme 5 / 22

  11. Going back to sequential processing ◮ Temporary data require memory ◮ Scheduling influences the peak memory A B C D E F A C B D E F When minimum memory demand > available memory: ◮ Store some temporary data on a larger, slower storage (disk) ◮ Out-of-core computing, with Input/Output operations (I/O) ◮ Decide both scheduling and eviction scheme 5 / 22

  12. Going back to sequential processing ◮ Temporary data require memory ◮ Scheduling influences the peak memory A B C D E F A C B D E F When minimum memory demand > available memory: ◮ Store some temporary data on a larger, slower storage (disk) ◮ Out-of-core computing, with Input/Output operations (I/O) ◮ Decide both scheduling and eviction scheme 5 / 22

  13. Research problems Several interesting questions: ◮ For sequential processing: ◮ Minimum memory needed to process a graph ◮ In case of memory shortage, minimum I/Os required ◮ In case of parallel processing: ◮ Tradeoffs between memory and time (makespan) ◮ Makespan minimization under bounded memory Most (all?) of these problems: NP-hard on general graphs � Sometimes restrict on simpler graphs: 1. Trees (single output, multiple inputs for each task) Arise in sparse linear algebra (sparse direct solvers), with large data to handle: memory is a problem 2. Series-Parallel graphs Natural generalization of trees, close to actual structure of regular codes 6 / 22

  14. Research problems Several interesting questions: ◮ For sequential processing: ◮ Minimum memory needed to process a graph ◮ In case of memory shortage, minimum I/Os required ◮ In case of parallel processing: ◮ Tradeoffs between memory and time (makespan) ◮ Makespan minimization under bounded memory Most (all?) of these problems: NP-hard on general graphs � Sometimes restrict on simpler graphs: 1. Trees (single output, multiple inputs for each task) Arise in sparse linear algebra (sparse direct solvers), with large data to handle: memory is a problem 2. Series-Parallel graphs Natural generalization of trees, close to actual structure of regular codes 6 / 22

  15. Research problems Several interesting questions: ◮ For sequential processing: ◮ Minimum memory needed to process a graph ◮ In case of memory shortage, minimum I/Os required ◮ In case of parallel processing: ◮ Tradeoffs between memory and time (makespan) ◮ Makespan minimization under bounded memory Most (all?) of these problems: NP-hard on general graphs � Sometimes restrict on simpler graphs: 1. Trees (single output, multiple inputs for each task) Arise in sparse linear algebra (sparse direct solvers), with large data to handle: memory is a problem 2. Series-Parallel graphs Natural generalization of trees, close to actual structure of regular codes 6 / 22

  16. Research problems Several interesting questions: ◮ For sequential processing: ◮ Minimum memory needed to process a graph ◮ In case of memory shortage, minimum I/Os required ◮ In case of parallel processing: ◮ Tradeoffs between memory and time (makespan) ◮ Makespan minimization under bounded memory Most (all?) of these problems: NP-hard on general graphs � Sometimes restrict on simpler graphs: 1. Trees (single output, multiple inputs for each task) Arise in sparse linear algebra (sparse direct solvers), with large data to handle: memory is a problem 2. Series-Parallel graphs Natural generalization of trees, close to actual structure of regular codes 6 / 22

  17. Outline Minimize Memory for Trees Minimize Memory for Series-Parallel Graphs Minimize I/Os for Trees under Bounded Memory 7 / 22

  18. Outline Minimize Memory for Trees Minimize Memory for Series-Parallel Graphs Minimize I/Os for Trees under Bounded Memory 8 / 22

  19. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i 2 n 2 n 3 3 ◮ Execution data of size n i 0 ◮ Input data of leaf nodes f 4 f 5 have null size 4 n 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

  20. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i 2 n 2 n 3 3 ◮ Execution data of size n i f 4 0 ◮ Input data of leaf nodes f 5 have null size n 4 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

  21. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i 2 n 2 n 3 3 ◮ Execution data of size n i f 4 f 5 0 ◮ Input data of leaf nodes have null size n 5 4 n 4 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

  22. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i n 2 2 n 3 3 ◮ Execution data of size n i f 4 f 5 0 ◮ Input data of leaf nodes have null size 4 n 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

  23. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i n 3 2 n 2 3 ◮ Execution data of size n i 0 ◮ Input data of leaf nodes f 4 f 5 have null size 4 n 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

  24. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i 2 n 2 n 3 3 ◮ Execution data of size n i 0 ◮ Input data of leaf nodes f 4 f 5 have null size 4 n 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend