periodic i o scheduling for supercomputers
play

Periodic I/O Scheduling for Supercomputers Guillaume Aupy 1 , Ana - PowerPoint PPT Presentation

Periodic I/O Scheduling for Supercomputers Guillaume Aupy 1 , Ana Gainaru 2 , Valentin Le F` evre 3 1 Inria & U. of Bordeaux 2 Vanderbilt University 3 ENS Lyon & Inria PMBS Workshop, November 2017 Slides available at


  1. Periodic I/O Scheduling for Supercomputers Guillaume Aupy 1 , Ana Gainaru 2 , Valentin Le F` evre 3 1 – Inria & U. of Bordeaux 2 – Vanderbilt University 3 – ENS Lyon & Inria PMBS Workshop, November 2017 Slides available at https://project.inria.fr/dash/

  2. IO congestion in HPC systems Some numbers for motivation: ◮ Computational power keeps increasing (Intrepid: 0.56 PFlop/s, Mira: 10 PFlop/s, Aurora: 450 PFlop/s (?)). ◮ IO Bandwidth increases at slowlier rate (Intrepid: 88 GB/s, Mira: 240 GB/s, Aurora: 1 TB/s (?)). ✶

  3. IO congestion in HPC systems Some numbers for motivation: ◮ Computational power keeps increasing (Intrepid: 0.56 PFlop/s, Mira: 10 PFlop/s, Aurora: 450 PFlop/s (?)). ◮ IO Bandwidth increases at slowlier rate (Intrepid: 88 GB/s, Mira: 240 GB/s, Aurora: 1 TB/s (?)). In other terms: Intrepid can process 160 GB for every PFlop Mira can process 24 GB for every PFlop Aurora will (?) process 2.2 GB for every PFlop Congestion is coming. ✶

  4. Burst buffers: the solution? Simplistically: ◮ If IO bandwidth available: use it ◮ Else, fill the burst buffers ◮ When IO bandwidth is available: empty the burst-buffers. If the Burst Buffers are big enough it should work ✷

  5. Burst buffers: the solution? Simplistically: ◮ If IO bandwidth available: use it ◮ Else, fill the burst buffers ◮ When IO bandwidth is available: empty the burst-buffers. If the Burst Buffers are big enough it should work right? ✷

  6. Burst buffers: the solution? Simplistically: ◮ If IO bandwidth available: use it ◮ Else, fill the burst buffers ◮ When IO bandwidth is available: empty the burst-buffers. If the Burst Buffers are big enough it should work right? Average I/O occupation : sum for all applications of the volume of I/O transfered, divided by the time they execute, normalized by the peak I/O bandwidth. ✷

  7. Burst buffers: the solution? Simplistically: ◮ If IO bandwidth available: use it ◮ Else, fill the burst buffers ◮ When IO bandwidth is available: empty the burst-buffers. If the Burst Buffers are big enough it should work right? Average I/O occupation : sum for all applications of the volume of I/O transfered, divided by the time they execute, normalized by the peak I/O bandwidth. Given a set of data-intensive applications running conjointly: ◮ on Intrepid have a max average I/O occupation of 25% ✷

  8. Burst buffers: the solution? Simplistically: ◮ If IO bandwidth available: use it ◮ Else, fill the burst buffers ◮ When IO bandwidth is available: empty the burst-buffers. If the Burst Buffers are big enough it should work right? Average I/O occupation : sum for all applications of the volume of I/O transfered, divided by the time they execute, normalized by the peak I/O bandwidth. Given a set of data-intensive applications running conjointly: ◮ on Intrepid have a max average I/O occupation of 25% ◮ on Mira have an average I/O occupation of 120 to 300% ! ✷

  9. Previously in IO cong. “Online” scheduling ( Gainaru et al. Ipdps’15 ) : ◮ When an application is ready to do I/O, it sends a message to an I/O scheduler; ◮ Based on the other applications running and a priority function, the I/O scheduler will give a GO or NOGO to the application. ◮ If the application receives a NOGO , it pauses until a GO instruction. ◮ Else, it performs I/O. ✸

  10. Previously in IO cong. App (3) App (2) App (1) bw B 0 0 Time ✸

  11. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) bw B 0 0 Time ✸

  12. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) bw B 0 0 Time ✸

  13. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) bw B 0 0 Time ✸

  14. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) bw B 0 0 Time ✸

  15. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) bw B 0 0 Time ✸

  16. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) w (1) bw B 0 0 Time ✸

  17. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) w (1) bw B 0 0 Time ✸

  18. Previously in IO cong. App (3) w (3) w (3) App (2) w (2) App (1) w (1) w (1) bw B 0 0 Time ✸

  19. Previously in IO cong. App (3) w (3) w (3) App (2) w (2) w (2) App (1) w (1) w (1) bw B 0 0 Time ✸

  20. Previously in IO cong. App (3) w (3) w (3) w (3) App (2) w (2) w (2) w (2) App (1) w (1) w (1) w (1) bw B 0 0 Time Approx 10% improvement in application performance with 5% gain in system performance on Intrepid. ✸

  21. This work Assumption: Applications follow I/O patterns 1 that we can obtain (based on historical data for intance). ◮ We use this information to compute an I/O time schedule; ◮ Each application then knows its GO / NOGO information and uses it to perform I/O. Spoiler: it works very well (at least it seems) 1 periodic pattern , to be defined ✹

  22. I/O characterization of HPC applis Hu et al. 2016 1. Periodicity: computation and I/O phases (write operations such as checkpoints). 2. Synchronization: parallel identical jobs lead to synchronized I/O operations. 3. Repeatability: jobs run several times with different inputs. 4. Burstiness: short burst of write operations. Idea: use the periodic behavior to compute periodic schedules. ✺

  23. Platform model ◮ N unit-speed processors, equipped with an I/O card of bandwidth b ◮ Centralized I/O system with total bandwidth B b=0.1Gb/s/Node =B Model instantiation for the Intrepid platform. ✻

  24. Application Model K periodic applications already scheduled in the system : App ( k ) ( β ( k ) , w ( k ) , vol ( k ) io ). ◮ β ( k ) is the number of processors onto which App ( k ) is assigned ◮ w ( k ) is the computation time of a period ◮ vol ( k ) is the volume of I/O to transfor after the w ( k ) units of time io vol ( k ) time ( k ) io io = min( β ( k ) · b, B ) App (3) w (3) w (3) w (3) App (2) w (2) w (2) w (2) App (1) w (1) w (1) w (1) Bandwidth B 0 0 Time ✼

  25. Objectives If App ( k ) runs during a total time T k and performs n ( k ) instances, we define: w ( k ) ρ ( k ) = n ( k ) w ( k ) ρ ( k ) = , ˜ w ( k ) + time ( k ) T k io ✽

  26. Objectives If App ( k ) runs during a total time T k and performs n ( k ) instances, we define: w ( k ) ρ ( k ) = n ( k ) w ( k ) ρ ( k ) = , ˜ w ( k ) + time ( k ) T k io SysEfficiency maximize peak performance (average number of Flops): 1 � K k =1 β ( k ) ˜ ρ ( k ) . maximize N ✽

  27. Objectives If App ( k ) runs during a total time T k and performs n ( k ) instances, we define: w ( k ) ρ ( k ) = n ( k ) w ( k ) ρ ( k ) = , ˜ w ( k ) + time ( k ) T k io SysEfficiency Dilation minimize largest slowdown maximize peak performance (fairness between users): (average number of Flops): ρ ( k ) 1 � K k =1 β ( k ) ˜ ρ ( k ) . minimize max k =1 ..K ρ ( k ) . maximize ˜ N ✽

  28. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ✾

  29. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ◮ We want the schedule information distributed over the applis: the goal is not to add a new congestion point; ✾

  30. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ◮ We want the schedule information distributed over the applis: the goal is not to add a new congestion point; ◮ Computing a full I/O schedule over all iterations of all applications would be too expensive (i) in time, (ii) in space. ✾

  31. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ◮ We want the schedule information distributed over the applis: the goal is not to add a new congestion point; ◮ Computing a full I/O schedule over all iterations of all applications would be too expensive (i) in time, (ii) in space. ◮ We want a minimum overhead for Applis users: otherwise, our guess is, users might not like it that much � . ✾

  32. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ◮ We want the schedule information distributed over the applis: the goal is not to add a new congestion point; ◮ Computing a full I/O schedule over all iterations of all applications would be too expensive (i) in time, (ii) in space. ◮ We want a minimum overhead for Applis users: otherwise, our guess is, users might not like it that much � . ✾

  33. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ◮ We want the schedule information distributed over the applis: the goal is not to add a new congestion point; ◮ Computing a full I/O schedule over all iterations of all applications would be too expensive (i) in time, (ii) in space. ◮ We want a minimum overhead for Applis users: otherwise, our guess is, users might not like it that much � . We introduce Periodic Scheduling. ✾

  34. Periodic schedules Bw · · · c Time T + c 2 T + c 3 T + c ( n − 2) T + c ( n − 1) T + c nT + c Init Pattern Clean up (a) Periodic schedule (phases) Bw B vol (3) vol (4) vol (3) vol (2) vol (2) vol (2) io io io io io io vol (1) vol (1) vol (1) io io io 0 Time 0 endW (4) initIO (4) initW (4) T 1 1 1 (b) Detail of I/O in a period/pattern ✶✵

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend