bo bounded st stream sc scheduli ling in in polyh
play

Bo Bounded St Stream Sc Scheduli ling in in Polyh lyhedral l - PowerPoint PPT Presentation

Bo Bounded St Stream Sc Scheduli ling in in Polyh lyhedral l OpenStream Nuno Mig iguel Nob obre | nunomiguel.nobre@manchester.ac.uk Andi Drebes | andi.drebes@inria.fr Graham Riley | graham.riley@manchester.ac.uk Antoniu Pop |


  1. Bo Bounded St Stream Sc Scheduli ling in in Polyh lyhedral l OpenStream Nuno Mig iguel Nob obre | nunomiguel.nobre@manchester.ac.uk Andi Drebes | andi.drebes@inria.fr Graham Riley | graham.riley@manchester.ac.uk Antoniu Pop | antoniu.pop@manchester.ac.uk IMPACT 2020: January 22, 2020 | Bologna, Italy

  2. The case for streaming dataflow languages … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 2 / 11

  3. The case for streaming dataflow languages Instead of barrier synchronization … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 2 / 11

  4. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency More opportunities for parallelism … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 2 / 11

  5. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency More opportunities for parallelism … … … … … … … Task … … … … … … … … … … … … … … … … … … … … … … … 2 / 11

  6. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency More opportunities for parallelism … … … … … … … Task … … … … … … … Data … … … … … … … … … … … … … … … … 2 / 11

  7. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency More opportunities for parallelism … … … … … … … Task … … … … … … … Data … Pipeline … … … … … … … … … … … … … … … 2 / 11

  8. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency More opportunities for parallelism … … … … … … … Task … … … … … … … Data … Pipeline … … … … … … … … Scheduling is the runtime’s job Provide functional determinism … … … … … … … No in-place writes: Fewer dependencies 2 / 11

  9. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency GPU FPGA More opportunities for parallelism … … … … … … … … … … … … … … Task … … … … … … … … … … … … … … Data … … Pipeline … … … … … … … … … … … … … … … … Scheduling is the runtime’s job Provide functional determinism … … … … … … … … … … … … … … No in-place writes: Fewer dependencies Memory footprint 2 / 11

  10. Outline 1) OpenStream • Overview & polyhedral subset • Computing dependencies and schedules 2) Stream bounding • Basic strategy & limitations • Usage guidelines 3 / 11

  11. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: 4 / 11

  12. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: stream s; W s R s … … s Task dependencies: Control program Accesses on stream s overlapping windows 4 / 11

  13. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: stream s; task p1 { R s W s write three times to s; } p 1 … … s p 1 Task dependencies: Control program Accesses on stream s overlapping windows 4 / 11

  14. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: stream s; task p1 { R s W s write three times to s; } task p2 { p 1 p 2 … … s write two times to s; } p 1 p 2 Task dependencies: Control program Accesses on stream s overlapping windows 4 / 11

  15. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: stream s; task p1 { R s W s write three times to s; } task p2 { p 1 p 2 … … s write two times to s; } task r { peek three times from s; p 1 p 2 r } r Task dependencies: Control program Accesses on stream s overlapping windows 4 / 11

  16. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: stream s; task p1 { W s R s write three times to s; } task p2 { p 1 p 2 … … s write two times to s; } task r { peek three times from s; p 1 p 2 r c } r c task c { read five times from s; } Task dependencies: Control program Accesses on stream s overlapping windows 4 / 11

  17. Polyhedral OpenStream: computing dependencies stream s; parameter N; for (i = 0; i < N; ++i) task tw { write two times to s; } for (j = 0; j < N/2; ++j) task tc { read four times from s; } Polyhedral control program: • No nested task creation • Affine control statements 5 / 11

  18. Polyhedral OpenStream: computing dependencies stream s; parameter N; for (i = 0; i < N; ++i) task tw { W s (t w ,i) = 2i window: [2i, 2i + 1] write two times to s; } for (j = 0; j < N/2; ++j) task tc { R s (t c ,j) = 4j window: [4j, 4j + 3] read four times from s; } Polyhedral control program: Can statically count W s and R s and obtain access windows: • No nested task creation Ehrhart polynomials • • Affine control statements Brion generating functions • 5 / 11

  19. Polyhedral OpenStream: computing dependencies stream s; parameter N; for (i = 0; i < N; ++i) task tw { W s (t w ,i) = 2i window: [2i, 2i + 1] write two times to s; } 2i ≤ 4j + 3 ∧ 4j ≤ 2i + 1 for (j = 0; j < N/2; ++j) 2j ≤ i ≤ 2j + 1 task tc { R s (t c ,j) = 4j window: [4j, 4j + 3] read four times from s; } Polyhedral control program: Can statically count W s and R s Compute dependencies by intersecting windows and obtain access windows: • No nested task creation Ehrhart polynomials • • Affine control statements t w,0 t w,1 Brion generating functions • … t c,0 5 / 11

  20. Polyhedral OpenStream: scheduling Dependencies: polynomial (in)equalities 𝑞 𝑗 𝑦 , semi-algebraic sets: 𝑇 = 𝑦 ∈ ℝ 𝑒 𝑞 1 𝑦 ≥ 0, 𝑞 2 𝑦 ≥ 0, … , 𝑞 𝑜 𝑦 ≥ 0} 6 / 11

  21. Polyhedral OpenStream: scheduling Dependencies: polynomial (in)equalities 𝑞 𝑗 𝑦 , semi-algebraic sets: 𝑇 = 𝑦 ∈ ℝ 𝑒 𝑞 1 𝑦 ≥ 0, 𝑞 2 𝑦 ≥ 0, … , 𝑞 𝑜 𝑦 ≥ 0} A polynomial 𝑄(𝑦) is strictly positive in 𝑇 iff: 𝑙 1 𝑦 𝑞 2 𝑙 2 𝑦 … 𝑞 𝑜 𝑙 𝑜 (𝑦) 𝑄 𝑦 = ෍ 𝜇 𝑙 𝑞 1 𝜇 𝑙 ≥ 0 ∑𝜇 𝑙 > 0 𝑙∈ℕ 𝑜 6 / 11

  22. Polyhedral OpenStream: scheduling Dependencies: polynomial (in)equalities 𝑞 𝑗 𝑦 , semi-algebraic sets: 𝑇 = 𝑦 ∈ ℝ 𝑒 𝑞 1 𝑦 ≥ 0, 𝑞 2 𝑦 ≥ 0, … , 𝑞 𝑜 𝑦 ≥ 0} A polynomial 𝑄(𝑦) is strictly positive in 𝑇 iff: 𝑙 1 𝑦 𝑞 2 𝑙 2 𝑦 … 𝑞 𝑜 𝑙 𝑜 (𝑦) 𝑄 𝑦 = ෍ 𝜇 𝑙 𝑞 1 𝜇 𝑙 ≥ 0 ∑𝜇 𝑙 > 0 𝑙∈ℕ 𝑜 Cannot possibly exhaust all 𝑙 in finite time: Semi-decidable (undecidable) problem • In practice, ∼ conservative ‘Farkas lemma’ • 6 / 11

  23. Stream bounding: back-pressure WaRs stream s; t c,0 t c,1 parameter N; for (i = 0; i < N; ++i) task tw { write two times to s; … … s } for (j = 0; j < N/2; ++j) task tc { t w,0 t w,1 read four times from s; t w,2 t w,3 } t w,0 t w,1 t w,2 t w,3 … t c,0 t c,1 7 / 11

  24. Stream bounding: back-pressure WaRs stream s; t c,0 t c,1 parameter N; for (i = 0; i < N; ++i) task tw { write two times to s; … … s } for (j = 0; j < N/2; ++j) task tc { t w,0 t w,1 read four times from s; t w,2 t w,2 t w,3 } Stream bound: 4 elements t w,0 t w,1 t w,2 t w,3 … t c,0 t c,1 7 / 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend