lecture 11 hw3 rest of parallel patterns load balancing
play

Lecture 11: HW3, Rest of Parallel Patterns, Load Balancing - PowerPoint PPT Presentation

Lecture 11: HW3, Rest of Parallel Patterns, Load Balancing G63.2011.002/G22.2945.001 November 16, 2010 D&C General Outline Divide-and-Conquer General Data Dependencies D&C General Outline Divide-and-Conquer General Data


  1. Lecture 11: HW3, Rest of Parallel Patterns, Load Balancing G63.2011.002/G22.2945.001 · November 16, 2010 D&C General

  2. Outline Divide-and-Conquer General Data Dependencies D&C General

  3. Outline Divide-and-Conquer General Data Dependencies D&C General

  4. Divide and Conquer x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 y i = f i ( x 1 , . . . , x N ) x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 for i ∈ { 1 , dots , M } . y 0 y 1 y 2 y 3 y 4 y 5 y 6 y 7 Main purpose: A way of partitioning up fully u 0 u 1 u 2 u 3 u 4 u 5 u 6 u 7 dependent tasks. v 0 v 1 v 2 v 3 v 4 v 5 v 6 v 7 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 D&C General

  5. Divide and Conquer x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 y i = f i ( x 1 , . . . , x N ) x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 for i ∈ { 1 , dots , M } . y 0 y 1 y 2 y 3 y 4 y 5 y 6 y 7 Main purpose: A way of partitioning up fully u 0 u 1 u 2 u 3 u 4 u 5 u 6 u 7 dependent tasks. v 0 v 1 v 2 v 3 v 4 v 5 v 6 v 7 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 Processor allocation? D&C General

  6. Divide and Conquer: Examples • GEMM, TRMM, TRSM, GETRF (LU) • FFT • Sorting: Bucket sort, Merge sort • N -Body problems (Barnes-Hut, FMM) • Adaptive Integration More fun with work and span: D&C analysis lecture D&C General

  7. Divide and Conquer: Issues • “No idea how to parallelize that” • → Try D&C • Non-optimal during partition, merge • But: Does not matter if deep levels do heavy enough processing • Subtle to map to fixed-width machines (e.g. GPUs) • Varying data size along tree • Bookkeeping nontrivial for non-2 n sizes • Side benefit: D&C is generally cache-friendly D&C General

  8. Outline Divide-and-Conquer General Data Dependencies D&C General

  9. General Dependency Graphs A f B B = f(A) C = g(B) g p E = f(C) q C P F = h(C) f h G = g(E,F) P = p(B) Q E F Q = q(B) g g r R = r(G,P,Q) r G r R D&C General

  10. General Dependency Graphs A f B B = f(A) C = g(B) g p E = f(C) q C P F = h(C) f h G = g(E,F) P = p(B) Q E F Q = q(B) g g r R = r(G,P,Q) r G r Great: All patterns discussed so far can be reduced to this one. R D&C General

  11. Cilk Features: cilk int fib ( int n) { • Adds keywords spawn , if (n < 2) return n; sync , ( inlet , abort ) else • Remove keywords → valid { (seq.) C int x, y; Timeline: x = spawn fib (n − 1); • Developed at MIT, starting y = spawn fib (n − 2); in ‘94 sync; • Commercialized in ‘06 • Bought by Intel in ‘09 return (x+y); • Available in the Intel } } Compilers D&C General

  12. Cilk Features: cilk int fib ( int n) { • Adds keywords spawn , if (n < 2) return n; sync , ( inlet , abort ) else • Remove keywords → valid { (seq.) C int x, y; Timeline: x = spawn fib (n − 1); • Developed at MIT, starting y = spawn fib (n − 2); in ‘94 sync; • Commercialized in ‘06 • Bought by Intel in ‘09 return (x+y); • Available in the Intel } } Compilers Efficient implementation? D&C General

  13. Work-Stealing Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. Spawn! P P P P P P P P With material by Charles E. Leiserson (MIT) D&C General

  14. Work-Stealing Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. Spawn! Spawn! P P P P P P P P With material by Charles E. Leiserson (MIT) D&C General

  15. Work-Stealing Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. Return! P P P P P P P P With material by Charles E. Leiserson (MIT) D&C General

  16. Work-Stealing Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. Return! P P P P P P P P With material by Charles E. Leiserson (MIT) D&C General

  17. Work-Stealing Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. Steal! P P P P P P P P When a processor runs out of work, it steals a thread from the top of a random victim’s deque. With material by Charles E. Leiserson (MIT) D&C General

  18. Work-Stealing Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. Steal! P P P P P P P P When a processor runs out of work, it steals a thread from the top of a random victim’s deque. With material by Charles E. Leiserson (MIT) D&C General

  19. Work-Stealing Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P P P P P When a processor runs out of work, it steals a thread from the top of a random victim’s deque. With material by Charles E. Leiserson (MIT) D&C General

  20. Work-Stealing Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. Spawn! P P P P P P P P When a processor runs out of work, it steals a thread from the top of a random victim’s deque. With material by Charles E. Leiserson (MIT) D&C General

  21. Work-Stealing Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. Spawn! P P P P P P P P When a processor runs out of work, it steals a thread from the top of a random victim’s deque. Why is Work-Stealing better than a Task Queue? With material by Charles E. Leiserson (MIT) D&C General

  22. General Graphs: Issues • Model can accommodate ‘speculative execution’ • Launch many different ‘approaches’ • Abort the others as soon as one satisfactory one emerges. • Discover dependencies, make up schedule at run-time • Usually less efficient than the case of known dependencies • Map-Reduce absorbs many cases that would otherwise be general • On-line scheduling: complicated • Not a good fit if a more specific pattern applies • Good if inputs/outputs/functions are (somewhat) heavy-weight D&C General

  23. Questions? ? D&C General

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend