lecture 14 cilk
play

Lecture 14: Cilk Shankar Balachandran bshankar@ee.iitb.ac.in The - PowerPoint PPT Presentation

Lecture 14: Cilk Shankar Balachandran bshankar@ee.iitb.ac.in The lecture is partly based on Charles Leisersons Slides on Cilk EE717 Cilk A C language for programming dynamic multithreaded applications on shared-memory multiprocessors.


  1. Lecture 14: Cilk Shankar Balachandran bshankar@ee.iitb.ac.in The lecture is partly based on Charles Leiserson’s Slides on Cilk EE717

  2. Cilk A C language for programming dynamic multithreaded applications on shared-memory multiprocessors. Example applications: ● virus shell assembly ● dense and sparse matrix computations ● graphics rendering ● friction-stir welding ● n -body simulation simulation ● heuristic search ● artificial evolution EE717

  3. Shared-Memory Multiprocessor … P P P $ $ $ Network Memory I/O In particular, over the next decade, chip multiprocessors (CMP’s) will be an increasingly important platform! EE717

  4. Cilk Is Simple • Cilk extends the C language with just a handful of keywords. • Every Cilk program has a serial semantics . • Not only is Cilk fast, it provides performance guarantees based on performance abstractions. • Cilk is processor-oblivious . • Cilk’s provably good runtime system auto- matically manages low-level aspects of parallel execution, including protocols, load balancing, and scheduling. • Cilk supports speculative parallelism. EE717

  5. Fibonacci int fib (int n) { if (n<2) return (n); Cilk code else { int x,y; cilk int fib (int n) { x = fib(n-1); if (n<2) return (n); y = fib(n-2); else { return (x+y); int x,y; } x = spawn fib(n-1); } y = spawn fib(n-2); sync; C elision return (x+y); } } Cilk is a faithful extension of C. A Cilk program’s serial elision is always a legal implementation of Cilk semantics. Cilk provides no new data types. EE717

  6. Basic Cilk Keywords Identifies a function as a Cilk procedure , cilk int fib (int n) { if (n<2) return (n); capable of being else { spawned in parallel. int x,y; x = spawn fib(n-1); y = spawn fib(n-2); sync; The named child return (x+y); } Cilk procedure } can execute in parallel with the Control cannot pass this parent caller. point until all spawned children have returned. EE717

  7. Dynamic Multithreading cilk int fib (int n) { if (n<2) return (n); Example: fib(4) else { int x,y; x = spawn fib(n-1); 4 y = spawn fib(n-2); sync; return (x+y); } 3 2 } 2 1 1 0 “Processor oblivious” The computation dag 1 0 unfolds dynamically. EE717

  8. Multithreaded Computation initial thread final thread continue edge return edge spawn edge • The dag G = ( V , E ) represents a parallel instruction stream. • Each vertex v 2 V represents a (Cilk) thread : a maximal sequence of instructions not containing parallel control ( spawn , sync , return ). • Every edge e 2 E is either a spawn edge, a return edge, or a continue edge. EE717

  9. LECTURE 14 Performance Measures EE717

  10. Algorithmic Complexity Measures TP = execution time on P processors EE717

  11. Algorithmic Complexity Measures TP = execution time on P processors T 1 = work EE717

  12. Algorithmic Complexity Measures TP = execution time on P processors T 1 = work T 1 = span* * Also called critical-path length or computational depth . EE717

  13. Algorithmic Complexity Measures TP = execution time on P processors T 1 = work T 1 = span* L OWER B OUNDS • TP ¸ T 1/ P • TP ¸ T 1 * Also called critical-path length or computational depth . EE717

  14. Speedup Definition: T 1 /TP = speedup on P processors. If T 1 /TP = Θ ( P ) · P , we have linear speedup ; = P , we have perfect linear speedup ; > P , we have superlinear speedup , which is not possible in our model, because of the lower bound TP ¸ T 1/ P . EE717

  15. Parallelism Because we have the lower bound TP ¸ T 1 , the maximum possible speedup given T 1 and T 1 is T 1 /T 1 = parallelism = the average amount of work per step along the span. EE717

  16. Example: fib(4) 1 8 2 7 3 4 6 5 Assume for simplicity that each Cilk thread in fib() takes unit time to execute. Work: T 1 = ? Work: T 1 = 17 Span: T 1 = ? Span: T 1 = 8 EE717

  17. Example: fib(4) Assume for simplicity that each Cilk thread in fib() takes unit time to execute. Work: T 1 = ? Work: T 1 = 17 Using many more Span: T 1 = ? Span: T 1 = 8 than 2 processors makes little sense. Parallelism: T 1 /T 1 = 2.125 EE717

  18. Lesson Work and span can predict performance on large machines better than running times on small machines can. EE717

  19. Suggested Reading Amdahl's Law in the Multicore Era , Mark D. Hill and Michael R. Marty, IEEE Computer, July 2008. EE717

  20. Parallelizing Vector Addition void vadd (real *A, real *B, int n){ C int i; for (i=0; i<n; i++) A[i]+=B[i]; } EE717

  21. Parallelizing Vector Addition void vadd (real *A, real *B, int n){ C int i; for (i=0; i<n; i++) A[i]+=B[i]; } void vadd (real *A, real *B, int n){ C if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { vadd (A, B, n/2); vadd (A+n/2, B+n/2, n-n/2); } } Parallelization strategy: 1. Convert loops to recursion. EE717

  22. Parallelizing Vector Addition void vadd (real *A, real *B, int n){ C int i; for (i=0; i<n; i++) A[i]+=B[i]; } void vadd (real *A, real *B, int n){ cilk Cilk if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { spawn vadd (A, B, n/2); spawn vadd (A+n/2, B+n/2, n-n/2); sync; } } Parallelization strategy: Side benefit: 1. Convert loops to recursion. D&C is generally 2. Insert Cilk keywords. good for caches! EE717

  23. Vector Addition cilk void vadd (real *A, real *B, int n){ if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { spawn vadd (A, B, n/2); spawn vadd (A+n/2, B+n/2, n-n/2); sync; } } EE717

  24. Vector Addition Analysis To add two vectors of length n , where BASE = Θ (1): Work: T 1 = ? Θ ( n ) Span: T 1 = ? Θ (lg n ) Parallelism: T 1/ T 1 = ? Θ ( n /lg n ) BASE EE717

  25. Another Parallelization void vadd1 (real *A, real *B, int n){ C int i; for (i=0; i<n; i++) A[i]+=B[i]; } void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { vadd1(A+j, B+j, min(BASE, n-j)); } } cilk void vadd1 (real *A, real *B, int n){ Cilk int i; for (i=0; i<n; i++) A[i]+=B[i]; } cilk void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { spawn vadd1(A+j, B+j, min(BASE, n-j)); } sync; } EE717

  26. Analysis … … BASE To add two vectors of length n , where BASE = Θ (1): Work: T 1 = ? Θ ( n ) P U N Y ! Θ ( n ) ? Span: T 1 = ? Θ (1) Parallelism: T 1/ T 1 = EE717

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend