CS 240A : Divide-and-Conquer with Cilk++ Divide & Conquer - PowerPoint PPT Presentation

CS 240A : Divide-and-Conquer with Cilk++ • Divide & Conquer Paradigm • Solving recurrences • Sorting: Quicksort and Mergesort Thanks to to Charles E. E. Leiserson for some of th these slides 1

Work and Span (Recap) T P = execution time on P processors T 1 = wo work T ∞ = sp span an * * Speedup Sp up on n p processo ssors ∙ T 1 /T p Pote tenti tial parallelism ∙ T 1 /T ∞ 2

Sorting ∙ Sorting is possibly the most frequently executed operation in computing! ∙ Quick Quicksort sort is the fastest sorting algorithm in practice with an average running time of O(N log N), (but O(N 2 ) worst case performance) ∙ Mergesort t has worst case performance of O(N log N) for sorting N elements ∙ Both based on the recursive div divide- ide-an and- d- conqu con quer er paradigm 3

QUICKSORT ∙ Basic Quicksort sorting an array S works as follows: § If the number of elements in S is 0 or 1, then return. § Pick any element v in S. Call this pivot. § Partition the set S-{v} into two disjoint groups: ♦ S 1 = = {x {x ε S S-{v {v} | } | x x ≤ v v} } ♦ S 2 = = {x {x ε S S-{v {v} | } | x x ≥ v v} } § Retu turn quicksort( t(S 1 ) f follow ollowed by ed by v f follow ollowed by ed by quicksort( t(S 2 ) ) 4

QUICKSORT 14 13 45 56 34 31 32 21 78 Select Pivot 14 13 45 56 34 31 32 21 78 5

QUICKSORT 14 13 45 56 34 31 32 21 78 Partition around Pivot 56 13 31 21 45 34 32 14 78 6

QUICKSORT 56 13 31 21 45 34 32 14 78 Quicksort recursively 13 14 21 31 32 34 45 56 78 13 14 21 31 32 34 45 56 78 7

Parallelizing Quicksort ∙ Serial Quicksort sorts an array S as follows: § If the number of elements in S is 0 or 1, then return. § Pick any element v in S. Call this pivot. § Partition the set S-{v} into two disjoint groups: ♦ S 1 = = {x {x ε S S-{v {v} | } | x x ≤ v v} } ♦ S 2 = = {x {x ε S S-{v {v} | } | x x ≥ v v} } § Retu turn quicksort( t(S 1 ) fo follo llowed wed by v f follow ollowed by ed by quicksort( t(S 2 ) ) 8

Parallel Quicksort (Basic) • The second recursive call to qsort does not depend on the results of the first recursive call • We have an opportunity to speed up the call by making both calls in parallel. template <typename T> void qsort(T begin, T end) { if (begin != end) { T middle = partition( begin, end, bind2nd( less<typename iterator_traits<T>::value_type>(), *begin ) ); cilk_spawn qsort(begin, middle); qsort(max(begin + 1, middle), end); cilk_sync; } } 9

Performance ∙ ./qsort 500000 -cilk_set_worker_count 1 >> 0.083 seconds ∙ ./qsort 500000 -cilk_set_worker_count 16 >> 0.014 seconds ∙ Speedup = T 1 /T 16 = 0.083/0.014 = 5.93 5.93 ∙ ./qsort 50000000 -cilk_set_worker_count 1 >> 10.57 seconds ∙ ./qsort 50000000 -cilk_set_worker_count 16 >> 1.58 seconds ∙ Speedup = T 1 /T 16 = 10.57/1.58 = 6.67 6.67 10

Measure Work/Span Empirically ∙ cilkscreen -w ./qsort 50000000 Work = 21593799861 Span = 1261403043 Burdened span = 1261600249 Parallelism = 17.1189 17.1189 Burdened parallelism = 17.1162 workspan ws; #Spawn = 50000000 ws.start(); #Atomic instructions = 14 sample_qsort(a, a + n); ws.stop(); ∙ cilkscreen -w ./qsort 500000 ws.report(std::cout); Work = 178835973 Span = 14378443 Burdened span = 14525767 Parallelism = 12.4378 12.4378 Burdened parallelism = 12.3116 #Spawn = 500000 #Atomic instructions = 8 11

Analyzing Quicksort 56 13 31 21 45 34 32 14 78 Quicksort recursively 13 14 21 31 32 34 45 56 78 13 14 21 31 32 34 45 56 78 Assume we have a “great” partitioner that always generates two balanced sets 12

Analyzing Quicksort ∙ Work: T 1 (n) = 2T 1 (n/2) + Θ (n) 2T 1 (n/2) = 4T 1 (n/4) + 2 Θ (n/2) …. …. n/2 T 1 (2) = n T 1 (1) + n/2 Θ (2) + + T 1 (n) = Θ (n lg n) ∙ Span recurrence: T ∞ (n) = T ∞ (n/2) + Θ (n) Solves to T ∞ (n) = Θ (n) 13

Analyzing Quicksort T 1 (n) Not t much ! Pa Parallelism: llelism: = Θ (lg n) T ∞ (n) ∙ Indeed, partitioning (i.e., constructing the array S 1 = {x ε S-{v} | x ≤ v}) can be accomplished in parallel in time Θ (lg n) ∙ Which gives a span T ∞ (n) = Θ (lg 2 n ) ∙ And parallelism Θ (n/lg n) Way bette tter ! ∙ Basic parallel qsort can be found in CLRS 14

The Master Method The Maste ter Meth thod for solving recurrences applies to recurrences of the form * T(n) = a T(n/b) + f(n) , where a ≥ 1, b > 1, and f is asymptotically positive. I DEA DEA : Compare n log b a with f(n) . * The unstated base case is T(n) = Θ (1) for sufficiently small n. 15

Master Method — C ASE 1 T(n) = a T(n/b) + f(n) n log b a ≫ f(n) Specifically, f(n) = O(n log b a – ε ) for some constant ε > 0 . Soluti tion: T(n) = Θ (n log b a ) . Eg matrix mult: a=8, b=2, f(n)=n 2 è T 1 (n)= Θ (n 3 ) 16

Master Method — C ASE 2 T(n) = a T(n/b) + f(n) n log b a ≈ f(n) Specifically, f(n) = Θ (n log b a lg k n) for some constant k ≥ 0. Soluti tion: T(n) = Θ (n log b a lg k+1 n)) . Eg qsort: a=2, b=2, k=0 è T 1 (n)= Θ (n lg n) 17

Master Method — C ASE 3 T(n) = a T(n/b) + f(n) n log b a ≪ f(n) Specifically, f(n) = Ω (n log b a + ε ) for some constant ε > 0, and f(n) satisfies the regularity ty conditi tion that a f(n/b) ≤ c f(n) for some constant c < 1 . Eg: S Eg : Span pan of of qs qsort ort Soluti tion: T(n) = Θ (f(n)) . 18

Master Method Summary T(n) = a T(n/b) + f(n) CASE E 1: f (n) = O(n log b a – ε ), constant ε > 0 ⇒ T(n) = Θ (n log b a ) . CASE E 2: f (n) = Θ (n log b a lg k n), constant k ≥ 0 ⇒ T(n) = Θ (n log b a lg k+1 n) . CASE E 3: f (n) = Ω (n log b a + ε ), constant ε > 0, and regularity condition ⇒ T(n) = Θ (f(n)) . 19

MERGESORT ∙ Mergesort is an example of a recursive sorting algorithm. ∙ It is based on the divide-and-conquer paradigm ∙ It uses the merge operation as its fundamental component (which takes in two sorted sequences and produces a single sorted sequence) ∙ Simulation of Mergesort ∙ Drawback of mergesort: Not in-place (uses an extra temporary array) 20

Merging Two Sorted Arrays template <typename T> void Merge(T *C, T *A, T *B, int na, int nb) { while (na>0 && nb>0) { if (*A <= *B) { *C++ = *A++; na--; } else { *C++ = *B++; nb--; Time to merge n } } elements = Θ (n). while (na>0) { *C++ = *A++; na--; } while (nb>0) { *C++ = *B++; nb--; } } 3 12 19 46 3 12 19 46 4 14 21 23 4 14 21 23 21

Parallel Merge Sort A: input t (unsorte ted) template <typename T> B: outp tput t (sorte ted) void MergeSort(T *B, T *A, int n) { if (n==1) { C: te temporary B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } } 3 4 12 14 19 21 33 46 me merge ge 3 12 19 46 4 14 21 33 me merge ge 3 19 12 46 4 33 14 21 merge me ge 19 3 12 46 33 4 21 14 22

Work of Merge Sort template <typename T> void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; CASE E 2: cilk_spawn MergeSort(C, A, n/2); n log b a = n log 2 2 = n MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; f(n) = Θ (n log b a lg 0 n) Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } } Work: Work: T 1 (n) = 2T 1 (n/2) + Θ (n) = Θ (n lg n) 23

Span of Merge Sort template <typename T> void MergeSort(T *B, T *A, int n) { CASE E 3: if (n==1) { B[0] = A[0]; n log b a = n log 2 1 = 1 } else { T* C = new T[n]; f(n) = Θ (n) cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } } Sp Span: n: T ∞ (n) = T ∞ (n/2) + Θ (n) = Θ (n) 24

Parallelism of Merge Sort Work: Work: T 1 (n) = Θ (n lg n) Sp Span: n: T ∞ (n) = Θ (n) T 1 (n) Pa Parallelism: llelism: = Θ (lg n) T ∞ (n) We need to to parallelize th the merge! 25

Throw away at t Parallel Merge least t na/2 ≥ n/4 0 ma = na/2 na A ≤ A[ma] ≥ A[ma] Recu Recurs rsiv ive e Recu Recurs rsiv ive e Bin Binary S ary Search earch P_M _Merge P_M _Merge na ≥ nb B ≤ A[ma] ≥ A[ma] 0 mb-1 mb nb K EY EY I I DEA DEA : If the total number of elements to be merged in the two arrays is n = na + nb, the total number of elements in the larger of the two recursive merges is at most (3/4) n . 26

Parallel Merge template <typename T> void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { P_Merge(C, B, A, nb, na); } else if (na==0) { return; } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } } Coarsen base cases for efficiency. 27

Span of Parallel Merge template <typename T> void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { ⋮ int mb = BinarySearch(A[ma], B, nb); CASE E 2: C[ma+mb] = A[ma]; n log b a = n log 4/3 1 = 1 cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); f(n) = Θ (n log b a lg 1 n) cilk_sync; } } Sp Span: n: T ∞ (n) = T ∞ (3n/4) + Θ (lg n) = Θ (lg 2 n ) 28

CS 240A : Divide-and-Conquer with Cilk++ Divide & Conquer - PowerPoint PPT Presentation

CS 240A : Divide-and-Conquer with Cilk++ Divide & Conquer Paradigm Solving recurrences Sorting: Quicksort and Mergesort Thanks to to Charles E. E. Leiserson for some of th these slides 1 Work and Span (Recap) T P = execution

Lecture 20- ECE 240a Distributed Feedback Lasers 1 ECE 240a Lasers - Fall 2019 Lecture 20

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Cilk for High Cilk for High Productivity Computing Productivity Computing Bradley C. Kuszmaul

Divide-Conquer-Glue Algorithms Divide-and-conquer. Divide up problem into several subproblems.

Week 2 Growth of Functions Divide-and- Divide and Conquer Conquer Min-Max- Problem Tutorial

Divide and Conquer Algorithm Design Techniques Greedy Divide and Conquer Dynamic Programming

Divide and Conquer Summary Divide Identify one or more subproblems Conquer Solve

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA

Divide and conquer 1 The main idea for the divide and conquer is trying to divide a problem into

Divide and conquer Philip II of Macedon Divide and conquer 1) Divide your problem into

Divide-Conquer-Glue Algorithms Divide-and-conquer. Mergesort and Counting Inversions Divide

Divide and Conquer Algorithms Divide-and-Conquer The most-well known algorithm design strategy:

CSC 151 Spring 2020 Topic: Merge Sort May 4, 2020 Day 39 Self Checks Divide and Conquer

Module 2: Divide and Conquer Module 2: Divide and Conquer Harivinod N Harivinod N Dept. of

Outline and Reading Divide-and-conquer paradigm (5.2) Divide-and-Conquer Review Merge-sort

A divide-and-conquer algorithm for a symmetric eigenproblem Binh T. Nguyen Anh-Duc Luong-Thanh

External Sorting Module 2, Lecture 6 Database Management Systems, R. Ramakrishnan 1 Why Sort?

28: More Sorting Mergesort review analysis Lower bound on comparison-based sorting Mergesort: A

Lectures 6 and 7: Merge-sort and Maximum Subarray Problem COMS10007 - Algorithms Dr. Christian

Some Efficient Sorting Algorithms Spring Semester 2011 Programming and Data Structure 46 Two

r qr

Divide-Conquer-Glue Tyler Moore CSE 3353, SMU, Dallas, TX February 19, 2013 Portions of these

Sorting Insertion sort Bubble sort Divide and conquer sorting Sorting Last time: introduction

Informatik II Ubung 4 FS 2020 1 Program Today Survey Productive Failure Case Study 1 1

CS 240A : Divide-and-Conquer with Cilk++ Divide & Conquer - PowerPoint PPT Presentation

CS 240A : Divide-and-Conquer with Cilk++ Divide & Conquer Paradigm Solving recurrences Sorting: Quicksort and Mergesort Thanks to to Charles E. E. Leiserson for some of th these slides 1 Work and Span (Recap) T P = execution

Lecture 20- ECE 240a Distributed Feedback Lasers 1 ECE 240a Lasers - Fall 2019 Lecture 20

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Cilk for High Cilk for High Productivity Computing Productivity Computing Bradley C. Kuszmaul

Divide-Conquer-Glue Algorithms Divide-and-conquer. Divide up problem into several subproblems.

Week 2 Growth of Functions Divide-and- Divide and Conquer Conquer Min-Max- Problem Tutorial

Divide and Conquer Algorithm Design Techniques Greedy Divide and Conquer Dynamic Programming

Divide and Conquer Summary Divide Identify one or more subproblems Conquer Solve

CS 240A: Shared Memory &amp; Multicore Programming with Cilk++ Multicore and NUMA

Divide and conquer 1 The main idea for the divide and conquer is trying to divide a problem into

Divide and conquer Philip II of Macedon Divide and conquer 1) Divide your problem into

Divide-Conquer-Glue Algorithms Divide-and-conquer. Mergesort and Counting Inversions Divide

Divide and Conquer Algorithms Divide-and-Conquer The most-well known algorithm design strategy:

CSC 151 Spring 2020 Topic: Merge Sort May 4, 2020 Day 39 Self Checks Divide and Conquer

Module 2: Divide and Conquer Module 2: Divide and Conquer Harivinod N Harivinod N Dept. of

Outline and Reading Divide-and-conquer paradigm (5.2) Divide-and-Conquer Review Merge-sort

A divide-and-conquer algorithm for a symmetric eigenproblem Binh T. Nguyen Anh-Duc Luong-Thanh

External Sorting Module 2, Lecture 6 Database Management Systems, R. Ramakrishnan 1 Why Sort?

28: More Sorting Mergesort review analysis Lower bound on comparison-based sorting Mergesort: A

Lectures 6 and 7: Merge-sort and Maximum Subarray Problem COMS10007 - Algorithms Dr. Christian

Some Efficient Sorting Algorithms Spring Semester 2011 Programming and Data Structure 46 Two

r qr

Divide-Conquer-Glue Tyler Moore CSE 3353, SMU, Dallas, TX February 19, 2013 Portions of these

Sorting Insertion sort Bubble sort Divide and conquer sorting Sorting Last time: introduction

Informatik II Ubung 4 FS 2020 1 Program Today Survey Productive Failure Case Study 1 1

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA