CSL 860: Modern Parallel Computation Computation PARALLEL - PowerPoint PPT Presentation

CSL 860: Modern Parallel Computation Computation

PARALLEL ALGORITHM TECHNIQUES: BALANCED BINARY TREE

Reduction • n operands => log n steps • Total work = O(n) • How do you map? Balance Binary tree technique

Reduction • n operands => log n steps • How do you map? • n/2 i processors per step 0 0 4 0 2 4 6 0 1 2 3 4 5 6 7

Reduction • n operands => log n steps • Only have p processors per step 0 • Agglomerate and Map 0 2 Processor dependence: Binomial tree 0 1 2 3 0 0 1 1 2 2 3 3

Binomial Tree • B 0 : single node: Root • B k : Root with k binomial subtrees, B 0 ... B k-1 B 0 B 1 B 2 B 3

Prefix Sums • P[0] = A[0] • For i = 1 to n-1 – P[i] = P[i-1] + A[i]

Recursive Prefix Sums prefixSums(s, x, 0:n) { parallel for i in 0:n/2 y[i] = Op(x[2*i], x[2*i+1]) prefixSum(z, y, 0:n/2) prefixSum(z, y, 0:n/2) s[0] = x[0] parallel for i in 1:n if(i&1) s[i] = z[i/2] else s[i] = op(z[i/2-1 ], x[i]) Or op -1 (z[i/2], x[i]) id op invertible }

Prefix Sums • P[0] = A[0] • For i = 1 to n-1 – P[i] = P[i-1] + A[i] S[n/2:n] S[n/2:n] S(0:n/2] S(0:n/2] S[n/2:3n/4]

Prefix Sums • P[0] = A[0] • For i = 1 to n-1 – P[i] = P[i-1] + A[i] S(0:n/2] S(0:n/2] S[n/2:n] S(0:n/2] S(0:3n/4]

Non-recursive Prefix Sums • parallel for i in 0:n – B[0][i] = A[i] • for h in 0:log n – parallel for i in 0:n/2 h • B[h][i] = B[h-1][2i] op B[h-1][2i+1] • for h in log n:0 – C[h][0] = B[h][0] – parallel for i in 1:n/2 h • Odd: C[h][i] = C[h+1][i/2] • Even: C[h][i] = C[h+1][(i/2-1] op B[h][i]

Prefix Sums: Data flow up B[3][0] B[2][0] B[2][0] B[2][1] B[2][1] B[1][0] B[1][1] B[1][2] B[1][3] B[0][0] B[0][1] B[0][2] B[0][3] B[0][4] B[0][5] B[0][6] B[0][7]

Prefix Sums: Data flow down C[3][0] = B[3][0] C[2][0] C[2][0] C[2][1] C[2][1] C[1][0] C[1][1] C[1][2] C[1][3] C[0][0] C[0][1] C[0][2] C[0][3] C[0][4] C[0][5] C[0][6] C[0][7]

Processor Mapping P0 P0 P0 P1 P1 P0 P0 P1 P1 P0 P0 P0 P0 P1 P1 P1 P1

Balanced Tree Approach • Build binary tree on the input – Hierarchically divide into groups • and groups of groups.. • Traverse tree upwards/downwards • Traverse tree upwards/downwards • Useful to think of “tree” network topology – Only for algorithm design – Later map sub-trees to processors

PARALLEL ALGORITHM TECHNIQUES: PARTITIONING

Merge Sorted Sequences (A,B) • Determine Rank of each element in A U B • Rank(x, A U B) = Rank(x, A) + Rank(x, B) – Only need one of them, if A and B are each sorted • Find Rank(A, B), and similarly Rank(B, A) Find Rank(A, B), and similarly Rank(B, A) • Find Rank by binary search • O(log n) time • O(n log n) work

Optimal Merge (A,B) • Partition A and B into ‘log n’ sized blocks • Choose from B, elements i * log n, i = 0:n/log n • Rank each chosen element of B in A – Binary search • Merge pairs of sub-sequences Merge pairs of sub-sequences – If |A i | = log(n), Sequential merge in time O(log(n) ) – Otherwise, partition A i into log n blocks • And Recursively subdivide B i into sub-sub-sequences • Total time is O(log(n)) • Total work is O(n)

Optimal Merge (A,B) • Partition A and B into √n blocks • Choose from B, elements i (√n), i=(0: √n] • Rank each chosen element of B in A – Parallel search using √n processors each search Parallel search using √n processors each search • Recursively merge pairs of sub-sequences – Total time: T(n) = O(1)+T(n/2) = O(log log n) – Total work: W(n) = O(n)+T(n/2) = O(n log log n) • “Fast” but still need to reduce work

Optimal Merge (A,B) • Use the fast, but non-optimal, algorithm on small enough subsets • Subdivide A and B into blocks of size log log n – A 1 , A 2 , .. 1 2 – B 1 , B 2 , .. • Select first element of each block – A’ = p 1 , p 2 .. – B’ = q 1 , q 2 .. • Now merge loglogn sized blocks n/loglogn times

Optimal Merge (A,B) • Merge A’ and B’ – find Rank(A’:B’), Rank(B’:A’) – using fast non-optimal algorithm – Time = O(log log n) – Work = O(n) • Compute Rank(A’:B) and Rank(B’:A) – If Rank(p i , B) is r i , p i lies in block B ri If Rank(p , B) is r , p lies in block B – Search sequentially – Time = O(log log n) – Work = O(n) • Compute ranks of remaining elements – Sequentially – Time = O(log log n) – Work = O(n)

Quick Sort • Choose the pivot – Select median? • Subdivide into two groups – Group sizes linearly related with high probability Group sizes linearly related with high probability • Sort each group independently

QuickSort Algorithm QuickSort(int A[], int first, int last) { Select random m in [first:last] // A[ m ] is pivot parallel for i in [first:last] parallel for i in [first:last] flag[i] = A[i] < A[m]; Split(A); // Separate flag values 0 and 1, A[m] goes to k // Use Prefix Sum Quicksort A[first:k-1] and A[k+1:last] }

Quick Sort • Choose the pivot – Select median? • Subdivide into two groups – Group sizes linearly related with high probability – Group sizes linearly related with high probability • Sort each group independently • Expected O(log n ) rounds • Time per round = O(log n) • Total work = O(n log n) with high probability

Partitioning Approach • Break into p roughly equal sized problems • Solve each sub-problem – Preferably, independently of each other • Focus on subdividing into independent parts Focus on subdividing into independent parts

PARALLEL ALGORITHM TECHNIQUES: DIVIDE AND CONQUER

Merge Sort • Partition data into two halves – Assign half the processors to each half – If only one processor remains, sequentially sort • Sort each half Sort each half • Merge results • More on this later

Convex Hull

PARALLEL ALGORITHM TECHNIQUES: ACCELERATED CASCADING

Min-find Input: array with n numbers Algorithm A1 using O(n 2 ) processors: parallel for i in (0:n] M[i]:=0 parallel for i,j in 0:n parallel for i,j in 0:n if i ≠ j && C[i] < C[j] M[j]=1 parallel for i in 0:n if M[i]=0 min = A[i] Not optimal: O(n 2 work)

Optimal Min-find • Balanced Binary tree – O(log n) time – O(n) work => Optimal • Use Accelerated cascading • Use Accelerated cascading • Make the tree branch much faster – Number of children of node u = √n u • Where n u is the number of leaves in u’s subtree – Works if the operation at each node can be performed in O(1)

From n 2 processors to n√n A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 Step 1: Partition into disjoint blocks of size √n Step 2: Apply A1 to each block n n Step 3: Apply A1 to the results from the step 2 n

From n√n processors to n 1+1/4 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 Step 1: Partition into disjoint blocks of size n n 1/2 n 3/4 Step 2: Apply A2 to each block n 3/4 Step 3: Apply A2 to the results from the step 2

n 2 -> n 1+1/2 -> n 1+1/4 -> n 1+1/8 -> n 1+1/16 ->… -> n 1+1/k ~ n 1 ? 1 + ε • Algorithm A k takes “O(1) time” with processors k n Algorithm A k+1 Algorithm A k+1 1. Partition input array C (size n) into disjoint blocks of size n 1/2 each 2. Solve for each block in parallel using algorithm A k 3. Re-apply A k to the results of step 3: n/ n 1/2 minima Doubly logarithmic-depth tree n log log n work, log log n time

Min-Find Review • Constant-time algorithm – O(n 2 ) work • O(log n) Balanced Tree Approach – O(n) work Optimal – O(n) work Optimal • O(loglog n) Doubly-log depth tree Approach – O(n loglog n) work – Degree is high at the root, reduces going down • #Children of node u = √ (#nodes in tree rooted at u) • Depth = O(log log n)

Accelerated Cascading • Solve recursively • Start bottom-up with the optimal algorithm – until the problem sizes is smaller • Switch to fast (non-optimal algorithm) • Switch to fast (non-optimal algorithm) – A few small problems solved fast but non-work- optimally • Min Find: – Optimal algorithm for lower loglog n levels – Then switch to O(n loglog n)-work algorithm n work, log log n time

CSL 860: Modern Parallel Computation Computation PARALLEL - PowerPoint PPT Presentation

CSL 860: Modern Parallel Computation Computation PARALLEL ALGORITHM TECHNIQUES: BALANCED BINARY TREE Reduction n operands => log n steps Total work = O(n) How do you map? Balance Binary tree technique Reduction n operands

CSL 860: Modern Parallel Computation Computation Hello OpenMP #pragma omp parallel { // I am

CSL 860: Modern Parallel Computation Computation MPI: MESSAGE PASSING INTERFACE Message

CSL 860: Modern Parallel Computation Computation Categories of Processing Flynns classification

CSL 860: Modern Parallel Computation Computation Course Information

CSL 860: Modern Parallel Computation Computation MEMORY CONSISTENCY Intuitive Memory Model

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

2019 AGM 203-1634 Harvey Ave., Kelowna, BC, Canada Tel: +1 250 860 8599 Fax: +1 250 860 1362

Recent PVS Language Developments Sam Owre owre@csl.sri.com URL: http://www.csl.sri.com/~owre/

Graph Covers and Iterative Decoding of Finite-Length Codes Pascal O. Vontobel (CSL, UIUC) Ralf

Random Testing in PVS Sam Owre owre@csl.sri.com URL: http://www.csl.sri.com/~owre/ Computer

Modern Risk Modern Risk Modern Risk Management Modern Risk Management anagement Concepts:

Polyphaser, Transtector, RO Associates Kevin Turner -- Regional Sales Manager (860) 323-8012

Lead-Zinc Project 203-1634 Harvey Ave., Kelowna, BC, Canada February 2019 Tel: +1 250 860 8582

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 06: Learning with

MIT 9.520/6.860, Fall 2019 Statistical Learning Theory and Applications Class 02: Statistical

Appraisal Preparation Tips 3 Suggestions from an Appraiser Adam Wiener CRA CSL Hello My Name is

02110 String indexing Computational geometry Introduction to NP-completeness Inge Li

The recent switch lowering improvements Hans Wennborg hwennborg@google.com A Switch C: switch

22. Dynamic Programming III FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5],

For Friday No reading/homework Rough drafts due: Bring TWO hard copies to class

COMP 3403 Algorithm Analysis Part 4 Chapter 8 Jim Diamond CAR 409 Jodrey School of

Informed search algorithms This lecture topic Chapter 3.5-3.7 Next lecture topic Chapter

Linear Search int search(int[] list, int target, int n) { for (int i=1; i<=n; i++) if

INF3130: Dynamic Programming 12 sept. 2019 In the textbook: Ch. 9, and Section 20.5 The

CSL 860: Modern Parallel Computation Computation PARALLEL - PowerPoint PPT Presentation

CSL 860: Modern Parallel Computation Computation PARALLEL ALGORITHM TECHNIQUES: BALANCED BINARY TREE Reduction n operands => log n steps Total work = O(n) How do you map? Balance Binary tree technique Reduction n operands

CSL 860: Modern Parallel Computation Computation Hello OpenMP #pragma omp parallel { // I am

CSL 860: Modern Parallel Computation Computation MPI: MESSAGE PASSING INTERFACE Message

CSL 860: Modern Parallel Computation Computation Categories of Processing Flynns classification

CSL 860: Modern Parallel Computation Computation Course Information

CSL 860: Modern Parallel Computation Computation MEMORY CONSISTENCY Intuitive Memory Model

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

2019 AGM 203-1634 Harvey Ave., Kelowna, BC, Canada Tel: +1 250 860 8599 Fax: +1 250 860 1362

Recent PVS Language Developments Sam Owre owre@csl.sri.com URL: http://www.csl.sri.com/~owre/

Graph Covers and Iterative Decoding of Finite-Length Codes Pascal O. Vontobel (CSL, UIUC) Ralf

Random Testing in PVS Sam Owre owre@csl.sri.com URL: http://www.csl.sri.com/~owre/ Computer

Modern Risk Modern Risk Modern Risk Management Modern Risk Management anagement Concepts:

Polyphaser, Transtector, RO Associates Kevin Turner -- Regional Sales Manager (860) 323-8012

Lead-Zinc Project 203-1634 Harvey Ave., Kelowna, BC, Canada February 2019 Tel: +1 250 860 8582

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 06: Learning with

MIT 9.520/6.860, Fall 2019 Statistical Learning Theory and Applications Class 02: Statistical

Appraisal Preparation Tips 3 Suggestions from an Appraiser Adam Wiener CRA CSL Hello My Name is

02110 String indexing Computational geometry Introduction to NP-completeness Inge Li

The recent switch lowering improvements Hans Wennborg hwennborg@google.com A Switch C: switch

22. Dynamic Programming III FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5],

For Friday No reading/homework Rough drafts due: Bring TWO hard copies to class

COMP 3403 Algorithm Analysis Part 4 Chapter 8 Jim Diamond CAR 409 Jodrey School of

Informed search algorithms This lecture topic Chapter 3.5-3.7 Next lecture topic Chapter

Linear Search int search(int[] list, int target, int n) { for (int i=1; i&lt;=n; i++) if

INF3130: Dynamic Programming 12 sept. 2019 In the textbook: Ch. 9, and Section 20.5 The

Linear Search int search(int[] list, int target, int n) { for (int i=1; i<=n; i++) if