parallel models
play

Parallel Models An abstract description of a real world parallel - PDF document

Parallel Models An abstract description of a real world parallel machine. Attempts to capture essential Advanced Algorithms features (and suppress details?) What other models have we seen so Piyush Kumar far? (Lecture e 10:


  1. Parallel Models • An abstract description of a real world parallel machine. • Attempts to capture essential Advanced Algorithms features (and suppress details?) • What other models have we seen so Piyush Kumar far? (Lecture e 10: Parallel el Algorithms) RAM? External Memory Model? Courtesy Baker 05. RAM Parallel RAM aka PRAM • Generalization of RAM • Random Access Machine Model • P processors with their own programs (and – Memory is a sequence of bits/words. unique id) – Each memory access takes O(1) time. • MIMD processors : At each point in time – Basic operations take O(1) time: the processors might be executing Add/Mul/Xor/Sub/AND/not… different instructions on different data. – Instructions can not be modified. • Shared Memory – No consideration of memory hierarchies. • Instructions are synchronized among the – Has been very successful in modelling real processors world machines. PRAM Variants of CRCW • Common CRCW: CW iff processors write same value. • Arbitrary CRCW Shared Memory EREW/ERCW/CREW/CRCW • Priority CRCW • Combining CRCW EREW: A program isnt allowed to access the same memory location at the same time. 1

  2. Why PRAM? PRAM Algorithm design. • Lot of literature available on • Problem 1: Produce the sum of an algorithms for PRAM. array of n numbers. • One of the most “clean” models. • RAM = ? • Focuses on what communication is • PRAM = ? needed ( and ignores the cost/means to do it) Problem 2: Prefix Computation Prefix computation • Suffix computation is a similar Let X = {s 0 , s 1 , …, s n-1 } be in a set S problem. Let  be a binary , associative , closed operator with respect to S • Assumes Binary op takes O(1) (usually Q (1) time – MIN, MAX, AND, +, ...) • In RAM = ? The result of s 0  s 1  …  s k is called the k-th prefix Computing all such n prefixes is the parallel prefix computation 1 st prefix s 0 2 nd prefix s 0  s 1 3 rd prefix s 0  s 1  s 2 ... ... s 0  s 1  ...  s n-1 (n -1)th prefix Prefix Computation (Akl) 2

  3. Problem 3: Array packing EREW PRAM Prefix computation • Assume that we have – an array of n elements, X = {x 1 , x 2 , ... , x n } • Assume PRAM has n processors and n is a power of 2. – Some array elements are marked (or Input: s i for i = 0,1, ... , n-1 . • distinguished ). • Algorithm Steps: • The requirements of this problem are to for j = 0 to (lg n) -1, do – pack the marked elements in the front part of for i = 2 j to n-1 do the array. – place the remaining elements in the back of the h = i - 2 j array. s i = s h  s i • While not a requirement, it is also desirable to endfor – maintain the original order between the endfor marked elements – maintain the original order between the unmarked elements Total time in EREW PRAM? EREW PRAM Algorithm In RAM? 1. Set s i in P i to 1 if x i is marked and set s i = 0 otherwise. • How would you do this? 2. Perform a prefix sum on S =(s 1 , s 2 ,..., s n ) to obtain destination d i = s i for each marked x i . • Inplace? 3. All PEs set m = s n , the total nr of marked elements. • Running time? 4. P i sets s i to 0 if x i is marked and otherwise • Any ideas on how to do this in PRAM? sets s i = 1. 5. Perform a prefix sum on S and set d i = s i + m for each unmarked x i . 6. Each P i copies array element x i into address d i in X. Problem 4: PRAM Array Packing MergeSort • Assume n processors are used above. • Optimal prefix sums requires O(lg n) time. • The EREW broadcast of s n needed in Step 3 takes • RAM Merge Sort Recursion? O(lg n) time using a binary tree in memory • PRAM Merge Sort recursion? • All and other steps require constant time. • Runs in O(lg n) time and is cost optimal. • Can we speed up the merging? • Maintains original order in unmarked group as well – Merging n elements with n processors can be Notes: done in O(log n) time. • Algorithm illustrates usefulness of Prefix Sums • There many applications for Array Packing – Assume all elements are distinct algorithm – Rank(a, A) = number of elements in A smaller than a. For example rank(8, {1,3,5,7,9}) = 4 3

  4. PRAM Merging PRAM Merge Sort • T(n) = T(n/2) + O(log n) • Using the idea of pipelined d&c PRAM A = 2,3,10,15,16 B = 1,8,12,14,19 Mergesort can be done in O(log n). Rank(2)=1 +1 Rank(1)=0 +1 • D&C is one of the most powerful Rank(3)=1 +2 Rank(8)=2 +2 techniques to solve problems in Rank(10)=2 +3 Rank(12)=3 +3 parallel. Rank(15)=4 +4 Rank(14)=3 +4 Rank(16)=4 +5 Rank(19)=5 +5 1 2 3 8 10 12 14 15 16 19 Closest Pair: RAM Version Problem 5: Closest Pair Closest-Pair(p 1 , …, p n ) { Compute separation line L such that half the points O(n log n) • RAM Version ? are on one side and half on the other side.  1 = Closest-Pair(left half) 2T(n / 2)  2 = Closest-Pair(right half)  = min(  1 ,  2 ) Delete all points further than  from separation line L L O(n) 7 O(n log n) Sort remaining points by y-coordinate. 6 Scan points in y-order and compare distance between 5 21 4 O(n) each point and next 11 neighbors. If any of these distances is less than  , update  .  = min(12, 21) return  . 12 3 } 2 1 Closest Pair: PRAM Version? Closest-Pair(p 1 , …, p n ) { Compute separation line L such that half the points O(1) are on one side and half on the other side. Use sorted lists  1 = Closest-Pair(left half) In parallel T(n / 2) Other Interesting  2 = Closest-Pair(right half)  = min(  1 ,  2 ) Delete all points further than  from separation line L Algorithms Use presorting and O(log n) prefix Sort remaining points by y-coordinate. computation. Scan points in y-order and compare distance between O(1) each point and next 11 neighbors. Find min of all these distances, update  . O(log n) Again use prefix return  . computation. } Recurrence : T(n) = T(n/2) + O(log n) 4

  5. Interesting Classes at A List FSU • Approximation Algorithms • Online Algorithms In case you liked this class: • Learning Algorithms • Network Algorithms – Parallel Algorithms • Advanced Data Structures. – Computational Geometry • Flow Algorithms. • Algorithmic Game Theory – Advanced Algorithms • Quantum Algorithms. • Geometric Algorithms Next Class • Practice Problem Solving for Finals. • Extra Office Hours : – Wednesday, I will be in office and accessible anytime for questions. 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend