sorting chapter 9
play

Sorting (Chapter 9) Alexandre David B2-206 Sorting Problem - PowerPoint PPT Presentation

Sorting (Chapter 9) Alexandre David B2-206 Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = <a 1 ,a 2 ,,a n >. Sort S into S = <a 1 ,a 2 ,,a n


  1. Sorting (Chapter 9) Alexandre David B2-206

  2. Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = <a 1 ,a 2 ,…,a n >. Sort S into S’ = <a 1 ’,a 2 ’,…,a n ’> such that a i ’ ≤ a j ’ for 1 ≤ i ≤ j ≤ n and S’ is a permutation of S. 21-04-2006 Alexandre David, MVP'06 2

  3. Recall on Comparison Based Sorting Algorithms Bubble sort Θ ( n 2 ) Selection sort Insertion sort Ω ( n ) O( n 2 ) Quick sort Ω ( n log n ) Merge sort Θ ( n log n ) Heap sort 21-04-2006 Alexandre David, MVP'06 3

  4. Characteristics of Sorting Algorithms � In-place sorting: No need for additional memory (or only constant size). � Stable sorting: Ordered elements keep their original relative position. � Internal sorting: Elements fit in process memory. � External sorting: Elements are on auxiliary storage. 21-04-2006 Alexandre David, MVP'06 4

  5. Fundamental Distinction � Comparison based sorting: � Compare-exchange of pairs of elements. � Lower bound is Ω ( n log n ) (proof based on decision trees). � Merge & heap-sort are optimal. � Non-comparison based sorting: � Use information on the element to sort. � Lower bound is Ω (n) . � Counting & radix-sort are optimal. 21-04-2006 Alexandre David, MVP'06 5

  6. Issues in Parallel Sorting � Where to store input & output? � One process or distributed? � Enumeration of processes used to distribute output. � How to compare? � How many elements per process? � As many processes as element ⇒ poor performance because of inter-process communication. 21-04-2006 Alexandre David, MVP'06 6

  7. Parallel Compare-Exchange Communication cost: t s +t w . Comparison cost much cheaper ⇒ communication time dominates. 21-04-2006 Alexandre David, MVP'06 7

  8. Blocks of Elements Per Process n/p elements per process n elements P 0 P 1 P p-1 … Blocks: A 0 ≤ A 1 ≤ … ≤ A p-1 21-04-2006 Alexandre David, MVP'06 8

  9. Compare-Split For large blocks: Θ (n/p) Exchange: Θ ( t s +t w n/p ) Merge: Θ (n/p) Split: O(n/p) 21-04-2006 Alexandre David, MVP'06 9

  10. Sorting Networks � Mostly of theoretical interest. � Key idea: Perform many comparisons in parallel. � Key elements: � Comparators: 2 inputs, 2 outputs. � Network architecture: Comparators arranged in columns, each performing a permutation. � Speed proportional to the depth. 21-04-2006 Alexandre David, MVP'06 10

  11. Comparators 21-04-2006 Alexandre David, MVP'06 11

  12. Sorting Networks 21-04-2006 Alexandre David, MVP'06 12

  13. Bitonic Sequence Definition A bitonic sequence is a sequence of elements <a 0 ,a 1 ,…,a n > s.t. 1. ∃ i, 0 ≤ i ≤ n-1 s.t. <a 0 ,…,a i > is monotonically increasing and <a i+1 ,…,a n-1 > is monotonically decreasing, 2.or there is a cyclic shift of indices so that 1) is satisfied. 21-04-2006 Alexandre David, MVP'06 13

  14. Bitonic Sort � Rearrange a bitonic sequence to be sorted. � Divide & conquer type of algorithm (similar to quicksort) using bitonic splits . � Sorting a bitonic sequence using bitonic splits = bitonic merge. � But we need a bitonic sequence… 21-04-2006 Alexandre David, MVP'06 14

  15. Bitonic Split s 2 s 1 <a 0 ,a 1 ,…,a n/2-1 ,a n/2 ,a n/2+1 ,…,a n-1 > s 1 ≤ s 2 s 1 & s 2 bitonic! s 1 = <min{a 0 ,a n/2 },min{a 1 ,a n/2+1 },…,min{a n/2-1 ,a n-1 }> b i s 2 = <max{a 0 ,a n/2 },max{a 1 ,a n/2+1 },…,max{a n/2-1 ,a n-1 }> b i ’ 21-04-2006 Alexandre David, MVP'06 15

  16. Bitonic Merging Network log n stages n/2 comparators ⊕ BM[n] 21-04-2006 Alexandre David, MVP'06 16

  17. Bitonic Sort � Use the bitonic network to merge bitonic sequences of increasing length… starting from 2, etc. � Bitonic network is a component. 21-04-2006 Alexandre David, MVP'06 17

  18. Bitonic Sort log n stages Cost: O(log 2 n ). Simulated on a serial computer: O( n log 2 n ). 21-04-2006 Alexandre David, MVP'06 18

  19. Mapping to Hypercubes & Mesh – Idea � Communication intensive, so special care for the mapping. � How are the input wires paired? � Pairs have their labels differing by only one bit ⇒ mapping to hypercube straightforward. But not efficient & not scalable � For a mesh lower connectivity, several because the sequential algorithm solutions but worse than the hypercube is suboptimal. T P = Θ (log 2 n )+ Θ ( √ n ) for 1 element/process. � Block of elements: sort locally ( n/p log n/p ) & use bitonic merge ⇒ cost optimal. 21-04-2006 Alexandre David, MVP'06 19

  20. Bubble Sort procedure BUBBLE_SORT(n) begin for i := n-1 downto 1 do Θ ( n 2 ) for j := 1 to i do compare_exchange(a j ,a j+1 ); end � Difficult to parallelize as it is because it is inherently sequential. 21-04-2006 Alexandre David, MVP'06 20

  21. Odd-Even Transposition Sort Θ ( n 2 ) (a 1 ,a 2 ),(a 3 ,a 4 )… (a 2 ,a 3 ),(a 4 ,a 5 )… 21-04-2006 Alexandre David, MVP'06 21

  22. 21-04-2006 Alexandre David, MVP'06 22

  23. Odd-Even Transposition Sort � Easy to parallelize! � Θ ( n ) if 1 process/element. � Not cost optimal but use fewer processes, an optimal local sort, and compare-splits: ⎛ ⎞ ( ) ( ) n n ⎜ ⎟ = Θ + Θ + Θ log T P n n ⎜ ⎟ ⎝ ⎠ p p Cost optimal for p = O(log n ) local sort (optimal) + comparisons + communication but not scalable (few processes). 21-04-2006 Alexandre David, MVP'06 23

  24. Improvement: Shellsort � 2 phases: � Move elements on longer distances. � Odd-even transposition but stop when no change. � Idea: Put quickly elements near their final position to reduce the number of iterations of odd-even transposition. 21-04-2006 Alexandre David, MVP'06 24

  25. 21-04-2006 Alexandre David, MVP'06 25

  26. Quicksort � Average complexity: O( n log n ). � But very efficient in practice. � Average “robust”. � Low overhead and very simple. � Divide & conquer algorithm: � Partition A[q..r] into A[q..s] ≤ A[s+1..r]. � Recursively sort sub-arrays. � Subtlety: How to partition? 21-04-2006 Alexandre David, MVP'06 26

  27. q r 3 2 1 5 3 8 4 5 3 7 3 3 2 1 3 7 8 4 5 8 7 3 1 2 1 3 3 7 5 4 5 7 8 1 2 3 3 4 5 5 4 7 8 21-04-2006 Alexandre David, MVP'06 27

  28. BUG 21-04-2006 Alexandre David, MVP'06 28

  29. Parallel Quicksort � Simple version: � Recursive decomposition with one process per recursive call. � Not cost optimal: Lower bound = n (initial partitioning). � Best we can do: Use O(log n ) processes. � Need to parallelize the partitioning step. 21-04-2006 Alexandre David, MVP'06 29

  30. Parallel Quicksort for CRCW PRAM � See execution of quicksort as constructing a binary tree. 3,2,1 3 7,4,5,8 5,4 8 3 7 1,2 1 5 8 2 4 21-04-2006 Alexandre David, MVP'06 30

  31. Text & algorithm 9.5: A[p..s] ≤ x < A[s+1..q]. Figures & algorithm 9.6: A[p..s] < x ≤ A[s+1..q]. BUG 21-04-2006 Alexandre David, MVP'06 31

  32. only one succeeds A[i] ≤ A[parent i ] 21-04-2006 Alexandre David, MVP'06 32

  33. 1 2 1 1 2 1 6 1 6 1 2 1 6 1 1 3 2 2 3 1 4 5 5 8 6 4 7 3 8 7 2 6 root=1 1 3 2 6 21-04-2006 Alexandre David, MVP'06 33

  34. 1 1 2 1 6 5 1 1 6 1 2 1 3 6 1 5 1 3 2 2 3 1 4 5 5 8 6 4 7 3 8 7 2 6 3 1 5 1 3 2 2 6 4 3 1 7 3 5 8 4 5 8 7 Each step: Θ (1). Average height: Θ (log n ). This is cost-optimal – but it is only a model. 21-04-2006 Alexandre David, MVP'06 34

  35. Parallel Quicksort – Shared Address (Realistic) � Same idea but remove contention: � Choose the pivot & broadcast it. � Each process rearranges its block of elements locally . � Global rearrangement of the blocks. � When the blocks reach a certain size, local sort is used. 21-04-2006 Alexandre David, MVP'06 35

  36. 21-04-2006 Alexandre David, MVP'06 36

  37. 21-04-2006 Alexandre David, MVP'06 37

  38. Cost � Scalability determined by time to broadcast the pivot & compute the prefix-sums. � Cost optimal. 21-04-2006 Alexandre David, MVP'06 38

  39. MPI Formulation of Quicksort � Arrays must be explicitly distributed. � Two phases: � Local partition smaller/larger than pivot. � Determine who will sort the sub-arrays. � And send the sub-arrays to the right process. 21-04-2006 Alexandre David, MVP'06 39

  40. Final Word � Pivot selection is very important. � Affects performance. � Bad pivot means idle processes. 21-04-2006 Alexandre David, MVP'06 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend