cs137 today electronic design automation
play

CS137: Today Electronic Design Automation Sequential Sorting - PDF document

CS137: Today Electronic Design Automation Sequential Sorting Building on Parallel Prefix Systolic Sort Day 12: February 6, 2006 Priority Queue Streaming Sort Sorting Mesh Sort (Shear Sort) Sorting


  1. CS137: Today Electronic Design Automation • Sequential Sorting • Building on Parallel Prefix • Systolic – Sort Day 12: February 6, 2006 – Priority Queue • Streaming Sort Sorting • Mesh Sort (Shear Sort) • Sorting Networks • Parallel Merge Sort 1 2 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Sequential Sort Sequential Merge Sort • What’s your favorite sequential sort? • Observe: can merge two sorted list of length N in O(N) time • Runtime? • Start with N lists of length 1 • Merge to for N/2 lists of length 2 • Merge to form N/4 lists of length 4 • …how many times? • Each merge? 3 4 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Sequential Merge Sort • Observe: can merge two sorted list of length N in O(N) time Parallel Sorting • Merge successively longer lists • log(N) merges prefix • Each takes time O(N) • Sort in: O(N log(N)) 5 6 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 1

  2. Day 9 Rank Finding Rank-based Sort • In O(log 2 (N)) time on N processors can find • Looking for I’th ordered element the I’th element • Do a prefix-sum on high-bit only • Use separate groups of N processors to find – Know m=number of things > 01111111… the 1 st , 2 nd , 3 rd , … element in parallel • High-low search on result • Also count the number of such elements in O(log(N)) time using parallel prefix – I.e. if number > I, recurse on half with – Give each unique offset leading zero • Send each element to its correct position – If number < I, search for (I-m)’th element in • � O(log 2 (N)) sorting algorithm with O(N 2 ) half with high-bit true processors • Find I’th element in log 2 (N) time 7 8 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Rank Sort Analysis • Area N 2 • Time log 2 (N) Systolic • Work: (N log(N)) 2 � square of sequential work One Dimensional Array 9 10 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Sort as Data Arrives Linear Systolic Sort • Often receive data as a sequential stream • Often receive data as a sequential stream • Can I sort the data as it arrive? • Can I sort the data as it arrive? • Build a systolic solution? • Build a systolic solution? – Use only local interconnect – Use only local interconnect Cell traps largest value 11 12 [Basic approach from Leighton] CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 2

  3. Linear Systolic Sort Analysis Priority Queue • Area N • Insert top • Time N • Extract Largest • Work: N 2 • With O(N) cells • O(1) Extract • Allows interleave insert/delete 13 14 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Priority Queue Idea Priority Queue Cell Largest New • Trap Largest • If (Cin=insert) value – Like Linear Sort Alocal � largest • Largest always at Bout � smallest front • If (Cin=extract) – Always immediately Alocal � Ain available Bout � Bin • On extract • Cout � Cin – Shift up Next Largest 15 16 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Streaming Sort • Can we sort streaming data with O(log(N)) hardware? Streaming Merge Sort • How do you sort efficiently in SCORE? – Pipe-and-filter System Architecture? 17 18 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 3

  4. Build Merge Tree Streaming Sort • Merge Sort stream Observe: early merges run at lower frequency than later… After log(N) merges, output stream is sorted. 19 20 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Streaming Sort Streaming Sort Analysis • Area log(N) compare/switch – O(N) memory – [also true of sequential case] • Time O(N) • Work: O(N log(N)) – Work efficient •Buffer lengths grow by 2× each stage. •Total memory: 2 × (N/2) + 2×(N/4) + 2×(N/8) +… ≤ 2N 21 22 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Mesh Sort • Start with N items in √ N× √ N mesh Mesh Sort • Sort into specified order • Nearest-neighbor communication only 23 24 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 4

  5. Observation 1 Shearsort • Can sort m things on linear array in • Algorithm: alternate sorting rows and columns for log(N)+1 steps O(m) time – i.e. sort rows on odd steps; columns on – Perform Parallel Bubble sort in m steps even steps – i.e. alternate odd/even swap pairings – Sort odd rows ascending, even rows descending – Can use even/odd swapping for row/column sorts • O( √ N log(N)) 25 26 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Simplifying Lemma Shearsort Works? • General form after column sort: • 0-1 Sorting Lemma: If an oblivious – 0 rows comparison-exchange algorithm sorts – Mixed (dirty) rows all input sets consisting of solely 0’s and – 1 rows 1’s, then it sorts all input sets with • Consider all row pairs: – 3 cases arbitrary values • More zeros, more ones, equal number – proof in Leighton – Row sort puts all zeros on one side, ones on other – Column sort � one of the pair ends up all • Odd/even swapping is an oblivious ones/zeros comparison-exchange – Therefore, each row/column sort cuts the number of “dirty” rows in half 27 28 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Shearsort Works? Rounding up Steps • Each sort m= √ N steps • Consider all row pairs: – 3 cases • log( √ N ) row/column sorts to remove • More zeros, more ones, equal number dirty rows – Row sort puts all zeros on one side, ones on other – Column sort � one of the pair ends up all • 2 log( √ N ) =log(N) ones/zerso • Total steps: √ N log(N) – Therefore, each row/column sort cuts the number of “dirty” rows in half 10001000 row 00000011 column 00000000 10101001 11110000 11110011 Dirty Rows after column sort 29 30 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 5

  6. Shear Sort Analysis Mesh Sort • Can do Mesh sort in O( √ N) steps • Area N • Time √ N log(N) – Best could hope to do – Best could hope to do is √ N w/ nearest- neighbor connections in 2D world • More complicated…see Leighton – Asymptotically in any 2D world • Work: N 1.5 log(N) 31 32 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Extend to 3D Array Sorting ∝ Movement • Can sort N numbers on N 1/3 × N 1/3 × N 1/3 • If you believe array in O(N 1/3 ) steps – We only have 3 dimensions – Sort zx into zx order – Signal transport is bounded by speed of – Sort yz into zy-order light – Sort xy into yx-order (reversing order on every-other plane) • This is asymptotically tight – Two-steps of odd/even merging within – Cannot do any better. each z-line – Will take O(N 1/3 ) time just to transport an – Sort xy into yx-order item from start location to destination 33 34 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Sorting Network • Build a spatial sorting network: Sorting Networks (from Knuth) Too big, too fast? � bit serial datapath elements? 35 36 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 6

  7. Systematic Construction: Systematic Construction: Step 1: Merge Network Sorting Network • Recursively swap large/small elements • Perform recursive merging from halves of network – log(N) merge networks – Merge in log(N) steps • Of depth log(N), log(N)-1… A0 – Depth: O(log 2 (N)) A1 – Area: O(N log 2 (N)) A2 • Can be used in pipelined fashion A3 B3 – Only using O(N) hardware exclusively per step B2 B1 B0 37 38 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Parallel Merge Sort • With O(N) processors • Sort in O(log 2 (N)) steps Parallel Merge Sort • Sequentially executing the O(log 2 (N)) pairwise swaps of the sorting network • Randomized algorithm – Works in O(log(N)) steps • With high probability • …see Leighton 39 40 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Admin • Wednesday, Friday: NC • Project: two things due in two weeks – Sequential baseline – Proposed plan of attack 41 CALTECH CS137 Winter2006 -- DeHon 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend