Principle of the radix sort Sorts a list of fixed size integer keys - PowerPoint PPT Presentation

Principle of the radix sort • Sorts a list of fixed size integer keys - Separates the key into individual digits of some radix - Sorts digit-by-digit • In this case we use a least significant digit sort - Sorts first the least significant digit of the key, then the next and so on - Provides a stable sort • Radix sort is O(kn) where n is the number of keys and k the key length - Grows linearly with the data set size, assuming constant memory performance - It is not necessarily the fastest sort available Acknowledgement: These slides were provided by Ben Gaster and Lee Howes of AMD -1

Steps in computation • Take the least significant digit of the key • Group the keys based on the value of that digit - Use a counting sort to achieve this - Maintain the ordering of keys within a value of the digit • Repeat the process moving on to the next digit -2

Sorting a simple radix-2 key: first digit • We wish to sort the following set of keys in radix-2 - Each digit is 1 bit wide - Sort the least significant bit first The ¡original ¡data ¡ Input ¡keys ¡ 3 1 1 0 3 2 2 1 1 0 0 2 First ¡itera0on, ¡least ¡significant ¡digit ¡(or ¡bit) ¡ Digit ¡to ¡sort ¡ 1 1 1 0 1 0 0 1 1 0 0 0 Number ¡ Count ¡the ¡number ¡of ¡1s ¡and ¡0s ¡in ¡the ¡set ¡(1s ¡shown) ¡ 0 1 2 3 3 4 4 4 5 6 6 6 of ¡1s ¡ Order ¡keys ¡ Sort ¡the ¡keys ¡based ¡on ¡the ¡first ¡digit ¡ 0 2 2 0 0 2 3 1 1 3 1 1 by ¡digit ¡ -3

Sorting a simple radix-2 key: second digit • We wish to sort the following set of keys in radix-2 - Each digit is 1 bit wide - Sort the least significant bit first The ¡output ¡of ¡the ¡previous ¡itera0on ¡ Input ¡keys ¡ 0 2 2 0 0 2 3 1 1 3 1 1 Digit ¡to ¡sort ¡ 0 1 1 0 0 1 1 0 0 1 0 0 2 nd ¡itera0on, ¡second-‑least ¡significant ¡digit ¡(or ¡bit) ¡ Count ¡the ¡number ¡of ¡1s ¡and ¡0s ¡in ¡the ¡set ¡(1s ¡shown) ¡ 0 0 1 2 2 2 3 4 4 4 5 5 Number ¡ of ¡1s ¡ Order ¡keys ¡ Sort ¡the ¡keys ¡based ¡on ¡the ¡second ¡digit ¡ 0 0 0 1 1 1 1 2 2 2 3 3 by ¡digit ¡ -4

Implementing on the GPU • Sort the keys in radix-16 - 4 bit chunks on each sort pass - Only 16 counting buckets – easily within the scope of an efficient counting sort • Divide the set of keys into chunks - Sort into 16 buckets in local memory - More efficient global memory traffic scattering into only 16 address ranges • Global prefix sum to obtain write locations -5

High level view Dataset ¡ Divide ¡into ¡blocks ¡ Sort ¡each ¡block ¡into ¡16 ¡bins ¡ based ¡on ¡current ¡digit ¡ Compute ¡global ¡offsets ¡for ¡bins ¡ Write ¡par0ally ¡sorted ¡data ¡into ¡output ¡ -6

Sorting individual blocks • Each block in a radix-16 sort is performed using four iterations of the binary sort Take ¡a ¡single ¡block ¡of ¡512 ¡elements ¡ Perform ¡1-‑bit ¡prefix ¡sum ¡of ¡1s ¡ (we ¡know ¡the ¡number ¡of ¡0s ¡as ¡loca0on ¡– ¡number ¡of ¡1s) ¡ Re-‑order ¡data ¡into ¡0s ¡and ¡1s ¡ Repeat ¡4 ¡0mes ¡un0l ¡we ¡have ¡our ¡data ¡sorted ¡by ¡the ¡4-‑bit ¡digit ¡ Compute ¡counts ¡in ¡each ¡bin, ¡giving ¡us ¡a ¡histogram ¡for ¡the ¡block ¡ 33 34 24 35 12 49 52 28 35 39 29 33 22 35 20 42 Store ¡the ¡sorted ¡block ¡and ¡the ¡histogram ¡data ¡ 3 3 2 3 1 4 5 2 3 3 2 3 2 3 2 4 3 4 4 5 2 9 2 8 5 9 9 3 2 5 0 2 3 3 2 3 1 4 5 2 3 3 2 3 2 3 2 4 3 4 4 5 2 9 2 8 5 9 9 3 2 5 0 2 -7

The global prefix sum • After we have prefix sums for each block we need the global versions - Each work group needs to know, for each radix, where its output range starts • For each group we had the histogram representing the number of each radix in the block: 33 34 24 35 12 49 52 28 35 39 29 33 22 35 20 42 20 18 68 45 11 40 50 31 54 50 12 30 27 31 17 8 10 18 58 65 11 25 45 26 54 55 17 40 32 20 30 6 • We need to perform a global prefix sum across these work groups to obtain global addresses: 0 73 143 293 438 472 33 107 167 328 450 53 125 235 373 461 -8

The global prefix sum • After we have prefix sums for each block we need the global versions - Each work group needs to know, for each radix, where its output range starts 33 34 24 35 12 49 52 28 35 39 29 33 22 35 20 42 20 18 68 45 11 40 50 31 54 50 12 30 27 31 17 8 • For each group we had the histogram representing the 10 18 58 65 11 25 45 26 54 55 17 40 32 20 30 6 number of each radix in the block: 0 73 143 293 438 472 33 107 167 328 450 328 53 125 235 373 461 Radix 3 starts at location 328 for the 2 nd (green) group • We need to perform a global prefix sum across these work -9

Compute global sort • We have 16 local bins and 16 global bins now for the global sorting phase • We need to perform a local prefix sum on the block’s histogram to obtain local offsets 33 34 24 35 12 49 52 28 35 39 29 33 22 35 20 42 12 13 18 23 26 30 34 37 40 42 46 48 0 33 67 91 6 8 7 9 7 2 1 0 3 5 0 0 Destination of data computed as: Global ¡sum ¡for ¡radix ¡ Local ¡sums ¡can ¡put ¡us ¡within ¡ this ¡range ¡ Index ¡of ¡value ¡in ¡block ¡– ¡local ¡ sum ¡tells ¡us ¡the ¡exact ¡locaAon ¡ -10

Compute global sort • Of course we compute this for each radix in parallel in the block making the output highly efficient: 33 34 24 35 12 49 52 28 35 39 29 33 22 35 20 42 12 13 18 23 26 30 34 37 40 42 46 48 0 33 67 91 6 8 7 9 7 2 1 0 3 5 0 0 Destination of data computed as: Global ¡sum ¡for ¡radix ¡ Local ¡sums ¡can ¡put ¡us ¡within ¡ this ¡range ¡ Index ¡of ¡value ¡in ¡block ¡– ¡local ¡ sum ¡tells ¡us ¡the ¡exact ¡locaAon ¡ -11

Producing an efficient local prefix sum • Each sorting pass requires a 1 bit prefix sum to be performed in local memory • We use an efficient barrier-free local prefix sum for blocks 2x the wavefront size Take ¡the ¡block ¡of ¡data ¡and ¡load ¡into ¡local ¡memory ¡ 1 1 1 0 1 0 0 1 Write ¡0s ¡into ¡earlier ¡local ¡memory ¡locaAons ¡ 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 1 Add ¡[index] ¡to ¡[index ¡– ¡power ¡of ¡2] ¡with ¡increasing ¡powers ¡of ¡2 ¡ The added 0s allow us to do this without conditionals The stride of 2 means that we can cover more elements with the wavefront and fix up at the end. This can be completely barrier free in a single wavefront When we want to cross a wavefront boundary we must be more careful -12

Producing an efficient local prefix sum • Each sorting pass requires a 1 bit prefix sum to be performed in local memory • We use an efficient barrier-free local prefix sum for blocks 2x the wavefront size: Take ¡the ¡block ¡of ¡data ¡and ¡load ¡into ¡local ¡memory ¡ 1 1 1 0 1 0 0 1 Write ¡0s ¡into ¡earlier ¡local ¡memory ¡locaAons ¡ 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 1 Add ¡[index] ¡to ¡[index ¡– ¡power ¡of ¡2] ¡with ¡increasing ¡powers ¡of ¡2 ¡ The added 0s allow us to do this without conditionals The stride of 2 means that we can cover more elements with the wavefront and fix up at the end. This can be completely barrier free in a single wavefront When we want to cross a wavefront boundary we must be more careful -13

Producing an efficient local prefix sum • Each sorting pass requires a 1 bit prefix sum to be performed in local memory • We use an efficient barrier-free local prefix sum for blocks 2x the wavefront size: Take ¡the ¡block ¡of ¡data ¡and ¡load ¡into ¡local ¡memory ¡ 1 1 1 0 1 0 0 1 Write ¡0s ¡into ¡earlier ¡local ¡memory ¡locaAons ¡ 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 1 Add ¡[index] ¡to ¡[index ¡– ¡power ¡of ¡2] ¡with ¡increasing ¡powers ¡of ¡2 ¡ The added 0s allow us to do this without conditionals The stride of 2 means that we can cover more elements with the wavefront and fix up at the end. This can be completely barrier free in a single wavefront When we want to cross a wavefront boundary we must be more careful -14

Principle of the radix sort Sorts a list of fixed size integer keys - PowerPoint PPT Presentation

Principle of the radix sort Sorts a list of fixed size integer keys - Separates the key into individual digits of some radix - Sorts digit-by-digit In this case we use a least significant digit sort - Sorts first the least significant digit

R A D I X S O R T Radix Sort 147 dnc CS 16: Radix Sort Radix Sort Unlike other sorting

Sorting Lower Bound Radix Sort Radix sort to the rescue sort of

Sorting Lower Bound Radix Sort Radix sort to the rescue sort of After today, you should be

Sorting Lower Bound Radix Sort Radix sort to the rescue sort of After today, you should

Sorting Lower Bound Radix Sort Radix sort to the rescue sort of After today, you should be

Bucket-Sort and Radix-Sort 1, c 3, a 3, b 7, d 7, g 7, e B 0 1

TODAY String sorts Key-indexed counting LSD radix sort MSD radix sort 3-way

RADIX SORT Parosh Aziz Abdulla Uppsala University September 21, 2008 Parosh Aziz Abdulla

TODAY String sorts Key-indexed counting LSD radix sort MSD radix sort 3-way

Parallel Radix Sort with MPI Yourii Martiak Why sorting? One of the most common problems

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE 5.1 S TRING S ORTS strings in Java

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE 5.1 S TRING S ORTS strings in Java

9. Sorting III Lower bounds for the comparison based sorting, radix- and bucket-sort 248 9.1

For Monday Read Weiss, chapter 4.3-4.4 Quicksort homework described on Blackboard

Elementary Data Structures Biostatistics 615/815 Lecture 6: . . 1 / 29 . Array Radix sort

For Wednesday Read Weiss, chapter 4, section 4 Homework: Weiss, chapter 4, exercise 9

Sorting (Version of 16 November 2005) 1. Merge Sort Running time: ( n log n ), where n is the

1 Calculate a firms weighted average cost of 4. capital. Discuss the pros and cons of using

End-to-End principle End-to-end Principle Broad networking principle First implementation

SORTING Chapter 8 Sorting 2 Why sort? To make searching faster! How? Binary Search gives

Lecture 1 Number Representation CS 230 - Spring 2020 1-1 Number Representation Radix

AMTH140 Lecture 20 Radix Conversion Slide 1 April 10, 2006 Reading: Lecture Notes 14.2

Sort Sort and n 0 such that g( n ) c f( n ) for all n n 0 . Procedures Procedures If

i D = d r i = i n . : radix point r : base or radix dp-1 : the