sorting library
play

Sorting Library Xiaoming Li, Mara Jess Garzarn, and David Padua - PowerPoint PPT Presentation

Software Engineering Seminar Stephan Semmler A Dynamically Tuned Sorting Library Xiaoming Li, Mara Jess Garzarn, and David Padua 2004 The Sorting Library Installation Runtime Hardware Input data Empirical Machine Sorting


  1. Software Engineering Seminar Stephan Semmler A Dynamically Tuned Sorting Library Xiaoming Li, María Jesús Garzarán, and David Padua 2004

  2. The Sorting Library Installation Runtime Hardware Input data Empirical Machine Sorting Optimized Search Learning Algorithms Algorithms Fastest Algorithm

  3. Overview • Sorting Algorithms • Optimizing the Algorithms • Factors of Performance • The Library • Results • Final Words

  4. Merge Sort 4 1 3 2 1 4 2 3 1 2 3 4 • Divide and Conquer • Runtime O(n·log(n)) • Needs additional memory to merge

  5. Multiway Merge Sort • Partition data into p subsets Heap 1 2 … p • Sort each subset • Merge p subsets using a heap Sorted Sorted Sorted Sorted Subset Subset Subset Subset p subsets

  6. Quicksort 9 9 9 9 9 9 9 9 9 9 8 8 8 8 8 8 8 8 8 8 7 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 5 6 5 6 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 2 3 2 3 2 3 2 3 2 3 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 • Average runtime O(n·log(n)) • Inplace • Worst Case runtime O(n²)

  7. Radix Sort 322 13 44 142 431 34 1 23 0 1 2 3 4

  8. Radix Sort 322 13 44 142 431 34 1 23 1 142 23 34 431 322 13 44 0 1 2 3 4 431 1 322 142 13 23 44 34

  9. Radix Sort 431 1 322 142 13 23 44 34 0 1 2 3 4

  10. Radix Sort 431 1 322 142 13 23 44 34 23 34 44 1 13 322 431 142 0 1 2 3 4 1 13 322 23 431 34 142 44

  11. Radix Sort 1 13 322 23 431 34 142 44 1 2 3 4 0

  12. Radix Sort 1 13 322 23 431 34 142 44 44 142 322 431 34 23 1 2 3 4 13 1 1 13 23 34 44 142 322 431 0

  13. Radix Sort • Non-comparative sorting algorithm • Needs Integer Keys • Linear Time Complexity O(n) • Highly dependent on key distribution

  14. Insertion Sort 5 1 1 2 2 2 2 2 2 2 2 2 2 5 2 5 5 5 2 1 4 4 4 4 4 4 4 5 2 2 5 2 2 2 4 4 4 4 1 5 5 5 4 5 5 4 4 4 5 4 4 6 5 6 5 6 5 5 6 6 6 1 6 6 6 6 6 6 1 1 6 1 1 6 1 6 1 1 1 6 1 1 6 1 1 • Average case O(n²) • Best case O(n) for sorted data • Good for small partitions

  15. Sorting Networks 0 0 1 1 2 2 3 3 Unsorted Sorted 4 4 5 5 6 6 7 7 • Like hardwired • Only appropriate for very small amount of data

  16. Optimizing Algorithms • For a given Architecture – Cache Size – Registers – Cache Line Size • Which Parameters to tune? Hardware Sorting Algorithms Empirical Optimized Search Algorithms Parameters

  17. Tuning Quicksort • Small partitions – Insertion Sort – Sorting Networks • Threshold for small partitions • Apply immediately Small Partition or at the end

  18. Tuning Radix sort (CC-radix sort) • Create sub-buckets if data is too large for cache • Apply radix sort for sub-buckets • Insertion sort / Sorting networks for small partitions

  19. Tuning Multiway Merge sort • Number of subset p • Operation on heap: find smallest child – Adapt fanout such that children fit into Cache Line Subset Subset Subset Subset Cache Line p subsets

  20. Hardware Sorting Algorithms Optimized Empirical Algorithms Search Parameters

  21. Comparison of Sorting Algorithms Execution Time Intel PIII Xeon (G=2^30 Cycles) 9 Quicksort 8 Radix sort 7 Merge sort 6 5 4 3 2 1 0 0 5 10 15 20 Number of Keys (M=2^20)

  22. Varied Standard Deviations Execution Intel PIII Xeon (2M) Execution Intel PIII Xeon (12M) Time (Cycles) Time (Cycles) 1.1E+09 1.1E+09 1.1E+09 1.0E+09 1.0E+09 9.5E+08 9.5E+08 9.0E+08 9.0E+08 8.5E+08 8.5E+08 8.0E+08 8.0E+08 7.5E+08 7.5E+08 7.0E+08 7.0E+08 6.5E+08 6.5E+08 6.0E+08 6.0E+08 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 Standard Deviation Standard Deviation Quicksort Radix sort Merge sort Quicksort Radix sort Merge sort

  23. Impact of Input Data • Number of keys does not affect the relative performance • Standard deviation matters! – Distribution among buckets in radix sort – Fewer operations on the heap in multiway merge sort • Problems with Standard deviation – Only related to distribution of digit in keys – Expensive to compute – Use Entropy instead

  24. Entropy • Expected value of the information 1 2 4 −𝑄 𝑗 ∗ log 2 𝑄 𝑗 1 3 5 𝑗 1 3 6 Entropy vector 0 0.9 1.58

  25. Building the Library Input Input sizes data sets Entropy Sorting Quicksort Empirical Machine Algorithms Radix sort Search Learning Merge sort Parameters Optimized Prediction Algorithms function

  26. Runtime Procedure Input data Input size Entropy Select Prediction Best Sorting function Algorithm Algorithm Sorted data

  27. Results AMD Athlon Execution Time (Cycles per key) 600 550 500 Quicksort Radix sort 450 Merge sort 400 Result 350 Stadard Deviation 300 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 • Library chooses best algorithm • Overhead of 5% • On average 44% better than worst algorithm

  28. Final words • Optimizies Sorting Algorithms • Works well for unknown input • Overhead for known data • Unclear degree of Optimization • Hard-coded decisions • Further work: See next presentation

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend