Sorting Library Xiaoming Li, Mara Jess Garzarn, and David Padua - - PowerPoint PPT Presentation

sorting library
SMART_READER_LITE
LIVE PREVIEW

Sorting Library Xiaoming Li, Mara Jess Garzarn, and David Padua - - PowerPoint PPT Presentation

Software Engineering Seminar Stephan Semmler A Dynamically Tuned Sorting Library Xiaoming Li, Mara Jess Garzarn, and David Padua 2004 The Sorting Library Installation Runtime Hardware Input data Empirical Machine Sorting


slide-1
SLIDE 1

A Dynamically Tuned Sorting Library

Xiaoming Li, María Jesús Garzarán, and David Padua 2004 Stephan Semmler

Software Engineering Seminar

slide-2
SLIDE 2

Installation Runtime Empirical Search Machine Learning

Hardware Input data

Optimized Algorithms Sorting Algorithms

Fastest Algorithm

The Sorting Library

slide-3
SLIDE 3

Overview

  • Sorting Algorithms
  • Optimizing the Algorithms
  • Factors of Performance
  • The Library
  • Results
  • Final Words
slide-4
SLIDE 4

Merge Sort

  • Divide and Conquer
  • Runtime O(n·log(n))
  • Needs additional memory to merge

4 1 3 2 4 1 2 3 2 1 3 4

slide-5
SLIDE 5

Multiway Merge Sort

  • Partition data into p subsets
  • Sort each subset
  • Merge p subsets using a heap

1 2 p …

Heap

Sorted Subset Sorted Subset Sorted Subset Sorted Subset

p subsets

slide-6
SLIDE 6

7 5 6 4 2 3 1 9 8 7 5 6 4 2 3 1 9 8 7 5 9 4 2 3 1 6 8 7 5 9 4 2 3 1 6 8 7 5 9 4 2 3 1 6 8 7 5 9 4 2 3 1 6 8 7 5 9 4 2 3 1 6 8 7 5 9 4 2 3 1 6 8 7 5 9 4 2 3 1 6 8 7 5 9 6 2 3 1 4 8

  • Average runtime O(n·log(n))
  • Inplace
  • Worst Case runtime O(n²)

Quicksort

slide-7
SLIDE 7

13 322 44 142 34 431 1 23

Radix Sort

1 2 3 4

slide-8
SLIDE 8

34 44 23 13 142 322 1 431 13 322 44 142 34 431 1 23 13 322 44 142 34 431 1 23

Radix Sort

1 2 3 4

slide-9
SLIDE 9

Radix Sort

13 322 44 142 34 431 1 23

1 2 3 4

slide-10
SLIDE 10

Radix Sort

13 322 44 142 34 431 1 23 13 322 44 142 34 431 1 23 431 322 142 23 44 1 13 34

1 2 3 4

slide-11
SLIDE 11

Radix Sort

431 322 142 23 44 1 13 34

1 2 3 4

slide-12
SLIDE 12

Radix Sort

142 431 322 23 44 1 13 34 44 23 322 34 431 1 13 142 431 322 142 23 44 1 13 34

1 2 3 4

slide-13
SLIDE 13

Radix Sort

  • Non-comparative sorting algorithm
  • Needs Integer Keys
  • Linear Time Complexity O(n)
  • Highly dependent on key distribution
slide-14
SLIDE 14

Insertion Sort

  • Average case O(n²)
  • Best case O(n) for sorted data
  • Good for small partitions

1 4 6 5 2 1 4 6 2 5 1 4 6 5 2 1 4 6 5 2 1 4 6 2 5 1 4 6 2 5 1 4 6 2 5 1 5 6 2 4 1 5 6 2 4 1 5 6 2 4 1 5 6 2 4 1 5 6 2 4 6 5 1 2 4 6 1 5 2 4 6 4 5 2 1 6 4 5 1 2 6 4 5 1 2

slide-15
SLIDE 15

Sorting Networks

  • Like hardwired
  • Only appropriate for very small amount of data

1 2 3 4 5 6 7 1 2 3 4 5 6 7

Sorted Unsorted

slide-16
SLIDE 16

Optimizing Algorithms

  • For a given Architecture

– Cache Size – Registers – Cache Line Size

  • Which Parameters to tune?

Empirical Search

Hardware

Optimized Algorithms Sorting Algorithms Parameters

slide-17
SLIDE 17

Tuning Quicksort

  • Small partitions

– Insertion Sort – Sorting Networks

  • Threshold for small partitions
  • Apply immediately
  • r at the end

Small Partition

slide-18
SLIDE 18

Tuning Radix sort (CC-radix sort)

  • Create sub-buckets if data is too large for cache
  • Apply radix sort for sub-buckets
  • Insertion sort / Sorting networks for small partitions
slide-19
SLIDE 19

Tuning Multiway Merge sort

  • Number of subset p
  • Operation on heap: find smallest child

– Adapt fanout such that children fit into Cache Line Cache Line

Subset Subset Subset Subset

p subsets

slide-20
SLIDE 20

Empirical Search

Hardware

Optimized Algorithms Sorting Algorithms Parameters

slide-21
SLIDE 21

Comparison of Sorting Algorithms

1 2 3 4 5 6 7 8 9 5 10 15 20

Execution Time (G=2^30 Cycles) Number of Keys (M=2^20)

Intel PIII Xeon

Quicksort Radix sort Merge sort

slide-22
SLIDE 22

Varied Standard Deviations

6.0E+08 6.5E+08 7.0E+08 7.5E+08 8.0E+08 8.5E+08 9.0E+08 9.5E+08 1.0E+09 1.1E+09 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 Execution Time (Cycles) Standard Deviation

Intel PIII Xeon (2M)

Quicksort Radix sort Merge sort

6.0E+08 6.5E+08 7.0E+08 7.5E+08 8.0E+08 8.5E+08 9.0E+08 9.5E+08 1.0E+09 1.1E+09 1.1E+09 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 Execution Time (Cycles) Standard Deviation

Intel PIII Xeon (12M)

Quicksort Radix sort Merge sort

slide-23
SLIDE 23

Impact of Input Data

  • Number of keys does not affect the relative

performance

  • Standard deviation matters!

– Distribution among buckets in radix sort – Fewer operations on the heap in multiway merge sort

  • Problems with Standard deviation

– Only related to distribution of digit in keys – Expensive to compute – Use Entropy instead

slide-24
SLIDE 24

Entropy

  • Expected value of the information

2 1 4 3 1 5 3 1 6

−𝑄𝑗 ∗ log2 𝑄𝑗

𝑗

0.9 1.58

Entropy vector

slide-25
SLIDE 25

Empirical Search Machine Learning

Building the Library

Sorting Algorithms Parameters

Input data sets Quicksort Radix sort Merge sort Input sizes Entropy Prediction function Optimized Algorithms

slide-26
SLIDE 26

Sorted data Best Sorting Algorithm

Select Algorithm

Prediction function Input size Entropy Input data

Runtime Procedure

slide-27
SLIDE 27

Results

  • Library chooses best algorithm
  • Overhead of 5%
  • On average 44% better than worst algorithm

300 350 400 450 500 550 600 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 Execution Time (Cycles per key) Stadard Deviation

AMD Athlon

Quicksort Radix sort Merge sort Result

slide-28
SLIDE 28

Final words

  • Optimizies Sorting Algorithms
  • Works well for unknown input
  • Overhead for known data
  • Unclear degree of Optimization
  • Hard-coded decisions
  • Further work: See next presentation
slide-29
SLIDE 29