Why is Dual-Pivot Quicksort Fast? Sebastian Wild wild@cs.uni-kl.de - - PowerPoint PPT Presentation

why is dual pivot quicksort fast
SMART_READER_LITE
LIVE PREVIEW

Why is Dual-Pivot Quicksort Fast? Sebastian Wild wild@cs.uni-kl.de - - PowerPoint PPT Presentation

Why is Dual-Pivot Quicksort Fast? Sebastian Wild wild@cs.uni-kl.de 29 September 2015 Theorietage 2015 Speyer Sebastian Wild Dual-Pivot Quicksort 2015-03-24 1 / 11 Sorting History Invention of Quicksort Dual-Pivot Quicksort in Java Age of


slide-1
SLIDE 1

Why is Dual-Pivot Quicksort Fast?

Sebastian Wild

wild@cs.uni-kl.de

29 September 2015

Theorietage 2015 Speyer

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 1 / 11

slide-2
SLIDE 2

Sorting History

Ancient World Age of classic Quicksort Dual-Pivot Era

Invention of Quicksort Dual-Pivot Quicksort in Java

1969 1975 ’78 1993 1997 1961 ’62 ’77 today 2009 ’11 Sebastian Wild Dual-Pivot Quicksort 2015-03-24 2 / 11

slide-3
SLIDE 3

Sorting History

1961,62 Hoare: publication, first analysis 1969 Singleton: median-of-three & Insertionsort on small subarrays 1975-78 Sedgewick: analysis of many optimizations 1993 Bentley, McIlroy: duplicate elements & “ninther” 1997 Musser: O(n log n) worst case by truncating recursion

Basic algorithm settled since 1961; latest tweaks from 1990’s. Since then: Almost identical in all programming libraries!

Ancient World Age of classic Quicksort Dual-Pivot Era

Invention of Quicksort Dual-Pivot Quicksort in Java

1969 1975 ’78 1993 1997 1961 ’62 ’77 today 2009 ’11 Sebastian Wild Dual-Pivot Quicksort 2015-03-24 2 / 11

slide-4
SLIDE 4

Sorting History

1961,62 Hoare: publication, first analysis 1969 Singleton: median-of-three & Insertionsort on small subarrays 1975-78 Sedgewick: analysis of many optimizations 1993 Bentley, McIlroy: duplicate elements & “ninther” 1997 Musser: O(n log n) worst case by truncating recursion

Basic algorithm settled since 1961; latest tweaks from 1990’s. Since then: Almost identical in all programming libraries!

Ancient World Age of classic Quicksort Dual-Pivot Era

Invention of Quicksort Dual-Pivot Quicksort in Java

1969 1975 ’78 1993 1997 1961 ’62 ’77 today 2009 ’11 Sebastian Wild Dual-Pivot Quicksort 2015-03-24 2 / 11

slide-5
SLIDE 5

Sorting History

2008 – 2009 Vladimir Yaroslavskiy (developer at Sun) experiments with Quicksort with two pivots 11 Sep 2009 announcement on Java core library mailing list 29 Oct 2009 inclusion in development version of OpenJDK 2009 – 2011 optimizations by Joshua Bloch, Jon Bentley and others

28 July 2011 public release of Java 7 with Yaroslavskiy’s Quicksort

Ancient World Age of classic Quicksort Dual-Pivot Era

Invention of Quicksort Dual-Pivot Quicksort in Java

1969 1975 ’78 1993 1997 1961 ’62 ’77 today 2009 ’11 Sebastian Wild Dual-Pivot Quicksort 2015-03-24 2 / 11

slide-6
SLIDE 6

Dual-Pivot Quicksort

Algorithm (Conceptual View)

1

Choose two pivots P Q

2

For each element x, determine its class

small for x < P medium for P < x < Q large for Q < x

by comparing x to pivots P and Q

3

Arrange elements according to classes:

P Q

4

Sort subarrays recursively. How to implement

3 efficiently on arrays? Sebastian Wild Dual-Pivot Quicksort 2015-03-24 3 / 11

slide-7
SLIDE 7

Dual-Pivot Quicksort

Algorithm (Conceptual View)

1

Choose two pivots P Q

2

For each element x, determine its class

small for x < P medium for P < x < Q large for Q < x

by comparing x to pivots P and Q

3

Arrange elements according to classes:

P Q

4

Sort subarrays recursively. How to implement

3 efficiently on arrays? Sebastian Wild Dual-Pivot Quicksort 2015-03-24 3 / 11

slide-8
SLIDE 8

Dual-Pivot Quicksort

Algorithm (Conceptual View)

1

Choose two pivots P Q

2

For each element x, determine its class

small for x < P medium for P < x < Q large for Q < x

by comparing x to pivots P and Q

3

Arrange elements according to classes:

P Q

4

Sort subarrays recursively. How to implement

3 efficiently on arrays? Sebastian Wild Dual-Pivot Quicksort 2015-03-24 3 / 11

slide-9
SLIDE 9

Dual-Pivot Quicksort

Algorithm (Conceptual View)

1

Choose two pivots P Q

2

For each element x, determine its class

small for x < P medium for P < x < Q large for Q < x

by comparing x to pivots P and Q

3

Arrange elements according to classes:

P Q

4

Sort subarrays recursively. How to implement

3 efficiently on arrays? Sebastian Wild Dual-Pivot Quicksort 2015-03-24 3 / 11

slide-10
SLIDE 10

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 5 1 7 4 2 8 6 g ℓ k P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-11
SLIDE 11

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 5 1 7 4 2 8 6 g ℓ k P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-12
SLIDE 12

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 5 1 7 4 2 8 6 g ℓ k P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-13
SLIDE 13

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 5 1 7 4 2 8 6 g ℓ k P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-14
SLIDE 14

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 5 1 7 4 2 8 6 g ℓ k

P≤◦≤Q

P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-15
SLIDE 15

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 5 1 7 4 2 8 6 g ℓ k

P≤◦≤Q

P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-16
SLIDE 16

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 5 7 4 2 8 6 g ℓ k

P≤◦≤Q

P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-17
SLIDE 17

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 5 7 4 2 8 6 g ℓ k < P

P≤◦≤Q

P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-18
SLIDE 18

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 5 7 4 2 8 6 g ℓ k < P

P≤◦≤Q

P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-19
SLIDE 19

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 5 7 4 2 8 6 g ℓ k < P

P≤◦≤Q

P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-20
SLIDE 20

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 5 7 4 2 8 6 g ℓ k < P

P≤◦≤Q

P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-21
SLIDE 21

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 5 7 4 2 8 6 g ℓ k < P

P≤◦≤Q

P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-22
SLIDE 22

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 5 7 4 2 8 6 g ℓ k < P

P≤◦≤Q

≥ Q P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-23
SLIDE 23

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 5 7 4 2 8 6 g ℓ k < P

P≤◦≤Q

≥ Q P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-24
SLIDE 24

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 5 7 4 2 8 6 g ℓ k < P

P≤◦≤Q

≥ Q P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-25
SLIDE 25

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 2 5 4 7 8 6 g ℓ k < P

P≤◦≤Q

≥ Q P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-26
SLIDE 26

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 2 5 4 7 8 6 g ℓ k < P

P≤◦≤Q

≥ Q P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-27
SLIDE 27

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 2 5 4 7 8 6 g ℓ k < P

P≤◦≤Q

≥ Q P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-28
SLIDE 28

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 2 5 4 7 8 6 g ℓ k < P

P≤◦≤Q

≥ Q P Q Invariant:

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-29
SLIDE 29

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 3 1 2 5 4 7 8 6

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-30
SLIDE 30

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 2 1 3 5 4 6 8 7

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-31
SLIDE 31

Yaroslavskiy’s Algorithm

< P ? swap ℓ < Q ? skip swap g ✓ ✗ ✓ ✗ > Q ? < P ? skip swap ℓ swap k ✓ ✗ ✓ ✗ P Q 1 2 3 4 5 6 7 8

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 4 / 11

slide-32
SLIDE 32

Running Time Experiments

Why switch to new, unknown algorithm?

0.5 1 1.5 2 ·106 7 8 9 n time 10−6 · n ln n Java 6 Library Normalized Java runtimes (in ms). Average and standard deviation of 1000 random permutations per size.

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 5 / 11

slide-33
SLIDE 33

Running Time Experiments

Why switch to new, unknown algorithm? Because it is faster!

0.5 1 1.5 2 ·106 7 8 9 n time 10−6 · n ln n Java 6 Library Java 7 Library Normalized Java runtimes (in ms). Average and standard deviation of 1000 random permutations per size.

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 5 / 11

slide-34
SLIDE 34

Running Time Experiments

Why switch to new, unknown algorithm? Because it is faster!

0.5 1 1.5 2 ·106 7 8 9 n time 10−6 · n ln n Java 6 Library Java 7 Library Classic Quicksort Yaroslavskiy Normalized Java runtimes (in ms). Average and standard deviation of 1000 random permutations per size.

remains true for basic variants of algorithms: vs. !

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 5 / 11

slide-35
SLIDE 35

Running Time Experiments

Why switch to new, unknown algorithm? Because it is faster!

0.5 1 1.5 2 ·106 7 8 9 n time 10−6 · n ln n Java 6 Library Java 7 Library Classic Quicksort Yaroslavskiy Normalized Java runtimes (in ms). Average and standard deviation of 1000 random permutations per size.

remains true for basic variants of algorithms: vs. ! No theoretical explanation for running time known in 2009!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 5 / 11

slide-36
SLIDE 36

Running Time Experiments

Why switch to new, unknown algorithm? Because it is faster!

0.5 1 1.5 2 ·106 7 8 9 n time 10−6 · n ln n Java 6 Library Java 7 Library Classic Quicksort Yaroslavskiy Normalized Java runtimes (in ms). Average and standard deviation of 1000 random permutations per size.

remains true for basic variants of algorithms: vs. ! No theoretical explanation for running time known in 2009! Only lucky experiments?

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 5 / 11

slide-37
SLIDE 37

Running Time Experiments

Why switch to new, unknown algorithm? Because it is faster!

0.5 1 1.5 2 ·106 7 8 9 n time 10−6 · n ln n Java 6 Library Java 7 Library Classic Quicksort Yaroslavskiy Normalized Java runtimes (in ms). Average and standard deviation of 1000 random permutations per size.

remains true for basic variants of algorithms: vs. ! No theoretical explanation for running time known in 2009! Only lucky experiments? Why did noone come up with this earlier?

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 5 / 11

slide-38
SLIDE 38

Track Record of Dual-Pivot Quicksort

Observation in practice: Yaroslavskiy’s Quicksort (YQS) faster than classic Quicksort (CQS) ... why? We did a mathematical analysis of YQS. Traditional cost measures do not explain observation!

CQS YQS Relative Running Time

(from various experiments)

−10±2%

Comparisons 2 1.9

−5%

Swaps 0.3 0.6

+80%

Bytecode Instructions 18 21.7

+20.6%

MMIX oops υ 11 13.1

+19.1%

MMIX mems µ 2.6 2.8

+5%

Scanned Elements 2 1.6

−20%

Branch Mispredictions 0.57 0.58

+2%

·n ln n + O(n) , average case results

Only plausible explanation for running time: 20% less memory transfers in YQS.

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 6 / 11

slide-39
SLIDE 39

Track Record of Dual-Pivot Quicksort

Observation in practice: Yaroslavskiy’s Quicksort (YQS) faster than classic Quicksort (CQS) ... why? We did a mathematical analysis of YQS. Traditional cost measures do not explain observation!

CQS YQS Relative Running Time

(from various experiments)

−10±2%

Comparisons 2 1.9

−5%

Swaps 0.3 0.6

+80%

Bytecode Instructions 18 21.7

+20.6%

MMIX oops υ 11 13.1

+19.1%

MMIX mems µ 2.6 2.8

+5%

Scanned Elements 2 1.6

−20%

Branch Mispredictions 0.57 0.58

+2%

·n ln n + O(n) , average case results

Only plausible explanation for running time: 20% less memory transfers in YQS.

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 6 / 11

slide-40
SLIDE 40

Track Record of Dual-Pivot Quicksort

Observation in practice: Yaroslavskiy’s Quicksort (YQS) faster than classic Quicksort (CQS) ... why? We did a mathematical analysis of YQS. Traditional cost measures do not explain observation!

CQS YQS Relative Running Time

(from various experiments)

−10±2%

Comparisons 2 1.9

−5%

Swaps 0.3 0.6

+80%

Bytecode Instructions 18 21.7

+20.6%

MMIX oops υ 11 13.1

+19.1%

MMIX mems µ 2.6 2.8

+5%

Scanned Elements 2 1.6

−20%

Branch Mispredictions 0.57 0.58

+2%

·n ln n + O(n) , average case results

Only plausible explanation for running time: 20% less memory transfers in YQS.

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 6 / 11

slide-41
SLIDE 41

Track Record of Dual-Pivot Quicksort

Observation in practice: Yaroslavskiy’s Quicksort (YQS) faster than classic Quicksort (CQS) ... why? We did a mathematical analysis of YQS. Traditional cost measures do not explain observation!

CQS YQS Relative Running Time

(from various experiments)

−10±2%

Comparisons 2 1.9

−5%

Swaps 0.3 0.6

+80%

Bytecode Instructions 18 21.7

+20.6%

MMIX oops υ 11 13.1

+19.1%

MMIX mems µ 2.6 2.8

+5%

Scanned Elements 2 1.6

−20%

Branch Mispredictions 0.57 0.58

+2%

·n ln n + O(n) , average case results

Only plausible explanation for running time: 20% less memory transfers in YQS.

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 6 / 11

slide-42
SLIDE 42

Track Record of Dual-Pivot Quicksort

Observation in practice: Yaroslavskiy’s Quicksort (YQS) faster than classic Quicksort (CQS) ... why? We did a mathematical analysis of YQS. Traditional cost measures do not explain observation!

CQS YQS Relative Running Time

(from various experiments)

−10±2%

Comparisons 2 1.9

−5%

Swaps 0.3 0.6

+80%

Bytecode Instructions 18 21.7

+20.6%

MMIX oops υ 11 13.1

+19.1%

MMIX mems µ 2.6 2.8

+5%

Scanned Elements 2 1.6

−20%

Branch Mispredictions 0.57 0.58

+2%

·n ln n + O(n) , average case results

Only plausible explanation for running time: 20% less memory transfers in YQS.

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 6 / 11

slide-43
SLIDE 43

Track Record of Dual-Pivot Quicksort

Observation in practice: Yaroslavskiy’s Quicksort (YQS) faster than classic Quicksort (CQS) ... why? We did a mathematical analysis of YQS. Traditional cost measures do not explain observation!

CQS YQS Relative Running Time

(from various experiments)

−10±2%

Comparisons 2 1.9

−5%

Swaps 0.3 0.6

+80%

Bytecode Instructions 18 21.7

+20.6%

MMIX oops υ 11 13.1

+19.1%

MMIX mems µ 2.6 2.8

+5%

Scanned Elements 2 1.6

−20%

Branch Mispredictions 0.57 0.58

+2%

·n ln n + O(n) , average case results

Only plausible explanation for running time: 20% less memory transfers in YQS.

What happened?!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 6 / 11

slide-44
SLIDE 44

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 102 104 106 CPU speed (MFLOPS)

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-45
SLIDE 45

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 102 104 106 CPU speed (MFLOPS)

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-46
SLIDE 46

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 102 104 106 CPU speed (MFLOPS) Averaged annual growth rates: 46% CPU speed

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-47
SLIDE 47

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 102 104 106 CPU speed (MFLOPS) RAM bandwidth (MWords/s) Averaged annual growth rates: 46% CPU speed

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-48
SLIDE 48

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 102 104 106 CPU speed (MFLOPS) RAM bandwidth (MWords/s) Averaged annual growth rates: 46% CPU speed 37% Memory Bandwidth

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-49
SLIDE 49

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 102 104 106 CPU speed (MFLOPS) RAM bandwidth (MWords/s) Averaged annual growth rates: 46% CPU speed 37% Memory Bandwidth

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-50
SLIDE 50

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 102 104 106 CPU speed (MFLOPS) RAM bandwidth (MWords/s) “imbalance”:

speed bandwidth

Averaged annual growth rates: 46% CPU speed 37% Memory Bandwidth 8.2% Imbalance

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-51
SLIDE 51

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 101 102 CPU speed (MFLOPS) RAM bandwidth (MWords/s) “imbalance”:

speed bandwidth

Averaged annual growth rates: 46% CPU speed 37% Memory Bandwidth 8.2% Imbalance

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-52
SLIDE 52

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 101 102 CPU speed (MFLOPS) RAM bandwidth (MWords/s) “imbalance”:

speed bandwidth

Averaged annual growth rates: 46% CPU speed 37% Memory Bandwidth 8.2% Imbalance

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-53
SLIDE 53

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 101 102 CPU speed (MFLOPS) RAM bandwidth (MWords/s) “imbalance”:

speed bandwidth

Averaged annual growth rates: 46% CPU speed 37% Memory Bandwidth 8.2% Imbalance

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-54
SLIDE 54

The “Memory Wall”

1991 1995 2000 2005 2010 2015 100 101 102 CPU speed (MFLOPS) RAM bandwidth (MWords/s) “imbalance”:

speed bandwidth

Averaged annual growth rates: 46% CPU speed 37% Memory Bandwidth 8.2% Imbalance

STREAM benchmark data with linear regressions

www.cs.virginia.edu/stream/by_date/Balance.html

20 years since Bentley and McIlroy developed the classic Quicksort implementation relative cost of RAM accesses today 5 times as big ... this most likely changes the game for sorting!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 7 / 11

slide-55
SLIDE 55

Analyzing Memory Transfers

Need an abstract and simple cost model to approximate memory transfer. abstract machine-independent results simple easy to analyze should only count memory accesses that are probably not cached neither swaps nor comparisons are suitable measures! My proposal: number of “scanned elements” Machine model: Access to array only through iterators Iterators can

head left or right (one-directional!) advance to next position

5

1

1

2

7

3

4

4

2

5

8

6

1 5

read and write current position

Cost: number of advances ... let’s compare scanned elements for our Quicksorts!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 8 / 11

slide-56
SLIDE 56

Analyzing Memory Transfers

Need an abstract and simple cost model to approximate memory transfer. abstract machine-independent results simple easy to analyze should only count memory accesses that are probably not cached neither swaps nor comparisons are suitable measures! My proposal: number of “scanned elements” Machine model: Access to array only through iterators Iterators can

head left or right (one-directional!) advance to next position

5

1

1

2

7

3

4

4

2

5

8

6

1 5

read and write current position

Cost: number of advances ... let’s compare scanned elements for our Quicksorts!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 8 / 11

slide-57
SLIDE 57

Analyzing Memory Transfers

Need an abstract and simple cost model to approximate memory transfer. abstract machine-independent results simple easy to analyze should only count memory accesses that are probably not cached neither swaps nor comparisons are suitable measures! My proposal: number of “scanned elements” Machine model: Access to array only through iterators Iterators can

head left or right (one-directional!) advance to next position

5

1

1

2

7

3

4

4

2

5

8

6

1 5

read and write current position

Cost: number of advances ... let’s compare scanned elements for our Quicksorts!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 8 / 11

slide-58
SLIDE 58

Analyzing Memory Transfers

Need an abstract and simple cost model to approximate memory transfer. abstract machine-independent results simple easy to analyze should only count memory accesses that are probably not cached neither swaps nor comparisons are suitable measures! My proposal: number of “scanned elements” Machine model: Access to array only through iterators Iterators can

head left or right (one-directional!) advance to next position

5

1

1

2

7

3

4

4

2

5

8

6

1 5

read and write current position

Cost: number of advances ... let’s compare scanned elements for our Quicksorts!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 8 / 11

slide-59
SLIDE 59

Analyzing Memory Transfers

Need an abstract and simple cost model to approximate memory transfer. abstract machine-independent results simple easy to analyze should only count memory accesses that are probably not cached neither swaps nor comparisons are suitable measures! My proposal: number of “scanned elements” Machine model: Access to array only through iterators Iterators can

head left or right (one-directional!) advance to next position

5

1

1

2

7

3

4

4

2

5

8

6

1 5

read and write current position

Cost: number of advances ... let’s compare scanned elements for our Quicksorts!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 8 / 11

slide-60
SLIDE 60

Analyzing Memory Transfers

Need an abstract and simple cost model to approximate memory transfer. abstract machine-independent results simple easy to analyze should only count memory accesses that are probably not cached neither swaps nor comparisons are suitable measures! My proposal: number of “scanned elements” Machine model: Access to array only through iterators Iterators can

head left or right (one-directional!) advance to next position

5

1

1

2

7

3

4

4

2

5

8

6

1 5

read and write current position

Cost: number of advances ... let’s compare scanned elements for our Quicksorts!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 8 / 11

slide-61
SLIDE 61

Analyzing Memory Transfers

Need an abstract and simple cost model to approximate memory transfer. abstract machine-independent results simple easy to analyze should only count memory accesses that are probably not cached neither swaps nor comparisons are suitable measures! My proposal: number of “scanned elements” Machine model: Access to array only through iterators Iterators can

head left or right (one-directional!) advance to next position

5

1

1

2

7

3

4

4

2

5

8

6

1 5

read and write current position

Cost: number of advances ... let’s compare scanned elements for our Quicksorts!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 8 / 11

slide-62
SLIDE 62

Analyzing Memory Transfers

Need an abstract and simple cost model to approximate memory transfer. abstract machine-independent results simple easy to analyze should only count memory accesses that are probably not cached neither swaps nor comparisons are suitable measures! My proposal: number of “scanned elements” Machine model: Access to array only through iterators Iterators can

head left or right (one-directional!) advance to next position

5

1

1

2

7

3

4

4

2

5

8

6

1 5

read and write current position

Cost: number of advances ... let’s compare scanned elements for our Quicksorts!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 8 / 11

slide-63
SLIDE 63

Analyzing Memory Transfers

Need an abstract and simple cost model to approximate memory transfer. abstract machine-independent results simple easy to analyze should only count memory accesses that are probably not cached neither swaps nor comparisons are suitable measures! My proposal: number of “scanned elements” Machine model: Access to array only through iterators Iterators can

head left or right (one-directional!) advance to next position

5

1

1

2

7

3

4

4

2

5

8

6

1 5

read and write current position

Cost: number of advances ... let’s compare scanned elements for our Quicksorts!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 8 / 11

slide-64
SLIDE 64

Analyzing Memory Transfers

Need an abstract and simple cost model to approximate memory transfer. abstract machine-independent results simple easy to analyze should only count memory accesses that are probably not cached neither swaps nor comparisons are suitable measures! My proposal: number of “scanned elements” Machine model: Access to array only through iterators Iterators can

head left or right (one-directional!) advance to next position

5

1

1

2

7

3

4

4

2

5

8

6

1 5

read and write current position

Cost: number of advances ... let’s compare scanned elements for our Quicksorts!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 8 / 11

slide-65
SLIDE 65

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-66
SLIDE 66

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-67
SLIDE 67

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-68
SLIDE 68

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-69
SLIDE 69

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-70
SLIDE 70

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-71
SLIDE 71

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-72
SLIDE 72

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-73
SLIDE 73

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-74
SLIDE 74

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-75
SLIDE 75

Scanned Elements in CQS and YQS

How many scanned elements (SE) do we need for partitioning? Classic Quicksort

k g

array scanned exactly once n scanned elements Yaroslavskiy’s Quicksort

ℓ k g

1.3n SE on average

worse than CQS!

How does this translate to sorting costs? Classic Quicksort 2 n ln n SE overall Yaroslavskiy’s Quicksort 1.6 n ln n SE overall

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 9 / 11

slide-76
SLIDE 76

Track Record of Dual-Pivot Quicksort

Observation in practice: Yaroslavskiy’s Quicksort (YQS) faster than classic Quicksort (CQS) ... why? We did a mathematical analysis of YQS. Traditional cost measures do not explain observation!

CQS YQS Relative Running Time

(from various experiments)

−10±2%

Comparisons 2 1.9

−5%

Swaps 0.3 0.6

+80%

Bytecode Instructions 18 21.7

+20.6%

MMIX oops υ 11 13.1

+19.1%

MMIX mems µ 2.6 2.8

+5%

Scanned Elements 2 1.6

−20%

Branch Mispredictions 0.57 0.58

+2%

·n ln n + O(n) , average case results

Only plausible explanation for running time: 20% less memory transfers in YQS.

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 10 / 11

slide-77
SLIDE 77

Track Record of Dual-Pivot Quicksort

Observation in practice: Yaroslavskiy’s Quicksort (YQS) faster than classic Quicksort (CQS) ... why? We did a mathematical analysis of YQS. Traditional cost measures do not explain observation!

CQS YQS Relative Running Time

(from various experiments)

−10±2%

Comparisons 2 1.9

−5%

Swaps 0.3 0.6

+80%

Bytecode Instructions 18 21.7

+20.6%

MMIX oops υ 11 13.1

+19.1%

MMIX mems µ 2.6 2.8

+5%

Scanned Elements 2 1.6

−20%

Branch Mispredictions 0.57 0.58

+2%

·n ln n + O(n) , average case results

Only plausible explanation for running time: 20% less memory transfers in YQS.

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 10 / 11

slide-78
SLIDE 78

Track Record of Dual-Pivot Quicksort

Observation in practice: Yaroslavskiy’s Quicksort (YQS) faster than classic Quicksort (CQS) ... why? We did a mathematical analysis of YQS. Traditional cost measures do not explain observation!

CQS YQS Relative Running Time

(from various experiments)

−10±2%

Comparisons 2 1.9

−5%

Swaps 0.3 0.6

+80%

Bytecode Instructions 18 21.7

+20.6%

MMIX oops υ 11 13.1

+19.1%

MMIX mems µ 2.6 2.8

+5%

Scanned Elements 2 1.6

−20%

Branch Mispredictions 0.57 0.58

+2%

·n ln n + O(n) , average case results

Only plausible explanation for running time: 20% less memory transfers in YQS.

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 10 / 11

slide-79
SLIDE 79

Conclusion

1

Dual-Pivot Quicksort most likely faster because of fewer memory references.

2

The “memory wall” calls for new view on classic algorithms. How about others?

3

Don’t stop looking for theoretical explanations, but do question models and assumptions!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 11 / 11

slide-80
SLIDE 80

Conclusion

1

Dual-Pivot Quicksort most likely faster because of fewer memory references.

2

The “memory wall” calls for new view on classic algorithms. How about others?

3

Don’t stop looking for theoretical explanations, but do question models and assumptions!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 11 / 11

slide-81
SLIDE 81

Conclusion

1

Dual-Pivot Quicksort most likely faster because of fewer memory references.

2

The “memory wall” calls for new view on classic algorithms. How about others?

3

Don’t stop looking for theoretical explanations, but do question models and assumptions!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 11 / 11

slide-82
SLIDE 82

Conclusion

1

Dual-Pivot Quicksort most likely faster because of fewer memory references.

2

The “memory wall” calls for new view on classic algorithms. How about others?

3

Don’t stop looking for theoretical explanations, but do question models and assumptions!

Sebastian Wild Dual-Pivot Quicksort 2015-03-24 11 / 11