On the Worst-Case Complexity of Timsort Nicolas Auger, Vincent Jug, - - PowerPoint PPT Presentation

on the worst case complexity of timsort
SMART_READER_LITE
LIVE PREVIEW

On the Worst-Case Complexity of Timsort Nicolas Auger, Vincent Jug, - - PowerPoint PPT Presentation

On the Worst-Case Complexity of Timsort Nicolas Auger, Vincent Jug, Cyril Nicaud & Carine Pivoteau LIGM Universit Paris-Est Marne-la-Valle & CNRS 20/08/2018 N. Auger, V. Jug, C. Nicaud & C. Pivoteau On the Worst-Case


slide-1
SLIDE 1

On the Worst-Case Complexity of Timsort

Nicolas Auger, Vincent Jugé, Cyril Nicaud & Carine Pivoteau

LIGM – Université Paris-Est Marne-la-Vallée & CNRS

20/08/2018

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-2
SLIDE 2

Contents

1

Efficient Merge Sorts

2

Timsort

3

Java Timsort, Bugs and Fixes

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-3
SLIDE 3

Sorting data

1 4 3 1 5 4 3 2 2 2 1 1 2 2 2 3 3 4 4 5

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-4
SLIDE 4

Sorting data – in a stable manner

01 11 41 31 12 51 42 32 21 22 02 23 · · · · · · · · · · · · · · · 01 02 11 12 21 22 23 31 32 41 42 51

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-5
SLIDE 5

Sorting data – in a stable manner

01 11 41 31 12 51 42 32 21 22 02 23 01 02 11 12 21 22 23 31 32 41 42 51 Mergesort has a worst-case time complexity of O(n log(n))

Can we do better?

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-6
SLIDE 6

Sorting data – in a stable manner

01 11 41 31 12 51 42 32 21 22 02 23 01 02 11 12 21 22 23 31 32 41 42 51 Mergesort has a worst-case time complexity of O(n log(n))

Can we do better? No!

Proof: There are n! possible reorderings Each element comparison gives a 1-bit information Thus log2(n!) ∼ n log2(n) tests are required

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-7
SLIDE 7

Cannot we ever do better?

In some cases, we should. . . 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-8
SLIDE 8

Let us do better!

1 4 3 1 5 4 3 2 2 2

1 Chunk your data in monotonic runs
  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-9
SLIDE 9

Let us do better!

4 runs of lengths 3, 2, 6 and 1 1 4 3 1 5 4 3 2 2 2

1 Chunk your data in monotonic runs 2 New parameters: Number of runs (ρ) and their lengths (r1, . . . , rρ)
  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-10
SLIDE 10

Let us do better!

4 runs of lengths 3, 2, 6 and 1 1 4 3 1 5 4 3 2 2 2

1 Chunk your data in monotonic runs 2 New parameters: Number of runs (ρ) and their lengths (r1, . . . , rρ)

New parameters: Run-length entropy: H = ρ

k=1(ri/n) log2(n/ri)

New parameters: Run-length entropy: H log2(ρ) log2(n)

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-11
SLIDE 11

Let us do better!

4 runs of lengths 3, 2, 6 and 1 1 4 3 1 5 4 3 2 2 2

1 Chunk your data in monotonic runs 2 New parameters: Number of runs (ρ) and their lengths (r1, . . . , rρ)

New parameters: Run-length entropy: H = ρ

k=1(ri/n) log2(n/ri)

New parameters: Run-length entropy: H log2(ρ) log2(n)

Theorem (Auger – Jugé – Nicaud – Pivoteau 2018)

Timsort has a worst-case time complexity of O(n + n log(ρ))

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-12
SLIDE 12

Let us do better!

4 runs of lengths 3, 2, 6 and 1 1 4 3 1 5 4 3 2 2 2

1 Chunk your data in monotonic runs 2 New parameters: Number of runs (ρ) and their lengths (r1, . . . , rρ)

New parameters: Run-length entropy: H = ρ

k=1(ri/n) log2(n/ri)

New parameters: Run-length entropy: H log2(ρ) log2(n)

Theorem (Auger – Jugé – Nicaud – Pivoteau 2018)

Timsort has a worst-case time complexity of O(n + n H)

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-13
SLIDE 13

Let us do better!

4 runs of lengths 3, 2, 6 and 1 1 4 3 1 5 4 3 2 2 2

1 Chunk your data in monotonic runs 2 New parameters: Number of runs (ρ) and their lengths (r1, . . . , rρ)

New parameters: Run-length entropy: H = ρ

k=1(ri/n) log2(n/ri)

New parameters: Run-length entropy: H log2(ρ) log2(n)

Theorem (Auger – Jugé – Nicaud – Pivoteau 2018)

Timsort has a worst-case time complexity of O(n + n H) We cannot do better than Ω(n + n H)![2] Reading the whole input requires a time Ω(n) There are X possible reorderings, with X 21−ρ

n r1 ... rρ

  • 2n H/2
  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-14
SLIDE 14

Contents

1

Efficient Merge Sorts

2

Timsort

3

Java Timsort, Bugs and Fixes

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-15
SLIDE 15

A brief history of Timsort

2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-16
SLIDE 16

A brief history of Timsort

2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19

1 1 Invented by Tim Peters[1]
  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-17
SLIDE 17

A brief history of Timsort

2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19

1 2

P

1 Invented by Tim Peters[1] 2 Standard algorithm in Python
  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-18
SLIDE 18

A brief history of Timsort

2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19

1 2

P

3 3 3

A J O

1 Invented by Tim Peters[1] 2 Standard algorithm in Python 3 Standard algorithm

———————— for non-primitive arrays in Android, Java, Octave

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-19
SLIDE 19

A brief history of Timsort

2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19

1 2

P

3 3 3

A J O

4 1 Invented by Tim Peters[1] 2 Standard algorithm in Python 3 Standard algorithm

———————— for non-primitive arrays in Android, Java, Octave

4 Stack size bug uncovered – a provably correct fix is suggested:[3] ◮ suggested fix implemented in Python

(true Timsort)

◮ custom fix implemented in Java

(Java Timsort)

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-20
SLIDE 20

A brief history of Timsort

2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19

1 2

P

3 3 3

A J O

4 5 1 Invented by Tim Peters[1] 2 Standard algorithm in Python 3 Standard algorithm

———————— for non-primitive arrays in Android, Java, Octave

4 Stack size bug uncovered – a provably correct fix is suggested:[3] ◮ suggested fix implemented in Python

(true Timsort)

◮ custom fix implemented in Java

(Java Timsort)

5 1st worst-case complexity analysis[4] – Timsort works in time O(n log n)
  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-21
SLIDE 21

A brief history of Timsort

2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19

1 2

P

3 3 3

A J O

4 5 6 1 Invented by Tim Peters[1] 2 Standard algorithm in Python 3 Standard algorithm

———————— for non-primitive arrays in Android, Java, Octave

4 Stack size bug uncovered – a provably correct fix is suggested:[3] ◮ suggested fix implemented in Python

(true Timsort)

◮ custom fix implemented in Java

(Java Timsort)

5 1st worst-case complexity analysis[4] – Timsort works in time O(n log n) 6 Another stack size bug uncovered (Java version)

Refined worst-case analysis: both versions work in time O(n + n H)

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-22
SLIDE 22

The principles of Timsort (1/3)

Algorithm based on merging adjacent runs 1 4 3 1 1 1 3 4

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-23
SLIDE 23

The principles of Timsort (1/3)

Algorithm based on merging adjacent runs 1 4 3 1 1 1 3 4 k ℓ

1 Run merging algorithm: standard + many optimizations ◮ time O(k + ℓ) ◮ memory O(min(k, ℓ))
  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-24
SLIDE 24

The principles of Timsort (1/3)

Algorithm based on merging adjacent runs 1 4 3 1 1 1 3 4 k ℓ ≡ ≡ 3 2 5

1 Run merging algorithm: standard + many optimizations ◮ time O(k + ℓ) ◮ memory O(min(k, ℓ)) 2 Policy for choosing runs to merge: ◮ depends on run lengths only
  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-25
SLIDE 25

The principles of Timsort (1/3)

Algorithm based on merging adjacent runs 1 4 3 1 1 1 3 4 k ℓ ≡ ≡ 3 2 5

1 Run merging algorithm: standard + many optimizations ◮ time O(k + ℓ) ◮ memory O(min(k, ℓ)) 2 Policy for choosing runs to merge: ◮ depends on run lengths only

Let us forget array values – only remember run lengths!

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-26
SLIDE 26

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

STACK

Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-27
SLIDE 27

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs STACK

Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-28
SLIDE 28

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs STACK

3 Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-29
SLIDE 29

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs STACK

3 2 Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-30
SLIDE 30

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs STACK

3 2 6 Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-31
SLIDE 31

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs STACK

3 2 6 Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-32
SLIDE 32

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs

1 1 3 4 5 4 3 2 2 2 ≡ 5 6 1

STACK

6 5 Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-33
SLIDE 33

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs

1 1 3 4 5 4 3 2 2 2 ≡ 5 6 1

STACK

5 6 Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-34
SLIDE 34

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs

1 1 3 4 5 4 3 2 2 2 ≡ 5 6 1 1 1 2 2 3 3 4 4 5 2 ≡ 11 1

STACK

11 Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-35
SLIDE 35

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs

1 1 3 4 5 4 3 2 2 2 ≡ 5 6 1 1 1 2 2 3 3 4 4 5 2 ≡ 11 1

STACK

11 1 Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-36
SLIDE 36

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs

1 1 3 4 5 4 3 2 2 2 ≡ 5 6 1 1 1 2 2 3 3 4 4 5 2 ≡ 11 1

STACK

11 1 Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-37
SLIDE 37

The principles of Timsort (2/3)

1 4 3 1 5 4 3 2 2 2 ≡ 3 2 6 1

Discovered runs

1 1 3 4 5 4 3 2 2 2 ≡ 5 6 1 1 1 2 2 3 3 4 4 5 2 ≡ 11 1 1 1 2 2 2 3 3 4 4 5 ≡ 12

STACK

12 Run merge policy: Maintain a stack of runs Until the array is sorted, either:

1

discover & push a new run length onto the stack

2

merge the top 1st and 2nd runs

3

merge the top 2nd and 3nd runs

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-38
SLIDE 38

Intermezzo: Intelligent design & amortized analysis

Key ideas:

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-39
SLIDE 39

Intermezzo: Intelligent design & amortized analysis

Key ideas: Each run r pays O(r) to

◮ enter the stack (before its 1st merge) ◮ go down 1 floor (after its 1st merge)

STACK

r1 r2 r3 . . . ri ri+1 . . . rℓ r Pushed run

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-40
SLIDE 40

Intermezzo: Intelligent design & amortized analysis

Key ideas: Each run r pays O(r) to

◮ enter the stack (before its 1st merge) ◮ go down 1 floor (after its 1st merge)

Stack height h = O(log(n/r)) when the run entry phase ends

Run entry collapse STACK

r1 r2 r3 . . . ri ri+1 . . . rℓ r Pushed run

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-41
SLIDE 41

Intermezzo: Intelligent design & amortized analysis

Key ideas: Each run r pays O(r) to

◮ enter the stack (before its 1st merge) ◮ go down 1 floor (after its 1st merge)

Stack height h = O(log(n/r)) when the run entry phase ends Merged run

New stack height STACK

r1 r2 r3 . . . rh−2 rh−1 rh Pushed run

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-42
SLIDE 42

Intermezzo: Intelligent design & amortized analysis

Key ideas: Each run r pays O(r) to

◮ enter the stack (before its 1st merge) ◮ go down 1 floor (after its 1st merge)

Stack height h = O(log(n/r)) when the run entry phase ends Ensure that

◮ (ri)i1 has exponential decay when r is pushed ◮ r = rh rh−O(1) when the run entry phase ends

Merged run

New stack height STACK

r1 r2 r3 . . . rh−2 rh−1 rh Pushed run

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-43
SLIDE 43

Intermezzo: Intelligent design & amortized analysis

Key ideas: Each run r pays O(r) to

◮ enter the stack (before its 1st merge) ◮ go down 1 floor (after its 1st merge)

Stack height h = O(log(n/r)) when the run entry phase ends Ensure that

◮ (ri)i1 has exponential decay when r is pushed ◮ r = rh rh−2 when the run entry phase ends

Implementation in Timsort: Fibonacci constraints ri > ri+1 + ri+2 on run push[1] Merge rh−2 and rh−1 whenever rh−2 rh Merged run

New stack height STACK

r1 r2 r3 . . . rh−2 rh−1 rh Pushed run

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-44
SLIDE 44

Intermezzo: Intelligent design & amortized analysis

Key ideas: Each run r pays O(r) to

◮ enter the stack (before its 1st merge) ✓ ◮ go down 1 floor (after its 1st merge)

Stack height h = O(log(n/r)) when the run entry phase ends ✓ Ensure that

◮ (ri)i1 has exponential decay when r is pushed ✓ ◮ r = rh rh−2 when the run entry phase ends

Implementation in Timsort: Fibonacci constraints ri > ri+1 + ri+2 on run push[1] Merge rh−2 and rh−1 whenever rh−2 rh ✓ Merged run

New stack height STACK

r1 r2 r3 . . . rh−2 rh−1 rh Pushed run

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-45
SLIDE 45

The principles of Timsort (3/3)

Choice rules for options

1 discover & push a new run length onto the stack 2 merge the top 1st and 2nd runs 3 merge the top 2nd and 3nd runs

Choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh or rh−3 rh−2 + rh−1: choose ② else: choose ① (or ② if ① is unavailable)

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-46
SLIDE 46

The principles of Timsort (3/3)

Choice rules for options

1 discover & push a new run length onto the stack 2 merge the top 1st and 2nd runs 3 merge the top 2nd and 3nd runs

Choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh or rh−3 rh−2 + rh−1: choose ② else: choose ① (or ② if ① is unavailable) Fibonacci constraints: ri > ri+1 + ri+2 for all i h − 4 (induction) ri > ri+1 + ri+2 for i h − 3 on run push

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-47
SLIDE 47

The principles of Timsort (3/3)

Choice rules for options

1 discover & push a new run length onto the stack 2 merge the top 1st and 2nd runs 3 merge the top 2nd and 3nd runs

Choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh or rh−3 rh−2 + rh−1: choose ② else: choose ① (or ② if ① is unavailable) Making runs pay for going down: rh−2 rh−1 rh

€ €

rh−2 rh

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-48
SLIDE 48

The principles of Timsort (3/3)

Choice rules for options

1 discover & push a new run length onto the stack 2 merge the top 1st and 2nd runs 3 merge the top 2nd and 3nd runs

Choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh or rh−3 rh−2 + rh−1: choose ② else: choose ① (or ② if ① is unavailable) Making runs pay for going down: rh−2 rh−1 rh

€ €

rh−2 rh rh−1 rh

€€

rh−1 rh

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-49
SLIDE 49

The principles of Timsort (3/3)

Choice rules for options

1 discover & push a new run length onto the stack 2 merge the top 1st and 2nd runs 3 merge the top 2nd and 3nd runs

Choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh or rh−3 rh−2 + rh−1: choose ② else: choose ① (or ② if ① is unavailable) Making runs pay for going down: rh−2 rh−1 rh

€ €

rh−2 rh rh−1 rh

€€

rh−1 rh rh−2 rh−1 rh

rh−2 rh−1 + rh−3 rh−3 rh−2 + rh−1

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-50
SLIDE 50

The principles of Timsort (3/3)

Choice rules for options

1 discover & push a new run length onto the stack 2 merge the top 1st and 2nd runs 3 merge the top 2nd and 3nd runs

Choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh or rh−3 rh−2 + rh−1: choose ② else: choose ① (or ② if ① is unavailable) Making runs pay (with 1-step delay) for going down: rh−2 rh−1 rh

€ €

rh−2 rh rh−1 rh

€€

rh−1 rh rh−2 rh−1 rh

rh−2 rh−1 + rh−3 rh−3 rh−2 + rh−1

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-51
SLIDE 51

Contents

1

Efficient Merge Sorts

2

Timsort

3

Java Timsort, Bugs and Fixes

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-52
SLIDE 52

Stack size bugs in Java Timsort

Java choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh or rh−3 rh−2 + rh−1: choose ② else: choose ① (or ② if ① is unavailable)

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-53
SLIDE 53

Stack size bugs in Java Timsort

Java choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh: choose ② else: choose ① (or ② if ① is unavailable)

Fibonacci constraints fail!

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-54
SLIDE 54

Stack size bugs in Java Timsort

Java choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh: choose ② else: choose ① (or ② if ① is unavailable)

Fibonacci constraints fail!

Stack height may be higher than forecast

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-55
SLIDE 55

Stack size bugs in Java Timsort

Java choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh: choose ② else: choose ① (or ② if ① is unavailable)

Fibonacci constraints fail!

Stack height may be higher than forecast Suggested fix: add the rh−3 rh−2 + rh−1 test[3]

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-56
SLIDE 56

Stack size bugs in Java Timsort

Java choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh: choose ② else: choose ① (or ② if ① is unavailable)

Fibonacci constraints fail!

Stack height may be higher than forecast Suggested fix: add the rh−3 rh−2 + rh−1 test[3] Custom Java fix: increase maximal stack size[3]

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-57
SLIDE 57

Stack size bugs in Java Timsort

Java choice algorithm

if rh−2 rh: choose ③ else if rh−1 rh, rh−2 rh−1 + rh: choose ② else: choose ① (or ② if ① is unavailable)

Fibonacci constraints fail!

Stack height may be higher than forecast Suggested fix: add the rh−3 rh−2 + rh−1 test[3] Custom Java fix: increase maximal stack size[3]

The increase was not sufficient!

Bug raised by igm.univ-mlv.fr/~pivoteau/Timsort/TimSort.java

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-58
SLIDE 58

Java Timsort complexity analysis

Key steps: Study of the creation of consecutive Fibonacci constraint failures ri ri−1 ri−2 ri−3 ri−4 ri−2 + ri−1 ri ri−3 + ri−2 ri−1 ri−4 + ri−3 < ri−2 Failure Failure Success

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-59
SLIDE 59

Java Timsort complexity analysis

Key steps: Study of the creation of consecutive Fibonacci constraint failures At most 6 consecutive contraint failures ri ri−1 ri−2 ri−3 ri−4 ri−2 + ri−1 ri ri−3 + ri−2 ri−1 ri−4 + ri−3 < ri−2 Failure Failure Success

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-60
SLIDE 60

Java Timsort complexity analysis

Key steps: Study of the creation of consecutive Fibonacci constraint failures At most 6 consecutive contraint failures (ri)i1 has still exponential decay

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-61
SLIDE 61

Java Timsort complexity analysis

Key steps: Study of the creation of consecutive Fibonacci constraint failures At most 6 consecutive contraint failures (ri)i1 has still exponential decay Tight upper bound on stack size!

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-62
SLIDE 62

Java Timsort complexity analysis

Key steps: Study of the creation of consecutive Fibonacci constraint failures At most 6 consecutive contraint failures (ri)i1 has still exponential decay Tight upper bound on stack size!

7 Suggested fix[3] now implemented in Java (JDK 11)!

2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19

1 2

P

3 3 3

A J O

4 5 6 7
  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-63
SLIDE 63

Conclusion

Timsort is good in practice

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-64
SLIDE 64

Conclusion

Timsort is good in practice Timsort is good — —————— in theory: O(n + n H) worst-case time complexity

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-65
SLIDE 65

Conclusion

Timsort is good in practice Timsort is good — —————— in theory: O(n + n H) worst-case time complexity Every algorithm deserves a proof of correctness and complexity

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-66
SLIDE 66

Conclusion

Timsort is good in practice Timsort is good — —————— in theory: O(n + n H) worst-case time complexity Every algorithm deserves a proof of correctness and complexity Some references:

[1] Tim Peters’ description of Timsort, svn.python.org/projects/python/trunk/Objects/listsort.txt (2001) [2] On compressing permutations and adaptive sorting, Barbay & Navarro (2013) [3] OpenJDK’s java.utils.Collection.sort() is broken, de Gouw et al. (2015) [4] Merge Strategies: from Merge Sort to Timsort, Auger et al. (2015) [5] Strategies for stable merge sorting, Buss & Knop (2018) [6] Nearly-optimal mergesorts, Munro & Wild – to be presented now (2018)

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort
slide-67
SLIDE 67

Conclusion

Timsort is good in practice Timsort is good — —————— in theory: O(n + n H) worst-case time complexity Every algorithm deserves a proof of correctness and complexity Some references:

[1] Tim Peters’ description of Timsort, svn.python.org/projects/python/trunk/Objects/listsort.txt (2001) [2] On compressing permutations and adaptive sorting, Barbay & Navarro (2013) [3] OpenJDK’s java.utils.Collection.sort() is broken, de Gouw et al. (2015) [4] Merge Strategies: from Merge Sort to Timsort, Auger et al. (2015) [5] Strategies for stable merge sorting, Buss & Knop (2018) [6] Nearly-optimal mergesorts, Munro & Wild – to be presented now (2018)

  • N. Auger, V. Jugé, C. Nicaud & C. Pivoteau
On the Worst-Case Complexity of Timsort