On the Worst-Case Complexity of Timsort Nicolas Auger, Vincent Jug, - PowerPoint PPT Presentation

On the Worst-Case Complexity of Timsort Nicolas Auger, Vincent Jugé, Cyril Nicaud & Carine Pivoteau LIGM – Université Paris-Est Marne-la-Vallée & CNRS 20/08/2018 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Contents Efficient Merge Sorts 1 Timsort 2 Java Timsort, Bugs and Fixes 3 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Sorting data 0 1 4 3 1 5 4 3 2 2 0 2 0 0 1 1 2 2 2 3 3 4 4 5 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Sorting data – in a stable manner 0 1 1 1 4 1 3 1 1 2 5 1 4 2 3 2 2 1 2 2 0 2 2 3 · · · · · · · · · · · · · · · 0 1 0 2 1 1 1 2 2 1 2 2 2 3 3 1 3 2 4 1 4 2 5 1 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Sorting data – in a stable manner 0 1 1 1 4 1 3 1 1 2 5 1 4 2 3 2 2 1 2 2 0 2 2 3 0 1 0 2 1 1 1 2 2 1 2 2 2 3 3 1 3 2 4 1 4 2 5 1 Mergesort has a worst-case time complexity of O ( n log ( n )) Can we do better? N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Sorting data – in a stable manner 0 1 1 1 4 1 3 1 1 2 5 1 4 2 3 2 2 1 2 2 0 2 2 3 0 1 0 2 1 1 1 2 2 1 2 2 2 3 3 1 3 2 4 1 4 2 5 1 Mergesort has a worst-case time complexity of O ( n log ( n )) Can we do better? No! Proof : There are n ! possible reorderings Each element comparison gives a 1-bit information Thus log 2 ( n !) ∼ n log 2 ( n ) tests are required N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Cannot we ever do better? In some cases, we should. . . 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Let us do better! 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Let us do better! 4 runs of lengths 3 , 2 , 6 and 1 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs 2 New parameters: Number of runs ( ρ ) and their lengths ( r 1 , . . . , r ρ ) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Let us do better! 4 runs of lengths 3 , 2 , 6 and 1 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs 2 New parameters: Number of runs ( ρ ) and their lengths ( r 1 , . . . , r ρ ) New parameters: Run-length entropy : H = � ρ k = 1 ( r i / n ) log 2 ( n / r i ) New parameters: Run-length entropy : H � log 2 ( ρ ) � log 2 ( n ) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Let us do better! 4 runs of lengths 3 , 2 , 6 and 1 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs 2 New parameters: Number of runs ( ρ ) and their lengths ( r 1 , . . . , r ρ ) New parameters: Run-length entropy : H = � ρ k = 1 ( r i / n ) log 2 ( n / r i ) New parameters: Run-length entropy : H � log 2 ( ρ ) � log 2 ( n ) Theorem (Auger – Jugé – Nicaud – Pivoteau 2018) Timsort has a worst-case time complexity of O ( n + n log ( ρ )) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Let us do better! 4 runs of lengths 3 , 2 , 6 and 1 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs 2 New parameters: Number of runs ( ρ ) and their lengths ( r 1 , . . . , r ρ ) New parameters: Run-length entropy : H = � ρ k = 1 ( r i / n ) log 2 ( n / r i ) New parameters: Run-length entropy : H � log 2 ( ρ ) � log 2 ( n ) Theorem (Auger – Jugé – Nicaud – Pivoteau 2018) Timsort has a worst-case time complexity of O ( n + n H ) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Let us do better! 4 runs of lengths 3 , 2 , 6 and 1 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs 2 New parameters: Number of runs ( ρ ) and their lengths ( r 1 , . . . , r ρ ) New parameters: Run-length entropy : H = � ρ k = 1 ( r i / n ) log 2 ( n / r i ) New parameters: Run-length entropy : H � log 2 ( ρ ) � log 2 ( n ) Theorem (Auger – Jugé – Nicaud – Pivoteau 2018) Timsort has a worst-case time complexity of O ( n + n H ) We cannot do better than Ω( n + n H ) ! [2] Reading the whole input requires a time Ω( n ) There are X possible reorderings, with X � 2 1 − ρ � n � 2 n H / 2 � r 1 ... r ρ N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Contents Efficient Merge Sorts 1 Timsort 2 Java Timsort, Bugs and Fixes 3 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

A brief history of Timsort 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

A brief history of Timsort 1 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

A brief history of Timsort P 1 2 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] 2 Standard algorithm in Python N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

A brief history of Timsort P A J O 1 2 3 3 3 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] 2 Standard algorithm in Python 3 Standard algorithm ———————— for non-primitive arrays in Android , Java , Octave N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

A brief history of Timsort P A J O 1 2 3 3 3 4 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] 2 Standard algorithm in Python 3 Standard algorithm ———————— for non-primitive arrays in Android , Java , Octave 4 Stack size bug uncovered – a provably correct fix is suggested: [3] ◮ suggested fix implemented in Python ( true Timsort) ◮ custom fix implemented in Java ( Java Timsort) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

A brief history of Timsort P A J O 5 1 2 3 3 3 4 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] 2 Standard algorithm in Python 3 Standard algorithm ———————— for non-primitive arrays in Android , Java , Octave 4 Stack size bug uncovered – a provably correct fix is suggested: [3] ◮ suggested fix implemented in Python ( true Timsort) ◮ custom fix implemented in Java ( Java Timsort) 5 1 st worst-case complexity analysis [4] – Timsort works in time O ( n log n ) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

A brief history of Timsort P A J O 5 1 2 3 3 3 4 6 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] 2 Standard algorithm in Python 3 Standard algorithm ———————— for non-primitive arrays in Android , Java , Octave 4 Stack size bug uncovered – a provably correct fix is suggested: [3] ◮ suggested fix implemented in Python ( true Timsort) ◮ custom fix implemented in Java ( Java Timsort) 5 1 st worst-case complexity analysis [4] – Timsort works in time O ( n log n ) 6 Another stack size bug uncovered ( Java version) Refined worst-case analysis: both versions work in time O ( n + n H ) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

The principles of Timsort (1/3) Algorithm based on merging adjacent runs 0 1 4 3 1 0 1 1 3 4 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

The principles of Timsort (1/3) Algorithm based on merging adjacent runs ℓ k 0 1 4 3 1 0 1 1 3 4 1 Run merging algorithm: standard + many optimizations ◮ time O ( k + ℓ ) ◮ memory O ( min ( k , ℓ )) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

The principles of Timsort (1/3) Algorithm based on merging adjacent runs ℓ k 0 1 4 3 1 3 2 ≡ 0 1 1 3 4 5 ≡ 1 Run merging algorithm: standard + many optimizations ◮ time O ( k + ℓ ) ◮ memory O ( min ( k , ℓ )) 2 Policy for choosing runs to merge: ◮ depends on run lengths only N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

The principles of Timsort (1/3) Algorithm based on merging adjacent runs ℓ k 0 1 4 3 1 3 2 ≡ 0 1 1 3 4 5 ≡ 1 Run merging algorithm: standard + many optimizations ◮ time O ( k + ℓ ) ◮ memory O ( min ( k , ℓ )) 2 Policy for choosing runs to merge: ◮ depends on run lengths only Let us forget array values – only remember run lengths ! N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

On the Worst-Case Complexity of Timsort Nicolas Auger, Vincent Jug, - PowerPoint PPT Presentation

On the Worst-Case Complexity of Timsort Nicolas Auger, Vincent Jug, Cyril Nicaud & Carine Pivoteau LIGM Universit Paris-Est Marne-la-Valle & CNRS 20/08/2018 N. Auger, V. Jug, C. Nicaud & C. Pivoteau On the Worst-Case

quiz insertion sort: worst-case time complexity? best-case time complexity? in-place?

Complexity of the Adaptive ShiversSort Algorithm and of its sibling TimSort Vincent Jug LIGM

Section 3.3 Section Summary ! Time Complexity ! Worst-Case Complexity ! Algorithmic Paradigms !

Lattices that Admit Logarithmic Worst-Case to Average-Case Connection Factors Chris Peikert 1

Information Geometry in Mathematical Finance: Model Risk, Worst and Almost Worst Scenarios Imre

Week 15 - Monday What did we talk about last time? Timsort Tries Lab hours

Worst-case Ethernet Network Latency for Shaped Sources Max Azarov, Standard Microsystems (SMSC)

Comparison of Efficiency Binary Binomial Procedure (worst- (worst- (amortized) case) case)

Typical versus Worst Case Design in Networking Nandita Dukkipati Yashar Ganjali, Rui Zhang-Shen

Lattices: From Worst-Case, to Average-Case, to Cryptography Chris Peikert Georgia Institute of

Cryptography from worst-case complexity assumptions Daniele Micciancio UC San Diego LLL+25

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

The discriminant is an invariant P ( x ) = x n e 1 x n 1 + e 2 x n 2 + + (

Perforce Overview Sven Erik Knop Perforce Software What we do Perforce helps delivery teams

Defining the Issues: Supernovae Saul Perlmutter Berkeley Premise #1: Dark Energy, after 10

Verification of One Integer Parameter Recursive Sequential Procedures Ahmed Bouajjani Liafa -

Studying extragalactic extinction using Type Ia Supernovae Rahman Amanullah, The Oskar Klein

Around State Equation Piotr Hofman University of Warsaw Outline 1 Petri Nets. 2 The continuous

Sakellaridis-Venkatesh conjectures for real classical symmetric spaces David Renard, joint work

Beyond Dark Matter and Dark Energy Sean Carroll Beyond Dark Matter and Dark Energy Sean Carroll,