on the worst case complexity of timsort
play

On the Worst-Case Complexity of Timsort Nicolas Auger, Vincent Jug, - PowerPoint PPT Presentation

On the Worst-Case Complexity of Timsort Nicolas Auger, Vincent Jug, Cyril Nicaud & Carine Pivoteau LIGM Universit Paris-Est Marne-la-Valle & CNRS 20/08/2018 N. Auger, V. Jug, C. Nicaud & C. Pivoteau On the Worst-Case


  1. On the Worst-Case Complexity of Timsort Nicolas Auger, Vincent Jugé, Cyril Nicaud & Carine Pivoteau LIGM – Université Paris-Est Marne-la-Vallée & CNRS 20/08/2018 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  2. Contents Efficient Merge Sorts 1 Timsort 2 Java Timsort, Bugs and Fixes 3 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  3. Sorting data 0 1 4 3 1 5 4 3 2 2 0 2 0 0 1 1 2 2 2 3 3 4 4 5 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  4. Sorting data – in a stable manner 0 1 1 1 4 1 3 1 1 2 5 1 4 2 3 2 2 1 2 2 0 2 2 3 · · · · · · · · · · · · · · · 0 1 0 2 1 1 1 2 2 1 2 2 2 3 3 1 3 2 4 1 4 2 5 1 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  5. Sorting data – in a stable manner 0 1 1 1 4 1 3 1 1 2 5 1 4 2 3 2 2 1 2 2 0 2 2 3 0 1 0 2 1 1 1 2 2 1 2 2 2 3 3 1 3 2 4 1 4 2 5 1 Mergesort has a worst-case time complexity of O ( n log ( n )) Can we do better? N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  6. Sorting data – in a stable manner 0 1 1 1 4 1 3 1 1 2 5 1 4 2 3 2 2 1 2 2 0 2 2 3 0 1 0 2 1 1 1 2 2 1 2 2 2 3 3 1 3 2 4 1 4 2 5 1 Mergesort has a worst-case time complexity of O ( n log ( n )) Can we do better? No! Proof : There are n ! possible reorderings Each element comparison gives a 1-bit information Thus log 2 ( n !) ∼ n log 2 ( n ) tests are required N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  7. Cannot we ever do better? In some cases, we should. . . 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  8. Let us do better! 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  9. Let us do better! 4 runs of lengths 3 , 2 , 6 and 1 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs 2 New parameters: Number of runs ( ρ ) and their lengths ( r 1 , . . . , r ρ ) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  10. Let us do better! 4 runs of lengths 3 , 2 , 6 and 1 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs 2 New parameters: Number of runs ( ρ ) and their lengths ( r 1 , . . . , r ρ ) New parameters: Run-length entropy : H = � ρ k = 1 ( r i / n ) log 2 ( n / r i ) New parameters: Run-length entropy : H � log 2 ( ρ ) � log 2 ( n ) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  11. Let us do better! 4 runs of lengths 3 , 2 , 6 and 1 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs 2 New parameters: Number of runs ( ρ ) and their lengths ( r 1 , . . . , r ρ ) New parameters: Run-length entropy : H = � ρ k = 1 ( r i / n ) log 2 ( n / r i ) New parameters: Run-length entropy : H � log 2 ( ρ ) � log 2 ( n ) Theorem (Auger – Jugé – Nicaud – Pivoteau 2018) Timsort has a worst-case time complexity of O ( n + n log ( ρ )) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  12. Let us do better! 4 runs of lengths 3 , 2 , 6 and 1 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs 2 New parameters: Number of runs ( ρ ) and their lengths ( r 1 , . . . , r ρ ) New parameters: Run-length entropy : H = � ρ k = 1 ( r i / n ) log 2 ( n / r i ) New parameters: Run-length entropy : H � log 2 ( ρ ) � log 2 ( n ) Theorem (Auger – Jugé – Nicaud – Pivoteau 2018) Timsort has a worst-case time complexity of O ( n + n H ) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  13. Let us do better! 4 runs of lengths 3 , 2 , 6 and 1 0 1 4 3 1 5 4 3 2 2 0 2 1 Chunk your data in monotonic runs 2 New parameters: Number of runs ( ρ ) and their lengths ( r 1 , . . . , r ρ ) New parameters: Run-length entropy : H = � ρ k = 1 ( r i / n ) log 2 ( n / r i ) New parameters: Run-length entropy : H � log 2 ( ρ ) � log 2 ( n ) Theorem (Auger – Jugé – Nicaud – Pivoteau 2018) Timsort has a worst-case time complexity of O ( n + n H ) We cannot do better than Ω( n + n H ) ! [2] Reading the whole input requires a time Ω( n ) There are X possible reorderings, with X � 2 1 − ρ � n � 2 n H / 2 � r 1 ... r ρ N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  14. Contents Efficient Merge Sorts 1 Timsort 2 Java Timsort, Bugs and Fixes 3 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  15. A brief history of Timsort 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  16. A brief history of Timsort 1 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  17. A brief history of Timsort P 1 2 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] 2 Standard algorithm in Python N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  18. A brief history of Timsort P A J O 1 2 3 3 3 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] 2 Standard algorithm in Python 3 Standard algorithm ———————— for non-primitive arrays in Android , Java , Octave N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  19. A brief history of Timsort P A J O 1 2 3 3 3 4 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] 2 Standard algorithm in Python 3 Standard algorithm ———————— for non-primitive arrays in Android , Java , Octave 4 Stack size bug uncovered – a provably correct fix is suggested: [3] ◮ suggested fix implemented in Python ( true Timsort) ◮ custom fix implemented in Java ( Java Timsort) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  20. A brief history of Timsort P A J O 5 1 2 3 3 3 4 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] 2 Standard algorithm in Python 3 Standard algorithm ———————— for non-primitive arrays in Android , Java , Octave 4 Stack size bug uncovered – a provably correct fix is suggested: [3] ◮ suggested fix implemented in Python ( true Timsort) ◮ custom fix implemented in Java ( Java Timsort) 5 1 st worst-case complexity analysis [4] – Timsort works in time O ( n log n ) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  21. A brief history of Timsort P A J O 5 1 2 3 3 3 4 6 2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 1 Invented by Tim Peters [1] 2 Standard algorithm in Python 3 Standard algorithm ———————— for non-primitive arrays in Android , Java , Octave 4 Stack size bug uncovered – a provably correct fix is suggested: [3] ◮ suggested fix implemented in Python ( true Timsort) ◮ custom fix implemented in Java ( Java Timsort) 5 1 st worst-case complexity analysis [4] – Timsort works in time O ( n log n ) 6 Another stack size bug uncovered ( Java version) Refined worst-case analysis: both versions work in time O ( n + n H ) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  22. The principles of Timsort (1/3) Algorithm based on merging adjacent runs 0 1 4 3 1 0 1 1 3 4 N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  23. The principles of Timsort (1/3) Algorithm based on merging adjacent runs ℓ k 0 1 4 3 1 0 1 1 3 4 1 Run merging algorithm: standard + many optimizations ◮ time O ( k + ℓ ) ◮ memory O ( min ( k , ℓ )) N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  24. The principles of Timsort (1/3) Algorithm based on merging adjacent runs ℓ k 0 1 4 3 1 3 2 ≡ 0 1 1 3 4 5 ≡ 1 Run merging algorithm: standard + many optimizations ◮ time O ( k + ℓ ) ◮ memory O ( min ( k , ℓ )) 2 Policy for choosing runs to merge: ◮ depends on run lengths only N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

  25. The principles of Timsort (1/3) Algorithm based on merging adjacent runs ℓ k 0 1 4 3 1 3 2 ≡ 0 1 1 3 4 5 ≡ 1 Run merging algorithm: standard + many optimizations ◮ time O ( k + ℓ ) ◮ memory O ( min ( k , ℓ )) 2 Policy for choosing runs to merge: ◮ depends on run lengths only Let us forget array values – only remember run lengths ! N. Auger, V. Jugé, C. Nicaud & C. Pivoteau On the Worst-Case Complexity of Timsort

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend