ewens like distributions and analysis of algorithms
play

Ewens-like distributions and Analysis of Algorithms Nicolas Auger , - PowerPoint PPT Presentation

Ewens-like distributions and Analysis of Algorithms Nicolas Auger , Mathilde Bouvel, Cyril Nicaud, Carine Pivoteau March 11, 2016 1 / 16 Notion of presortedness In practice, data are often presorted . No reasons to be uniformly distributed. Few


  1. Ewens-like distributions and Analysis of Algorithms Nicolas Auger , Mathilde Bouvel, Cyril Nicaud, Carine Pivoteau March 11, 2016 1 / 16

  2. Notion of presortedness In practice, data are often presorted . No reasons to be uniformly distributed. Few alterations in databases. First intuition in [Knuth73] and formalized in [Mannila86]. MF/%SURES OF PRESORTEDNESSAND OPTIMAL SORTINGALGORITHMS Extended abstract Heikki Mannila Department of Computer Science, University of Helsinki Tukholmankatu 2, SF-00250 Helsinki 25, Finland Abstract The concept of presortedness and its use in sorting are studied. Natural ways to measure presortedness are given and some general properties necessary for a measure are proposed. A concept of a sorting algorithm optimal with respect to a measure of presortedness is defined, and examples of such algorithms are given. An insertion sort is shown to be optimal with respect to three natural measures. The problem of finding an optimal algorithm for an arbitrary measure is studied and partial results are proven~ i. Introduction In practice : The question of identifying in some sense "easy" cases of a computational problem and utilizing this easiness has considerable interest. In sorting, easiness is can be identified with existing order. Indeed, when discussing sorting, it is customary to note that the input can be almost in order or at least have some existing order (see e.g. /Knu73, p. 339/~ /Sed75, p.126/, /Dij82, p. 223/ and /Her83, p. 165/). In this paper we study the use of presortedness in sorting. We do Used in standard libraries this by trying to answer three questions: of a sequence be How can the existing order (pres0rtedness) Oracle’s benchmarks, using spies measured? What does it mean that an algorithm utilizes the presortedness of input (measured in some way)? TimSort 2 / 16

  3. Measures of presortedness Definition Let X = ( x 1 , . . . , x n ) and Y = ( y 1 , . . . , y ℓ ) two sequences of elements from a set E ; m : E + → N is a measure of presortedness iff 1 m ( X ) = 0 if X is sorted. 2 If n = ℓ and x i < x j ⇐ ⇒ y i < y j , then m ( X ) = m ( Y ). 3 If Y is a subsequence of X , then m ( Y ) ≤ m ( X ). 4 If X < Y , then m ( XY ) ≤ m ( X ) + m ( Y ). 5 For any element a , m ( aX ) ≤ | X | + m ( X ). Two classical measures : number of Runs − 1, Runs (4 15 368 27) = 4 number of Inversions, Inv (41536827) = 9 3 / 16

  4. Measures of presortedness Definition Let X = ( x 1 , . . . , x n ) and Y = ( y 1 , . . . , y ℓ ) two sequences of elements from a set E ; m : E + → N is a measure of presortedness iff 1 m ( X ) = 0 if X is sorted. 2 If n = ℓ and x i < x j ⇐ ⇒ y i < y j , then m ( X ) = m ( Y ). 3 If Y is a subsequence of X , then m ( Y ) ≤ m ( X ). 4 If X < Y , then m ( XY ) ≤ m ( X ) + m ( Y ). 5 For any element a , m ( aX ) ≤ | X | + m ( X ). Two classical measures : number of Runs − 1, Runs (4 15 368 27) = 4 number of Inversions, Inv (41536827) = 9 3 / 16

  5. Measures of presortedness Definition Let X = ( x 1 , . . . , x n ) and Y = ( y 1 , . . . , y ℓ ) two sequences of elements from a set E ; m : E + → N is a measure of presortedness iff 1 m ( X ) = 0 if X is sorted. 2 If n = ℓ and x i < x j ⇐ ⇒ y i < y j , then m ( X ) = m ( Y ). 3 If Y is a subsequence of X , then m ( Y ) ≤ m ( X ). 4 If X < Y , then m ( XY ) ≤ m ( X ) + m ( Y ). 5 For any element a , m ( aX ) ≤ | X | + m ( X ). Two classical measures : number of Runs − 1, Runs (4 15 368 27) = 4 number of Inversions, Inv (41536827) = 9 3 / 16

  6. Measures of presortedness Definition Let X = ( x 1 , . . . , x n ) and Y = ( y 1 , . . . , y ℓ ) two sequences of elements from a set E ; m : E + → N is a measure of presortedness iff 1 m ( X ) = 0 if X is sorted. 2 If n = ℓ and x i < x j ⇐ ⇒ y i < y j , then m ( X ) = m ( Y ). 3 If Y is a subsequence of X , then m ( Y ) ≤ m ( X ). 4 If X < Y , then m ( XY ) ≤ m ( X ) + m ( Y ). 5 For any element a , m ( aX ) ≤ | X | + m ( X ). Two classical measures : number of Runs − 1, Runs (4 15 368 27) = 4 number of Inversions, Inv (41536827) = 9 3 / 16

  7. Measures of presortedness Definition Let X = ( x 1 , . . . , x n ) and Y = ( y 1 , . . . , y ℓ ) two sequences of elements from a set E ; m : E + → N is a measure of presortedness iff 1 m ( X ) = 0 if X is sorted. 2 If n = ℓ and x i < x j ⇐ ⇒ y i < y j , then m ( X ) = m ( Y ). 3 If Y is a subsequence of X , then m ( Y ) ≤ m ( X ). 4 If X < Y , then m ( XY ) ≤ m ( X ) + m ( Y ). 5 For any element a , m ( aX ) ≤ | X | + m ( X ). Two classical measures : number of Runs − 1, Runs (4 15 368 27) = 4 number of Inversions, Inv (41536827) = 9 3 / 16

  8. Measures of presortedness Definition Let X = ( x 1 , . . . , x n ) and Y = ( y 1 , . . . , y ℓ ) two sequences of elements from a set E ; m : E + → N is a measure of presortedness iff 1 m ( X ) = 0 if X is sorted. 2 If n = ℓ and x i < x j ⇐ ⇒ y i < y j , then m ( X ) = m ( Y ). 3 If Y is a subsequence of X , then m ( Y ) ≤ m ( X ). 4 If X < Y , then m ( XY ) ≤ m ( X ) + m ( Y ). 5 For any element a , m ( aX ) ≤ | X | + m ( X ). Two classical measures : number of Runs − 1, Runs (4 15 368 27) = 4 number of Inversions, Inv (41536827) = 9 3 / 16

  9. Measures of presortedness Definition Let X = ( x 1 , . . . , x n ) and Y = ( y 1 , . . . , y ℓ ) two sequences of elements from a set E ; m : E + → N is a measure of presortedness iff 1 m ( X ) = 0 if X is sorted. 2 If n = ℓ and x i < x j ⇐ ⇒ y i < y j , then m ( X ) = m ( Y ). 3 If Y is a subsequence of X , then m ( Y ) ≤ m ( X ). 4 If X < Y , then m ( XY ) ≤ m ( X ) + m ( Y ). 5 For any element a , m ( aX ) ≤ | X | + m ( X ). Two classical measures : number of Runs − 1, Runs (4 15 368 27) = 4 number of Inversions, Inv (41536827) = 9 3 / 16

  10. Measures of presortedness Definition Let X = ( x 1 , . . . , x n ) and Y = ( y 1 , . . . , y ℓ ) two sequences of elements from a set E ; m : E + → N is a measure of presortedness iff 1 m ( X ) = 0 if X is sorted. 2 If n = ℓ and x i < x j ⇐ ⇒ y i < y j , then m ( X ) = m ( Y ). 3 If Y is a subsequence of X , then m ( Y ) ≤ m ( X ). 4 If X < Y , then m ( XY ) ≤ m ( X ) + m ( Y ). 5 For any element a , m ( aX ) ≤ | X | + m ( X ). Two classical measures : number of Runs − 1, Runs (4 15 368 27) = 4 number of Inversions, Inv (41536827) = 9 3 / 16

  11. Measures of presortedness Definition Let X = ( x 1 , . . . , x n ) and Y = ( y 1 , . . . , y ℓ ) two sequences of elements from a set E ; m : E + → N is a measure of presortedness iff 1 m ( X ) = 0 if X is sorted. 2 If n = ℓ and x i < x j ⇐ ⇒ y i < y j , then m ( X ) = m ( Y ). 3 If Y is a subsequence of X , then m ( Y ) ≤ m ( X ). 4 If X < Y , then m ( XY ) ≤ m ( X ) + m ( Y ). 5 For any element a , m ( aX ) ≤ | X | + m ( X ). Two classical measures : number of Runs − 1, Runs (4 15 368 27) = 4 number of Inversions, Inv (41536827) = 9 3 / 16

  12. Adaptiveness of sorting algorithms Theorem Let X be a sequence s.t. m ( X ) = k. Any algorithm uses at least C ( n , k ) comparisons to sort X, with C ( n , k ) ∈ Θ( n + log( � below m ( n , k ) � ) and below m ( n , k ) = { σ ∈ S n : m ( σ ) ≤ k } . Definition A sorting algorithm is m-optimal if it reaches this bound. 41536827 Natural Merge Sort [Knuth73] 14523678 O ( n log r ), where r is the number of runs Runs-optimal 12345678 4 / 16

  13. Records as a measure of presortedness Let X = ( x 1 , . . . , x n ) be a sequence; x i is a record iff x j < x i whenever j < i . Lemma For any sequence X of size n, m rec ( X ) = n − record( X ) is a measure of presortedness. Example : For X = 32418567, record( X ) = 3 and m rec ( X ) = 5. Proof. If Y is a subsequence of X , then m rec ( Y ) ≤ m rec ( X ). Two cases : Remove a non-record (if we remove 2, Y = 3418567, rec ( Y ) = 3 and m rec = 4). Remove a record (if we remove 8, Y = 3241567, rec ( Y ) = 5 and m rec ( Y ) = 2). The other properties are trivial. 5 / 16

  14. A m rec -optimal sorting algorithm 32418567 extraction Θ( n ) 21567 348 sorting O ( k log k ) 12567 merging O ( n ) 12345678 � below m rec ( n , k ) � ≥ k ! Overall complexity O ( n + k log k ) 6 / 16

  15. Analysis of algorithms on average Under the uniform distribution, for most measures m : � below m ( n , E [ m ]) � = Θ( n !). O ( n log n ) in average. Questions How to define a probabilistic framework well-suited for presortedness measures ? Analysis of algorithms ? 7 / 16

  16. The classical Ewens distribution Any permutation can be seen as a composition of cycles. Example : 145263 is composed of 3 cycles : (1), (563) and (42). We denote cycle( σ ) the number of cycles of σ . Definition (Ewens distribution) [Ewens72] To any σ ∈ S n , we associate a weight w( σ ) = θ cycle( σ ) , where θ is an arbitrary positive real number. σ ∈ S n w( σ ) = θ ( n ) . Total weight : � P ( σ ) = θ cycle( σ ) θ ( n ) . Notation : θ ( n ) = θ ( θ + 1) . . . ( θ + n − 1) 8 / 16

  17. Generalizing the distribution Definition (Ewens-like distribution) Let χ be any statistic on σ ∈ S n . To any σ ∈ S n , we associate a weight w( σ ) = θ χ ( σ ) . σ ∈ S n w( σ ) and P ( σ ) = w( σ ) Let W n = � W n . 9 / 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend