Ewens-like distributions and Analysis of Algorithms Nicolas Auger , - PowerPoint PPT Presentation

Ewens-like distributions and Analysis of Algorithms Nicolas Auger , Mathilde Bouvel, Cyril Nicaud, Carine Pivoteau March 11, 2016 1 / 16

Notion of presortedness In practice, data are often presorted . No reasons to be uniformly distributed. Few alterations in databases. First intuition in [Knuth73] and formalized in [Mannila86]. MF/%SURES OF PRESORTEDNESSAND OPTIMAL SORTINGALGORITHMS Extended abstract Heikki Mannila Department of Computer Science, University of Helsinki Tukholmankatu 2, SF-00250 Helsinki 25, Finland Abstract The concept of presortedness and its use in sorting are studied. Natural ways to measure presortedness are given and some general properties necessary for a measure are proposed. A concept of a sorting algorithm optimal with respect to a measure of presortedness is defined, and examples of such algorithms are given. An insertion sort is shown to be optimal with respect to three natural measures. The problem of finding an optimal algorithm for an arbitrary measure is studied and partial results are proven~ i. Introduction In practice : The question of identifying in some sense "easy" cases of a computational problem and utilizing this easiness has considerable interest. In sorting, easiness is can be identified with existing order. Indeed, when discussing sorting, it is customary to note that the input can be almost in order or at least have some existing order (see e.g. /Knu73, p. 339/~ /Sed75, p.126/, /Dij82, p. 223/ and /Her83, p. 165/). In this paper we study the use of presortedness in sorting. We do Used in standard libraries this by trying to answer three questions: of a sequence be How can the existing order (pres0rtedness) Oracle’s benchmarks, using spies measured? What does it mean that an algorithm utilizes the presortedness of input (measured in some way)? TimSort 2 / 16

Measures of presortedness Definition Let X = ( x 1 , . . . , x n ) and Y = ( y 1 , . . . , y ℓ ) two sequences of elements from a set E ; m : E + → N is a measure of presortedness iff 1 m ( X ) = 0 if X is sorted. 2 If n = ℓ and x i < x j ⇐ ⇒ y i < y j , then m ( X ) = m ( Y ). 3 If Y is a subsequence of X , then m ( Y ) ≤ m ( X ). 4 If X < Y , then m ( XY ) ≤ m ( X ) + m ( Y ). 5 For any element a , m ( aX ) ≤ | X | + m ( X ). Two classical measures : number of Runs − 1, Runs (4 15 368 27) = 4 number of Inversions, Inv (41536827) = 9 3 / 16

Adaptiveness of sorting algorithms Theorem Let X be a sequence s.t. m ( X ) = k. Any algorithm uses at least C ( n , k ) comparisons to sort X, with C ( n , k ) ∈ Θ( n + log( � below m ( n , k ) � ) and below m ( n , k ) = { σ ∈ S n : m ( σ ) ≤ k } . Definition A sorting algorithm is m-optimal if it reaches this bound. 41536827 Natural Merge Sort [Knuth73] 14523678 O ( n log r ), where r is the number of runs Runs-optimal 12345678 4 / 16

Records as a measure of presortedness Let X = ( x 1 , . . . , x n ) be a sequence; x i is a record iff x j < x i whenever j < i . Lemma For any sequence X of size n, m rec ( X ) = n − record( X ) is a measure of presortedness. Example : For X = 32418567, record( X ) = 3 and m rec ( X ) = 5. Proof. If Y is a subsequence of X , then m rec ( Y ) ≤ m rec ( X ). Two cases : Remove a non-record (if we remove 2, Y = 3418567, rec ( Y ) = 3 and m rec = 4). Remove a record (if we remove 8, Y = 3241567, rec ( Y ) = 5 and m rec ( Y ) = 2). The other properties are trivial. 5 / 16

A m rec -optimal sorting algorithm 32418567 extraction Θ( n ) 21567 348 sorting O ( k log k ) 12567 merging O ( n ) 12345678 � below m rec ( n , k ) � ≥ k ! Overall complexity O ( n + k log k ) 6 / 16

Analysis of algorithms on average Under the uniform distribution, for most measures m : � below m ( n , E [ m ]) � = Θ( n !). O ( n log n ) in average. Questions How to define a probabilistic framework well-suited for presortedness measures ? Analysis of algorithms ? 7 / 16

The classical Ewens distribution Any permutation can be seen as a composition of cycles. Example : 145263 is composed of 3 cycles : (1), (563) and (42). We denote cycle( σ ) the number of cycles of σ . Definition (Ewens distribution) [Ewens72] To any σ ∈ S n , we associate a weight w( σ ) = θ cycle( σ ) , where θ is an arbitrary positive real number. σ ∈ S n w( σ ) = θ ( n ) . Total weight : � P ( σ ) = θ cycle( σ ) θ ( n ) . Notation : θ ( n ) = θ ( θ + 1) . . . ( θ + n − 1) 8 / 16

Generalizing the distribution Definition (Ewens-like distribution) Let χ be any statistic on σ ∈ S n . To any σ ∈ S n , we associate a weight w( σ ) = θ χ ( σ ) . σ ∈ S n w( σ ) and P ( σ ) = w( σ ) Let W n = � W n . 9 / 16

Ewens-like distributions and Analysis of Algorithms Nicolas Auger , - PowerPoint PPT Presentation

Ewens-like distributions and Analysis of Algorithms Nicolas Auger , Mathilde Bouvel, Cyril Nicaud, Carine Pivoteau March 11, 2016 1 / 16 Notion of presortedness In practice, data are often presorted . No reasons to be uniformly distributed. Few

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

JET Job Skills Elementary School I Like Rain By Sarah Rogers-Tanner I like rain I dont like

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Lecture 5: Probability Distributions Random Variables Probability Distributions

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

Math 186: Conditional Probability and Bayes Theorem (2.4) Independence (2.5) Math 283: Ewens

High needs funding reform Russell Ewens Funding Policy Unit The context for changes to high

High needs funding reform Russell Ewens Funding Policy Unit The context for changes to high

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not

Unit 2: Probability and distributions 3. Normal and binomial distributions GOVT 3990 - Spring

Triangular Distributions and Correlations The simple math behind triangular distributions and

Testing properties of distributions Ronitt Rubinfeld MIT and Tel Aviv University Distributions

Meas easurem emen ent of of U Ultra-High En Ener ergy gy Cosmic R Co c Rays: Prese

Daltons Atomic Theory 1. All matter consists of tiny particles. 2. Atoms are indestructible

Spectrum of kaonic atom and kaon-nucleus interaction revisited 2018.11.11-12 Hadron structure

LOPEZ ISLAND SCHOOL DISTRICT YEAR END PRESENTATION FY2019-2020 LOPEZ ISLAND SCHOOL DISTRICT

Verifying a Lustre Compiler Part 2 Llio Brun PARKAS (Inria - ENS) Timothy Bourke,

S h o w e r u n i v e r s a l i t y @ A u g e r A l e x a n d e r

Particle physics at the Pierre Auger Observatory Jan Ebr* for the Pierre Auger Collaboration

Learning and Teaching Academy All about Aurora! The beginnings of the Aurora Programme 7,204

Ewens-like distributions and Analysis of Algorithms Nicolas Auger , - PowerPoint PPT Presentation

Ewens-like distributions and Analysis of Algorithms Nicolas Auger , Mathilde Bouvel, Cyril Nicaud, Carine Pivoteau March 11, 2016 1 / 16 Notion of presortedness In practice, data are often presorted . No reasons to be uniformly distributed. Few

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

JET Job Skills Elementary School I Like Rain By Sarah Rogers-Tanner I like rain I dont like

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Lecture 5: Probability Distributions Random Variables Probability Distributions

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

Math 186: Conditional Probability and Bayes Theorem (2.4) Independence (2.5) Math 283: Ewens

High needs funding reform Russell Ewens Funding Policy Unit The context for changes to high

High needs funding reform Russell Ewens Funding Policy Unit The context for changes to high

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens &amp; Grant 5.1 Math 186: Not

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens &amp; Grant 5.1 Math 186: Not

Unit 2: Probability and distributions 3. Normal and binomial distributions GOVT 3990 - Spring

Triangular Distributions and Correlations The simple math behind triangular distributions and

Testing properties of distributions Ronitt Rubinfeld MIT and Tel Aviv University Distributions

Meas easurem emen ent of of U Ultra-High En Ener ergy gy Cosmic R Co c Rays: Prese

Daltons Atomic Theory 1. All matter consists of tiny particles. 2. Atoms are indestructible

Spectrum of kaonic atom and kaon-nucleus interaction revisited 2018.11.11-12 Hadron structure

LOPEZ ISLAND SCHOOL DISTRICT YEAR END PRESENTATION FY2019-2020 LOPEZ ISLAND SCHOOL DISTRICT

Verifying a Lustre Compiler Part 2 Llio Brun PARKAS (Inria - ENS) Timothy Bourke,

S h o w e r u n i v e r s a l i t y @ A u g e r A l e x a n d e r

Particle physics at the Pierre Auger Observatory Jan Ebr* for the Pierre Auger Collaboration

Learning and Teaching Academy All about Aurora! The beginnings of the Aurora Programme 7,204

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not