analysis of approximate median selection
play

Analysis of Approximate Median Selection M. Hofri Department of - PDF document

Analysis of Approximate Median Selection M. Hofri Department of Computer Science, WPI Collaborators: Domenico Cantone & students Universit` a di Catania, Dipartimento di Matematica Svante Janson Department of Mathematics, Uppsala


  1. Analysis of Approximate Median Selection M. Hofri Department of Computer Science, WPI Collaborators: Domenico Cantone & students Universit` a di Catania, Dipartimento di Matematica Svante Janson Department of Mathematics, Uppsala University

  2. 2 Finding the median efficiently — a difficult problem. A deterministic algorithm for the exact median was improved in 5/99 by Dor & Zwick, requiring (in the worst case) ≈ 2 . 942 n . Extremely involved . . . For expected number of comparisons: Floyd & Rivest showed (1975) it can be done in ( 1 . 5 + o ( 1 )) n . Cunto & Munro (1989): this bound is tight. Our algorithm was developed in 1998 by Cantone — and only much later we discovered that several formulated various analogues earlier — as early as 1978! Deterministic, uses at most 1 . 5 n comparisons, and the expected number is 4 / 3 n . Major virtue: extremely easy to implement (and understand) — but it only approximates the median.

  3. Sicilian Median Selection 3 12 22 26 13 21 7 10 2 16 5 11 27 9 17 25 23 1 14 20 3 8 24 15 18 19 4 6 s ✰ 22 13 10 11 17 14 8 18 6 ❯ ☛ 13 14 8 ❄ 13 This is performed in situ. Essentially the same algorithm can be done “on- line:” processing a stream of values and using work- area of 4log 3 n positions.

  4. 4 Analysis — Cost of search Finding median of three requires 2 comparisons in 2 permutations, 3 comparisons in 4 permutations, — out of the 6 possible permutations. Hence E [ C 3 ] = 8 / 3 . The expected total number of comparisons when looking in a list of size n : C 3 ( n ) = n 3 · 8 3 + C 3 ( n 3 ) , C 3 ( 1 ) = 0 Result: C 3 ( n ) = 4 3 ( n − 1 ) . The number of elements that are moved is similarly E 3 ( n ) = 1 3 ( n − 1 ) . 1 2 ( n − 1 ) . The number of three-medians computed:

  5. Sicilian Median Selection 5 Analysis — Probabilities of selection To show that the selected median – X n – is likely to be close to the true median we need to compute the distribution of the rank of the selected entry, X n . Let n = 3 r . The key quantity is q ( r ) def = the number of permuta- a , b tions, out of the n ! possible ones, in which the entry which is the a th smallest in the array is: ( i ) selected, and ( ii ) has rank b ( = is the b th smallest) in the next set, 3 = 3 r − 1 entries. that has n The counting is performed in two steps: 1. Count permutations in which a is chosen in the b th triplet, and all the entries chosen in the first b − 1 triplets are smaller than a , and all the items chosen in the rightmost n / 3 − b triplets are larger that a . 2. Compensate for this restriction: multiply the re- sult of step one by the number of rearrangements of

  6. 6 � n ( n / 3 ) ! 3 − 1 3 − b ) ! = n � such permutations: . ( b − 1 ) ! ( n b − 1 3 The first step is not that simple, and it produces the following expression, � 1 n � b − 1 �� 3 − b 2 n ( a − 1 ) ! ( n − a ) !3 a − b ∑ 9 i . a − 2 b − i i i We find: � n � 3 − 1 q ( r ) 3 a − b − 1 a , b = 2 n ( a − 1 ) ! ( n − a ) ! b − 1 � 1 n � b − 1 �� 3 − br × ∑ 9 i . a − 2 b − i i i The related probability: p ( r ) a , b = q ( r ) a , b / n ! : � 1 3 − b � n 3 − 1 � n � b − 1 �� 3 − b a , b = 2 p ( r ) b − 1 � × ∑ 3 · 3 − a � n − 1 a − 2 b − i 9 i i i a − 1 3 − b � n 3 − 1 � = 2 � × [ z a − 2 b ]( 1 + z n b − 1 9 ) b − 1 ( 1 + z ) 3 − b . 3 · 3 − a � n − 1 a − 1

  7. Sicilian Median Selection 7 Finally, P ( r ) a : the probability that the algorithm chooses 1 , ..., n = 3 r . a from an array holding P ( r ) = ∑ b r p ( r ) a , b r P ( r − 1 ) p ( r ) a , b r p ( r − 1 ) b r , b r − 1 ··· p ( 2 ) ∑ = a b 3 , 2 b r b r , b r − 1 , ··· , b 3 2 j − 1 ≤ b j ≤ 3 j − 1 − 2 j − 1 + 1 . For � r 3 a − 1 � 2 P ( r ) = a � n − 1 � 3 a − 1 � 1 3 j − 1 − b j r � b j − 1 �� ∑ ∏ j = 2 ∑ × 9 i j b j + 1 − 2 b j − i j i j i j ≥ 0 b r , b r − 1 , ··· , b 3 b j ∈ [ 2 j − 1 . . 3 j − 1 − 2 j − 1 + 1 ] , b 2 = 2 and b r + 1 ≡ a . No known reduction . . . Numerical calculations produced:

  8. 8 σ d / n 2 / 3 n r = log 3 n σ d Avg. 9 2 0.428571 0.494872 0.114375 27 3 1.475971 1.184262 0.131585 81 4 3.617240 2.782263 0.148619 243 5 8.096189 6.194667 0.159079 729 6 17.377167 13.282273 0.163979 2187 7 36.427027 27.826992 0.165158 Variance ratios for the median selection as function of array size d is the error of the approximation: � � � X n − n + 1 � � d ≡ � � 2 � What can we expect when n grows?

  9. Sicilian Median Selection 9 0.25 0.2 0.15 0.1 0.05 0 8 10 12 14 16 18 20 Plot of the median probability distribution for n=27

  10. 10 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 20 40 60 80 100 120 140 160 180 200 220 Plot of the median probability distribution for n=243

  11. Sicilian Median Selection 11 To answer the last question we look at a “similar” situation, where we look at n independent random variables: Ξ = ( ξ 1 , ξ 2 ,..., ξ n ) , ξ j ∼ U ( 0 , 1 ) . Ξ is a permutation of their sorted order, S ( Ξ ) : S ( Ξ ) = ( ξ ( 1 ) ≤ ξ ( 2 ) ≤ ··· ≤ ξ ( n ) ) . Observation: If the Sicilian algorithm operates on this permutation of N n , and returns X n = k , then sicking it on Ξ would return Y n = ξ ( k ) . The idea: Y n tracks X n n , but—due to the indpendence of the n variables ξ i —it has a simpler distribution.

  12. 12 How good is the tracking? Condition on the sampled value: �� 2 �� � X n − n + 1 � 2 � � Y n − X n − 1 / 2 Y n − 1 2 − = E S E S 2 n n � 2 � ξ ( k ) − k − 1 / 2 � 1 = E k 4 n . n And the variance of | D n | / n is larger, and decreases more slowly! We said Y n is simpler. . . How simple is it? n = 3 r . F r ( x ) ≡ Pr ( Y n − 1 / 2 ≤ x ) , − 1 / 2 ≤ x ≤ 1 / 2 , F 0 ( x ) = x + 1 / 2 . Now we need a recurrence: is the median of 3 independent values ∼ Y n , Y 3 n hence F r + 1 ( x ) = Pr ( Y 3 n ≤ x + 1 / 2 ) = 3 F 2 r ( x )( 1 − F r ( x ))+ F 3 r ( x ) = 3 F 2 r ( x ) − 2 F 3 r ( x ) .

  13. Sicilian Median Selection 13 A simpler form is obtained by shifting F r ( · ) by 1/2; G r ( x ) ≡ F r ( x ) − 1 / 2 = ⇒ G 0 ( x ) = x , We get our first key equation: G r + 1 ( x ) = 3 2 G r ( x ) − 2 G 3 r ( x ) . But it is not interesting! it is satisfied by  − 1 x < a 2   G r ( x ) = x = a 0  1 x > 0  2 def = X n − n + 1 This says: D n n → 0 , 2 . D n Need change of scale. We showed, √ � 2 → 0 Y n − 1 µ 2 r E �� � − D n / n ∀ µ ∈ [ 0 , 3 ) . 2 Hence we can track µ r ( D n / n ) with µ r ( Y n − 1 / 2 ) . We pick a convenient value, µ = 3 / 2 and show:

  14. 14 Theorem [Svante Janson] Let n = 3 r , r ∈ N . X n — approximate median of random permutation of N n . Then a random variable X exists, such that � r X n − n + 1 � 3 2 − → X , 2 n where X has the distribution F ( · ) ; with the same shift F ( x ) ≡ G ( x )+ 1 / 2 , we get the equation G ( 3 2 x ) = 3 2 G ( x ) − 2 G 3 ( x ) , − ∞ < x < ∞ Moreover: The distribution function F ( · ) is strictly increasing throughout. The value 3/2 is inherent in the problem!

  15. Sicilian Median Selection 15 The proof of the Theorem uses the technical lemma Let a ∈ ( 0 , ∞ ) and φ that maps [ 0 , a ] into [ 0 , a ] Lemma For x > a we define φ ( x ) = x . Assume φ ( 0 ) = 0 ( i ) ( ii ) φ ( a ) = a ( iii ) φ ( x ) > x , for all x ∈ ( 0 , a ) . ( iv ) φ ′ ( 0 ) = µ > 1, and continuous there; φ ( · ) is continuous and strictly increasing on [ 0 , a ) . φ ( x ) < µx , x ∈ ( 0 , a ) . ( v ) Let φ r ( t ) = φ ( φ r − 1 ( t )) , the r th iterate of φ ( · ) . Then φ r ( x / µ r ) − as r − → ∞ , → ψ ( x ) , x ≥ 0 . ψ ( x ) is well defined, strictly monotonic increasing for all x , increases from 0 to a , and satisfies the equation ψ ( µx ) = φ ( ψ ( x )) . Proof: φ ( x / µ r + 1 ) < x / µ r , From Property ( v ) : Since iteration preserves monotonicity, φ r + 1 ( x / µ r + 1 ) = φ r ( φ ( x / µ r + 1 )) < φ r ( x / µ r ) . Hence a limit ψ ( · ) exists.

  16. 16 The properties of ψ ( x ) depend on the behavior of φ ( · ) near x = 0. Since φ ′ ( x ) is continuous at x = 0, ψ ( · ) is continuous throughout. Since it is bounded, the convergence is uniform on [ 0 , ∞ ] . Hence, since φ ( · ) and all its iterates are strictly monotonic, so is ψ ( · ) itself. We have then the equation G ( 3 2 x ) = 3 2 G ( x ) − 2 G 3 ( x ) , − ∞ < x < ∞ but we have no explicit solution for it. What can we do? Several things. We can calculate a power expansion for it; From G 0 ( · ) and the iteration, all G r ( · ) are odd, hence we can write G ( x ) = ∑ b k x 2 k − 1 . k ≥ 1 b 1 is avaiable from the iteration: The derivatives of G r ( x / µ r ) are all 1, hence this is also the derivative there of G ( x ) . Successive calculations are easy:

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend