order statistics
play

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - PowerPoint PPT Presentation

CMPS 6610/4610 Fall 2016 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk CMPS 6610/4610 Algorithms 1 Order statistics Select the i th smallest of n elements (the element with rank i ). i


  1. CMPS 6610/4610 – Fall 2016 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk CMPS 6610/4610 Algorithms 1

  2. Order statistics Select the i th smallest of n elements (the element with rank i ). • i = 1: minimum ; • i = n : maximum ; • i =  ( n +1)/2  or  ( n +1)/2  : median . Naive algorithm : Sort and index i th element. Worst-case running time =  ( n log n + 1) =  ( n log n ), using merge sort ( not quicksort). CMPS 6610/4610 Algorithms 2

  3. Randomized divide-and- conquer algorithm R AND -S ELECT ( A , p, q, i ) i- th smallest of A [ p . . q ] if p = q then return A [ p ] r  R AND -P ARTITION ( A , p, q ) k  r – p + 1 k = rank( A [ r ]) if i = k then return A [ r ] if i < k then return R AND -S ELECT ( A , p, r – 1 , i ) else return R AND -S ELECT ( A , r + 1 , q, i – k ) k  A [ r ]  A [ r ] p r q CMPS 6610/4610 Algorithms 3

  4. Example Select the i = 7th smallest: 6 10 13 5 8 3 2 11 i = 7 pivot Partition: 2 5 3 6 8 13 10 11 k = 4 Select the 7 – 4 = 3rd smallest recursively. CMPS 6610/4610 Algorithms 4

  5. Intuition for analysis (All our analyses today assume that all elements are distinct.) for R AND - P ARTITION Lucky:  n  log 1 0 n 1 T ( n ) = T (3 n /4) + dn 4 / 3 =  ( n ) C ASE 3 Unlucky: T ( n ) = T ( n – 1) + dn arithmetic series =  ( n 2 ) Worse than sorting! CMPS 6610/4610 Algorithms 5

  6. Analysis of expected time • Call a pivot good if its rank lies in [ n /4,3 n /4]. • How many good pivots are there? n /2  A random pivot has 50% chance of being good. • Let T ( n , s ) be the runtime random variable time to reduce array size to  3/4 n T ( n , s )  T (3 n /4, s ) + X(s)  dn #times it takes to Runtime of partition find a good pivot CMPS 6610/4610 Algorithms 6

  7. Analysis of expected time Lemma: A fair coin needs to be tossed an expected number of 2 times until the first “heads” is seen. Proof: Let E ( X ) be the expected number of tosses until the first “heads”is seen. • Need at least one toss, if it’s “heads” we are done. • If it’s “tails” we need to repeat (probability ½).  E ( X ) = 1 + ½ E ( X )  E ( X ) = 2 CMPS 6610/4610 Algorithms 7

  8. Analysis of expected time time to reduce array size to  3/4 n T ( n , s )  T (3 n /4, s ) + X(s)  dn #times it takes to Runtime of partition find a good pivot  E ( T ( n , s ))  E ( T (3 n /4, s )) + E (X(s)  dn ) Linearity of  E ( T ( n , s ))  E ( T (3 n /4, s )) + E (X(s))  dn expectation  E ( T ( n , s ))  E ( T (3 n /4, s )) + 2  dn Lemma  T exp (n)  T exp (3 n /4) +  ( n)  T exp (n)   ( n) CMPS 6610/4610 Algorithms 8

  9. Summary of randomized order-statistic selection • Works fast: linear expected time. • Excellent algorithm in practice. • But, the worst case is very bad:  ( n 2 ). Q. Is there an algorithm that runs in linear time in the worst case? A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973]. I DEA : Generate a good pivot recursively. This algorithm has large constants though and therefore is not efficient in practice. CMPS 6610/4610 Algorithms 9

  10. Worst-case linear-time order statistics S ELECT ( i, n ) 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2. Recursively S ELECT the median x of the  n /5  group medians to be the pivot. 3. Partition around the pivot x . Let k = rank( x ). 4. if i = k then return x Same as elseif i < k R AND - then recursively S ELECT the i th smallest element in the lower part S ELECT else recursively S ELECT the ( i–k )th smallest element in the upper part CMPS 6610/4610 Algorithms 10

  11. Choosing the pivot CMPS 6610/4610 Algorithms 11

  12. Choosing the pivot 1. Divide the n elements into groups of 5. CMPS 6610/4610 Algorithms 12

  13. Choosing the pivot lesser 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. greater CMPS 6610/4610 Algorithms 13

  14. Choosing the pivot x lesser 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2. Recursively S ELECT the median x of the  n /5  group medians to be the pivot. greater CMPS 6610/4610 Algorithms 14

  15. Developing the recurrence S ELECT ( i, n ) T ( n ) 1. Divide the n elements into groups of 5. Find  ( n ) the median of each 5-element group by rote. 2. Recursively S ELECT the median x of the  n /5  T ( n /5) group medians to be the pivot.  ( n ) 3. Partition around the pivot x . Let k = rank( x ). if i = k then return x 4. elseif i < k then recursively S ELECT the i th ? T ( ) smallest element in the lower part else recursively S ELECT the ( i–k )th smallest element in the upper part CMPS 6610/4610 Algorithms 15

  16. Analysis (Assume all elements are distinct.) x At least half the group medians are  x , which lesser is at least   n /5  /2  =  n /10  group medians. greater CMPS 6610/4610 Algorithms 16

  17. Analysis (Assume all elements are distinct.) x At least half the group medians are  x , which lesser is at least   n /5  /2  =  n /10  group medians. • Therefore, at least 3  n /10  elements are  x . greater CMPS 6610/4610 Algorithms 17

  18. Analysis (Assume all elements are distinct.) x At least half the group medians are  x , which lesser is at least   n /5  /2  =  n /10  group medians. • Therefore, at least 3  n /10  elements are  x . • Similarly, at least 3  n /10  elements are  x . greater CMPS 6610/4610 Algorithms 18

  19. Analysis (Assume all elements are distinct.) Need “at most” for worst-case runtime • At least 3  n /10  elements are  x  at most n -3  n /10  elements are  x • At least 3  n /10  elements are  x  at most n -3  n /10  elements are  x • The recursive call to S ELECT in Step 4 is executed recursively on n -3  n /10  elements. CMPS 6610/4610 Algorithms 19

  20. Analysis (Assume all elements are distinct.) • Use fact that  a / b   ( a -( b -1))/ b (page 51) • n -3  n /10  n -3·( n -9)/10 = (10 n -3 n +27)/10  7 n/ 10 + 3 • The recursive call to S ELECT in Step 4 is executed recursively on at most 7 n/ 10+3 elements. CMPS 6610/4610 Algorithms 20

  21. Developing the recurrence S ELECT ( i, n ) T ( n ) 1. Divide the n elements into groups of 5. Find  ( n ) the median of each 5-element group by rote. 2. Recursively S ELECT the median x of the  n /5  T ( n /5) group medians to be the pivot.  ( n ) 3. Partition around the pivot x . Let k = rank( x ). if i = k then return x 4. elseif i < k then recursively S ELECT the i th T (7 n /10 smallest element in the lower part +3) else recursively S ELECT the ( i–k )th smallest element in the upper part CMPS 6610/4610 Algorithms 21

  22. Solving the recurrence for  ( n )     1 7         T ( n ) T n T n 3 dn     5 10 1 7       Big-Oh Induction: T ( n ) c ( n 3 ) c ( n 3 3 ) dn 5 10 T ( n )  c ( n - 3) 9    cn 3 c dn 10 Technical trick. This 1 shows that T ( n )  O( n )     c ( n 3 ) cn dn 10   , c ( n 3 ) if c is chosen large enough, e.g., c= 10 d CMPS 6610/4610 Algorithms 22

  23. Conclusions • Since the work at each level of recursion is basically a constant fraction (9/10) smaller, the work per level is a geometric series dominated by the linear work at the root. • In practice, this algorithm runs slowly, because the constant in front of n is large. • The randomized algorithm is far more practical. Exercise: Try to divide into groups of 3 or 7. CMPS 6610/4610 Algorithms 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend