Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - - PowerPoint PPT Presentation

order statistics
SMART_READER_LITE
LIVE PREVIEW

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - - PowerPoint PPT Presentation

CMPS 6610/4610 Fall 2016 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk CMPS 6610/4610 Algorithms 1 Order statistics Select the i th smallest of n elements (the element with rank i ). i


slide-1
SLIDE 1

CMPS 6610/4610 Algorithms 1

CMPS 6610/4610 – Fall 2016

Order Statistics

Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk

slide-2
SLIDE 2

Order statistics

Select the ith smallest of n elements (the element with rank i).

  • i = 1: minimum;
  • i = n: maximum;
  • i = (n+1)/2 or (n+1)/2: median.

Naive algorithm: Sort and index ith element. Worst-case running time = (n log n + 1) = (n log n), using merge sort (not quicksort).

CMPS 6610/4610 Algorithms 2

slide-3
SLIDE 3

Randomized divide-and- conquer algorithm

RAND-SELECT(A, p, q, i)

i-th smallest of A[ p . . q]

if p = q then return A[p] r  RAND-PARTITION(A, p, q) k  r – p + 1 k = rank(A[r]) if i = k then return A[r] if i < k then return RAND-SELECT(A, p, r – 1, i) else return RAND-SELECT(A, r + 1, q, i – k)  A[r]  A[r] r p q k

CMPS 6610/4610 Algorithms 3

slide-4
SLIDE 4

Example

pivot i = 7 6 10 13 5 8 3 2 11 k = 4 Select the 7 – 4 = 3rd smallest recursively. Select the i = 7th smallest: 2 5 3 6 8 13 10 11 Partition:

CMPS 6610/4610 Algorithms 4

slide-5
SLIDE 5

Intuition for analysis

Lucky: 1

1 log

3 / 4

  n n CASE 3 T(n) = T(3n/4) + dn = (n) Unlucky: T(n) = T(n – 1) + dn = (n2) arithmetic series Worse than sorting! (All our analyses today assume that all elements are distinct.)

for RAND-PARTITION

CMPS 6610/4610 Algorithms 5

slide-6
SLIDE 6

Analysis of expected time

  • Call a pivot good if its rank lies in [n/4,3n/4].
  • How many good pivots are there?

 A random pivot has 50% chance of being good.

  • Let T(n,s) be the runtime random variable

T(n,s)  T(3n/4,s) + X(s)dn

time to reduce array size to  3/4n #times it takes to find a good pivot

n/2

Runtime of partition

CMPS 6610/4610 Algorithms 6

slide-7
SLIDE 7

Analysis of expected time

Lemma: A fair coin needs to be tossed an expected number of 2 times until the first “heads” is seen. Proof: Let E(X) be the expected number of tosses until the first “heads”is seen.

  • Need at least one toss, if it’s “heads” we are done.
  • If it’s “tails” we need to repeat (probability ½).

 E(X) = 1 + ½ E(X)  E(X) = 2

CMPS 6610/4610 Algorithms 7

slide-8
SLIDE 8

Analysis of expected time

T(n,s)  T(3n/4,s) + X(s)dn

time to reduce array size to  3/4n #times it takes to find a good pivot Runtime of partition

 E(T(n,s))  E(T(3n/4,s)) + E(X(s)dn)  E(T(n,s))  E(T(3n/4,s)) + E(X(s))dn  E(T(n,s))  E(T(3n/4,s)) + 2dn  Texp(n)  Texp(3n/4) + (n)  Texp(n)  (n)

Linearity of expectation Lemma

CMPS 6610/4610 Algorithms 8

slide-9
SLIDE 9

Summary of randomized

  • rder-statistic selection
  • Works fast: linear expected time.
  • Excellent algorithm in practice.
  • But, the worst case is very bad: (n2).
  • Q. Is there an algorithm that runs in linear

time in the worst case?

IDEA: Generate a good pivot recursively.

This algorithm has large constants though and therefore is not efficient in practice.

  • A. Yes, due to Blum, Floyd, Pratt, Rivest, and

Tarjan [1973].

CMPS 6610/4610 Algorithms 9

slide-10
SLIDE 10

Worst-case linear-time order statistics

if i = k then return x elseif i < k then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

  • 2. Recursively SELECT the median x of the n/5

group medians to be the pivot.

  • 3. Partition around the pivot x. Let k = rank(x).

4. Same as RAND- SELECT

CMPS 6610/4610 Algorithms 10

slide-11
SLIDE 11

Choosing the pivot

CMPS 6610/4610 Algorithms 11

slide-12
SLIDE 12

Choosing the pivot

  • 1. Divide the n elements into groups of 5.

CMPS 6610/4610 Algorithms 12

slide-13
SLIDE 13

Choosing the pivot

lesser greater

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

CMPS 6610/4610 Algorithms 13

slide-14
SLIDE 14

Choosing the pivot

lesser greater

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

  • 2. Recursively SELECT the median x of the n/5

group medians to be the pivot. x

CMPS 6610/4610 Algorithms 14

slide-15
SLIDE 15

Developing the recurrence

if i = k then return x elseif i < k then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

  • 2. Recursively SELECT the median x of the n/5

group medians to be the pivot.

  • 3. Partition around the pivot x. Let k = rank(x).

4. T(n) (n) T(n/5) (n) T( )

?

CMPS 6610/4610 Algorithms 15

slide-16
SLIDE 16

Analysis

lesser greater

x At least half the group medians are  x, which is at least  n/5 /2 = n/10 group medians. (Assume all elements are distinct.)

CMPS 6610/4610 Algorithms 16

slide-17
SLIDE 17

Analysis

lesser greater

x At least half the group medians are  x, which is at least  n/5 /2 = n/10 group medians.

  • Therefore, at least 3n/10elements are  x.

(Assume all elements are distinct.)

CMPS 6610/4610 Algorithms 17

slide-18
SLIDE 18

Analysis

lesser greater

x At least half the group medians are  x, which is at least  n/5 /2 = n/10 group medians.

  • Therefore, at least 3n/10elements are  x.
  • Similarly, at least 3n/10elements are  x.

(Assume all elements are distinct.)

CMPS 6610/4610 Algorithms 18

slide-19
SLIDE 19
  • At least 3n/10elements are x

 at most n-3n/10elements are  x

  • At least 3n/10elements are  x

 at most n-3n/10elements are  x

  • The recursive call to SELECT in Step 4 is

executed recursively on n-3n/10elements.

Analysis (Assume all elements are distinct.)

Need “at most” for worst-case runtime

CMPS 6610/4610 Algorithms 19

slide-20
SLIDE 20
  • Use fact that a/b  (a-(b-1))/b (page 51)
  • n-3n/10 n-3·(n-9)/10 = (10n -3n +27)/10

 7n/10 + 3

  • The recursive call to SELECT in Step 4 is

executed recursively on at most 7n/10+3 elements.

Analysis (Assume all elements are distinct.)

CMPS 6610/4610 Algorithms 20

slide-21
SLIDE 21

Developing the recurrence

if i = k then return x elseif i < k then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

  • 2. Recursively SELECT the median x of the n/5

group medians to be the pivot.

  • 3. Partition around the pivot x. Let k = rank(x).

4. T(n) (n) T(n/5) (n) T(7n/10 +3)

CMPS 6610/4610 Algorithms 21

slide-22
SLIDE 22

Solving the recurrence

dn n T n T n T                 3 10 7 5 1 ) ( if c is chosen large enough, e.g., c=10d

) 3 ( 10 1 ) 3 ( 3 10 9 ) 3 3 10 7 ( ) 3 5 1 ( ) (                n c dn cn n c dn c cn dn n c n c n T

Big-Oh Induction: T(n)  c(n - 3)

for (n)

Technical trick. This shows that T(n) O(n)

CMPS 6610/4610 Algorithms 22

,

slide-23
SLIDE 23

Conclusions

  • Since the work at each level of recursion is

basically a constant fraction (9/10) smaller, the work per level is a geometric series dominated by the linear work at the root.

  • In practice, this algorithm runs slowly,

because the constant in front of n is large.

  • The randomized algorithm is far more

practical. Exercise: Try to divide into groups of 3 or 7.

CMPS 6610/4610 Algorithms 23