Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - - PowerPoint PPT Presentation

order statistics
SMART_READER_LITE
LIVE PREVIEW

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - - PowerPoint PPT Presentation

CS 5633 -- Spring 2008 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk 2/12/08 CS 5633 Analysis of Algorithms 1 Order statistics Select the i th smallest of n elements (the element with


slide-1
SLIDE 1

CS 5633 Analysis of Algorithms 1 2/12/08

CS 5633 -- Spring 2008

Order Statistics

Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk

slide-2
SLIDE 2

CS 5633 Analysis of Algorithms 2 2/12/08

Order statistics

Select the ith smallest of n elements (the element with rank i).

  • i = 1: minimum;
  • i = n: maximum;
  • i = (n+1)/2 or (n+1)/2: median.

Naive algorithm: Sort and index ith element. Worst-case running time = Θ(n log n) + Θ(1) = Θ(n log n), using merge sort or heapsort (not quicksort).

slide-3
SLIDE 3

CS 5633 Analysis of Algorithms 3 2/12/08

Randomized divide-and- conquer algorithm

RAND-SELECT(A, p, q, i)

⊳ ith smallest of A[p . . q]

if p = q then return A[p] r ← RAND-PARTITION(A, p, q) k ← r – p + 1

⊳ k = rank(A[r])

if i = k then return A[r] if i < k then return RAND-SELECT(A, p, r – 1, i) else return RAND-SELECT(A, r + 1, q, i – k) ≤ A[r] ≤ A[r] ≥ A[r] ≥ A[r] r p q k

slide-4
SLIDE 4

CS 5633 Analysis of Algorithms 4 2/12/08

Example

pivot i = 7 6 6 10 10 13 13 5 5 8 8 3 3 2 2 11 11 k = 4 Select the 7 – 4 = 3rd smallest recursively. Select the i = 7th smallest: 2 2 5 5 3 3 6 6 8 8 13 13 10 10 11 11 Partition:

slide-5
SLIDE 5

CS 5633 Analysis of Algorithms 5 2/12/08

Intuition for analysis

Lucky: 1

1 log

9 / 10

= = n n CASE 3 T(n) = T(9n/10) + Θ(n) = Θ(n) Unlucky: T(n) = T(n – 1) + Θ(n) = Θ(n2) arithmetic series Worse than sorting! (All our analyses today assume that all elements are distinct.)

slide-6
SLIDE 6

CS 5633 Analysis of Algorithms 6 2/12/08

Analysis of expected time

Let T(n) = the random variable for the running time of RAND-SELECT on an input of size n, assuming random numbers are independent. For k = 0, 1, …, n–1, define the indicator random variable Xk = 1 if PARTITION generates a k : n–k–1 split, 0 otherwise. The analysis follows that of randomized quicksort, but it’s a little different.

slide-7
SLIDE 7

CS 5633 Analysis of Algorithms 7 2/12/08

Analysis (continued)

T(n) = T(max{0, n–1}) + Θ(n) if 0 : n–1 split, T(max{1, n–2}) + Θ(n) if 1 : n–2 split, M T(max{n–1, 0}) + Θ(n) if n–1 : 0 split,

( )

− =

Θ + − − =

1

) ( }) 1 , (max{

n k k

n k n k T X

.

To obtain an upper bound, assume that the i th element always falls in the larger side of the partition:

( )

 

− =

Θ + ≤

1 2 /

) ( ) ( 2

n n k k

n k T X

slide-8
SLIDE 8

CS 5633 Analysis of Algorithms 8 2/12/08

Calculating expectation

Take expectations of both sides.

( )

 

      Θ + =

− = 1 2 /

) ( ) ( 2 )] ( [

n n k k

n k T X E n T E

slide-9
SLIDE 9

CS 5633 Analysis of Algorithms 9 2/12/08

Calculating expectation

Linearity of expectation.

( )

 

( )

[ ]

 

∑ ∑

− = − =

Θ + =       Θ + =

1 2 / 1 2 /

) ( ) ( 2 ) ( ) ( 2 )] ( [

n n k k n n k k

n k T X E n k T X E n T E

slide-10
SLIDE 10

CS 5633 Analysis of Algorithms 10 2/12/08

Calculating expectation

Independence of Xk from other random choices.

( )

 

( )

[ ]

 

[ ] [ ]

 

∑ ∑ ∑

− = − = − =

Θ + ⋅ = Θ + =       Θ + =

1 2 / 1 2 / 1 2 /

) ( ) ( 2 ) ( ) ( 2 ) ( ) ( 2 )] ( [

n n k k n n k k n n k k

n k T E X E n k T X E n k T X E n T E

slide-11
SLIDE 11

CS 5633 Analysis of Algorithms 11 2/12/08

Calculating expectation

Linearity of expectation; E[Xk] = 1/n.

( )

 

( )

[ ]

 

[ ] [ ]

 

[ ]

   

∑ ∑ ∑ ∑ ∑

− = − = − = − = − =

Θ + = Θ + ⋅ = Θ + =       Θ + =

1 2 / 1 2 / 1 2 / 1 2 / 1 2 /

) ( 2 ) ( 2 ) ( ) ( 2 ) ( ) ( 2 ) ( ) ( 2 )] ( [

n n k n n k n n k k n n k k n n k k

n n k T E n n k T E X E n k T X E n k T X E n T E

slide-12
SLIDE 12

CS 5633 Analysis of Algorithms 12 2/12/08

Calculating expectation

( )

 

( )

[ ]

 

[ ] [ ]

 

[ ]

   

[ ]

 

) ( ) ( 2 ) ( 2 ) ( 2 ) ( ) ( 2 ) ( ) ( 2 ) ( ) ( 2 )] ( [

1 2 / 1 2 / 1 2 / 1 2 / 1 2 / 1 2 /

n k T E n n n k T E n n k T E X E n k T X E n k T X E n T E

n n k n n k n n k n n k k n n k k n n k k

Θ + = Θ + = Θ + ⋅ = Θ + =       Θ + =

∑ ∑ ∑ ∑ ∑ ∑

− = − = − = − = − = − =

slide-13
SLIDE 13

CS 5633 Analysis of Algorithms 13 2/12/08

Hairy recurrence

[ ]

 

) ( ) ( 2 )] ( [

1 2 /

n k T E n n T E

n n k

Θ + =

− =

Prove: E[T(n)] ≤ cn for constant c > 0. Use fact:

 

2 1 2 / 8 3n

k

n n k ∑ − =

≤ (exercise).

  • The constant c can be chosen large enough

so that E[T(n)] ≤ cn for the base cases. (But not quite as hairy as the quicksort one.)

slide-14
SLIDE 14

CS 5633 Analysis of Algorithms 14 2/12/08

Substitution method

[ ]

 

) ( 2 ) (

1 2 /

n ck n n T E

n n k

Θ + ≤

− =

Substitute inductive hypothesis.

slide-15
SLIDE 15

CS 5633 Analysis of Algorithms 15 2/12/08

Substitution method

[ ]

 

) ( 8 3 2 ) ( 2 ) (

2 1 2 /

n n n c n ck n n T E

n n k

Θ +       ≤ Θ + ≤

− =

Use fact.

slide-16
SLIDE 16

CS 5633 Analysis of Algorithms 16 2/12/08

Substitution method

Express as desired – residual.

[ ]

 

      Θ − − = Θ +       ≤ Θ + ≤

− =

) ( 4 ) ( 8 3 2 ) ( 2 ) (

2 1 2 /

n cn cn n n n c n ck n n T E

n n k

slide-17
SLIDE 17

CS 5633 Analysis of Algorithms 17 2/12/08

Substitution method

[ ]

 

cn n cn cn n n n c n ck n n T E

n n k

≤       Θ − − = Θ +       ≤ Θ + ≤

− =

) ( 4 ) ( 8 3 2 ) ( 2 ) (

2 1 2 /

if c is chosen large enough so that cn/4 dominates the Θ(n). ,

slide-18
SLIDE 18

CS 5633 Analysis of Algorithms 18 2/12/08

Summary of randomized

  • rder-statistic selection
  • Works fast: linear expected time.
  • Excellent algorithm in practice.
  • But, the worst case is very bad: Θ(n2).
  • Q. Is there an algorithm that runs in linear

time in the worst case? IDEA: Generate a good pivot recursively.

  • A. Yes, due to Blum, Floyd, Pratt, Rivest,

and Tarjan [1973].

slide-19
SLIDE 19

CS 5633 Analysis of Algorithms 19 2/12/08

Worst-case linear-time order statistics

if i = k then return x elseif i < k then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

  • 2. Recursively SELECT the median x of the n/5

group medians to be the pivot.

  • 3. Partition around the pivot x. Let k = rank(x).

4. Same as RAND- SELECT

slide-20
SLIDE 20

CS 5633 Analysis of Algorithms 20 2/12/08

Choosing the pivot

slide-21
SLIDE 21

CS 5633 Analysis of Algorithms 21 2/12/08

Choosing the pivot

  • 1. Divide the n elements into groups of 5.
slide-22
SLIDE 22

CS 5633 Analysis of Algorithms 22 2/12/08

Choosing the pivot

lesser greater

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

slide-23
SLIDE 23

CS 5633 Analysis of Algorithms 23 2/12/08

Choosing the pivot

lesser greater

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

  • 2. Recursively SELECT the median x of the n/5

group medians to be the pivot. x

slide-24
SLIDE 24

CS 5633 Analysis of Algorithms 24 2/12/08

Developing the recurrence

if i = k then return x elseif i < k then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

  • 2. Recursively SELECT the median x of the n/5

group medians to be the pivot.

  • 3. Partition around the pivot x. Let k = rank(x).

4. T(n) Θ(n) T(n/5) Θ(n) T( )

?

slide-25
SLIDE 25

CS 5633 Analysis of Algorithms 25 2/12/08

Analysis

lesser greater

x At least half the group medians are ≤ x, which is at least  n/5 /2 = n/10 group medians. (Assume all elements are distinct.)

slide-26
SLIDE 26

CS 5633 Analysis of Algorithms 26 2/12/08

Analysis

lesser greater

x At least half the group medians are ≤ x, which is at least  n/5 /2 = n/10 group medians.

  • Therefore, at least 3n/10 elements are ≤ x.

(Assume all elements are distinct.)

slide-27
SLIDE 27

CS 5633 Analysis of Algorithms 27 2/12/08

Analysis

lesser greater

x At least half the group medians are ≤ x, which is at least  n/5 /2 = n/10 group medians.

  • Therefore, at least 3n/10 elements are ≤ x.
  • Similarly, at least 3n/10 elements are ≥ x.

(Assume all elements are distinct.)

slide-28
SLIDE 28

CS 5633 Analysis of Algorithms 28 2/12/08

  • At least 3n/10 elements are ≤ x

⇒ at most n-3n/10 elements are ≥ x

  • At least 3n/10 elements are ≥ x

⇒ at most n-3n/10 elements are ≤ x

  • The recursive call to SELECT in Step 4 is

executed recursively on n-3n/10 elements.

Analysis (Assume all elements are distinct.)

Need “at most” for worst-case runtime

slide-29
SLIDE 29

CS 5633 Analysis of Algorithms 29 2/12/08

  • Use fact that a/b ≥ ((a-(b-1))/b (page 51)
  • n-3n/10 ≤ n-3·(n-9)/10 = (10n -3n +27)/10

≤ 7n/10 + 3

  • The recursive call to SELECT in Step 4 is

executed recursively on at most 7n/10+3 elements.

Analysis (Assume all elements are distinct.)

slide-30
SLIDE 30

CS 5633 Analysis of Algorithms 30 2/12/08

Developing the recurrence

if i = k then return x elseif i < k then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

  • 2. Recursively SELECT the median x of the n/5

group medians to be the pivot.

  • 3. Partition around the pivot x. Let k = rank(x).

4. T(n) Θ(n) T(n/5) Θ(n) T(7n/10 +3)

slide-31
SLIDE 31

CS 5633 Analysis of Algorithms 31 2/12/08

Solving the recurrence

dn n T n T n T +       + +       = 3 10 7 5 1 ) ( if c is chosen large enough, e.g., c=10d

) 4 ( 10 1 ) 4 ( 4 10 9 ) 4 3 10 7 ( ) 4 5 1 ( ) ( − ≤ + − − = + − ≤ + − + + − ≤ n c dn cn n c dn c cn dn n c n c n T

,

Substitution: T(n) ≤ c(n - 4)

for Θ(n)

Technical trick. This shows that T(n)∈ O(n)

slide-32
SLIDE 32

CS 5633 Analysis of Algorithms 32 2/12/08

Conclusions

  • Since the work at each level of recursion is

basically a constant fraction (9/10) smaller, the work per level is a geometric series dominated by the linear work at the root.

  • In practice, this algorithm runs slowly,

because the constant in front of n is large.

  • The randomized algorithm is far more

practical. Exercise: Try to divide into groups of 3 or 7.