Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - - PowerPoint PPT Presentation

order statistics
SMART_READER_LITE
LIVE PREVIEW

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - - PowerPoint PPT Presentation

CS 3343 Fall 2011 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with small y changes by Carola Wenk 10/6/11 1 CS 3343 Analysis of Algorithms Order statistics Order statistics Select the i th smallest of n elements


slide-1
SLIDE 1

CS 3343 – Fall 2011

Order Statistics

Carola Wenk Slides courtesy of Charles Leiserson with small

10/6/11 CS 3343 Analysis of Algorithms 1

y changes by Carola Wenk

slide-2
SLIDE 2

Order statistics Order statistics

Select the ith smallest of n elements (the element with rank i).

  • i = 1: minimum;
  • i = n: maximum;
  • i = ⎣(n+1)/2⎦ or ⎡(n+1)/2⎤: median.

( ) ( ) Naive algorithm: Sort and index ith element. W t i ti Θ( l + 1) Worst-case running time = Θ(n log n + 1) = Θ(n log n), i t h t ( t i k t)

10/6/11 CS 3343 Analysis of Algorithms 2

using merge sort or heapsort (not quicksort).

slide-3
SLIDE 3

Randomized divide-and- l ith conquer algorithm

RAND-SELECT(A, p, q, i)

i-th smallest of A[ p . . q]

( p q )

[ p q]

if p = q then return A[p] r ← RAND-PARTITION(A, p, q) k ← + 1 k k(A[ ]) k ← r – p + 1 k = rank(A[r]) if i = k then return A[r] if i < k if i < k then return RAND-SELECT(A, p, r – 1, i) else return RAND-SELECT(A, r + 1, q, i – k) ≤ A[r] ≥ A[r] k

10/6/11 CS 3343 Analysis of Algorithms 3

r p q

slide-4
SLIDE 4

Example Example

Select the i = 7th smallest: i = 7 6 10 13 5 8 3 2 11 Select the i = 7th smallest: pivot P i i k = 4 2 5 3 6 8 13 10 11 Partition: Select the 7 – 4 = 3rd smallest recursively.

10/6/11 CS 3343 Analysis of Algorithms 4

Select the 7 4 3rd smallest recursively.

slide-5
SLIDE 5

Intuition for analysis Intuition for analysis

(All our analyses today assume that all elements Lucky: ( y y are distinct.)

for RAND-PARTITION

Lucky: 1

1 log

9 / 10

= = n n CASE 3 T(n) = T(9n/10) + dn = Θ(n) CASE 3 Θ(n) Unlucky: T(n) = T(n – 1) + dn arithmetic series T(n) T(n 1) + dn = Θ(n2) arithmetic series Worse than sorting!

10/6/11 CS 3343 Analysis of Algorithms 5

Worse than sorting!

slide-6
SLIDE 6

Analysis of expected time Analysis of expected time

The analysis follows that of randomized Let T(n) = the random variable for the running The analysis follows that of randomized quicksort, but it’s a little different. Let T(n) = the random variable for the running time of RAND-SELECT on an input of size n, assuming random numbers are independent. assuming random numbers are independent. For k = 0, 1, …, n–1, define the indicator random variable random variable Xk = 1 if PARTITION generates a k : n–k–1 split, 0 otherwise

10/6/11 CS 3343 Analysis of Algorithms 6

k

0 otherwise.

slide-7
SLIDE 7

Analysis (continued) Analysis (continued)

To obtain an upper bound, assume that the i th element T(max{0, n–1}) + dn if 0 : n–1 split, pp always falls in the larger side of the partition: T(n) = T(max{1, n–2}) + dn if 1 : n–2 split, M ( { 1 0}) d if 1 0 li T(max{n–1, 0}) + dn if n–1 : 0 split,

( )

+ − − =

1

}) 1 (max{

n

dn k n k T X (

)

=

+ − − = }) 1 , (max{

k k

dn k n k T X

.

( )

+ ≤

1

) ( 2

n k

dn k T X

10/6/11 CS 3343 Analysis of Algorithms 7

( )

⎣ ⎦

= 2 /

) (

n k k

slide-8
SLIDE 8

Calculating expectation Calculating expectation

( )⎥

⎤ ⎢ ⎡ + =

−1

) ( 2 )] ( [

n k

dn k T X E n T E

Take expectations of both sides.

( )

⎣ ⎦

⎥ ⎦ ⎢ ⎣ +

= 2 /

) ( 2 )] ( [

n k k

dn k T X E n T E

Take expectations of both sides.

10/6/11 CS 3343 Analysis of Algorithms 8

slide-9
SLIDE 9

Calculating expectation Calculating expectation

( )

⎥ ⎤ ⎢ ⎡ + =

1

) ( 2 )] ( [

n k

dn k T X E n T E

( )

⎣ ⎦

( )

[ ]

∑ ∑

− =

+ = ⎥ ⎦ ⎢ ⎣ +

1 2 /

) ( 2 ) ( 2 )] ( [

n n k k

dn k T X E dn k T X E n T E

Linearity of expectation.

( )

[ ]

⎣ ⎦

=

+ =

2 /

) ( 2

n k k

dn k T X E

y p

10/6/11 CS 3343 Analysis of Algorithms 9

slide-10
SLIDE 10

Calculating expectation Calculating expectation

( )

⎥ ⎤ ⎢ ⎡ + =

1

) ( 2 )] ( [

n k

dn k T X E n T E

( )

⎣ ⎦

( )

[ ]

∑ ∑

− =

+ = ⎥ ⎦ ⎢ ⎣ +

1 2 /

) ( 2 ) ( 2 )] ( [

n n k k

dn k T X E dn k T X E n T E

( )

[ ]

⎣ ⎦

[ ] [ ]

∑ ∑

− =

+ ⋅ = + =

1 2 /

) ( 2 ) ( 2

n n k k

dn k T E X E dn k T X E

Independence of Xk from other random

[ ] [ ]

⎣ ⎦

=

+ ⋅ =

2 /

) ( 2

n k k

dn k T E X E

p

k

choices.

10/6/11 CS 3343 Analysis of Algorithms 10

slide-11
SLIDE 11

Calculating expectation Calculating expectation

( )

⎥ ⎤ ⎢ ⎡ + =

1

) ( 2 )] ( [

n k

dn k T X E n T E

( )

⎣ ⎦

( )

[ ]

∑ ∑

− =

+ = ⎥ ⎦ ⎢ ⎣ +

1 2 /

) ( 2 ) ( 2 )] ( [

n n k k

dn k T X E dn k T X E n T E

( )

[ ]

⎣ ⎦

[ ] [ ]

∑ ∑

− =

+ ⋅ = + =

1 2 /

) ( 2 ) ( 2

n n k k

dn k T E X E dn k T X E

[ ] [ ]

⎣ ⎦

[ ]

∑ ∑ ∑

− − =

+ = + ⋅ =

1 1 2 /

2 ) ( 2 ) ( 2

n n n k k

dn k T E dn k T E X E

Linearity of expectation; E[Xk] = 1/n.

[ ]

⎣ ⎦ ⎣ ⎦

∑ ∑

= =

+ =

2 / 2 /

) (

n k n k

dn n k T E n

10/6/11 CS 3343 Analysis of Algorithms 11

Linearity of expectation; E[Xk] 1/n.

slide-12
SLIDE 12

Calculating expectation Calculating expectation

( )

dn k T X E n T E

n k

⎥ ⎤ ⎢ ⎡ + =

−1

) ( 2 )] ( [

( )

⎣ ⎦

( )

[ ]

dn k T X E dn k T X E n T E

n n k k

+ = ⎥ ⎦ ⎢ ⎣ +

∑ ∑

− = 1 2 /

) ( 2 ) ( 2 )] ( [

( )

[ ]

⎣ ⎦

[ ] [ ]

dn k T E X E dn k T X E

n n k k

+ ⋅ = + =

∑ ∑

− = 1 2 /

) ( 2 ) ( 2

[ ] [ ]

⎣ ⎦

[ ]

dn k T E dn k T E X E

n n n k k

+ = + ⋅ =

∑ ∑ ∑

− − = 1 1 2 /

2 ) ( 2 ) ( 2

[ ]

⎣ ⎦ ⎣ ⎦

[ ] dn

k T E dn n k T E n

n n k n k

+ = + =

∑ ∑ ∑

− = = 1 2 / 2 /

) ( 2 ) (

10/6/11 CS 3343 Analysis of Algorithms 12

[ ]

⎣ ⎦

dn k T E n

n k

+ =

= 2 /

) (

slide-13
SLIDE 13

Hairy recurrence Hairy recurrence

(But not quite as hairy as the quicksort one.)

[ ]

⎣ ⎦

dn k T E n n T E

n k

+ =

−1 2 /

) ( 2 )] ( [

⎣ ⎦

n

n k= 2 /

Prove: E[T(n)] ≤ cn for constant c > 0.

  • The constant c can be chosen large enough

so that E[T(n)] ≤ cn for the base cases. Use fact:

2 1 8 3n

k

n

≤ (exercise).

10/6/11 CS 3343 Analysis of Algorithms 13

⎣ ⎦

2 / 8 n k ∑ =

( )

slide-14
SLIDE 14

Substitution method Substitution method

[ ]

dn ck n T E

n

+ ≤

−1

2 ) (

[ ]

⎣ ⎦

n

n k ∑ = 2 /

Substitute inductive hypothesis.

10/6/11 CS 3343 Analysis of Algorithms 14

slide-15
SLIDE 15

Substitution method Substitution method

[ ]

dn ck n T E

n

+ ≤

−1

2 ) (

[ ]

⎣ ⎦

dn n c n

n k

+ ⎟ ⎞ ⎜ ⎛ ≤

= 2 2 /

3 2 dn n n + ⎟ ⎠ ⎜ ⎝ ≤ 8

Use fact.

10/6/11 CS 3343 Analysis of Algorithms 15

slide-16
SLIDE 16

Substitution method Substitution method

[ ]

+ ≤

dn ck n T E

n

2 ) (

1

[ ]

⎣ ⎦

+ ⎟ ⎞ ⎜ ⎛ ≤

=

dn n c n

n k

3 2

2 2 /

⎟ ⎞ ⎜ ⎛ + ⎟ ⎠ ⎜ ⎝ ≤ d cn dn n n 8 ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − − = dn cn cn 4

Express as desired – residual.

10/6/11 CS 3343 Analysis of Algorithms 16

slide-17
SLIDE 17

Substitution method Substitution method

[ ]

dn ck n T E

n

+ ≤

2 ) (

1

[ ]

⎣ ⎦

dn n c n

n k

+ ⎟ ⎞ ⎜ ⎛ ≤

=

3 2

2 2 /

d cn dn n n ⎟ ⎞ ⎜ ⎛ + ⎟ ⎠ ⎜ ⎝ ≤ 8 cn dn cn cn ≤ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − − = 4 cn ≤

if c ≥ 4d . ,

10/6/11 CS 3343 Analysis of Algorithms 17

slide-18
SLIDE 18

Summary of randomized d i i l i

  • rder-statistic selection
  • Works fast: linear expected time
  • Works fast: linear expected time.
  • Excellent algorithm in practice.
  • But the worst case is very bad: Θ(n2)
  • But, the worst case is very bad: Θ(n ).
  • Q. Is there an algorithm that runs in linear

i i h ? time in the worst case?

  • A. Yes, due to Blum, Floyd, Pratt, Rivest,

d T j [1973] IDEA: Generate a good pivot recursively. and Tarjan [1973].

10/6/11 CS 3343 Analysis of Algorithms 18

g p y

slide-19
SLIDE 19

Worst-case linear-time order i i statistics

SELECT(i, n)

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote. 2 Recursively SELECT the median x of the ⎣n/5⎦

  • 2. Recursively SELECT the median x of the ⎣n/5⎦

group medians to be the pivot.

  • 3. Partition around the pivot x. Let k = rank(x).

if i = k then return x elseif i < k then recursively SELECT the ith p ( ) 4. Same as RAND then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th RAND- SELECT

10/6/11 CS 3343 Analysis of Algorithms 19

smallest element in the upper part

slide-20
SLIDE 20

Choosing the pivot Choosing the pivot

10/6/11 CS 3343 Analysis of Algorithms 20

slide-21
SLIDE 21

Choosing the pivot Choosing the pivot

1 Divide the n elements into groups of 5

  • 1. Divide the n elements into groups of 5.

10/6/11 CS 3343 Analysis of Algorithms 21

slide-22
SLIDE 22

Choosing the pivot Choosing the pivot

lesser

1 Divide the n elements into groups of 5 Find

lesser

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

10/6/11 CS 3343 Analysis of Algorithms 22

greater

slide-23
SLIDE 23

Choosing the pivot Choosing the pivot

x

lesser

1 Divide the n elements into groups of 5 Find

lesser

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.

  • 2. Recursively SELECT the median x of the ⎣n/5⎦

10/6/11 CS 3343 Analysis of Algorithms 23

greater

y group medians to be the pivot.

slide-24
SLIDE 24

Developing the recurrence Developing the recurrence

SELECT(i, n)

T(n)

( , )

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote. 2 R i l S h di f h ⎣ /5⎦ ( ) Θ(n)

  • 2. Recursively SELECT the median x of the ⎣n/5⎦

group medians to be the pivot. 3 Partition around the pivot x Let k = rank(x) T(n/5) Θ(n) if i = k then return x elseif i < k h i l S h i h

  • 3. Partition around the pivot x. Let k

rank(x). 4. Θ(n) then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th T( )

?

10/6/11 CS 3343 Analysis of Algorithms 24

y ( ) smallest element in the upper part

slide-25
SLIDE 25

Analysis (Assume all elements are distinct ) Analysis (Assume all elements are distinct.)

x

lesser

At least half the group medians are ≤ x which

lesser

At least half the group medians are ≤ x, which is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians.

10/6/11 CS 3343 Analysis of Algorithms 25

greater

slide-26
SLIDE 26

Analysis (Assume all elements are distinct ) Analysis (Assume all elements are distinct.)

x

lesser

At least half the group medians are ≤ x which

lesser

At least half the group medians are ≤ x, which is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians.

  • Therefore, at least 3⎣n/10⎦ elements are ≤ x.

10/6/11 CS 3343 Analysis of Algorithms 26

greater

slide-27
SLIDE 27

Analysis (Assume all elements are distinct ) Analysis (Assume all elements are distinct.)

x

lesser

At least half the group medians are ≤ x which

lesser

At least half the group medians are ≤ x, which is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians.

  • Therefore, at least 3⎣n/10⎦ elements are ≤ x.

10/6/11 CS 3343 Analysis of Algorithms 27

greater

  • Similarly, at least 3⎣n/10⎦ elements are ≥ x.
slide-28
SLIDE 28

Analysis (Assume all elements are distinct ) Analysis (Assume all elements are distinct.)

Need “at most” for worst-case runtime

  • At least 3⎣n/10⎦ elements are ≤ x

⇒ at most n-3⎣n/10⎦ elements are ≥ x ⇒ at most n 3⎣n/10⎦ elements are ≥ x

  • At least 3⎣n/10⎦ elements are ≥ x

⇒ at most n-3⎣n/10⎦ elements are ≤ x ⇒ at most n 3⎣n/10⎦ elements are ≤ x

  • The recursive call to SELECT in Step 4 is

executed recursively on n-3⎣n/10⎦ elements executed recursively on n-3⎣n/10⎦ elements.

10/6/11 CS 3343 Analysis of Algorithms 28

slide-29
SLIDE 29

Analysis (Assume all elements are distinct ) Analysis (Assume all elements are distinct.)

  • Use fact that ⎣a/b⎦ ≥ ((a-(b-1))/b (page 51)
  • n 3⎣n/10⎦ ≤ n 3·(n 9)/10 = (10n 3n +27)/10
  • n-3⎣n/10⎦ ≤ n-3·(n-9)/10 = (10n -3n +27)/10

≤ 7n/10 + 3 Th i ll t SELECT i St 4 i

  • The recursive call to SELECT in Step 4 is

executed recursively on at most 7n/10+3 elements elements.

10/6/11 CS 3343 Analysis of Algorithms 29

slide-30
SLIDE 30

Developing the recurrence Developing the recurrence

SELECT(i, n)

T(n)

( , )

  • 1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote. 2 R i l S h di f h ⎣ /5⎦ ( ) Θ(n)

  • 2. Recursively SELECT the median x of the ⎣n/5⎦

group medians to be the pivot. 3 Partition around the pivot x Let k = rank(x) T(n/5) Θ(n) if i = k then return x elseif i < k h i l S h i h

  • 3. Partition around the pivot x. Let k

rank(x). 4. Θ(n) then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th T(7n/10 +3)

10/6/11 CS 3343 Analysis of Algorithms 30

y ( ) smallest element in the upper part

slide-31
SLIDE 31

Solving the recurrence Solving the recurrence

dn n T n T n T + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = 3 10 7 5 1 ) (

for Θ(n)

⎠ ⎝ ⎠ ⎝ 10 5 ) (

) 3 3 7 ( ) 3 1 ( ) ( + − + + − ≤ dn n c n c n T

Substitution:

3 10 9 ) 10 ( ) 5 ( ) ( + − ≤ dn c cn

T(n) ≤ c(n - 3)

10 1 ) 3 ( 10 + − − = dn cn n c

Technical trick. This shows that T(n)∈ O(n)

if c is chosen large enough e g c=10d

) 3 ( 10 − ≤ n c

,

10/6/11 CS 3343 Analysis of Algorithms 31

if c is chosen large enough, e.g., c=10d

slide-32
SLIDE 32

Conclusions Conclusions

  • Since the work at each level of recursion is

basically a constant fraction (9/10) smaller, the work per level is a geometric series dominated by the linear work at the root.

  • In practice, this algorithm runs slowly,

because the constant in front of n is large.

  • The randomized algorithm is far more

g practical. Exercise: Try to divide into groups of 3 or 7

10/6/11 CS 3343 Analysis of Algorithms 32

Exercise: Try to divide into groups of 3 or 7.