Randomized Algorithms Lecture 6: Coupon Collectors problem Sotiris - - PowerPoint PPT Presentation

randomized algorithms lecture 6 coupon collector s problem
SMART_READER_LITE
LIVE PREVIEW

Randomized Algorithms Lecture 6: Coupon Collectors problem Sotiris - - PowerPoint PPT Presentation

Randomized Algorithms Lecture 6: Coupon Collectors problem Sotiris Nikoletseas Professor CEID - ETY Course 2017 - 2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 1 / 16 Variance: key features Definition: ( x


slide-1
SLIDE 1

Randomized Algorithms Lecture 6: “Coupon Collector’s problem”

Sotiris Nikoletseas Professor

CEID - ETY Course 2017 - 2018

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 1 / 16

slide-2
SLIDE 2

Variance: key features

Definition: V ar(X) = E[(X − µ)2] =

  • x

(x − µ)2 Pr{X = x} where µ = E[X] =

  • x

x Pr{X = x} We call standard deviation of X the σ =

  • V ar(X)

Basic Properties:

(i) V ar(X) = E[X2] − E2[X] (ii) V ar(cX) = c2V ar(X), where c constant. (iii) V ar(X + c) = V ar(X), where c constant.

proof of (i): V ar(X) = E[(X − µ)2] = E[X2 − 2µX + µ2] = E[X2] + E[−2µX] + E[µ2] = E[X2] − 2µE[X] + µ2 = E[X2] − µ2

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 2 / 16

slide-3
SLIDE 3

On the Additivity of Variance

In general the variance of a sum of random variables is not equal to the sum of their variances However, variances do add for independent variables (i.e. mutually independent variables). Actually pairwise independence suffices.

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 3 / 16

slide-4
SLIDE 4

Conditional distributions

Let X, Y be discrete random variables. Their joint probability density function is f(x, y) = Pr{(X = x) ∩ (Y = y)} Clearly f1(x) = Pr{X = x} =

  • y

f(x, y) and f2(y) = Pr{Y = y} =

  • x

f(x, y) Also, the conditional probability density function is: f(x|y) = Pr{X = x|Y = y} = Pr{(X = x) ∩ (Y = y)} Pr{Y = y} = = f(x, y) f2(y) = f(x, y)

  • x f(x, y)

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 4 / 16

slide-5
SLIDE 5

Pairwise independence

Let random variables X1, X2, . . . , Xn. These are called pairwise independent iff for all i = j it is Pr{(Xi = x)|(Xj = y)} = Pr{Xi = x}, ∀x, y Equivalently, Pr{(Xi = x) ∩ (Xj = y)} = = Pr{Xi = x} · Pr{Xj = y}, ∀x, y Generalizing, the collection is k-wise independent iff, for every subset I ⊆ {1, 2, . . . , n} with |I| < k for every set of values {ai}, b and j / ∈ I, it is Pr

  • Xj = b|
  • i∈I

Xi = ai

  • = Pr{Xj = b}

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 5 / 16

slide-6
SLIDE 6

Mutual (or“full”) independence

The random variables X1, X2, . . . , Xn are mutually independent iff for any subset Xi1, Xi2, . . . , Xik, (2 ≤ k ≤ n) of them, it is Pr{(Xi1 = x1) ∩ (Xi2 = x2) ∩ · · · ∩ (Xik = xk)} = = Pr{Xi1 = x1} · Pr{Xi2 = x2} · · · Pr{Xik = xk} Example (for n = 3). Let A1, A2, A3 3 events. They are mutually independent iff all four equalities hold: Pr{A1A2} = Pr{A1} Pr{A2} (1) Pr{A2A3} = Pr{A2} Pr{A3} (2) Pr{A1A3} = Pr{A1} Pr{A3} (3) Pr{A1A2A3} = Pr{A1} Pr{A2} Pr{A3} (4) They are called pairwise independent if (1), (2), (3) hold.

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 6 / 16

slide-7
SLIDE 7

The Coupon Collector’s problem

There are n distinct coupons and at each trial a coupon is chosen uniformly at random, independently of previous trials. Let m the number of trials. Goal: establish relationships between the number m of trials and the probability of having chosen each one of the n coupons at least once. Note: the problem is similar to occupancy (number of balls so that no bin is empty).

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 7 / 16

slide-8
SLIDE 8

The expected number of trials needed (I)

Let X the number of trials (a random variable) needed to collect all coupons at least once each. Let C1, C2, . . . , CX the sequence of trials, where Ci ∈ {1, . . . , n} denotes the coupon type chosen at trial i. We call the ith trial a success if coupon type chosen at Ci was not drawn in any of the first i − 1 trials (obviously C1 and CX are always successes). We divide the sequence of trials into epochs, where epoch i begins with the trial following the ith success and ends with the trial at which the (i + 1)st success takes place. Let r.v. Xi(0 ≤ i ≤ n − 1) be the number of trials in the ith epoch.

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 8 / 16

slide-9
SLIDE 9

The expected number of trials needed (II)

Clearly, X =

n−1

  • i=0

Xi Let pi the probability of success at any trial of the ith

  • epoch. This is the probability of choosing one of the n − i

remaining coupon types, so: pi = n−i

n

Clearly, Xi follows a geometric distribution with parameter pi, so E[Xi] = 1

pi and V ar(Xi) = 1−pi p2

i

By linearity of expectation: E[X] = E n−1

  • i=0

Xi

  • =

n−1

  • i=0

E[Xi] =

n−1

  • i=0

n n − i = n

n

  • i=1

1 i = = nHn But Hn ∼ ln n + Θ(1) ⇒ E[X] ∼ n ln n + Θ(n)

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 9 / 16

slide-10
SLIDE 10

The variance of the number of needed trials

Since the Xi’s are independent, we have: V ar(X) =

n−1

  • i=0

V ar(Xi) =

n−1

  • i=0

ni (n − i)2 =

n

  • i=1

n(n − i) i2 = = n2

n

  • i=1

1 i2 − n

n

  • i=1

1 i Since lim

n→∞ n

  • i=1

1 i2 = π2 6 we get V ar(X) ∼ π2

6 n2

Concentration around the expectation The Chebyshev inequality does not provide a strong result: For β > 1, Pr{X > βn ln n} = Pr{X − n ln n > (β − 1)n ln n} ≤ Pr{|X − n ln n| > (β − 1)n ln n} ≤

V ar(X) (β−1)2n2 ln2 n

n2 n2 ln2 n = 1 ln2 n

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 10 / 16

slide-11
SLIDE 11

Stronger concentration around the expectation

Let Er

i the event: “coupon type i is not collected during the

first r trials”. Then Pr{Er

i } = (1 − 1 n)r ≤ e− r

n

For r = βn ln n we get Pr{Er

i } ≤ e− βn ln n

n

= n−β By the union bound we have Pr{X > r} = Pr n

  • i=1

Er

i

  • (i.e. at least one coupon is not selected), so

Pr{X > r} ≤

n

  • i=1

Pr{Er

i } ≤ n · n−β = n−(β−1) = n−ǫ,

where ǫ = β − 1 > 0

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 11 / 16

slide-12
SLIDE 12

Sharper concentration around the mean - a heuristic argument

Binomial distribution (#successes in n independent trials each one with success probability p) X ∼ B(n, p) ⇒ Pr{X = k} = n

k

  • pk(1 − p)n−k

(k = 0, 1, 2, . . . , n) E(X) = np, V ar(X) = np(1 − p) Poisson distribution) X ∼ P(λ) ⇒ Pr{X = x} = e−λ λx

x!

(x = 0, 1, . . . ) E(X) = V ar(X) = λ Approximation: It is B(n, p) ∞ − → P(λ), where λ = np. For large n, the approximation of the binomial by the Poisson is good.

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 12 / 16

slide-13
SLIDE 13

Towards the sharp concentration result

Let Nr

i = number of times coupon i chosen during the first

r trials. Then Er

i is equivalent to the event {Nr i = 0}.

Clearly Nr

i ∼ B

  • r, 1

n

  • , thus

Pr{Nr

i = x} =

r

x

1

n

x 1 − 1

n

r−x Let λ a positive real number. A r.v. Y is P(λ) ⇔ Pr{Y = y} = e−λ · λy

y!

As said, for suitable small λ and as r approaches ∞, P r

n

  • is a good approximation of B
  • r, 1

n

  • . Thus

Pr{Er

i } = Pr{Nr i = 0} ≃ e−λ λ0 0! = e−λ = e− r

n

(fact 1)

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 13 / 16

slide-14
SLIDE 14

An informal argument on independence

We will now claim that the Er

i (1 ≤ i ≤ n) events are “almost

independent”, (although it is obvious that there is some dependence between them; but we are anyway heading towards a heuristic). Claim 1. For 1 ≤ i ≤ n, and any set if indices {j1, . . . , jk} not containing i, Pr

  • Er

i

  • k

l=1 Er jl

  • ≃ Pr{Er

i }

Proof: Pr

  • Er

i

  • k
  • l=1

Er

jl

  • =

Pr

  • Er

i ∩

k

l=1 Er jl

  • Pr

k

l=1 Er jl

  • =
  • 1 − k+1

n

r

  • 1 − k

n

r ≃ e− r(k+1)

n

e− rk

n

= e− r

n ≃ Pr{Er

i }

  • Sotiris Nikoletseas, Professor

Randomized Algorithms - Lecture 6 14 / 16

slide-15
SLIDE 15

An approximation of the probability

Because of fact 1 and Claim 1, we have: Pr n

  • i=1

Em

i

  • = Pr

n

  • i=1

Em

i

  • ≃ (1 − e− m

n )n ≃ e−ne− m n

For m = n(ln n + c) = n ln n + cn, for any constant c ∈ R, we then get Pr{X > m = n ln n + cn} = Pr n

  • i=1

Em

i

  • ≃ Pr

n

  • i=1

Em

i

  • = 1 − e−e−c

The above probability:

  • is close to 0, for large positive c
  • is close to 1, for large negative c

Thus the probability of having collected all coupons, rapidly changes from nearly 0 to almost 1 in a small interval cantered around n ln n (!)

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 15 / 16

slide-16
SLIDE 16

The rigorous result

Theorem: Let X the r.v. counting the number of trials for having collected each one of the n coupons at least once. Then, for any constant c ∈ R and m = n(ln n + c) it is lim

n→∞ Pr{X > m} = 1 − e−e−c

Note 1. The proof uses the Boole-Bonferroni inequalities for inclusion-exclusion in the probability of a union of events. Note 2. The power of the Poisson heuristic is that it gives a quick, approximative estimation of probabilities and offers some intuitive insight towards the accurate behaviour of the involved quantities.

Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 16 / 16