Randomized Algorithms Lecture 4: Two-point Sampling, Coupon - - PowerPoint PPT Presentation

randomized algorithms lecture 4 two point sampling coupon
SMART_READER_LITE
LIVE PREVIEW

Randomized Algorithms Lecture 4: Two-point Sampling, Coupon - - PowerPoint PPT Presentation

Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collectors problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013 - 2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 1 / 36


slide-1
SLIDE 1

Randomized Algorithms Lecture 4: “Two-point Sampling, Coupon Collector’s problem”

Sotiris Nikoletseas Associate Professor

CEID - ETY Course 2013 - 2014

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 1 / 36

slide-2
SLIDE 2

Overview

  • A. Pairwise independence of random variables
  • B. The pairwise independent sampling theorem
  • C. Probability amplification via reduced randomness
  • D. The Coupon Collector’s problem

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 2 / 36

slide-3
SLIDE 3
  • A. On the Additivity of Variance

In general the variance of a sum of random variables is not equal to the sum of their variances However, variances do add for independent variables (i.e. mutually independent variables) In fact, mutual independence is not necessary and pairwise independence suffices This is very useful, since in many situations the random variables involved are pairwise independent but not mutually independent.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 3 / 36

slide-4
SLIDE 4

Conditional distributions

Let X, Y be discrete random variables. Their joint probability density function is f(x, y) = Pr{(X = x) ∩ (Y = y)} Clearly f1(x) = Pr{X = x} = ∑

y

f(x, y) and f2(y) = Pr{Y = y} = ∑

x

f(x, y) Also, the conditional probability density function is: f(x|y) = Pr{X = x|Y = y} = Pr{(X = x) ∩ (Y = y)} Pr{Y = y} = = f(x, y) f2(y) = f(x, y) ∑

x f(x, y)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 4 / 36

slide-5
SLIDE 5

Pairwise independence

Let random variables X1, X2, . . . , Xn. These are called pairwise independent iff for all i ̸= j it is Pr{(Xi = x)|(Xj = y)} = Pr{Xi = x}, ∀x, y Equivalently, Pr{(Xi = x) ∩ (Xj = y)} = = Pr{Xi = x} · Pr{Xj = y}, ∀x, y Generalizing, the collection is k-wise independent iff, for every subset I ⊆ {1, 2, . . . , n} with |I| < k for every set of values {ai}, b and j / ∈ I, it is Pr { Xj = b| ∧

i∈I

Xi = ai } = Pr{Xj = b}

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 5 / 36

slide-6
SLIDE 6

Mutual (or“full”) independence

The random variables X1, X2, . . . , Xn are mutually independent iff for any subset Xi1, Xi2, . . . , Xik, (2 ≤ k ≤ n) of them, it is Pr{(Xi1 = x1) ∩ (Xi2 = x2) ∩ · · · ∩ (Xik = xk)} = = Pr{Xi1 = x1} · Pr{Xi2 = x2} · · · Pr{Xik = xk} Example (for n = 3). Let A1, A2, A3 3 events. They are mutually independent iff all four equalities hold: Pr{A1A2} = Pr{A1} Pr{A2} (1) Pr{A2A3} = Pr{A2} Pr{A3} (2) Pr{A1A3} = Pr{A1} Pr{A3} (3) Pr{A1A2A3} = Pr{A1} Pr{A2} Pr{A3} (4) They are called pairwise independent if (1), (2), (3) hold.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 6 / 36

slide-7
SLIDE 7

Mutual vs pairwise independence

Important notice: Pairwise independence does not imply mutual independence in general.

  • Example. Let a probability space including all

permutations of a, b, c as well as aaa, bbb, ccc (all 9 points considered equiprobable). Let Ak =“at place k there is an a” (for k = 1, 2, 3). It is Pr{A1} = Pr{A2} = Pr{A3} = 2+1

9

= 1

3

Also Pr{A1A2} = Pr{A2A3} = Pr{A1A3} = 1

9 = 1 3 · 1 3

thus A1, A2, A3 are pairwise independent. But Pr{A1A2A3} = 1

9 ̸= Pr{A1} Pr{A2} Pr{A3} = 1 27

thus the events are not mutually independent

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 7 / 36

slide-8
SLIDE 8

Variance: key features

Definition: V ar(X) = E[(X − µ)2] = ∑

x

(x − µ)2 Pr{X = x} where µ = E[X] = ∑

x

x Pr{X = x} We call standard deviation of X the σ = √ V ar(X) Basic Properties:

(i) V ar(X) = E[X2] − E2[X] (ii) V ar(cX) = c2V ar(X), where c constant. (iii) V ar(X + c) = V ar(X), where c constant.

proof of (i): V ar(X) = E[(X − µ)2] = E[X2 − 2µX + µ2] = E[X2] + E[−2µX] + E[µ2] = E[X2] − 2µE[X] + µ2 = E[X2] − µ2

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 8 / 36

slide-9
SLIDE 9

The additivity of variance

Theorem: if X1, X2, . . . , Xn are pairwise independent random variables, then: V ar ( n ∑

i=1

Xi ) =

n

i=1

V ar(Xi) Proof: V ar(X1 +· · ·+Xn) = E[(X1 +· · ·+Xn)2]−E2[X1 +· · ·+Xn] = = E  

n

i=1

X2

i + n

1≤i̸=j≤n

XiXj   −  

n

i=1

µ2

i + n

1≤i̸=j≤n

µiµj   = =

n

i=1

(E[X2

i ] − µ2 i ) + n

1≤i̸=j≤n

(E[XiXj] − µiµj) =

n

i=1

V ar(Xi) (since Xi pairwise independent, so ∀1 ≤ i ̸= j ≤ n E(XiXj) = E(Xi)E(Xj = µiµj) □ Note: As we see in the proof, the pairwise independence suffices, and mutual (full) independence is not needed.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 9 / 36

slide-10
SLIDE 10
  • B. The pairwise independent sampling theorem

Another Example. Birthday matching: Let us try to estimate the number of pairs of people in a room having birthday on the same day. Note 1: Matching birthdays for different pairs of students are pairwise independent, since knowing that (George, Takis) have a match tell us nothing about whether (George, Petros) match. Note 2: However, the events are not mutually independent. Indeed they are not even 3-wise independent since if (George,Takis) match and (Takis, Petros) match then (George, Petros) match!

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 10 / 36

slide-11
SLIDE 11

Birthday matching

Let us calculate the probability of having a certain number

  • f birthday matches.

Let B1, B2, . . . , Bn the birthdays of n independently chosen people and let Ei,j be the indicator variable for the event of a (i, j) match (i.e. Bi = Bj). As said, the events Ei,j are pairwise independent but not mutually independent. Clearly, Pr{Ei,j} = Pr{Bi = Bj} = 365 1

365 1 365 = 1 365 (for

i ̸= j). Let D the number of matching pairs. Then D = ∑

1≤i<j≤n

Ei,j By linearity of expectation we have E[D] = E   ∑

1≤i<j≤n

Ei,j   = ∑

1≤i<j≤n

E[Ei,j] = (n 2 ) 1 365

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 11 / 36

slide-12
SLIDE 12

Birthday matching

Since the variances of pairwise independent variables Ei,j add up, it is: V ar[D] = V ar   ∑

1≤i<j≤n

Ei,j   = ∑

1≤i<j≤n

V ar[Ei,j] = = (n

2

) 1

365(1 − 1 365)

As an example, for a class of n = 100 students, it is E[D] ≃ 14 and V ar[D] < 14(1 −

1 365) < 14. So by

Chebyshev’s inequality we have Pr{|D − 14| ≥ x} ≤ 14

x2

Letting x = 6, we conclude that with more than 50% chance the number of matching birthdays will be between 8 and 20.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 12 / 36

slide-13
SLIDE 13

The Pairwise Independent Sampling Theorem (I)

We can actually generalize and not restrict to sums of zero-one (indicator) valued variables neither to variables with the same distribution. We below state the theorem for possibly different distributions with same mean and variance (but this is done for simplicity, and the result holds for distributions with different means and/or variances as well).

  • Theorem. Let X1, . . . , Xn pairwise independent variables

with the same mean µ and variance σ2. Let Sn = ∑n

i=1 Xi

Then Pr { Sn

n − µ

  • ≥ x

} ≤ 1

n

( σ

x

)2

  • Proof. Note that E[ Sn

n ] = nµ n = µ,

V ar[ Sn

n ] =

( 1

n

)2 nσ2 = σ2

n and apply Chebyshev’s

inequality. □

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 13 / 36

slide-14
SLIDE 14

The Pairwise Independent Sampling Theorem (II)

Note: This Theorem actually provides a precise general evaluation of how the average of pairwise independent random samples approaches their mean. If the number n of samples becomes large enough we can arbitrarily close approach the mean with confidence arbitrarily close to 100% (n > σ2

x2 ) i.e. a

large number of samples is needed for distributions of large variance and when we want to assure high concentration around the mean).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 14 / 36

slide-15
SLIDE 15
  • C. Reduced randomness at probability amplification

Motivation: Randomized Algorithms, for a given input x, actually choose n random numbers (“witnesses”) and run a deterministic algorithm on the input, using each of these random numbers. intuitively, if the deterministic algorithm has a probability

  • f error ϵ (e.g. 1

2), t independent runs reduce the error

probability to ϵt(e.g.

1 2t ) and amplify the correctness

probability from 1

2 to 1 − 1 2t .

however, true randomness is quite expensive! What happens if we are constrained to use no more than a constant c random numbers? The simplest case is when c = 2 e.g. we choose just 2 random numbers (thus the name two-point sampling)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 15 / 36

slide-16
SLIDE 16
  • C. Reduced randomness at probability amplification

Problem definition: If our randomized algorithm reduced the error probability to ϵt with t random numbers, what can we expect about the error probability with t = 2 random numbers only? an obvious bound is ϵ2(e.g. when ϵ = 1

2 reducing the error

probability from 1

2 to 1 4)

can we do any better? it turns out that we can indeed do much better and reduce the error probability to 1

t (which is much smaller than the

constant ϵ2)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 16 / 36

slide-17
SLIDE 17
  • C. Reduced randomness at probability amplification

High level idea: generate t (“pseudo-random”) numbers based on the chosen 2 truly random numbers and use them in lieu of t truly independent random numbers. these generated numbers are dependent on the 2 chosen

  • numbers. Hence they are not independent, but are pairwise

independent. This loss of full independence reduces the accuracy of the algorithmic process but is still quite usable in reducing the error probability.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 17 / 36

slide-18
SLIDE 18
  • C. Reduced randomness at probability amplification

The process (high level description):

1 Choose a large prime number p

(e.g. Mersenne prime, 231 − 1).

2 Define Zp as the ring of integers modulo p

(e.g 0, 1, 2, . . . , 231 − 2)

3 Choose 2 truly random numbers, a and b from Zp

(e.g. a = 220 + 781 and b = 227 − 44).

4 Generate t “pseudo-random” numbers yi = (ai + b) mod p

e.g. y0 = 227 − 44 y1 = (220 + 781) · 1 + 227 − 44 y2 = (220 + 781) · 2 + 227 − 44 and so on.

5 Use each of the yi’s as t witnesses (in lieu of t purely

independent random witnesses)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 18 / 36

slide-19
SLIDE 19
  • C. Reduced randomness at probability amplification

Performance (high level discussion) As we will see: for any given error bound ϵ, the error probability is reduced from ϵ2 to 1

t .

however, instead of requiring O(log 1

ϵ) runs on the

deterministic algorithm (in the case of t independent random witnesses) we will require O( 1

ϵ) runs in the case of

2 independent random witnesses. thus, we gain in the probability amplification but loose on some efficiency. we need significantly less true randomness.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 19 / 36

slide-20
SLIDE 20

The class RP (I)

  • Definition. The class RP (Random Polynomial time) consists of

all languages L admitting a randomized algorithm A running in worst case polynomial time such that for any input x: x ∈ L ⇒ Pr{A(x) accepts} ≥ 1

2

x / ∈ L ⇒ Pr{A(x) accepts} = 0 Notes: language recognition ← → computational decision problems the 1

2 value is arbitrary. The success probability needs just

to be lower-bounded by an inverse polynomial function of the input size (a polynomial number of algorithm repetitions would boost it to constant, in polynomial time). RP actually includes Monte Carlo algorithms than can err

  • nly when x ∈ L (one-sided error)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 20 / 36

slide-21
SLIDE 21

The RP algorithm

An RP algorithm, for given input x, actually picks a random number r from Zp and computes A(x, r) with the following properties:

  • x ∈ L ⇒ A(x, r) = 1, for half of the possible values of r
  • x /

∈ L ⇒ A(x, r) = 0, for all possible choices of r

As said,

  • if we run algorithm A t times on the same input x, the error

probability is ≤

1 2t but this requires t truly random

numbers.

  • if we restrict ourselves to t = 2 true random numbers a, b

from Zp and run A(x, a) and A(x, b), the error probability can be as high as 1

4

  • but we can do better with t pseudo-random numbers: Let

ri = a · i + b (mod p), where a, b are truly random, as above, for i = 1, 2, . . . , t.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 21 / 36

slide-22
SLIDE 22

Modulo rings pairwise independence (I)

Let p a prime number and Zp = {0, 1, 2, . . . , p − 1} denote the ring of the integers modulo p. Lemma 1. Given y, i ∈ Zp and choosing a, b randomly uniformly from Zp, the probability of y ≡ a · i + b (mod p) is 1

p

  • Proof. Imagine that we first choose a. Then when choosing b, it

must be y − a · i ≡ b (mod p). Since we choose b uniformly and modulo p can take p values, the probability is clearly 1

p indeed.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 22 / 36

slide-23
SLIDE 23

Modulo rings pairwise independence (II)

Lemma 2. Given y, z, x, w ∈ Zp such that x ̸= w, and choosing a, b randomly uniformly from Zp, the probability of y ≡ a · x + b (mod p) and z ≡ a · w + b (mod p) is

1 p2

  • Proof. It is y − z ≡ a · (x − w) (mod p). Since x − w ̸= 0, the

equation holds for a unique value of a. This in turn implies a specific value for b. The probability that a, b get those two specific values is clearly 1

p · 1 p = 1 p2 .

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 23 / 36

slide-24
SLIDE 24

Modulo rings pairwise independence (III)

Lemma 3. Let i, j two distinct elements of Zp,and choose a, b randomly uniformly from Zp. Then the two variables Yi = a · i + b (mod p) and Yj = a · j + b (mod p) are uniformly distributed on Zp and are pairwise independent.

  • Proof. From lemma 1, it is clearly

Pr{Yi = α} = 1

p, for any α ∈ Zp, so Yi, Yj indeed have uniform

  • distribution. As for pairwise independence, note that:

Pr{Yi = α|Yj = β} = Pr{Yi=α∩Yj=β}

Pr{Yj=β}

=

1 p2 1 p = Pr{Yi = α}

Thus, Yi, Yj are pairwise independent. □

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 24 / 36

slide-25
SLIDE 25

Modulo rings pairwise independence (IV)

Note: This is only pairwise independence. Indeed, consider the variables Y1, Y2 Y3, Y4 as defined above. Every pair of them are pairwise independent. But, if you give the value of Y1, Y2 then we know the values of Y3, Y4 immediately, since the values of Y1 and Y2 uniquely determine a and b, and thus we can compute all Yi variables.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 25 / 36

slide-26
SLIDE 26

Modulo rings pairwise independence (V)

Thus Y = ∑t

i=1 A(x, ri) is a sum of random variables

which are pairwise independent, since the ri values are pairwise independent. Assume that x ∈ L, then E[Y ] = t

2

and V ar[Y ] = ∑t

i=1 V ar[A(x, ri)] = t 1 2 1 2 = t 4

The probability that all those t executions failed corresponds to the event Y = 0, and Pr{Y = 0} ≤ Pr{|Y − E[Y ]| ≥ E[Y ]} = Pr{|Y − t

2| ≥ t 2} ≤

≤ V ar[Y ] ( t

2) 2

=

t 4

( t

2) 2 = 1

t

from Chebyshev’s inequality. Thus the use of pseudo-randomness (t pseudo-random numbers) allows to “exploit” our 2 random bits to reduce the error probability from 1

4 to 1 t .

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 26 / 36

slide-27
SLIDE 27
  • D. The Coupon Collector’s problem

There are n distinct coupons and at each trial a coupon is chosen uniformly at random, independently of previous trials. Let m the number of trials. Goal: establish relationships between the number m of trials and the probability of having chosen each one of the n coupons at least once. Note: the problem is similar to occupancy (number of balls so that no bin is empty).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 27 / 36

slide-28
SLIDE 28

The expected number of trials needed (I)

Let X the number of trials (a random variable) needed to collect all coupons at least once each. Let C1, C2, . . . , CX the sequence of trials, where Ci ∈ {1, . . . , n} denotes the coupon type chosen at trial i. We call the ith trial a success if coupon type chosen at Ci was not drawn in any of the first i − 1 trials (obviously C1 and CX are always successes). We divide the sequence of trials into epochs, where epoch i begins with the trial following the ith success and ends with the trial at which the (i + 1)st success takes place. Let r.v. Xi(0 ≤ i ≤ n − 1) be the number of trials in the ith epoch.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 28 / 36

slide-29
SLIDE 29

The expected number of trials needed (II)

Clearly, X =

n−1

i=0

Xi Let pi the probability of success at any trial of the ith

  • epoch. This is the probability of choosing one of the n − i

remaining coupon types, so: pi = n−i

n

Clearly, Xi follows a geometric distribution with parameter pi, so E[Xi] = 1

pi and V ar(Xi) = 1−pi p2

i

By linearity of expectation: E[X] = E [ n−1 ∑

i=0

Xi ] =

n−1

i=0

E[Xi] =

n−1

i=0

n n − i = n

n

i=1

1 i = = nHn But Hn ∼ ln n + Θ(1) ⇒ E[X] ∼ n ln n + Θ(n)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 29 / 36

slide-30
SLIDE 30

The variance of the number of needed trials

Since the Xi’s are independent, we have: V ar(X) =

n−1

i=0

V ar(Xi) =

n−1

i=0

ni (n − i)2 =

n

i=1

n(n − i) i2 = = n2

n

i=1

1 i2 − n

n

i=1

1 i Since lim

n→∞ n

i=1

1 i2 = π2 6 we get V ar(X) ∼ π2

6 n2

Concentration around the expectation The Chebyshev inequality does not provide a strong result: For β > 1, Pr{X > βn ln n} = Pr{X − n ln n > (β − 1)n ln n} ≤ Pr{|X − n ln n| > (β − 1)n ln n} ≤

V ar(X) (β−1)2n2 ln2 n

n2 n2 ln2 n = 1 ln2 n

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 30 / 36

slide-31
SLIDE 31

Stronger concentration around the expectation

Let Er

i the event: “coupon type i is not collected during the

first r trials”. Then Pr{Er

i } = (1 − 1 n)r ≤ e− r

n

For r = βn ln n we get Pr{Er

i } ≤ e− βn ln n

n

= n−β By the union bound we have Pr{X > r} = Pr { n ∪

i=1

Er

i

} (i.e. at least one coupon is not selected), so Pr{X > r} ≤

n

i=1

Pr{Er

i } ≤ n · n−β = n−(β−1) = n−ϵ,

where ϵ = β − 1 > 0

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 31 / 36

slide-32
SLIDE 32

Sharper concentration around the mean - a heuristic argument

Binomial distribution (#successes in n independent trials each one with success probability p) X ∼ B(n, p) ⇒ Pr{X = k} = (n

k

) pk(1 − p)n−k (k = 0, 1, 2, . . . , n) E(X) = np, V ar(X) = np(1 − p) Poisson distribution) X ∼ P(λ) ⇒ Pr{X = x} = e−λ λx

x!

(x = 0, 1, . . . ) E(X) = V ar(X) = λ Approximation: It is B(n, p) ∞ − → P(λ), where λ = np. For large n, the approximation of the binomial by the Poisson is good.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 32 / 36

slide-33
SLIDE 33

Towards the sharp concentration result

Let Nr

i = number of times coupon i chosen during the first

r trials. Then Er

i is equivalent to the event {Nr i = 0}.

Clearly N r

i ∼ B

( r, 1

n

) , thus Pr{N r

i = x} =

(r

x

) ( 1

n

)x ( 1 − 1

n

)r−x Let λ a positive real number. A r.v. Y is P(λ) ⇔ Pr{Y = y} = e−λ · λy

y!

As said, for suitability small λ and as r approaches ∞, P ( r

n

) is a good approximation of B ( r, 1

n

) . Thus Pr{Er

i } = Pr{Nr i = 0} ≃ e−λ λ0 0! = e−λ = e− r

n

(fact 1)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 33 / 36

slide-34
SLIDE 34

An informal argument on independence

We will now claim that the Er

i (1 ≤ i ≤ n) events are “almost

independent”, (although it is obvious that there is some dependence between them; but we are anyway heading towards a heuristic). Claim 1. For 1 ≤ i ≤ n, and any set if indices {j1, . . . , jk} not containing i, Pr { Er

i

  • ∩k

l=1 Er jl

} ≃ Pr{Er

i }

Proof: Pr { Er

i

  • k

l=1

Er

jl

} = Pr { Er

i ∩

(∩k

l=1 Er jl

)} Pr {∩k

l=1 Er jl

} = ( 1 − k+1

n

)r ( 1 − k

n

)r ≃ e− r(k+1)

n

e− rk

n

= e− r

n ≃ Pr{Er

i }

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 34 / 36

slide-35
SLIDE 35

An approximation of the probability

Because of fact 1 and Claim 1, we have: Pr { n ∪

i=1

Em

i

} = Pr { n ∩

i=1

Em

i

} ≃ (1 − e− m

n )n ≃ e−ne− m n

For m = n(ln n + c) = n ln n + cn, for any constant c ∈ R, we then get Pr{X > m = n ln n + cn} = Pr { n ∪

i=1

Em

i

} ≃ Pr { n ∩

i=1

Em

i

} = 1 − e−e−c The above probability:

  • is close to 0, for large positive c
  • is close to 1, for large negative c

Thus the probability of having collected all coupons, rapidly changes from nearly 0 to almost 1 in a small interval cantered around n ln n (!)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 35 / 36

slide-36
SLIDE 36

The rigorous result

Theorem: Let X the r.v. counting the number of trials for having collected each one of the n coupons at least once. Then, for any constant c ∈ R and m = n(ln n + c) it is lim

n→∞ Pr{X > m} = 1 − e−e−c

Note 1. The proof uses the Boole-Bonferroni inequalities for inclusion-exclusion in the probability of a union of events. Note 2. The power of the Poisson heuristic is that it gives a quick, approximative estimation of probabilities and offers some intuitive insight towards the accurate behaviour of the involved quantities.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 36 / 36