Randomized Algorithms Lecture 6: Coupon Collectors problem Sotiris - PowerPoint PPT Presentation

Randomized Algorithms Lecture 6: “Coupon Collector’s problem” Sotiris Nikoletseas Professor CEID - ETY Course 2017 - 2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 1 / 16

Variance: key features Definition: ( x − µ ) 2 Pr { X = x } � V ar ( X ) = E [( X − µ ) 2 ] = x � where µ = E [ X ] = x Pr { X = x } x � We call standard deviation of X the σ = V ar ( X ) Basic Properties: (i) V ar ( X ) = E [ X 2 ] − E 2 [ X ] (ii) V ar ( cX ) = c 2 V ar ( X ), where c constant. (iii) V ar ( X + c ) = V ar ( X ), where c constant. proof of (i): V ar ( X ) = E [( X − µ ) 2 ] = E [ X 2 − 2 µX + µ 2 ] = E [ X 2 ] + E [ − 2 µX ] + E [ µ 2 ] = E [ X 2 ] − 2 µE [ X ] + µ 2 = E [ X 2 ] − µ 2 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 2 / 16

On the Additivity of Variance In general the variance of a sum of random variables is not equal to the sum of their variances However, variances do add for independent variables (i.e. mutually independent variables). Actually pairwise independence suffices. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 3 / 16

Conditional distributions Let X, Y be discrete random variables. Their joint probability density function is f ( x, y ) = Pr { ( X = x ) ∩ ( Y = y ) } � Clearly f 1 ( x ) = Pr { X = x } = f ( x, y ) y � and f 2 ( y ) = Pr { Y = y } = f ( x, y ) x Also, the conditional probability density function is: f ( x | y ) = Pr { X = x | Y = y } = Pr { ( X = x ) ∩ ( Y = y ) } = Pr { Y = y } = f ( x, y ) f ( x, y ) f 2 ( y ) = � x f ( x, y ) Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 4 / 16

Pairwise independence Let random variables X 1 , X 2 , . . . , X n . These are called pairwise independent iff for all i � = j it is Pr { ( X i = x ) | ( X j = y ) } = Pr { X i = x } , ∀ x, y Equivalently, Pr { ( X i = x ) ∩ ( X j = y ) } = = Pr { X i = x } · Pr { X j = y } , ∀ x, y Generalizing, the collection is k-wise independent iff, for every subset I ⊆ { 1 , 2 , . . . , n } with | I | < k for every set of values { a i } , b and j / ∈ I , it is � � � Pr X j = b | X i = a i = Pr { X j = b } i ∈ I Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 5 / 16

Mutual (or“full”) independence The random variables X 1 , X 2 , . . . , X n are mutually independent iff for any subset X i 1 , X i 2 , . . . , X i k , (2 ≤ k ≤ n ) of them, it is Pr { ( X i 1 = x 1 ) ∩ ( X i 2 = x 2 ) ∩ · · · ∩ ( X i k = x k ) } = = Pr { X i 1 = x 1 } · Pr { X i 2 = x 2 } · · · Pr { X i k = x k } Example (for n = 3). Let A 1 , A 2 , A 3 3 events. They are mutually independent iff all four equalities hold: Pr { A 1 A 2 } = Pr { A 1 } Pr { A 2 } (1) Pr { A 2 A 3 } = Pr { A 2 } Pr { A 3 } (2) Pr { A 1 A 3 } = Pr { A 1 } Pr { A 3 } (3) Pr { A 1 A 2 A 3 } = Pr { A 1 } Pr { A 2 } Pr { A 3 } (4) They are called pairwise independent if (1), (2), (3) hold. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 6 / 16

The Coupon Collector’s problem There are n distinct coupons and at each trial a coupon is chosen uniformly at random, independently of previous trials. Let m the number of trials. Goal: establish relationships between the number m of trials and the probability of having chosen each one of the n coupons at least once. Note: the problem is similar to occupancy (number of balls so that no bin is empty). Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 7 / 16

The expected number of trials needed (I) Let X the number of trials (a random variable) needed to collect all coupons at least once each. Let C 1 , C 2 , . . . , C X the sequence of trials, where C i ∈ { 1 , . . . , n } denotes the coupon type chosen at trial i . We call the i th trial a success if coupon type chosen at C i was not drawn in any of the first i − 1 trials (obviously C 1 and C X are always successes). We divide the sequence of trials into epochs, where epoch i begins with the trial following the i th success and ends with the trial at which the ( i + 1)st success takes place. Let r.v. X i (0 ≤ i ≤ n − 1) be the number of trials in the i th epoch. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 8 / 16

The expected number of trials needed (II) n − 1 � Clearly, X = X i i =0 Let p i the probability of success at any trial of the i th epoch. This is the probability of choosing one of the n − i remaining coupon types, so: p i = n − i n Clearly, X i follows a geometric distribution with parameter p i , so E [ X i ] = 1 p i and V ar ( X i ) = 1 − p i p 2 i By linearity of expectation: � n − 1 � n − 1 n − 1 n n 1 � � � � E [ X ] = E X i = E [ X i ] = n − i = n i = i =0 i =0 i =0 i =1 = nH n But H n ∼ ln n + Θ(1) ⇒ E [ X ] ∼ n ln n + Θ( n ) Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 9 / 16

The variance of the number of needed trials Since the X i ’s are independent, we have: n − 1 n − 1 n ni n ( n − i ) � � � V ar ( X ) = V ar ( X i ) = ( n − i ) 2 = = i 2 i =0 i =0 i =1 n n 1 1 = n 2 � � i 2 − n i i =1 i =1 n i 2 = π 2 1 V ar ( X ) ∼ π 2 � 6 n 2 Since lim 6 we get n →∞ i =1 Concentration around the expectation The Chebyshev inequality does not provide a strong result: For β > 1, Pr { X > βn ln n } = Pr { X − n ln n > ( β − 1) n ln n } V ar ( X ) ≤ Pr {| X − n ln n | > ( β − 1) n ln n } ≤ ( β − 1) 2 n 2 ln 2 n n 2 1 ∼ n 2 ln 2 n = ln 2 n Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 10 / 16

Stronger concentration around the expectation Let E r i the event: “coupon type i is not collected during the first r trials”. Then n ) r ≤ e − r i } = (1 − 1 Pr {E r n i } ≤ e − βn ln n Pr {E r = n − β For r = βn ln n we get n By the union bound we have � n � � E r Pr { X > r } = Pr i i =1 (i.e. at least one coupon is not selected), so n i } ≤ n · n − β = n − ( β − 1) = n − ǫ , � Pr {E r Pr { X > r } ≤ i =1 where ǫ = β − 1 > 0 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 11 / 16

Sharper concentration around the mean - a heuristic argument Binomial distribution (#successes in n independent trials each one with success probability p ) � n p k (1 − p ) n − k � X ∼ B ( n, p ) ⇒ Pr { X = k } = k ( k = 0 , 1 , 2 , . . . , n ) E ( X ) = np, V ar ( X ) = np (1 − p ) Poisson distribution) X ∼ P ( λ ) ⇒ Pr { X = x } = e − λ λ x ( x = 0 , 1 , . . . ) x ! E ( X ) = V ar ( X ) = λ Approximation: It is B ( n, p ) ∞ − → P ( λ ), where λ = np . For large n , the approximation of the binomial by the Poisson is good. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 12 / 16

Towards the sharp concentration result Let N r i = number of times coupon i chosen during the first r trials. Then E r i is equivalent to the event { N r i = 0 } . r, 1 Clearly N r � � i ∼ B , thus n � � 1 � x � � r − x � r 1 − 1 Pr { N r i = x } = x n n Let λ a positive real number. A r.v. Y is P ( λ ) ⇔ Pr { Y = y } = e − λ · λ y y ! � r � As said, for suitable small λ and as r approaches ∞ , P n r, 1 � � is a good approximation of B . Thus n i = 0 } ≃ e − λ λ 0 0! = e − λ = e − r Pr {E r i } = Pr { N r (fact 1) n Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 13 / 16

An informal argument on independence We will now claim that the E r i (1 ≤ i ≤ n ) events are “almost independent”, (although it is obvious that there is some dependence between them; but we are anyway heading towards a heuristic). Claim 1. For 1 ≤ i ≤ n , and any set if indices { j 1 , . . . , j k } not containing i , � � � � k E r l =1 E r ≃ Pr {E r Pr i } � i j l � � �� k �� E r l =1 E r � r Pr i ∩ � k � 1 − k +1 � � j l � E r � E r n Proof: Pr = = � r i � j l �� k � � 1 − k l =1 E r � Pr l =1 n j l ≃ e − r ( k +1) = e − r n n ≃ Pr {E r i } e − rk n � Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 14 / 16

An approximation of the probability Because of fact 1 and Claim 1, we have: � n � n � � n ) n ≃ e − ne − m ≃ (1 − e − m � � E m E m n Pr = Pr i i i =1 i =1 For m = n (ln n + c ) = n ln n + cn , for any constant c ∈ R , we then get � n � n � � � � E m E m Pr { X > m = n ln n + cn } = Pr ≃ Pr i i i =1 i =1 = 1 − e − e − c The above probability: - is close to 0, for large positive c - is close to 1, for large negative c Thus the probability of having collected all coupons, rapidly changes from nearly 0 to almost 1 in a small interval cantered around n ln n (!) Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 15 / 16

The rigorous result Theorem: Let X the r.v. counting the number of trials for having collected each one of the n coupons at least once. Then, for any constant c ∈ R and m = n (ln n + c ) it is n →∞ Pr { X > m } = 1 − e − e − c lim Note 1. The proof uses the Boole-Bonferroni inequalities for inclusion-exclusion in the probability of a union of events. Note 2. The power of the Poisson heuristic is that it gives a quick, approximative estimation of probabilities and offers some intuitive insight towards the accurate behaviour of the involved quantities. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 16 / 16

Randomized Algorithms Lecture 6: Coupon Collectors problem Sotiris - PowerPoint PPT Presentation

Randomized Algorithms Lecture 6: Coupon Collectors problem Sotiris Nikoletseas Professor CEID - ETY Course 2017 - 2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 1 / 16 Variance: key features Definition: ( x

CS 574: Randomized Algorithms Lecture 5. Coupon Collector Problems September 8, 2015 Lecture 5.

On the Biased Partial Word Collector Problem Philippe Duchon and Cyril Nicaud LIGM Universit e

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

Staying Strong Together Coupon Book Sale September 24th October 22nd WHAT can you WIN ?? 1

Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collectors problem Sotiris

A Semi Preemptive Garbage A Semi Preemptive Garbage Collector for Solid State Collector

Bipolar Junction Transistors Emitter p n p Collector Emitter n p n Collector Base Base

Quick Review of Probability Geometric Distribution Coupon Collector Problem Anil Maheshwari

Balls & Bins Geometric Distribution Coupon Collector Problem Balls & Bins Anil

San Mateo County San Mateo County Treasurer- -Tax Collector Tax Collector Treasurer Lee

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized

Randomized algorithms Quick-sort Closest pair of points Inge Li Grtz 1 2

Expectation Continued: Tail Sum, Coupon Collector, and Functions of RVs CS 70, Summer 2019

CS70: Lecture 27. Coupon Collectors Problem. Time to collect coupons Coupons; Independent Random

Randomized Algorithms Lecture 3: Occupancy, Moments and deviations, Randomized selection

Union Bound, Geometric Variables, Coupon collectors problem and Minimum Cut Maria-Eirini Pegia

Stability, convergence to equilibrium and simulation of non-linear Hawkes Processes with memory

Dynamic Egocentric Models for Citation Networks Duy Vu Arthur Asuncion David Hunter Padhraic

4CSLL5 IBM Translation Models Martin Emms October 23, 2020 4CSLL5 IBM Translation Models

Last class: Synchronization Problems and Primitives Today: Synchonization Solutions

Foundations of Computing II Lecture 12: Multiple Random Variables, Linearity of Expectation.

Last Time... Sanity Check Let X be a RV that takes on values in A . Expectation describes the

Probability 2: Random variables and Expectations E [ X + Y ] = E [ X ] + E [ Y ] Review Some

Problems of Network Coding in P2P - and how to overcome it Christian Schindelhauer joint work

Randomized Algorithms Lecture 6: Coupon Collectors problem Sotiris - PowerPoint PPT Presentation

Randomized Algorithms Lecture 6: Coupon Collectors problem Sotiris Nikoletseas Professor CEID - ETY Course 2017 - 2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 6 1 / 16 Variance: key features Definition: ( x

CS 574: Randomized Algorithms Lecture 5. Coupon Collector Problems September 8, 2015 Lecture 5.

On the Biased Partial Word Collector Problem Philippe Duchon and Cyril Nicaud LIGM Universit e

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

Staying Strong Together Coupon Book Sale September 24th October 22nd WHAT can you WIN ?? 1

Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collectors problem Sotiris

A Semi Preemptive Garbage A Semi Preemptive Garbage Collector for Solid State Collector

Bipolar Junction Transistors Emitter p n p Collector Emitter n p n Collector Base Base

Quick Review of Probability Geometric Distribution Coupon Collector Problem Anil Maheshwari

Balls &amp; Bins Geometric Distribution Coupon Collector Problem Balls &amp; Bins Anil

San Mateo County San Mateo County Treasurer- -Tax Collector Tax Collector Treasurer Lee

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah &amp; Karan Singh 1 Randomized

Randomized algorithms Quick-sort Closest pair of points Inge Li Grtz 1 2

Expectation Continued: Tail Sum, Coupon Collector, and Functions of RVs CS 70, Summer 2019

CS70: Lecture 27. Coupon Collectors Problem. Time to collect coupons Coupons; Independent Random

Randomized Algorithms Lecture 3: Occupancy, Moments and deviations, Randomized selection

Union Bound, Geometric Variables, Coupon collectors problem and Minimum Cut Maria-Eirini Pegia

Stability, convergence to equilibrium and simulation of non-linear Hawkes Processes with memory

Dynamic Egocentric Models for Citation Networks Duy Vu Arthur Asuncion David Hunter Padhraic

4CSLL5 IBM Translation Models Martin Emms October 23, 2020 4CSLL5 IBM Translation Models

Last class: Synchronization Problems and Primitives Today: Synchonization Solutions

Foundations of Computing II Lecture 12: Multiple Random Variables, Linearity of Expectation.

Last Time... Sanity Check Let X be a RV that takes on values in A . Expectation describes the

Probability 2: Random variables and Expectations E [ X + Y ] = E [ X ] + E [ Y ] Review Some

Problems of Network Coding in P2P - and how to overcome it Christian Schindelhauer joint work

Balls & Bins Geometric Distribution Coupon Collector Problem Balls & Bins Anil

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized