{ } { } Pr { t } = by definition of Pr i [ n ] , h ( x i ) t - PDF document

1 stream from left to right and we want to minimize the memory needed by the algorithm to ac- . But, fortunately: However, the following fact seems to imply that the algorithm is wrong. thetical family of hash functions and then see how to turn it into an effective algorithm. We start with an hypothetical algorithm using uniform real random numbers and a hypo- complish this task. One can show that any deterministic algorithm that approximates the value � Exercise 1 (A streaming algorithm for counting the number of distinct values). [ ⋆ ] We are given a stream of numbers x 1 , . . . , x n ∈ [ m ] and we want to compute the number of distinct values in the stream: F 0 ( x ) = # { x i : i ∈ [ n ] } . (Note that if f a ( x ) = # { i : x i = a } , we can express F 0 ( x ) = ∑ m − 1 a =0 ( f a ( x )) 0 , as the zero-th moment of the frequencies of each element of [ m ] in the stream). Let us denote by S x = { x i : i ∈ [ n ] } the set of the values in the stream x . Note that F 0 ( x ) = # S x . (We may drop the x when the context is clear.) The streaming constraint is that the algorithm will see every x i only once as it reads the of F 0 within 10 % requires at least Ω( n ) bits of memory. Here, we will design a randomized algorithm that accomplish this task using only O ( log n + log m ) bits of memory. Assume that we are given a random function h : [ m ] → (0 , 1] , i.e. such that for every x ∈ [ m ] , h ( x ) is a (�xed) independent uniform random real in (0 , 1] . The algorithm proceeds asfollows: whenreadingthestream, recordinmemorytheminimumvalue µ sofarofthe h ( x i ) s, and output 1/ µ − 1 at the end. Show that Pr { µ � t } = (1 − t ) F 0 . ◮ Question 1.1 ) Answer. ◃ By independence of the values of h , { } { } Pr { µ � t } = by definition of µ Pr ∀ i ∈ [ n ] , h ( x i ) � t = Pr ∀ a ∈ S x , h ( a ) � t ∏ Pr { h ( a ) � t } = (1 − t ) F 0 . = by independence of the h ( a ) s ▹ a ∈ S x Show that E [ µ ] = F 0 +1 . 1 ◮ Question 1.2 ) ∫ ∞ ∫ 1 1 (1 − t ) F 0 dt = Answer. ◃ As µ � 0 , E [ µ ] = Pr { µ � t } dt = F 0 + 1 . ▹ 0 0 Show that E [1/ µ ] = ∞ . ◮ Question 1.3 ) ∫ 1 ∫ 1 F 0 · (1 − t ) F 0 − 1 − d Pr { µ � t } Answer. ◃ Indeed, E [1/ µ ] = = dt = ∞ since t t 0 0 ∫ ε (1 − t ) F 0 − 1 ∼ 1 dt t for t → 0 and t = ∞ for all ε > 0 . ▹ t 0 ◮ Question 1.4 ) Compute V ar ( µ ) and show that V ar ( µ ) � E [ µ ] 2 . ∫ 1 2 t 2 · F 0 · (1 − t ) F 0 − 1 dt = Answer. ◃ E [ µ 2 ] = ( F 0 + 2)( F 0 + 1) < 2 E [ µ ] 2 . Thus, 0 V ar ( µ ) = E [ µ 2 ] − E [ µ ] 2 < E [ µ ] 2 . ▹ Design andanalyzea ( ε, δ ) -estimatorfor F 0 . Still, whatis the expectedvalue ◮ Question 1.5 ) of its output? Is there a paradox here? ◃ Hint. First, design an ( ε, δ ) -estimator for µ . Answer. ◃ We use the standard technics: output the median ν of A = ⌈ α ln (1/ δ ) ⌉ average of B = ⌈ β / ε 2 ⌉ simultaneous independent evaluations of µ : µ i j for i ∈ [ A ] and j ∈ [ B ] . µ i 1 + · · · µ i 1 B Let µ i We have E [ µ i ] = E [ µ ] = F 0 + 1 and V ar ( µ i ) = = B V ar ( µ ) {� � 1 ε } � µ i − � � . Thus, by Chebyshev inequality, for all i ∈ [ A ] , Pr � � � � � B F 0 + 1 F 0 + 1 V ar ( µ )/ B B · ε 2 � 1 1 4 if we set β = 4 . ε 2 /( F 0 + 1) 2 �

2 But, we have Pr only the position of their �rst non-zero bit in their binary writing. We proceed as follows. reduce the memory needed is to relax the independence of the hash value to pairwise indepen- Pr Pr dence only. In the following, we will approximate the minimum of the hash keys by recording From the 1 ± ε Now, let Y i be the indicator variable for the event µ i ̸∈ F 0 +1 .   { ν ̸∈ 1 ± ε } Y i � A   1 ∑ above, E [ Y i ] � � � 4 . F 0 + 1 2   i ∈ [ A ]   − 2( A /4) 2 ( ) E [ Y i ] � A   ∑ ∑ Y i −  � Hoeffding exp � δ if we set α = 8 . 4 A  i ∈ [ A ] i ∈ [ A ] The ( ε, δ ) -estimator thus compute ν according to the above and output 1/ ν − 1 . This ensures that with probability at least 1 − δ , the output value belongs to [ F 0 1+ ε , F 0 1 − ε ] yielding a ( ε + o ( ε ) , δ ) -estimator for F 0 . Note that the expected value of each 1/ µ i j is still ∞ and thus the expected value of the output 1/ ν − 1 is ∞ as well. However, with probability 1 − δ , 1/ ν − 1 is within ε of F 0 . ▹ Unfortunately, such a random function h requires storing m reals in memory. The key to Let ℓ = ⌈ log 2 m ⌉ such that 2 ℓ − 1 < m � 2 ℓ and consider the �eld with 2 ℓ elements F 2 ℓ . We identify F 2 ℓ through canonical bijections to the set of bit-vectors { 0 , 1 } ℓ and to the set of integers { 0 , . . . , 2 ℓ − 1 } written in binary. For every pair ( a, b ) ∈ F 2 2 ℓ , consider the hash function h ab : F 2 ℓ → F 2 ℓ de�ned as h ab ( y ) = a + b · y . For every y ∈ F (2 ℓ ) ≡ { 0 , 1 } ℓ , we denote by ρ ( y ) = max { j ∈ [ ℓ ] : y 1 = · · · = y j = 0 } the largest index j such that the �rst j bits of y , seen as a bit-vector, are all zero. Let us now consider the following streaming algorithm: Algorithm 2 Streaming algorithm for F 0 Let ℓ = ⌈ log 2 m ⌉ , we identify each element x i ∈ [ m ] of the stream with its corresponding element in F 2 ℓ . Pick uniformly and independently two random elements a, b ∈ F 2 ℓ . Compute R = max i =1 ..n ρ ( h ab ( x i )) . return 2 R . = 1 ◮ Question 1.6 ) Show that for all c ∈ F 2 ℓ and r ∈ { 0 , . . . , ℓ } , Pr { } 2 r . ρ ( h ab ( c )) � r a,b ◃ Hint. Show that h ab ( c ) is uniform in F 2 ℓ . Answer. ◃ Since a is chosen uniformly at random in F 2 ℓ and independently from bc , then a + bc is uniform in F 2 ℓ and h ab ( c ) is an uniform random variable for all c ∈ F 2 ℓ . It follows that for all c ∈ F 2 ℓ and r ∈ { 0 , . . . , ℓ } , the probability that the binary writing of h ab ( c ) starts with r zeros is exactly 1/2 r . ▹ Let W r c ∈ S x W r c the indicator random variable for the event ρ ( h ab ( c )) � r . Let Z r = ∑ c , be the number of the values in the stream whose r �rst bits of their hash key are all zero. ◮ Question 1.7 ) Show that E [ Z r ] = F 0 /2 r . ∑ ∑ E [ W r Answer. ◃ E [ Z r ] = linearity c ] = indicator variables Pr { ρ ( h ab ( c )) � r } = c ∈ S x c ∈ S x 2 r = F 0 # S x 2 r . ▹ Show that the random values h ab (0) , . . . , h ab (2 ℓ − 1) are uniform and pair- ◮ Question 1.8 ) wise independent. 1 ◃ Hint. Show that if c ̸ = d , then for all γ, δ ∈ F 2 ℓ , Pr a,b { } 2 ℓ . ( h ab ( c ) , h ab ( d )) = ( γ, δ ) = # F 2

3 # Pr It follows that: Answer. Answer. Answer. ◃ Consider c ̸ = d ∈ F 2 ℓ and ( γ, δ ) ∈ F 2 2 ℓ . # { ( a, b ) ∈ F 2 2 ℓ : ( h ab ( c ) , h ab ( d )) = ( γ, δ ) } { } ( h ab ( c ) , h ab ( d )) = ( γ, δ ) = # F 2 a,b 2 ℓ ( 1 ) ( a ( γ { ) )} c ( a, b ) ∈ F 2 2 ℓ : = 1 d b δ 1 = = , # F 2 # F 2 2 ℓ 2 ℓ since the matrix is inversible as c ̸ = d (its determinant is d − c ). ▹ Show that V ar ( Z r ) = F 0 1 − 1 ( ) ◮ Question 1.9 ) < E [ Z r ] . 2 r 2 r As the random variables h ab (0) , . . . , h ab (2 ℓ − 1) are pairwise indepen- ◃ dent, the random variables ( W r c ) c ∈ S x are also pairwise independent. As the variance c ∈ S x V ar ( W r is linear for pairwise independent variables, we have V ar ( Z r ) = ∑ c ) = 2 r (1 − 1 1 2 r ) = F 0 2 r (1 − 1 2 r ) < F 0 ∑ 2 r = E [ Z r ] , since V ar ( Bernouilli ( α )) = α (1 − α ) . c ∈ S x ▹ Fix some η > 1 . η for all r ∈ { 0 , . . . , ℓ } such that 2 r > ηF 0 . Show that Pr { Z r > 0 } < 1 ◮ Question 1.10 ) ◃ Hint. Z r is an integer and use Markov’s inequality. Answer. ◃ Consider r such that 2 r > ηF 0 , i.e. such that 1/ η > F 0 /2 r = E [ Z r ] . Then, Pr { Z r > 0 } = Pr { Z r � 1 } � E [ Z r ] < 1/ η by Markov's inequality. ▹ η for all r ∈ { 0 , . . . , ℓ } such that 2 r < F 0 / η . Show that Pr { Z r = 0 } < 1 ◮ Question 1.11 ) ◃ Hint. Z r is an integer and apply Chebyshev’s inequality. Answer. ◃ Consider r such that 2 r < F 0 / η , i.e. such that η < F 0 /2 r = E [ Z r ] . Then, Pr { Z r = 0 } � Pr {| Z r − E [ Z r ] | � E [ Z r ] } � V ar ( Z r ) E [ Z r ] 2 < 1/ E [ Z r ] < 1/ η by Chebyshev's inequality. ▹ 2 R ∈ [ F 0 / η, ηF 0 ] > 1 − 2 ◮ Question 1.12 ) Conclude that for all η > 2 , Pr { } η . The algorithm outputs thus a η -approximation of F 0 with probability at least 1 − 2/ η for all η > 2 . How many bits of memory does it require? ◃ Note that R = max { r : Z r > 0 } . Thus, for all r ∈ { 0 , . . . , ℓ } , Pr { R � r } = Pr { Z r > 0 } and Pr { R < r } = Pr { Z r = 0 } . with r = ⌊ log 2 ( F 0 / η ) ⌋ , we get Pr { 2 R < F 0 / η } = Pr { Z r = 0 } < 1/ η by question ?? . And with r = ⌈ log 2 ( ηF 0 ) ⌉ , we get Pr { 2 R � ηF 0 } = Pr { Z r > 0 } < 1/ η by question ?? . It follows that the value 2 R output by the algorithm belongs to [ F 0 / η, ηF 0 ] with probability at least 1 − 2/ η > 0 , for all η > 2 . The algorithm requires 2 ℓ + ⌈ log 2 ℓ ⌉ < 2 log 2 m + log log 2 m + 3 = O ( log m ) bits of memory to remember a , b and R . ▹ We have thus obtained a ( ε, 2/(1 + ε )) -estimator for F 0 using O ( log m ) bits of memory ε ε > 1 . Getting a ( ε, δ ) -estimator for F 0 in O ε,δ ( log m + log n ) bits of memory for arbitrarily for ε small ε, δ > 0 requires a lot more work...

{ } { } Pr { t } = by definition of Pr i [ n ] , h ( x i ) t - PDF document

1 stream from left to right and we want to minimize the memory needed by the algorithm to ac- . But, fortunately: However, the following fact seems to imply that the algorithm is wrong. thetical family of hash functions and then see how to turn

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Formal Definition of a Finite Automaton Formal Definition of a Finite Automaton p.1/23 Why a

Fundamentalism Definition? Definition? Definition? Definition? Origins Conflict with

Definition of Innovation 1 Definition of Innovation 01. Defining Innovation 02. Grades of

Mathematics of Finance Exponents, Radicals and Loga- rithms Definition 1. x n = x x x

Did you talk to your community today? Martin Ferro-Thomsen Conferize martin@conferize.com

Problem Definition Problem Definition CG Lecture 5 CG Lecture 5 Point Location Point Location

Improving the Definition of UML Greg OKeefe Computer Sciences Laboratory Australian National

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Federal Definition of AT? Definition: Assistive technology means any item, piece of

Definitions in Technical Writing By: Isaac Morton Definition of definition 1. a : a statement

Topic 6 Conditional Probability and Independence Conditional Probability 1 / 9 Definition The

Subsidies in the Property I nsurance Market D fi iti D fi iti Definition Definition A

Clinical Governance - a Definition Clinical Governance - a Definition A framework through

Outline Outline 4 Definition of Turbulence 4 Definition of Turbulence 4 Features of Turbulence

34 Council June 24, 2019 REVISED DEFINITION OF AFFORDABLE HOUSING Issue A revised definition of

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

BLOOMIN' MARVELLOUS WHY PROBABLY CAN BE BETTER THAN DEFINITELY Adrian Colyer, @adriancolyer

Tracking Frequent Items Dynamically: Whats Hot and Whats Not To appear in PODS 2003

15-853:Algorithms in the Real World Announcements: HW2 due tomorrow noon. Small correction

Causal Inference Theory and Applications Dr. Matthias Uflacker, Johannes Huegle, Christopher

Independent Random Matching Darrell Duffie, Stanford University and Yeneng Sun, National

s rss st

Two-point Sampling Speaker: Chuang-Chieh Lin Advisor: Professor Maw-Shang Chang National Chung