Data Streams & Communication Complexity Lecture 1: Simple Stream - PowerPoint PPT Presentation

Example: Frequency Moments ◮ Frequency Moments: Define F k = � i f k for k ∈ { 1 , 2 , 3 , . . . } i ◮ Use AMS estimator with X = m ( r k − ( r − 1) k ). ◮ Expectation: E [ X ] = F k ◮ Range: 0 ≤ X ≤ kmF k − 1 ≤ kn 1 − 1 / k F k ∞ ◮ Repeat t times and let ˜ F k be the average value. By Chernoff, tF k ǫ 2 t ǫ 2 � � � � � � | ˜ F k − F k | ≥ ǫ F k ≤ 2 exp − = 2 exp − P 3 kn 1 − 1 / k F k 3 kn 1 − 1 / k ◮ If t = 3 ǫ − 2 kn 1 − 1 / k log(2 δ − 1 ) then P � � | ˜ F k − F k | ≥ ǫ F k ≤ δ . 8/25

Example: Frequency Moments ◮ Frequency Moments: Define F k = � i f k for k ∈ { 1 , 2 , 3 , . . . } i ◮ Use AMS estimator with X = m ( r k − ( r − 1) k ). ◮ Expectation: E [ X ] = F k ◮ Range: 0 ≤ X ≤ kmF k − 1 ≤ kn 1 − 1 / k F k ∞ ◮ Repeat t times and let ˜ F k be the average value. By Chernoff, tF k ǫ 2 t ǫ 2 � � � � � � | ˜ F k − F k | ≥ ǫ F k ≤ 2 exp − = 2 exp − P 3 kn 1 − 1 / k F k 3 kn 1 − 1 / k ◮ If t = 3 ǫ − 2 kn 1 − 1 / k log(2 δ − 1 ) then P � � | ˜ F k − F k | ≥ ǫ F k ≤ δ . ◮ Thm: In ˜ O ( ǫ − 2 n 1 − 1 / k ) space we can find a (1 ± ǫ ) approximation for F k with probability at least 1 − δ . 8/25

Outline Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist ℓ p Sampling and Frequency Moments 9/25

Random Projections ◮ Many stream algorithms use a random projection Z ∈ R w × n , w ≪ n   f 1 f 2       z 1 , 1 . . . . . . z 1 , n s 1  .  .   . . . .   . . . Z ( f ) = =  = s     . . .        z w , 1 . . . . . . z w , n s w  .  .   .   f n 10/25

Random Projections ◮ Many stream algorithms use a random projection Z ∈ R w × n , w ≪ n   f 1 f 2       z 1 , 1 . . . . . . z 1 , n s 1  .  .   . . . .   . . . Z ( f ) = =  = s     . . .        z w , 1 . . . . . . z w , n s w  .  .   .   f n ◮ Updatable: We can maintain sketch s in ˜ O ( w ) space since incrementing f i corresponds to 10/25

Random Projections ◮ Many stream algorithms use a random projection Z ∈ R w × n , w ≪ n   f 1 f 2       z 1 , 1 . . . . . . z 1 , n s 1  .  .   . . . .   . . . Z ( f ) = =  = s     . . .        z w , 1 . . . . . . z w , n s w  .  .   .   f n ◮ Updatable: We can maintain sketch s in ˜ O ( w ) space since incrementing f i corresponds to   z 1 , i . . s ← s +   .   z w , i 10/25

Random Projections ◮ Many stream algorithms use a random projection Z ∈ R w × n , w ≪ n   f 1 f 2       z 1 , 1 . . . . . . z 1 , n s 1  .  .   . . . .   . . . Z ( f ) = =  = s     . . .        z w , 1 . . . . . . z w , n s w  .  .   .   f n ◮ Updatable: We can maintain sketch s in ˜ O ( w ) space since incrementing f i corresponds to   z 1 , i . . s ← s +   .   z w , i ◮ Useful: Choose a distribution for z i , j such that relevant function of f can be estimated from s with high probability for sufficiently large w . 10/25

Examples ◮ If z i , j ∈ R {− 1 , 1 } , can estimate F 2 with w = O ( ǫ − 2 log δ − 1 ). 11/25

Examples ◮ If z i , j ∈ R {− 1 , 1 } , can estimate F 2 with w = O ( ǫ − 2 log δ − 1 ). ◮ If z i , j ∼ D where D is p -stable p ∈ (0 , 2], can estimate F p with w = O ( ǫ − 2 log δ − 1 ). For example, 1 and 2 stable distributions are: Cauchy( x ) = 1 1 1 · e − x 2 / 2 π · Gaussian( x ) = √ 1 + x 2 2 π 11/25

Examples ◮ If z i , j ∈ R {− 1 , 1 } , can estimate F 2 with w = O ( ǫ − 2 log δ − 1 ). ◮ If z i , j ∼ D where D is p -stable p ∈ (0 , 2], can estimate F p with w = O ( ǫ − 2 log δ − 1 ). For example, 1 and 2 stable distributions are: Cauchy( x ) = 1 1 1 · e − x 2 / 2 π · Gaussian( x ) = √ 1 + x 2 2 π ◮ Note that F 0 = (1 ± ǫ ) F p if p = log(1 + ǫ ) / log m . 11/25

Examples ◮ If z i , j ∈ R {− 1 , 1 } , can estimate F 2 with w = O ( ǫ − 2 log δ − 1 ). ◮ If z i , j ∼ D where D is p -stable p ∈ (0 , 2], can estimate F p with w = O ( ǫ − 2 log δ − 1 ). For example, 1 and 2 stable distributions are: Cauchy( x ) = 1 1 1 · e − x 2 / 2 π · Gaussian( x ) = √ 1 + x 2 2 π ◮ Note that F 0 = (1 ± ǫ ) F p if p = log(1 + ǫ ) / log m . ◮ For the rest of lecture we’ll focus on “hash-based” sketches. Given a random hash function h : [ n ] → [ w ], non-zero entries are z h i , i .   0 1 0 0 0 1 Z = 0 0 0 1 0 0   1 0 1 0 1 0 11/25

Count-Min Sketch ◮ Maintain vector s ∈ N w via random hash function h : [ n ] → [ w ] f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] s[1] s[2] s[3] ... s[w] 13/25

Count-Min Sketch ◮ Maintain vector s ∈ N w via random hash function h : [ n ] → [ w ] f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] s[1] s[2] s[3] ... s[w] ◮ Update: For each increment of f i , increment s h i . Hence, � s k = f j j : h j = k 13/25

Count-Min Sketch ◮ Maintain vector s ∈ N w via random hash function h : [ n ] → [ w ] f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] s[1] s[2] s[3] ... s[w] ◮ Update: For each increment of f i , increment s h i . Hence, � s k = f j e.g., s 3 = f 6 + f 7 + f 13 j : h j = k 13/25

Count-Min Sketch ◮ Maintain vector s ∈ N w via random hash function h : [ n ] → [ w ] f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] s[1] s[2] s[3] ... s[w] ◮ Update: For each increment of f i , increment s h i . Hence, � s k = f j e.g., s 3 = f 6 + f 7 + f 13 j : h j = k ◮ Query: Use ˜ f i = s h i to estimate f i . 13/25

Count-Min Sketch ◮ Maintain vector s ∈ N w via random hash function h : [ n ] → [ w ] f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] s[1] s[2] s[3] ... s[w] ◮ Update: For each increment of f i , increment s h i . Hence, � s k = f j e.g., s 3 = f 6 + f 7 + f 13 j : h j = k ◮ Query: Use ˜ f i = s h i to estimate f i . � � ◮ Lemma: f i ≤ ˜ ˜ f i and P f i ≥ f i + 2 m / w ≤ 1 / 2 13/25

Count-Min Sketch ◮ Maintain vector s ∈ N w via random hash function h : [ n ] → [ w ] f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] s[1] s[2] s[3] ... s[w] ◮ Update: For each increment of f i , increment s h i . Hence, � s k = f j e.g., s 3 = f 6 + f 7 + f 13 j : h j = k ◮ Query: Use ˜ f i = s h i to estimate f i . � � ◮ Lemma: f i ≤ ˜ ˜ f i and P f i ≥ f i + 2 m / w ≤ 1 / 2 ◮ Thm: Let w = 2 /ǫ . Repeat the hashing lg( δ − 1 ) times in parallel and take the minimum estimate for f i � � f i ≤ ˜ P f i ≤ f i + ǫ m ≥ 1 − δ 13/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E and so � E = f j j � = i : h i = h j 14/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E and so � E = f j j � = i : h i = h j ◮ Since all f j ≥ 0, we have E ≥ 0. 14/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E and so � E = f j j � = i : h i = h j ◮ Since all f j ≥ 0, we have E ≥ 0. ◮ Since P [ h i = h j ] = 1 / w , � E [ E ] = f j · P [ h i = h j ] j � = i 14/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E and so � E = f j j � = i : h i = h j ◮ Since all f j ≥ 0, we have E ≥ 0. ◮ Since P [ h i = h j ] = 1 / w , � E [ E ] = f j · P [ h i = h j ] ≤ m / w j � = i 14/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E and so � E = f j j � = i : h i = h j ◮ Since all f j ≥ 0, we have E ≥ 0. ◮ Since P [ h i = h j ] = 1 / w , � E [ E ] = f j · P [ h i = h j ] ≤ m / w j � = i ◮ By an application of the Markov bound, P [ E ≥ 2 m / w ] ≤ 1 / 2 14/25

Range Queries ◮ Range Query: For i , j ∈ [ n ], estimate f [ i , j ] = f i + f i +1 + . . . + f j 15/25

Range Queries ◮ Range Query: For i , j ∈ [ n ], estimate f [ i , j ] = f i + f i +1 + . . . + f j ◮ Dyadic Intervals: Restrict attention to intervals of the form [1 + ( i − 1)2 j , i 2 j ] where j ∈ { 0 , 1 , . . . , lg n } , i ∈ { 1 , 2 , . . . n / 2 j } 15/25

Range Queries ◮ Range Query: For i , j ∈ [ n ], estimate f [ i , j ] = f i + f i +1 + . . . + f j ◮ Dyadic Intervals: Restrict attention to intervals of the form [1 + ( i − 1)2 j , i 2 j ] where j ∈ { 0 , 1 , . . . , lg n } , i ∈ { 1 , 2 , . . . n / 2 j } since any range can be partitioned as O (log n ) such intervals. E.g., [48 , 106] = [48 , 48] ∪ [49 , 64] ∪ [65 , 96] ∪ [97 , 104] ∪ [105 , 106] 15/25

Range Queries ◮ Range Query: For i , j ∈ [ n ], estimate f [ i , j ] = f i + f i +1 + . . . + f j ◮ Dyadic Intervals: Restrict attention to intervals of the form [1 + ( i − 1)2 j , i 2 j ] where j ∈ { 0 , 1 , . . . , lg n } , i ∈ { 1 , 2 , . . . n / 2 j } since any range can be partitioned as O (log n ) such intervals. E.g., [48 , 106] = [48 , 48] ∪ [49 , 64] ∪ [65 , 96] ∪ [97 , 104] ∪ [105 , 106] ◮ To support dyadic intervals, construct Count-Min sketches corresponding to intervals of width 1 , 2 , 4 , 8 , . . . 15/25

Range Queries ◮ Range Query: For i , j ∈ [ n ], estimate f [ i , j ] = f i + f i +1 + . . . + f j ◮ Dyadic Intervals: Restrict attention to intervals of the form [1 + ( i − 1)2 j , i 2 j ] where j ∈ { 0 , 1 , . . . , lg n } , i ∈ { 1 , 2 , . . . n / 2 j } since any range can be partitioned as O (log n ) such intervals. E.g., [48 , 106] = [48 , 48] ∪ [49 , 64] ∪ [65 , 96] ∪ [97 , 104] ∪ [105 , 106] ◮ To support dyadic intervals, construct Count-Min sketches corresponding to intervals of width 1 , 2 , 4 , 8 , . . . ◮ E.g., for intervals of width 2 we have: f[1] f[2] f[3] f[4] f[5] f[6] ... f[n-1] f[n] g[1] g[2] g[3] ... g[n/2] s[1] s[2] s[3] ... s[w] where update rule is now: for increment of f 2 i − 1 or f 2 i , increment s h i . 15/25

Quantiles and Heavy Hitters 16/25

Quantiles and Heavy Hitters ◮ Quantiles: Find j such that f 1 + . . . + f j ≈ m / 2 16/25

Quantiles and Heavy Hitters ◮ Quantiles: Find j such that f 1 + . . . + f j ≈ m / 2 Can approximate median via binary search of range queries. 16/25

Quantiles and Heavy Hitters ◮ Quantiles: Find j such that f 1 + . . . + f j ≈ m / 2 Can approximate median via binary search of range queries. ◮ Heavy Hitter Problem: Find a set S ⊂ [ n ] where { i : f i ≥ φ m } ⊆ S ⊆ { i : f i ≥ ( φ − ǫ ) m } 16/25

Quantiles and Heavy Hitters ◮ Quantiles: Find j such that f 1 + . . . + f j ≈ m / 2 Can approximate median via binary search of range queries. ◮ Heavy Hitter Problem: Find a set S ⊂ [ n ] where { i : f i ≥ φ m } ⊆ S ⊆ { i : f i ≥ ( φ − ǫ ) m } Rather than checking each ˜ f i individually can save time by exploiting the fact that if ˜ f [ i , k ] < φ m then f j < φ m for all j ∈ [ i , k ]. 16/25

Count-Sketch: Count-Min with a Twist ◮ Maintain s ∈ N w via hash functions h : [ n ] → [ w ], r : [ n ] → {− 1 , 1 } f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6] ... -f[n] s[1] s[2] s[3] ... s[w] 18/25

Count-Sketch: Count-Min with a Twist ◮ Maintain s ∈ N w via hash functions h : [ n ] → [ w ], r : [ n ] → {− 1 , 1 } f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6] ... -f[n] s[1] s[2] s[3] ... s[w] ◮ Update: For each increment of f i , s h i ← s h i + r i . Hence, � s k = f j r j j : h j = k 18/25

Count-Sketch: Count-Min with a Twist ◮ Maintain s ∈ N w via hash functions h : [ n ] → [ w ], r : [ n ] → {− 1 , 1 } f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6] ... -f[n] s[1] s[2] s[3] ... s[w] ◮ Update: For each increment of f i , s h i ← s h i + r i . Hence, � s k = f j r j e.g., s 3 = f 6 − f 7 − f 13 j : h j = k 18/25

Count-Sketch: Count-Min with a Twist ◮ Maintain s ∈ N w via hash functions h : [ n ] → [ w ], r : [ n ] → {− 1 , 1 } f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6] ... -f[n] s[1] s[2] s[3] ... s[w] ◮ Update: For each increment of f i , s h i ← s h i + r i . Hence, � s k = f j r j e.g., s 3 = f 6 − f 7 − f 13 j : h j = k ◮ Query: Use ˜ f i = s h i r i to estimate f i . 18/25

Count-Sketch: Count-Min with a Twist ◮ Maintain s ∈ N w via hash functions h : [ n ] → [ w ], r : [ n ] → {− 1 , 1 } f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6] ... -f[n] s[1] s[2] s[3] ... s[w] ◮ Update: For each increment of f i , s h i ← s h i + r i . Hence, � s k = f j r j e.g., s 3 = f 6 − f 7 − f 13 j : h j = k ◮ Query: Use ˜ f i = s h i r i to estimate f i . � � � � ˜ ˜ ◮ Lemma: E f i = f i and V f i ≤ F 2 / w 18/25

Count-Sketch: Count-Min with a Twist ◮ Maintain s ∈ N w via hash functions h : [ n ] → [ w ], r : [ n ] → {− 1 , 1 } f[1] f[2] f[3] f[4] f[5] f[6] ... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6] ... -f[n] s[1] s[2] s[3] ... s[w] ◮ Update: For each increment of f i , s h i ← s h i + r i . Hence, � s k = f j r j e.g., s 3 = f 6 − f 7 − f 13 j : h j = k ◮ Query: Use ˜ f i = s h i r i to estimate f i . � � � � ˜ ˜ ◮ Lemma: E f i = f i and V f i ≤ F 2 / w ◮ Thm: Let w = O (1 /ǫ 2 ). Repeating O (lg δ − 1 ) in parallel and taking the median estimate ensures � � F 2 ≤ ˜ � � P f i − ǫ f i ≤ f i + ǫ F 2 ≥ 1 − δ . 18/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E r i and so � E = f j r j j � = i : h i = h j 19/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E r i and so � E = f j r j j � = i : h i = h j ◮ Expectation: Since E [ r j ] = 0, � E [ E ] = f j E [ r j ] = 0 j � = i : h i = h j 19/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E r i and so � E = f j r j j � = i : h i = h j ◮ Expectation: Since E [ r j ] = 0, � E [ E ] = f j E [ r j ] = 0 j � = i : h i = h j ◮ Variance: Similarly, V [ E ] 19/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E r i and so � E = f j r j j � = i : h i = h j ◮ Expectation: Since E [ r j ] = 0, � E [ E ] = f j E [ r j ] = 0 j � = i : h i = h j ◮ Variance: Similarly,   � f j r j ) 2 V [ E ] ≤ E  (  j � = i : h i = h j 19/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E r i and so � E = f j r j j � = i : h i = h j ◮ Expectation: Since E [ r j ] = 0, � E [ E ] = f j E [ r j ] = 0 j � = i : h i = h j ◮ Variance: Similarly,   � � f j r j ) 2 V [ E ] ≤ E  ( = f j f k E [ r j r k ] P [ h i = h j = h k ]  j � = i : h i = h j j , k � = i h i = h j = h k 19/25

Proof of Lemma ◮ Define E by ˜ f i = f i + E r i and so � E = f j r j j � = i : h i = h j ◮ Expectation: Since E [ r j ] = 0, � E [ E ] = f j E [ r j ] = 0 j � = i : h i = h j ◮ Variance: Similarly,   � � f j r j ) 2 V [ E ] ≤ E  ( = f j f k E [ r j r k ] P [ h i = h j = h k ]  j � = i : h i = h j j , k � = i h i = h j = h k � f 2 = j P [ h i = h j ] ≤ F 2 / w j � = i : h i = h j 19/25

ℓ p Sampling 21/25

ℓ p Sampling ◮ ℓ p Sampling: Return random values I ∈ [ n ] and R ∈ R where P [ I = i ] = (1 ± ǫ ) | f i | p and R = (1 ± ǫ ) f i F p 21/25

ℓ p Sampling ◮ ℓ p Sampling: Return random values I ∈ [ n ] and R ∈ R where P [ I = i ] = (1 ± ǫ ) | f i | p and R = (1 ± ǫ ) f i F p ◮ Applications: ◮ Will use ℓ 2 sampling to get optimal algorithm for F k , k > 2. ◮ Will use ℓ 0 sampling for processing graph streams. ◮ Many other stream problems can be solved via ℓ p sampling, e.g., duplicate finding, triangle counting, entropy estimation. 21/25

ℓ p Sampling ◮ ℓ p Sampling: Return random values I ∈ [ n ] and R ∈ R where P [ I = i ] = (1 ± ǫ ) | f i | p and R = (1 ± ǫ ) f i F p ◮ Applications: ◮ Will use ℓ 2 sampling to get optimal algorithm for F k , k > 2. ◮ Will use ℓ 0 sampling for processing graph streams. ◮ Many other stream problems can be solved via ℓ p sampling, e.g., duplicate finding, triangle counting, entropy estimation. ◮ Let’s see algorithm for p = 2. . . 21/25

ℓ 2 Sampling Algorithm 22/25

ℓ 2 Sampling Algorithm ◮ Weight f i by γ i = � 1 / u i where u i ∈ R [0 , 1] to form vector g : f = ( f 1 , f 2 , . . . , f n ) g = ( g 1 , g 2 , . . . , g n ) where g i = γ i f i 22/25

ℓ 2 Sampling Algorithm ◮ Weight f i by γ i = � 1 / u i where u i ∈ R [0 , 1] to form vector g : f = ( f 1 , f 2 , . . . , f n ) g = ( g 1 , g 2 , . . . , g n ) where g i = γ i f i ◮ Return ( i , f i ) if g 2 i ≥ t := F 2 ( f ) /ǫ 22/25

ℓ 2 Sampling Algorithm ◮ Weight f i by γ i = � 1 / u i where u i ∈ R [0 , 1] to form vector g : f = ( f 1 , f 2 , . . . , f n ) g = ( g 1 , g 2 , . . . , g n ) where g i = γ i f i ◮ Return ( i , f i ) if g 2 i ≥ t := F 2 ( f ) /ǫ ◮ Probability ( i , f i ) is returned: g 2 � � P i ≥ t 22/25

ℓ 2 Sampling Algorithm ◮ Weight f i by γ i = � 1 / u i where u i ∈ R [0 , 1] to form vector g : f = ( f 1 , f 2 , . . . , f n ) g = ( g 1 , g 2 , . . . , g n ) where g i = γ i f i ◮ Return ( i , f i ) if g 2 i ≥ t := F 2 ( f ) /ǫ ◮ Probability ( i , f i ) is returned: g 2 u i ≤ f 2 = f 2 � � � � P i ≥ t = P i / t i / t 22/25

ℓ 2 Sampling Algorithm ◮ Weight f i by γ i = � 1 / u i where u i ∈ R [0 , 1] to form vector g : f = ( f 1 , f 2 , . . . , f n ) g = ( g 1 , g 2 , . . . , g n ) where g i = γ i f i ◮ Return ( i , f i ) if g 2 i ≥ t := F 2 ( f ) /ǫ ◮ Probability ( i , f i ) is returned: g 2 u i ≤ f 2 = f 2 � � � � P i ≥ t = P i / t i / t ◮ Probability some value is returned is � i f 2 i / t = ǫ so repeating O ( ǫ − 1 log δ − 1 ) ensures a value is returned with probability 1 − δ . 22/25

ℓ 2 Sampling Algorithm ◮ Weight f i by γ i = � 1 / u i where u i ∈ R [0 , 1] to form vector g : f = ( f 1 , f 2 , . . . , f n ) g = ( g 1 , g 2 , . . . , g n ) where g i = γ i f i ◮ Return ( i , f i ) if g 2 i ≥ t := F 2 ( f ) /ǫ ◮ Probability ( i , f i ) is returned: g 2 u i ≤ f 2 = f 2 � � � � P i ≥ t = P i / t i / t ◮ Probability some value is returned is � i f 2 i / t = ǫ so repeating O ( ǫ − 1 log δ − 1 ) ensures a value is returned with probability 1 − δ . ◮ Lemma: Using a Count-Sketch of size O ( ǫ − 1 log 2 n ) ensures a (1 ± ǫ ) approximation of any g i that passes the threshold. 22/25

Proof of Lemma 23/25

Proof of Lemma ◮ Exercise: P [ F 2 ( g ) / F 2 ( f ) ≤ c log n ] ≥ 99 / 100 for some large c > 0 so we’ll condition on this event. 23/25

Proof of Lemma ◮ Exercise: P [ F 2 ( g ) / F 2 ( f ) ≤ c log n ] ≥ 99 / 100 for some large c > 0 so we’ll condition on this event. ◮ Set w = 9 c ǫ − 1 log n . Count-Sketch in O ( w log 2 n ) space ensures � ˜ g i = g i ± F 2 ( g ) / w 23/25

Proof of Lemma ◮ Exercise: P [ F 2 ( g ) / F 2 ( f ) ≤ c log n ] ≥ 99 / 100 for some large c > 0 so we’ll condition on this event. ◮ Set w = 9 c ǫ − 1 log n . Count-Sketch in O ( w log 2 n ) space ensures � ˜ g i = g i ± F 2 ( g ) / w g 2 ◮ Then ˜ i ≥ F 2 ( f ) /ǫ implies � � � g 2 F 2 ( g ) / w ≤ F 2 ( f ) / (9 ǫ − 1 ) ≤ ǫ ˜ i / (9 ǫ − 1 ) = ǫ ˜ g i / 3 g 2 i = (1 ± ǫ/ 3) 2 g 2 i = (1 ± ǫ ) g 2 and hence ˜ i as required. 23/25

Proof of Lemma ◮ Exercise: P [ F 2 ( g ) / F 2 ( f ) ≤ c log n ] ≥ 99 / 100 for some large c > 0 so we’ll condition on this event. ◮ Set w = 9 c ǫ − 1 log n . Count-Sketch in O ( w log 2 n ) space ensures � ˜ g i = g i ± F 2 ( g ) / w g 2 ◮ Then ˜ i ≥ F 2 ( f ) /ǫ implies � � � g 2 F 2 ( g ) / w ≤ F 2 ( f ) / (9 ǫ − 1 ) ≤ ǫ ˜ i / (9 ǫ − 1 ) = ǫ ˜ g i / 3 g 2 i = (1 ± ǫ/ 3) 2 g 2 i = (1 ± ǫ ) g 2 and hence ˜ i as required. ◮ Under-the-rug: Need to ensure that conditioning doesn’t affect sampling probability too much. 23/25

F k Revisited ◮ Earlier we used O ( n 1 − 1 / k ) space to approximate F k = � i | f i | k . 24/25

F k Revisited ◮ Earlier we used O ( n 1 − 1 / k ) space to approximate F k = � i | f i | k . ◮ Algorithm: Let ( I , R ) be an (1 + γ )-approximate ℓ 2 sample. Return T = ˜ where ˜ F 2 R k − 2 F 2 is a (1 ± γ ) approximation for F 2 24/25

F k Revisited ◮ Earlier we used O ( n 1 − 1 / k ) space to approximate F k = � i | f i | k . ◮ Algorithm: Let ( I , R ) be an (1 + γ )-approximate ℓ 2 sample. Return T = ˜ where ˜ F 2 R k − 2 F 2 is a (1 ± γ ) approximation for F 2 ◮ Expectation: Setting γ = ǫ/ (4 k ), E [ T ] 24/25

F k Revisited ◮ Earlier we used O ( n 1 − 1 / k ) space to approximate F k = � i | f i | k . ◮ Algorithm: Let ( I , R ) be an (1 + γ )-approximate ℓ 2 sample. Return T = ˜ where ˜ F 2 R k − 2 F 2 is a (1 ± γ ) approximation for F 2 ◮ Expectation: Setting γ = ǫ/ (4 k ), E [ T ] = ˜ � P [ I = i ] ((1 ± γ ) f i ) k − 2 F 2 24/25

F k Revisited ◮ Earlier we used O ( n 1 − 1 / k ) space to approximate F k = � i | f i | k . ◮ Algorithm: Let ( I , R ) be an (1 + γ )-approximate ℓ 2 sample. Return T = ˜ where ˜ F 2 R k − 2 F 2 is a (1 ± γ ) approximation for F 2 ◮ Expectation: Setting γ = ǫ/ (4 k ), � f 2 P [ I = i ] ((1 ± γ ) f i ) k − 2 = (1 ± γ ) k F 2 E [ T ] = ˜ � i f k − 2 F 2 i F 2 24/25

F k Revisited ◮ Earlier we used O ( n 1 − 1 / k ) space to approximate F k = � i | f i | k . ◮ Algorithm: Let ( I , R ) be an (1 + γ )-approximate ℓ 2 sample. Return T = ˜ where ˜ F 2 R k − 2 F 2 is a (1 ± γ ) approximation for F 2 ◮ Expectation: Setting γ = ǫ/ (4 k ), � f 2 = (1 ± ǫ P [ I = i ] ((1 ± γ ) f i ) k − 2 = (1 ± γ ) k F 2 E [ T ] = ˜ � i f k − 2 F 2 2) F k i F 2 24/25

F k Revisited ◮ Earlier we used O ( n 1 − 1 / k ) space to approximate F k = � i | f i | k . ◮ Algorithm: Let ( I , R ) be an (1 + γ )-approximate ℓ 2 sample. Return T = ˜ where ˜ F 2 R k − 2 F 2 is a (1 ± γ ) approximation for F 2 ◮ Expectation: Setting γ = ǫ/ (4 k ), � f 2 = (1 ± ǫ P [ I = i ] ((1 ± γ ) f i ) k − 2 = (1 ± γ ) k F 2 E [ T ] = ˜ � i f k − 2 F 2 2) F k i F 2 ◮ Range: 0 ≤ T ≤ (1 + γ ) F 2 F k − 2 = (1 + γ ) n 1 − 2 / k F k . ∞ 24/25

F k Revisited ◮ Earlier we used O ( n 1 − 1 / k ) space to approximate F k = � i | f i | k . ◮ Algorithm: Let ( I , R ) be an (1 + γ )-approximate ℓ 2 sample. Return T = ˜ where ˜ F 2 R k − 2 F 2 is a (1 ± γ ) approximation for F 2 ◮ Expectation: Setting γ = ǫ/ (4 k ), � f 2 = (1 ± ǫ P [ I = i ] ((1 ± γ ) f i ) k − 2 = (1 ± γ ) k F 2 E [ T ] = ˜ � i f k − 2 F 2 2) F k i F 2 ◮ Range: 0 ≤ T ≤ (1 + γ ) F 2 F k − 2 = (1 + γ ) n 1 − 2 / k F k . ∞ ◮ Averaging over t = O ( ǫ − 2 n 1 − 2 / k log δ − 1 ) parallel repetitions gives, � � | ˜ F k − F k | ≥ ǫ F k ≤ δ P 24/25

Data Streams & Communication Complexity Lecture 1: Simple Stream - PowerPoint PPT Presentation

Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n , e.g., x 1 , x 2 , . . . , x m = 3

Data Streams & Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Data Streams Many large sources of data are generated as streams of updates: IP Network

Data Streams Many large sources of data are generated as streams of updates: IP Network

Stream Bank Stabilization in Open Space Streams in open space There are approximately 35

CSE 143 Streams as C++ Classes Streams are C++ classes Streams have lots of built-in

Comparing Data Streams Using Hamming Norms Graham Cormode, Mayur Datar, Piotr Indyk, S.

Data Streams & Communication Complexity Lecture 2: Graph Spanners, Sparsifiers, & Sketches

A P A P A Proposal for Publishing Data A Proposal for Publishing Data l f l f P bli hi P bli

Communication Complexity BASICS Summer School 2015 Communication

SK Telecom 1 U U U U U U U- U - - communication - - - - - communication

Streams and File I/O Fundamentals of Computer Science Outline Overview of Streams and File

Scalable Machine Learning 3. Data Streams Alex Smola Yahoo! Research and ANU

Solving a Dirichlet problem for Poissons Equation on a disc is as hard as integration.

Ouroboros: a simple, secure and efficient key exchange protocol based on coding theory

Reflecting cones on boolean algebras David Milovich May 13, 2006 A poset P is is op -like

Multimodal Database of Negative Emotion Recovery in Dyadic Interactions: Construction and Analysis

Two-Weight Inequalities for Commutators with Calder on-Zygmund Operators Irina Holmes Joint

Vectors and Matrices Basilio Bona DAUIN Politecnico di Torino October 2013 B. Bona (DAUIN)

Computable dyadic subbases Arno Pauly and Hideki Tsuiki Second Workshop on Mathematical Logic and

Tense MV-algebras and related functions Jan Paseka Michal Botur Department of Mathematics and

Sambuz

Useful Links

Newsletter

Mail Us

Data Streams & Communication Complexity Lecture 1: Simple Stream - PowerPoint PPT Presentation

Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n , e.g., x 1 , x 2 , . . . , x m = 3

Data Streams &amp; Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

WITH C++ Prof. Amr Goneid AUC Part 9. Streams &amp; Files Prof. amr Goneid, AUC 1 Streams

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data &amp; Real Time Data Streams

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Data Streams Many large sources of data are generated as streams of updates: IP Network

Data Streams Many large sources of data are generated as streams of updates: IP Network

Stream Bank Stabilization in Open Space Streams in open space There are approximately 35

CSE 143 Streams as C++ Classes Streams are C++ classes Streams have lots of built-in

Comparing Data Streams Using Hamming Norms Graham Cormode, Mayur Datar, Piotr Indyk, S.

Data Streams &amp; Communication Complexity Lecture 2: Graph Spanners, Sparsifiers, &amp; Sketches

A P A P A Proposal for Publishing Data A Proposal for Publishing Data l f l f P bli hi P bli

Communication Complexity BASICS Summer School 2015 Communication

SK Telecom 1 U U U U U U U- U - - communication - - - - - communication

Streams and File I/O Fundamentals of Computer Science Outline Overview of Streams and File

Scalable Machine Learning 3. Data Streams Alex Smola Yahoo! Research and ANU

Solving a Dirichlet problem for Poissons Equation on a disc is as hard as integration.

Ouroboros: a simple, secure and efficient key exchange protocol based on coding theory

Reflecting cones on boolean algebras David Milovich May 13, 2006 A poset P is is op -like

Multimodal Database of Negative Emotion Recovery in Dyadic Interactions: Construction and Analysis

Two-Weight Inequalities for Commutators with Calder on-Zygmund Operators Irina Holmes Joint

Vectors and Matrices Basilio Bona DAUIN Politecnico di Torino October 2013 B. Bona (DAUIN)

Computable dyadic subbases Arno Pauly and Hideki Tsuiki Second Workshop on Mathematical Logic and

Tense MV-algebras and related functions Jan Paseka Michal Botur Department of Mathematics and

Sambuz

Useful Links

Newsletter

Mail Us

Data Streams & Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

Data Streams & Communication Complexity Lecture 2: Graph Spanners, Sparsifiers, & Sketches