Data Streams & Communication Complexity Lecture 1: Simple Stream - - PowerPoint PPT Presentation

data streams communication complexity
SMART_READER_LITE
LIVE PREVIEW

Data Streams & Communication Complexity Lecture 1: Simple Stream - - PowerPoint PPT Presentation

Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n , e.g., x 1 , x 2 , . . . , x m = 3


slide-1
SLIDE 1

Data Streams & Communication Complexity

Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst

1/25

slide-2
SLIDE 2

Data Stream Model

◮ Stream: m elements from universe of size n, e.g.,

x1, x2, . . . , xm = 3, 5, 3, 7, 5, 4, . . .

2/25

slide-3
SLIDE 3

Data Stream Model

◮ Stream: m elements from universe of size n, e.g.,

x1, x2, . . . , xm = 3, 5, 3, 7, 5, 4, . . .

◮ Goal: Compute some function of stream, e.g., number of distinct

elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, . . .

2/25

slide-4
SLIDE 4

Data Stream Model

◮ Stream: m elements from universe of size n, e.g.,

x1, x2, . . . , xm = 3, 5, 3, 7, 5, 4, . . .

◮ Goal: Compute some function of stream, e.g., number of distinct

elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, . . .

◮ Catch:

  • 1. Limited working memory, sublinear in n and m

2/25

slide-5
SLIDE 5

Data Stream Model

◮ Stream: m elements from universe of size n, e.g.,

x1, x2, . . . , xm = 3, 5, 3, 7, 5, 4, . . .

◮ Goal: Compute some function of stream, e.g., number of distinct

elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, . . .

◮ Catch:

  • 1. Limited working memory, sublinear in n and m
  • 2. Access data sequentially

2/25

slide-6
SLIDE 6

Data Stream Model

◮ Stream: m elements from universe of size n, e.g.,

x1, x2, . . . , xm = 3, 5, 3, 7, 5, 4, . . .

◮ Goal: Compute some function of stream, e.g., number of distinct

elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, . . .

◮ Catch:

  • 1. Limited working memory, sublinear in n and m
  • 2. Access data sequentially
  • 3. Process each element quickly

2/25

slide-7
SLIDE 7

Data Stream Model

◮ Stream: m elements from universe of size n, e.g.,

x1, x2, . . . , xm = 3, 5, 3, 7, 5, 4, . . .

◮ Goal: Compute some function of stream, e.g., number of distinct

elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties, . . .

◮ Catch:

  • 1. Limited working memory, sublinear in n and m
  • 2. Access data sequentially
  • 3. Process each element quickly

◮ Origins in seventies but has become popular in last ten years. . .

2/25

slide-8
SLIDE 8

Why’s it become popular?

◮ Practical Appeal:

◮ Faster networks, cheaper data storage, ubiquitous data-logging

results in massive amount of data to be processed.

◮ Applications to network monitoring, query planning, I/O efficiency

for massive data, sensor networks aggregation. . .

3/25

slide-9
SLIDE 9

Why’s it become popular?

◮ Practical Appeal:

◮ Faster networks, cheaper data storage, ubiquitous data-logging

results in massive amount of data to be processed.

◮ Applications to network monitoring, query planning, I/O efficiency

for massive data, sensor networks aggregation. . .

◮ Theoretical Appeal:

◮ Easy to state problems but hard to solve. ◮ Links to communication complexity, compressed sensing, metric

embeddings, pseudo-random generators, approximation. . .

3/25

slide-10
SLIDE 10

This Lecture: Basic Numerical Statistics

◮ Given a stream of m elements from universe [n] = {1, 2, . . . , n}, e.g.,

x1, x2, . . . , xm = 3, 5, 3, 7, 5, 4, . . . let f ∈ Nn be the frequency vector where fi is the frequency of i.

4/25

slide-11
SLIDE 11

This Lecture: Basic Numerical Statistics

◮ Given a stream of m elements from universe [n] = {1, 2, . . . , n}, e.g.,

x1, x2, . . . , xm = 3, 5, 3, 7, 5, 4, . . . let f ∈ Nn be the frequency vector where fi is the frequency of i.

◮ Problems: What can we approximate in sub linear space?

◮ Frequency moments: Fk =

i f k i .

◮ Max frequency: F∞ = maxi fi. ◮ Number of distinct element: F0 =

i f 0 i

◮ Median: j such that f1 + f2 + . . . + fj ≈ m/2

Algorithms are often randomized and guarantees will be probabilistic.

4/25

slide-12
SLIDE 12

This Lecture: Basic Numerical Statistics

◮ Given a stream of m elements from universe [n] = {1, 2, . . . , n}, e.g.,

x1, x2, . . . , xm = 3, 5, 3, 7, 5, 4, . . . let f ∈ Nn be the frequency vector where fi is the frequency of i.

◮ Problems: What can we approximate in sub linear space?

◮ Frequency moments: Fk =

i f k i .

◮ Max frequency: F∞ = maxi fi. ◮ Number of distinct element: F0 =

i f 0 i

◮ Median: j such that f1 + f2 + . . . + fj ≈ m/2

Algorithms are often randomized and guarantees will be probabilistic.

◮ Keep things simple: Could consider fi’s being increased or decreased

but for this talk we’ll focus on unit increments. Will also assume algorithms have an unlimited store of random bits.

4/25

slide-13
SLIDE 13

Outline

Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist ℓp Sampling and Frequency Moments

5/25

slide-14
SLIDE 14

Sampling and Statistics

◮ Sampling is a general technique for tackling massive amounts of data

6/25

slide-15
SLIDE 15

Sampling and Statistics

◮ Sampling is a general technique for tackling massive amounts of data ◮ Example: To find an ǫ-approximate median, i.e., j such that

f1 + f2 + . . . + fj = m/2 ± ǫm then sampling O(ǫ−2) stream elements and returning the sample median works with good probability.

6/25

slide-16
SLIDE 16

Sampling and Statistics

◮ Sampling is a general technique for tackling massive amounts of data ◮ Example: To find an ǫ-approximate median, i.e., j such that

f1 + f2 + . . . + fj = m/2 ± ǫm then sampling O(ǫ−2) stream elements and returning the sample median works with good probability.

◮ Beyond basic sampling: There are more powerful forms of sampling

and other techniques the make better use of the limited space.

6/25

slide-17
SLIDE 17

AMS Sampling

◮ Problem: Estimate i g(fi) for some function g with g(0) = 0

7/25

slide-18
SLIDE 18

AMS Sampling

◮ Problem: Estimate i g(fi) for some function g with g(0) = 0 ◮ Basic Estimator: Sample xJ where J ∈R [m] and compute

r = |{j ≥ J : xj = xJ}|

7/25

slide-19
SLIDE 19

AMS Sampling

◮ Problem: Estimate i g(fi) for some function g with g(0) = 0 ◮ Basic Estimator: Sample xJ where J ∈R [m] and compute

r = |{j ≥ J : xj = xJ}| Output X = m(g(r) − g(r − 1))

7/25

slide-20
SLIDE 20

AMS Sampling

◮ Problem: Estimate i g(fi) for some function g with g(0) = 0 ◮ Basic Estimator: Sample xJ where J ∈R [m] and compute

r = |{j ≥ J : xj = xJ}| Output X = m(g(r) − g(r − 1))

◮ Expectation:

E [X]

7/25

slide-21
SLIDE 21

AMS Sampling

◮ Problem: Estimate i g(fi) for some function g with g(0) = 0 ◮ Basic Estimator: Sample xJ where J ∈R [m] and compute

r = |{j ≥ J : xj = xJ}| Output X = m(g(r) − g(r − 1))

◮ Expectation:

E [X] =

  • i

P [xJ = i] E [X|xJ = i]

7/25

slide-22
SLIDE 22

AMS Sampling

◮ Problem: Estimate i g(fi) for some function g with g(0) = 0 ◮ Basic Estimator: Sample xJ where J ∈R [m] and compute

r = |{j ≥ J : xj = xJ}| Output X = m(g(r) − g(r − 1))

◮ Expectation:

E [X] =

  • i

P [xJ = i] E [X|xJ = i] =

  • i

fi m fi

  • r=1

m(g(r) − g(r − 1)) fi

  • 7/25
slide-23
SLIDE 23

AMS Sampling

◮ Problem: Estimate i g(fi) for some function g with g(0) = 0 ◮ Basic Estimator: Sample xJ where J ∈R [m] and compute

r = |{j ≥ J : xj = xJ}| Output X = m(g(r) − g(r − 1))

◮ Expectation:

E [X] =

  • i

P [xJ = i] E [X|xJ = i] =

  • i

fi m fi

  • r=1

m(g(r) − g(r − 1)) fi

  • =
  • i

g(fi)

7/25

slide-24
SLIDE 24

AMS Sampling

◮ Problem: Estimate i g(fi) for some function g with g(0) = 0 ◮ Basic Estimator: Sample xJ where J ∈R [m] and compute

r = |{j ≥ J : xj = xJ}| Output X = m(g(r) − g(r − 1))

◮ Expectation:

E [X] =

  • i

P [xJ = i] E [X|xJ = i] =

  • i

fi m fi

  • r=1

m(g(r) − g(r − 1)) fi

  • =
  • i

g(fi)

◮ For high confidence: Compute t estimators in parallel and average.

7/25

slide-25
SLIDE 25

Example: Frequency Moments

◮ Frequency Moments: Define Fk = i f k i

for k ∈ {1, 2, 3, . . .}

8/25

slide-26
SLIDE 26

Example: Frequency Moments

◮ Frequency Moments: Define Fk = i f k i

for k ∈ {1, 2, 3, . . .}

◮ Use AMS estimator with X = m(r k − (r − 1)k).

8/25

slide-27
SLIDE 27

Example: Frequency Moments

◮ Frequency Moments: Define Fk = i f k i

for k ∈ {1, 2, 3, . . .}

◮ Use AMS estimator with X = m(r k − (r − 1)k). ◮ Expectation: E [X] = Fk

8/25

slide-28
SLIDE 28

Example: Frequency Moments

◮ Frequency Moments: Define Fk = i f k i

for k ∈ {1, 2, 3, . . .}

◮ Use AMS estimator with X = m(r k − (r − 1)k). ◮ Expectation: E [X] = Fk ◮ Range: 0 ≤ X ≤ kmF k−1 ∞

≤ kn1−1/kFk

8/25

slide-29
SLIDE 29

Example: Frequency Moments

◮ Frequency Moments: Define Fk = i f k i

for k ∈ {1, 2, 3, . . .}

◮ Use AMS estimator with X = m(r k − (r − 1)k). ◮ Expectation: E [X] = Fk ◮ Range: 0 ≤ X ≤ kmF k−1 ∞

≤ kn1−1/kFk

◮ Repeat t times and let ˜

Fk be the average value. By Chernoff, P

Fk − Fk| ≥ ǫFk

  • ≤ 2 exp

tFkǫ2 3kn1−1/kFk

  • = 2 exp

tǫ2 3kn1−1/k

  • 8/25
slide-30
SLIDE 30

Example: Frequency Moments

◮ Frequency Moments: Define Fk = i f k i

for k ∈ {1, 2, 3, . . .}

◮ Use AMS estimator with X = m(r k − (r − 1)k). ◮ Expectation: E [X] = Fk ◮ Range: 0 ≤ X ≤ kmF k−1 ∞

≤ kn1−1/kFk

◮ Repeat t times and let ˜

Fk be the average value. By Chernoff, P

Fk − Fk| ≥ ǫFk

  • ≤ 2 exp

tFkǫ2 3kn1−1/kFk

  • = 2 exp

tǫ2 3kn1−1/k

  • ◮ If t = 3ǫ−2kn1−1/k log(2δ−1) then P

Fk − Fk| ≥ ǫFk

  • ≤ δ.

8/25

slide-31
SLIDE 31

Example: Frequency Moments

◮ Frequency Moments: Define Fk = i f k i

for k ∈ {1, 2, 3, . . .}

◮ Use AMS estimator with X = m(r k − (r − 1)k). ◮ Expectation: E [X] = Fk ◮ Range: 0 ≤ X ≤ kmF k−1 ∞

≤ kn1−1/kFk

◮ Repeat t times and let ˜

Fk be the average value. By Chernoff, P

Fk − Fk| ≥ ǫFk

  • ≤ 2 exp

tFkǫ2 3kn1−1/kFk

  • = 2 exp

tǫ2 3kn1−1/k

  • ◮ If t = 3ǫ−2kn1−1/k log(2δ−1) then P

Fk − Fk| ≥ ǫFk

  • ≤ δ.

◮ Thm: In ˜

O(ǫ−2n1−1/k) space we can find a (1 ± ǫ) approximation for Fk with probability at least 1 − δ.

8/25

slide-32
SLIDE 32

Outline

Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist ℓp Sampling and Frequency Moments

9/25

slide-33
SLIDE 33

Random Projections

◮ Many stream algorithms use a random projection Z ∈ Rw×n, w ≪ n

Z(f ) =    z1,1 . . . . . . z1,n . . . . . . zw,1 . . . . . . zw,n              f1 f2 . . . . . . fn           =    s1 . . . sw    = s

10/25

slide-34
SLIDE 34

Random Projections

◮ Many stream algorithms use a random projection Z ∈ Rw×n, w ≪ n

Z(f ) =    z1,1 . . . . . . z1,n . . . . . . zw,1 . . . . . . zw,n              f1 f2 . . . . . . fn           =    s1 . . . sw    = s

◮ Updatable: We can maintain sketch s in ˜

O(w) space since incrementing fi corresponds to

10/25

slide-35
SLIDE 35

Random Projections

◮ Many stream algorithms use a random projection Z ∈ Rw×n, w ≪ n

Z(f ) =    z1,1 . . . . . . z1,n . . . . . . zw,1 . . . . . . zw,n              f1 f2 . . . . . . fn           =    s1 . . . sw    = s

◮ Updatable: We can maintain sketch s in ˜

O(w) space since incrementing fi corresponds to s ← s +    z1,i . . . zw,i   

10/25

slide-36
SLIDE 36

Random Projections

◮ Many stream algorithms use a random projection Z ∈ Rw×n, w ≪ n

Z(f ) =    z1,1 . . . . . . z1,n . . . . . . zw,1 . . . . . . zw,n              f1 f2 . . . . . . fn           =    s1 . . . sw    = s

◮ Updatable: We can maintain sketch s in ˜

O(w) space since incrementing fi corresponds to s ← s +    z1,i . . . zw,i   

◮ Useful: Choose a distribution for zi,j such that relevant function of f

can be estimated from s with high probability for sufficiently large w.

10/25

slide-37
SLIDE 37

Examples

◮ If zi,j ∈R {−1, 1}, can estimate F2 with w = O(ǫ−2 log δ−1).

11/25

slide-38
SLIDE 38

Examples

◮ If zi,j ∈R {−1, 1}, can estimate F2 with w = O(ǫ−2 log δ−1). ◮ If zi,j ∼ D where D is p-stable p ∈ (0, 2], can estimate Fp with

w = O(ǫ−2 log δ−1). For example, 1 and 2 stable distributions are: Cauchy(x) = 1 π · 1 1 + x2 Gaussian(x) = 1 √ 2π · e−x2/2

11/25

slide-39
SLIDE 39

Examples

◮ If zi,j ∈R {−1, 1}, can estimate F2 with w = O(ǫ−2 log δ−1). ◮ If zi,j ∼ D where D is p-stable p ∈ (0, 2], can estimate Fp with

w = O(ǫ−2 log δ−1). For example, 1 and 2 stable distributions are: Cauchy(x) = 1 π · 1 1 + x2 Gaussian(x) = 1 √ 2π · e−x2/2

◮ Note that F0 = (1 ± ǫ)Fp if p = log(1 + ǫ)/ log m.

11/25

slide-40
SLIDE 40

Examples

◮ If zi,j ∈R {−1, 1}, can estimate F2 with w = O(ǫ−2 log δ−1). ◮ If zi,j ∼ D where D is p-stable p ∈ (0, 2], can estimate Fp with

w = O(ǫ−2 log δ−1). For example, 1 and 2 stable distributions are: Cauchy(x) = 1 π · 1 1 + x2 Gaussian(x) = 1 √ 2π · e−x2/2

◮ Note that F0 = (1 ± ǫ)Fp if p = log(1 + ǫ)/ log m. ◮ For the rest of lecture we’ll focus on “hash-based” sketches. Given a

random hash function h : [n] → [w], non-zero entries are zhi,i. Z =   1 1 1 1 1 1  

11/25

slide-41
SLIDE 41

Outline

Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist ℓp Sampling and Frequency Moments

12/25

slide-42
SLIDE 42

Count-Min Sketch

◮ Maintain vector s ∈ Nw via random hash function h : [n] → [w]

f[1] f[2] f[3] f[4] f[5] f[n] f[6] ... s[1] s[2] s[3] s[w] ...

13/25

slide-43
SLIDE 43

Count-Min Sketch

◮ Maintain vector s ∈ Nw via random hash function h : [n] → [w]

f[1] f[2] f[3] f[4] f[5] f[n] f[6] ... s[1] s[2] s[3] s[w] ...

◮ Update: For each increment of fi, increment shi. Hence,

sk =

  • j:hj=k

fj

13/25

slide-44
SLIDE 44

Count-Min Sketch

◮ Maintain vector s ∈ Nw via random hash function h : [n] → [w]

f[1] f[2] f[3] f[4] f[5] f[n] f[6] ... s[1] s[2] s[3] s[w] ...

◮ Update: For each increment of fi, increment shi. Hence,

sk =

  • j:hj=k

fj e.g., s3 = f6 + f7 + f13

13/25

slide-45
SLIDE 45

Count-Min Sketch

◮ Maintain vector s ∈ Nw via random hash function h : [n] → [w]

f[1] f[2] f[3] f[4] f[5] f[n] f[6] ... s[1] s[2] s[3] s[w] ...

◮ Update: For each increment of fi, increment shi. Hence,

sk =

  • j:hj=k

fj e.g., s3 = f6 + f7 + f13

◮ Query: Use ˜

fi = shi to estimate fi.

13/25

slide-46
SLIDE 46

Count-Min Sketch

◮ Maintain vector s ∈ Nw via random hash function h : [n] → [w]

f[1] f[2] f[3] f[4] f[5] f[n] f[6] ... s[1] s[2] s[3] s[w] ...

◮ Update: For each increment of fi, increment shi. Hence,

sk =

  • j:hj=k

fj e.g., s3 = f6 + f7 + f13

◮ Query: Use ˜

fi = shi to estimate fi.

◮ Lemma: fi ≤ ˜

fi and P

  • ˜

fi ≥ fi + 2m/w

  • ≤ 1/2

13/25

slide-47
SLIDE 47

Count-Min Sketch

◮ Maintain vector s ∈ Nw via random hash function h : [n] → [w]

f[1] f[2] f[3] f[4] f[5] f[n] f[6] ... s[1] s[2] s[3] s[w] ...

◮ Update: For each increment of fi, increment shi. Hence,

sk =

  • j:hj=k

fj e.g., s3 = f6 + f7 + f13

◮ Query: Use ˜

fi = shi to estimate fi.

◮ Lemma: fi ≤ ˜

fi and P

  • ˜

fi ≥ fi + 2m/w

  • ≤ 1/2

◮ Thm: Let w = 2/ǫ. Repeat the hashing lg(δ−1) times in parallel and

take the minimum estimate for fi P

  • fi ≤ ˜

fi ≤ fi + ǫm

  • ≥ 1 − δ

13/25

slide-48
SLIDE 48

Proof of Lemma

◮ Define E by ˜

fi = fi + E and so E =

  • j=i:hi=hj

fj

14/25

slide-49
SLIDE 49

Proof of Lemma

◮ Define E by ˜

fi = fi + E and so E =

  • j=i:hi=hj

fj

◮ Since all fj ≥ 0, we have E ≥ 0.

14/25

slide-50
SLIDE 50

Proof of Lemma

◮ Define E by ˜

fi = fi + E and so E =

  • j=i:hi=hj

fj

◮ Since all fj ≥ 0, we have E ≥ 0. ◮ Since P [hi = hj] = 1/w,

E [E] =

  • j=i

fj · P [hi = hj]

14/25

slide-51
SLIDE 51

Proof of Lemma

◮ Define E by ˜

fi = fi + E and so E =

  • j=i:hi=hj

fj

◮ Since all fj ≥ 0, we have E ≥ 0. ◮ Since P [hi = hj] = 1/w,

E [E] =

  • j=i

fj · P [hi = hj] ≤ m/w

14/25

slide-52
SLIDE 52

Proof of Lemma

◮ Define E by ˜

fi = fi + E and so E =

  • j=i:hi=hj

fj

◮ Since all fj ≥ 0, we have E ≥ 0. ◮ Since P [hi = hj] = 1/w,

E [E] =

  • j=i

fj · P [hi = hj] ≤ m/w

◮ By an application of the Markov bound,

P [E ≥ 2m/w] ≤ 1/2

14/25

slide-53
SLIDE 53

Range Queries

◮ Range Query: For i, j ∈ [n], estimate f[i,j] = fi + fi+1 + . . . + fj

15/25

slide-54
SLIDE 54

Range Queries

◮ Range Query: For i, j ∈ [n], estimate f[i,j] = fi + fi+1 + . . . + fj ◮ Dyadic Intervals: Restrict attention to intervals of the form

[1 + (i − 1)2j, i2j] where j ∈ {0, 1, . . . , lg n}, i ∈ {1, 2, . . . n/2j}

15/25

slide-55
SLIDE 55

Range Queries

◮ Range Query: For i, j ∈ [n], estimate f[i,j] = fi + fi+1 + . . . + fj ◮ Dyadic Intervals: Restrict attention to intervals of the form

[1 + (i − 1)2j, i2j] where j ∈ {0, 1, . . . , lg n}, i ∈ {1, 2, . . . n/2j} since any range can be partitioned as O(log n) such intervals. E.g., [48, 106] = [48, 48] ∪ [49, 64] ∪ [65, 96] ∪ [97, 104] ∪ [105, 106]

15/25

slide-56
SLIDE 56

Range Queries

◮ Range Query: For i, j ∈ [n], estimate f[i,j] = fi + fi+1 + . . . + fj ◮ Dyadic Intervals: Restrict attention to intervals of the form

[1 + (i − 1)2j, i2j] where j ∈ {0, 1, . . . , lg n}, i ∈ {1, 2, . . . n/2j} since any range can be partitioned as O(log n) such intervals. E.g., [48, 106] = [48, 48] ∪ [49, 64] ∪ [65, 96] ∪ [97, 104] ∪ [105, 106]

◮ To support dyadic intervals, construct Count-Min sketches

corresponding to intervals of width 1, 2, 4, 8, . . .

15/25

slide-57
SLIDE 57

Range Queries

◮ Range Query: For i, j ∈ [n], estimate f[i,j] = fi + fi+1 + . . . + fj ◮ Dyadic Intervals: Restrict attention to intervals of the form

[1 + (i − 1)2j, i2j] where j ∈ {0, 1, . . . , lg n}, i ∈ {1, 2, . . . n/2j} since any range can be partitioned as O(log n) such intervals. E.g., [48, 106] = [48, 48] ∪ [49, 64] ∪ [65, 96] ∪ [97, 104] ∪ [105, 106]

◮ To support dyadic intervals, construct Count-Min sketches

corresponding to intervals of width 1, 2, 4, 8, . . .

◮ E.g., for intervals of width 2 we have:

g[1] g[2] g[3] ... s[1] s[2] s[3] s[w] ... f[1] f[2] f[3] f[4] f[5] f[n] f[6] ... f[n-1] g[n/2]

where update rule is now: for increment of f2i−1 or f2i, increment shi.

15/25

slide-58
SLIDE 58

Quantiles and Heavy Hitters

16/25

slide-59
SLIDE 59

Quantiles and Heavy Hitters

◮ Quantiles: Find j such that

f1 + . . . + fj ≈ m/2

16/25

slide-60
SLIDE 60

Quantiles and Heavy Hitters

◮ Quantiles: Find j such that

f1 + . . . + fj ≈ m/2 Can approximate median via binary search of range queries.

16/25

slide-61
SLIDE 61

Quantiles and Heavy Hitters

◮ Quantiles: Find j such that

f1 + . . . + fj ≈ m/2 Can approximate median via binary search of range queries.

◮ Heavy Hitter Problem: Find a set S ⊂ [n] where

{i : fi ≥ φm} ⊆ S ⊆ {i : fi ≥ (φ − ǫ)m}

16/25

slide-62
SLIDE 62

Quantiles and Heavy Hitters

◮ Quantiles: Find j such that

f1 + . . . + fj ≈ m/2 Can approximate median via binary search of range queries.

◮ Heavy Hitter Problem: Find a set S ⊂ [n] where

{i : fi ≥ φm} ⊆ S ⊆ {i : fi ≥ (φ − ǫ)m} Rather than checking each ˜ fi individually can save time by exploiting the fact that if ˜ f[i,k] < φm then fj < φm for all j ∈ [i, k].

16/25

slide-63
SLIDE 63

Outline

Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist ℓp Sampling and Frequency Moments

17/25

slide-64
SLIDE 64

Count-Sketch: Count-Min with a Twist

◮ Maintain s ∈ Nw via hash functions h : [n] → [w], r : [n] → {−1, 1}

  • f[1]

f[2]

  • f[3]

f[4] f[5]

  • f[n]
  • f[6]

... s[1] s[2] s[3] s[w] ... f[1] f[2] f[3] f[4] f[5] f[n] f[6] ...

18/25

slide-65
SLIDE 65

Count-Sketch: Count-Min with a Twist

◮ Maintain s ∈ Nw via hash functions h : [n] → [w], r : [n] → {−1, 1}

  • f[1]

f[2]

  • f[3]

f[4] f[5]

  • f[n]
  • f[6]

... s[1] s[2] s[3] s[w] ... f[1] f[2] f[3] f[4] f[5] f[n] f[6] ...

◮ Update: For each increment of fi, shi ← shi + ri. Hence,

sk =

  • j:hj=k

fjrj

18/25

slide-66
SLIDE 66

Count-Sketch: Count-Min with a Twist

◮ Maintain s ∈ Nw via hash functions h : [n] → [w], r : [n] → {−1, 1}

  • f[1]

f[2]

  • f[3]

f[4] f[5]

  • f[n]
  • f[6]

... s[1] s[2] s[3] s[w] ... f[1] f[2] f[3] f[4] f[5] f[n] f[6] ...

◮ Update: For each increment of fi, shi ← shi + ri. Hence,

sk =

  • j:hj=k

fjrj e.g., s3 = f6 − f7 − f13

18/25

slide-67
SLIDE 67

Count-Sketch: Count-Min with a Twist

◮ Maintain s ∈ Nw via hash functions h : [n] → [w], r : [n] → {−1, 1}

  • f[1]

f[2]

  • f[3]

f[4] f[5]

  • f[n]
  • f[6]

... s[1] s[2] s[3] s[w] ... f[1] f[2] f[3] f[4] f[5] f[n] f[6] ...

◮ Update: For each increment of fi, shi ← shi + ri. Hence,

sk =

  • j:hj=k

fjrj e.g., s3 = f6 − f7 − f13

◮ Query: Use ˜

fi = shiri to estimate fi.

18/25

slide-68
SLIDE 68

Count-Sketch: Count-Min with a Twist

◮ Maintain s ∈ Nw via hash functions h : [n] → [w], r : [n] → {−1, 1}

  • f[1]

f[2]

  • f[3]

f[4] f[5]

  • f[n]
  • f[6]

... s[1] s[2] s[3] s[w] ... f[1] f[2] f[3] f[4] f[5] f[n] f[6] ...

◮ Update: For each increment of fi, shi ← shi + ri. Hence,

sk =

  • j:hj=k

fjrj e.g., s3 = f6 − f7 − f13

◮ Query: Use ˜

fi = shiri to estimate fi.

◮ Lemma: E

  • ˜

fi

  • = fi and V
  • ˜

fi

  • ≤ F2/w

18/25

slide-69
SLIDE 69

Count-Sketch: Count-Min with a Twist

◮ Maintain s ∈ Nw via hash functions h : [n] → [w], r : [n] → {−1, 1}

  • f[1]

f[2]

  • f[3]

f[4] f[5]

  • f[n]
  • f[6]

... s[1] s[2] s[3] s[w] ... f[1] f[2] f[3] f[4] f[5] f[n] f[6] ...

◮ Update: For each increment of fi, shi ← shi + ri. Hence,

sk =

  • j:hj=k

fjrj e.g., s3 = f6 − f7 − f13

◮ Query: Use ˜

fi = shiri to estimate fi.

◮ Lemma: E

  • ˜

fi

  • = fi and V
  • ˜

fi

  • ≤ F2/w

◮ Thm: Let w = O(1/ǫ2). Repeating O(lg δ−1) in parallel and taking

the median estimate ensures P

  • fi − ǫ
  • F2 ≤ ˜

fi ≤ fi + ǫ

  • F2
  • ≥ 1 − δ .

18/25

slide-70
SLIDE 70

Proof of Lemma

◮ Define E by ˜

fi = fi + Eri and so E =

  • j=i:hi=hj

fjrj

19/25

slide-71
SLIDE 71

Proof of Lemma

◮ Define E by ˜

fi = fi + Eri and so E =

  • j=i:hi=hj

fjrj

◮ Expectation: Since E [rj] = 0,

E [E] =

  • j=i:hi=hj

fjE [rj] = 0

19/25

slide-72
SLIDE 72

Proof of Lemma

◮ Define E by ˜

fi = fi + Eri and so E =

  • j=i:hi=hj

fjrj

◮ Expectation: Since E [rj] = 0,

E [E] =

  • j=i:hi=hj

fjE [rj] = 0

◮ Variance: Similarly,

V [E]

19/25

slide-73
SLIDE 73

Proof of Lemma

◮ Define E by ˜

fi = fi + Eri and so E =

  • j=i:hi=hj

fjrj

◮ Expectation: Since E [rj] = 0,

E [E] =

  • j=i:hi=hj

fjE [rj] = 0

◮ Variance: Similarly,

V [E] ≤ E  (

  • j=i:hi=hj

fjrj)2  

19/25

slide-74
SLIDE 74

Proof of Lemma

◮ Define E by ˜

fi = fi + Eri and so E =

  • j=i:hi=hj

fjrj

◮ Expectation: Since E [rj] = 0,

E [E] =

  • j=i:hi=hj

fjE [rj] = 0

◮ Variance: Similarly,

V [E] ≤ E  (

  • j=i:hi=hj

fjrj)2   =

  • j,k=i

hi=hj=hk

fjfkE [rjrk] P [hi = hj = hk]

19/25

slide-75
SLIDE 75

Proof of Lemma

◮ Define E by ˜

fi = fi + Eri and so E =

  • j=i:hi=hj

fjrj

◮ Expectation: Since E [rj] = 0,

E [E] =

  • j=i:hi=hj

fjE [rj] = 0

◮ Variance: Similarly,

V [E] ≤ E  (

  • j=i:hi=hj

fjrj)2   =

  • j,k=i

hi=hj=hk

fjfkE [rjrk] P [hi = hj = hk] =

  • j=i:hi=hj

f 2

j P [hi = hj] ≤ F2/w

19/25

slide-76
SLIDE 76

Outline

Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist ℓp Sampling and Frequency Moments

20/25

slide-77
SLIDE 77

ℓp Sampling

21/25

slide-78
SLIDE 78

ℓp Sampling

◮ ℓp Sampling: Return random values I ∈ [n] and R ∈ R where

P [I = i] = (1 ± ǫ)|fi|p Fp and R = (1 ± ǫ)fi

21/25

slide-79
SLIDE 79

ℓp Sampling

◮ ℓp Sampling: Return random values I ∈ [n] and R ∈ R where

P [I = i] = (1 ± ǫ)|fi|p Fp and R = (1 ± ǫ)fi

◮ Applications:

◮ Will use ℓ2 sampling to get optimal algorithm for Fk, k > 2. ◮ Will use ℓ0 sampling for processing graph streams. ◮ Many other stream problems can be solved via ℓp sampling, e.g.,

duplicate finding, triangle counting, entropy estimation.

21/25

slide-80
SLIDE 80

ℓp Sampling

◮ ℓp Sampling: Return random values I ∈ [n] and R ∈ R where

P [I = i] = (1 ± ǫ)|fi|p Fp and R = (1 ± ǫ)fi

◮ Applications:

◮ Will use ℓ2 sampling to get optimal algorithm for Fk, k > 2. ◮ Will use ℓ0 sampling for processing graph streams. ◮ Many other stream problems can be solved via ℓp sampling, e.g.,

duplicate finding, triangle counting, entropy estimation.

◮ Let’s see algorithm for p = 2. . .

21/25

slide-81
SLIDE 81

ℓ2 Sampling Algorithm

22/25

slide-82
SLIDE 82

ℓ2 Sampling Algorithm

◮ Weight fi by γi =

  • 1/ui where ui ∈R [0, 1] to form vector g:

f = (f1, f2, . . . , fn) g = (g1, g2, . . . , gn) where gi = γifi

22/25

slide-83
SLIDE 83

ℓ2 Sampling Algorithm

◮ Weight fi by γi =

  • 1/ui where ui ∈R [0, 1] to form vector g:

f = (f1, f2, . . . , fn) g = (g1, g2, . . . , gn) where gi = γifi

◮ Return (i, fi) if g 2 i ≥ t := F2(f )/ǫ

22/25

slide-84
SLIDE 84

ℓ2 Sampling Algorithm

◮ Weight fi by γi =

  • 1/ui where ui ∈R [0, 1] to form vector g:

f = (f1, f2, . . . , fn) g = (g1, g2, . . . , gn) where gi = γifi

◮ Return (i, fi) if g 2 i ≥ t := F2(f )/ǫ ◮ Probability (i, fi) is returned:

P

  • g 2

i ≥ t

  • 22/25
slide-85
SLIDE 85

ℓ2 Sampling Algorithm

◮ Weight fi by γi =

  • 1/ui where ui ∈R [0, 1] to form vector g:

f = (f1, f2, . . . , fn) g = (g1, g2, . . . , gn) where gi = γifi

◮ Return (i, fi) if g 2 i ≥ t := F2(f )/ǫ ◮ Probability (i, fi) is returned:

P

  • g 2

i ≥ t

  • = P
  • ui ≤ f 2

i /t

  • = f 2

i /t

22/25

slide-86
SLIDE 86

ℓ2 Sampling Algorithm

◮ Weight fi by γi =

  • 1/ui where ui ∈R [0, 1] to form vector g:

f = (f1, f2, . . . , fn) g = (g1, g2, . . . , gn) where gi = γifi

◮ Return (i, fi) if g 2 i ≥ t := F2(f )/ǫ ◮ Probability (i, fi) is returned:

P

  • g 2

i ≥ t

  • = P
  • ui ≤ f 2

i /t

  • = f 2

i /t ◮ Probability some value is returned is i f 2 i /t = ǫ so repeating

O(ǫ−1 log δ−1) ensures a value is returned with probability 1 − δ.

22/25

slide-87
SLIDE 87

ℓ2 Sampling Algorithm

◮ Weight fi by γi =

  • 1/ui where ui ∈R [0, 1] to form vector g:

f = (f1, f2, . . . , fn) g = (g1, g2, . . . , gn) where gi = γifi

◮ Return (i, fi) if g 2 i ≥ t := F2(f )/ǫ ◮ Probability (i, fi) is returned:

P

  • g 2

i ≥ t

  • = P
  • ui ≤ f 2

i /t

  • = f 2

i /t ◮ Probability some value is returned is i f 2 i /t = ǫ so repeating

O(ǫ−1 log δ−1) ensures a value is returned with probability 1 − δ.

◮ Lemma: Using a Count-Sketch of size O(ǫ−1 log2 n) ensures a

(1 ± ǫ) approximation of any gi that passes the threshold.

22/25

slide-88
SLIDE 88

Proof of Lemma

23/25

slide-89
SLIDE 89

Proof of Lemma

◮ Exercise: P [F2(g)/F2(f ) ≤ c log n] ≥ 99/100 for some large c > 0

so we’ll condition on this event.

23/25

slide-90
SLIDE 90

Proof of Lemma

◮ Exercise: P [F2(g)/F2(f ) ≤ c log n] ≥ 99/100 for some large c > 0

so we’ll condition on this event.

◮ Set w = 9cǫ−1 log n. Count-Sketch in O(w log2 n) space ensures

˜ gi = gi ±

  • F2(g)/w

23/25

slide-91
SLIDE 91

Proof of Lemma

◮ Exercise: P [F2(g)/F2(f ) ≤ c log n] ≥ 99/100 for some large c > 0

so we’ll condition on this event.

◮ Set w = 9cǫ−1 log n. Count-Sketch in O(w log2 n) space ensures

˜ gi = gi ±

  • F2(g)/w

◮ Then ˜

g 2

i ≥ F2(f )/ǫ implies

  • F2(g)/w ≤
  • F2(f )/(9ǫ−1) ≤
  • ǫ˜

g 2

i /(9ǫ−1) = ǫ˜

gi/3 and hence ˜ g 2

i = (1 ± ǫ/3)2g 2 i = (1 ± ǫ)g 2 i as required.

23/25

slide-92
SLIDE 92

Proof of Lemma

◮ Exercise: P [F2(g)/F2(f ) ≤ c log n] ≥ 99/100 for some large c > 0

so we’ll condition on this event.

◮ Set w = 9cǫ−1 log n. Count-Sketch in O(w log2 n) space ensures

˜ gi = gi ±

  • F2(g)/w

◮ Then ˜

g 2

i ≥ F2(f )/ǫ implies

  • F2(g)/w ≤
  • F2(f )/(9ǫ−1) ≤
  • ǫ˜

g 2

i /(9ǫ−1) = ǫ˜

gi/3 and hence ˜ g 2

i = (1 ± ǫ/3)2g 2 i = (1 ± ǫ)g 2 i as required. ◮ Under-the-rug: Need to ensure that conditioning doesn’t affect

sampling probability too much.

23/25

slide-93
SLIDE 93

Fk Revisited

◮ Earlier we used O(n1−1/k) space to approximate Fk = i |fi|k.

24/25

slide-94
SLIDE 94

Fk Revisited

◮ Earlier we used O(n1−1/k) space to approximate Fk = i |fi|k. ◮ Algorithm: Let (I, R) be an (1 + γ)-approximate ℓ2 sample. Return

T = ˜ F2Rk−2 where ˜ F2 is a (1 ± γ) approximation for F2

24/25

slide-95
SLIDE 95

Fk Revisited

◮ Earlier we used O(n1−1/k) space to approximate Fk = i |fi|k. ◮ Algorithm: Let (I, R) be an (1 + γ)-approximate ℓ2 sample. Return

T = ˜ F2Rk−2 where ˜ F2 is a (1 ± γ) approximation for F2

◮ Expectation: Setting γ = ǫ/(4k),

E [T]

24/25

slide-96
SLIDE 96

Fk Revisited

◮ Earlier we used O(n1−1/k) space to approximate Fk = i |fi|k. ◮ Algorithm: Let (I, R) be an (1 + γ)-approximate ℓ2 sample. Return

T = ˜ F2Rk−2 where ˜ F2 is a (1 ± γ) approximation for F2

◮ Expectation: Setting γ = ǫ/(4k),

E [T] = ˜ F2

  • P [I = i] ((1±γ)fi)k−2

24/25

slide-97
SLIDE 97

Fk Revisited

◮ Earlier we used O(n1−1/k) space to approximate Fk = i |fi|k. ◮ Algorithm: Let (I, R) be an (1 + γ)-approximate ℓ2 sample. Return

T = ˜ F2Rk−2 where ˜ F2 is a (1 ± γ) approximation for F2

◮ Expectation: Setting γ = ǫ/(4k),

E [T] = ˜ F2

  • P [I = i] ((1±γ)fi)k−2 = (1±γ)kF2

f 2

i

F2 f k−2

i

24/25

slide-98
SLIDE 98

Fk Revisited

◮ Earlier we used O(n1−1/k) space to approximate Fk = i |fi|k. ◮ Algorithm: Let (I, R) be an (1 + γ)-approximate ℓ2 sample. Return

T = ˜ F2Rk−2 where ˜ F2 is a (1 ± γ) approximation for F2

◮ Expectation: Setting γ = ǫ/(4k),

E [T] = ˜ F2

  • P [I = i] ((1±γ)fi)k−2 = (1±γ)kF2

f 2

i

F2 f k−2

i

= (1± ǫ 2)Fk

24/25

slide-99
SLIDE 99

Fk Revisited

◮ Earlier we used O(n1−1/k) space to approximate Fk = i |fi|k. ◮ Algorithm: Let (I, R) be an (1 + γ)-approximate ℓ2 sample. Return

T = ˜ F2Rk−2 where ˜ F2 is a (1 ± γ) approximation for F2

◮ Expectation: Setting γ = ǫ/(4k),

E [T] = ˜ F2

  • P [I = i] ((1±γ)fi)k−2 = (1±γ)kF2

f 2

i

F2 f k−2

i

= (1± ǫ 2)Fk

◮ Range: 0 ≤ T ≤ (1 + γ)F2F k−2 ∞

= (1 + γ)n1−2/kFk.

24/25

slide-100
SLIDE 100

Fk Revisited

◮ Earlier we used O(n1−1/k) space to approximate Fk = i |fi|k. ◮ Algorithm: Let (I, R) be an (1 + γ)-approximate ℓ2 sample. Return

T = ˜ F2Rk−2 where ˜ F2 is a (1 ± γ) approximation for F2

◮ Expectation: Setting γ = ǫ/(4k),

E [T] = ˜ F2

  • P [I = i] ((1±γ)fi)k−2 = (1±γ)kF2

f 2

i

F2 f k−2

i

= (1± ǫ 2)Fk

◮ Range: 0 ≤ T ≤ (1 + γ)F2F k−2 ∞

= (1 + γ)n1−2/kFk.

◮ Averaging over t = O(ǫ−2n1−2/k log δ−1) parallel repetitions gives,

P

Fk − Fk| ≥ ǫFk

  • ≤ δ

24/25

slide-101
SLIDE 101

Fk Revisited

◮ Earlier we used O(n1−1/k) space to approximate Fk = i |fi|k. ◮ Algorithm: Let (I, R) be an (1 + γ)-approximate ℓ2 sample. Return

T = ˜ F2Rk−2 where ˜ F2 is a (1 ± γ) approximation for F2

◮ Expectation: Setting γ = ǫ/(4k),

E [T] = ˜ F2

  • P [I = i] ((1±γ)fi)k−2 = (1±γ)kF2

f 2

i

F2 f k−2

i

= (1± ǫ 2)Fk

◮ Range: 0 ≤ T ≤ (1 + γ)F2F k−2 ∞

= (1 + γ)n1−2/kFk.

◮ Averaging over t = O(ǫ−2n1−2/k log δ−1) parallel repetitions gives,

P

Fk − Fk| ≥ ǫFk

  • ≤ δ

◮ Thm: In ˜

O(ǫ−2n1−2/k) space we can find a (1 ± ǫ) approximation for Fk with probability at least 1 − δ.

24/25

slide-102
SLIDE 102

Summary

◮ Basic Sampling: Can do basic sampling where i is selected with

probability ∝ fi but we can be much smarter via sketches.

◮ Count-Min: fi ≤ ˜

fi ≤ fi + ǫF1 in O(ǫ−1) space. Z =   1 1 1 1 1 1  

◮ Count-Sketch: fi − ǫ√F2 ≤ ˜

fi ≤ fi + ǫ√F2 in O(ǫ−2) space. Z =   1 −1 −1 −1 1 1   Above sketches solve range-queries, quantiles, heavy hitters, . . .

◮ ℓp-Sampling: Selecting i with probability ∝ f p i

in O(ǫ−2) space. Z =   γ2 −γ6 −γ4 −γ1 γ3 γ5  

25/25