Algorithms for Big Data (IV) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

algorithms for big data iv
SMART_READER_LITE
LIVE PREVIEW

Algorithms for Big Data (IV) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

Algorithms for Big Data (IV) Chihao Zhang Shanghai Jiao Tong University Oct. 11, 2019 Algorithms for Big Data (IV) 1/19 We are given a sequence of numbers a a m where each a i It defines a frequency vector f f n where f i Review of the Last


slide-1
SLIDE 1

Algorithms for Big Data (IV)

Chihao Zhang

Shanghai Jiao Tong University

  • Oct. 11, 2019

Algorithms for Big Data (IV) 1/19

slide-2
SLIDE 2

Review of the Last Lecture

Last time, we introduced AMS algorithm for counting distinct elements in the streaming model. We are given a sequence of numbers a am where each ai n . It defines a frequency vector f f fn where fi k m ak i . We want to compute the number d i n fi .

Algorithms for Big Data (IV) 2/19

slide-3
SLIDE 3

Review of the Last Lecture

Last time, we introduced AMS algorithm for counting distinct elements in the streaming model. We are given a sequence of numbers a am where each ai n . It defines a frequency vector f f fn where fi k m ak i . We want to compute the number d i n fi .

Algorithms for Big Data (IV) 2/19

slide-4
SLIDE 4

Review of the Last Lecture

Last time, we introduced AMS algorithm for counting distinct elements in the streaming model. We are given a sequence of numbers ⟨a1, . . . , am⟩ where each ai ∈ [n]. It defines a frequency vector f f fn where fi k m ak i . We want to compute the number d i n fi .

Algorithms for Big Data (IV) 2/19

slide-5
SLIDE 5

Review of the Last Lecture

Last time, we introduced AMS algorithm for counting distinct elements in the streaming model. We are given a sequence of numbers ⟨a1, . . . , am⟩ where each ai ∈ [n]. It defines a frequency vector f = (f1, . . . , fn) where fi =

  • {

k ∈ [m] : ak = i } . We want to compute the number d i n fi .

Algorithms for Big Data (IV) 2/19

slide-6
SLIDE 6

Review of the Last Lecture

Last time, we introduced AMS algorithm for counting distinct elements in the streaming model. We are given a sequence of numbers ⟨a1, . . . , am⟩ where each ai ∈ [n]. It defines a frequency vector f = (f1, . . . , fn) where fi =

  • {

k ∈ [m] : ak = i } . We want to compute the number d =

  • {

i ∈ [n] : fi > 0 } .

Algorithms for Big Data (IV) 2/19

slide-7
SLIDE 7

Review of the Last Lecture

Last time, we introduced AMS algorithm for counting distinct elements in the streaming model. We are given a sequence of numbers ⟨a1, . . . , am⟩ where each ai ∈ [n]. It defines a frequency vector f = (f1, . . . , fn) where fi =

  • {

k ∈ [m] : ak = i } . We want to compute the number d =

  • {

i ∈ [n] : fi > 0 } .

Algorithms for Big Data (IV) 2/19

slide-8
SLIDE 8

Algorithm AMS Algorithm for Counting Distinct Elements Init: A random Hash function h : [n] → [n] from a 2-universal family Z ← 0 On Input y: if zero(h(y)) > Z then Z ← zero(h(y)) end if Output:

  • d = 2Z+ 1

2 .

Algorithms for Big Data (IV) 3/19

slide-9
SLIDE 9

Using O(log 1

δ log n) bits of memory, we can obtain

Pr [d 3 ≤ d ≤ 3d ] ≥ 1 − δ. We also introduced the BJKST algorithm, a refinement of the AMS algorithm. We will show today that the BJKST algorithm can produce d which is a approximation of d for any .

Algorithms for Big Data (IV) 4/19

slide-10
SLIDE 10

Using O(log 1

δ log n) bits of memory, we can obtain

Pr [d 3 ≤ d ≤ 3d ] ≥ 1 − δ. We also introduced the BJKST algorithm, a refinement of the AMS algorithm. We will show today that the BJKST algorithm can produce d which is a approximation of d for any .

Algorithms for Big Data (IV) 4/19

slide-11
SLIDE 11

Using O(log 1

δ log n) bits of memory, we can obtain

Pr [d 3 ≤ d ≤ 3d ] ≥ 1 − δ. We also introduced the BJKST algorithm, a refinement of the AMS algorithm. We will show today that the BJKST algorithm can produce d which is a 1 ± ε approximation of d for any ε > 0.

Algorithms for Big Data (IV) 4/19

slide-12
SLIDE 12

The BJKST Algorithm

The following refinement is due to Bar-Yossef, Jayram, Kumar, Sivakumar and Trevisan. Algorithm BJKST Algorithm for Counting Distinct Elements Init: Random Hash functions h : [n] → [n], g : [n] → [bε−4 log2 n], both from 2- universal families; Z ← 0, B ← On Input y: if zero(h(y)) ≥ Z then B ← B ∪ { (g(y), zeros(h(y))) } while |B| ≥ c/ε2 do Z ← Z + 1 Remove all (α, β) with β < Z from B end while end if Output: d = |B| 2Z

Algorithms for Big Data (IV) 5/19

slide-13
SLIDE 13

The algorithm maintains a bucket B, which stores those y whose zeros(h(y)) is larger than the current Z. We set a cap L

c for the size of B:

if L , B stores all entries, and the algorithm is exact; if L , the algorithm is equivalent to AMS. Therefore, the size of B is a trade-ofg between the memory consumption and the accuracy of the algorithm.

Algorithms for Big Data (IV) 6/19

slide-14
SLIDE 14

The algorithm maintains a bucket B, which stores those y whose zeros(h(y)) is larger than the current Z. We set a cap L =

c ε2 for the size of B:

if L , B stores all entries, and the algorithm is exact; if L , the algorithm is equivalent to AMS. Therefore, the size of B is a trade-ofg between the memory consumption and the accuracy of the algorithm.

Algorithms for Big Data (IV) 6/19

slide-15
SLIDE 15

The algorithm maintains a bucket B, which stores those y whose zeros(h(y)) is larger than the current Z. We set a cap L =

c ε2 for the size of B:

▶ if L = ∞, B stores all entries, and the algorithm is exact; ▶ if L = 2, the algorithm is equivalent to AMS. Therefore, the size of B is a trade-ofg between the memory consumption and the accuracy of the algorithm.

Algorithms for Big Data (IV) 6/19

slide-16
SLIDE 16

The algorithm maintains a bucket B, which stores those y whose zeros(h(y)) is larger than the current Z. We set a cap L =

c ε2 for the size of B:

▶ if L = ∞, B stores all entries, and the algorithm is exact; ▶ if L = 2, the algorithm is equivalent to AMS. Therefore, the size of B is a trade-ofg between the memory consumption and the accuracy of the algorithm.

Algorithms for Big Data (IV) 6/19

slide-17
SLIDE 17

Analysis

To analyze the algorithm, we first assume that g is simply the identity function from n to n , namely g y y for all y n . We need to store the whole B, whose size is O . Similar to AMS, for every k n , Xk r is the indicator that h k has at least r trailing zeros. Define Yr

k n fk

Xk r as the number of h ai with trailing zero at least r. We already know from the last lecture that E Yr

d

r and Var Yr

d

r .

Algorithms for Big Data (IV) 7/19

slide-18
SLIDE 18

Analysis

To analyze the algorithm, we first assume that g is simply the identity function from [n] to [n], namely g(y) = y for all y ∈ [n]. We need to store the whole B, whose size is O . Similar to AMS, for every k n , Xk r is the indicator that h k has at least r trailing zeros. Define Yr

k n fk

Xk r as the number of h ai with trailing zero at least r. We already know from the last lecture that E Yr

d

r and Var Yr

d

r .

Algorithms for Big Data (IV) 7/19

slide-19
SLIDE 19

Analysis

To analyze the algorithm, we first assume that g is simply the identity function from [n] to [n], namely g(y) = y for all y ∈ [n]. We need to store the whole B, whose size is O (ε−2). Similar to AMS, for every k n , Xk r is the indicator that h k has at least r trailing zeros. Define Yr

k n fk

Xk r as the number of h ai with trailing zero at least r. We already know from the last lecture that E Yr

d

r and Var Yr

d

r .

Algorithms for Big Data (IV) 7/19

slide-20
SLIDE 20

Analysis

To analyze the algorithm, we first assume that g is simply the identity function from [n] to [n], namely g(y) = y for all y ∈ [n]. We need to store the whole B, whose size is O (ε−2). Similar to AMS, for every k ∈ [n], Xk,r is the indicator that h(k) has at least r trailing zeros. Define Yr

k n fk

Xk r as the number of h ai with trailing zero at least r. We already know from the last lecture that E Yr

d

r and Var Yr

d

r .

Algorithms for Big Data (IV) 7/19

slide-21
SLIDE 21

Analysis

To analyze the algorithm, we first assume that g is simply the identity function from [n] to [n], namely g(y) = y for all y ∈ [n]. We need to store the whole B, whose size is O (ε−2). Similar to AMS, for every k ∈ [n], Xk,r is the indicator that h(k) has at least r trailing zeros. Define Yr = ∑

k∈[n]:fk>0 Xk,r as the number of h(ai) with trailing zero at least r.

We already know from the last lecture that E Yr

d

r and Var Yr

d

r .

Algorithms for Big Data (IV) 7/19

slide-22
SLIDE 22

Analysis

To analyze the algorithm, we first assume that g is simply the identity function from [n] to [n], namely g(y) = y for all y ∈ [n]. We need to store the whole B, whose size is O (ε−2). Similar to AMS, for every k ∈ [n], Xk,r is the indicator that h(k) has at least r trailing zeros. Define Yr = ∑

k∈[n]:fk>0 Xk,r as the number of h(ai) with trailing zero at least r.

We already know from the last lecture that E [Yr] = d

2r and Var [Yr] ≤ d 2r .

Algorithms for Big Data (IV) 7/19

slide-23
SLIDE 23

If Z = t at the end of the algorithm, then Yt = |B| and d = Yt2t. We use A to denote the bad event that Yt t d d, or equivalently Yt d

t

d

t

We will bound the probability of A using the following argument if t is small, then E Yt

d

t is large, so we can apply concentration inequalities;

the value t is unlikely to be very large. We let s be the threshold for small/large value mentioned above.

Algorithms for Big Data (IV) 8/19

slide-24
SLIDE 24

If Z = t at the end of the algorithm, then Yt = |B| and d = Yt2t. We use A to denote the bad event that

  • Yt2t − d
  • ≥ εd, or equivalently
  • Yt − d

2t

  • ≥ εd

2t . We will bound the probability of A using the following argument if t is small, then E Yt

d

t is large, so we can apply concentration inequalities;

the value t is unlikely to be very large. We let s be the threshold for small/large value mentioned above.

Algorithms for Big Data (IV) 8/19

slide-25
SLIDE 25

If Z = t at the end of the algorithm, then Yt = |B| and d = Yt2t. We use A to denote the bad event that

  • Yt2t − d
  • ≥ εd, or equivalently
  • Yt − d

2t

  • ≥ εd

2t . We will bound the probability of A using the following argument ▶ if t is small, then E [Yt] = d

2t is large, so we can apply concentration inequalities;

the value t is unlikely to be very large. We let s be the threshold for small/large value mentioned above.

Algorithms for Big Data (IV) 8/19

slide-26
SLIDE 26

If Z = t at the end of the algorithm, then Yt = |B| and d = Yt2t. We use A to denote the bad event that

  • Yt2t − d
  • ≥ εd, or equivalently
  • Yt − d

2t

  • ≥ εd

2t . We will bound the probability of A using the following argument ▶ if t is small, then E [Yt] = d

2t is large, so we can apply concentration inequalities;

▶ the value t is unlikely to be very large. We let s be the threshold for small/large value mentioned above.

Algorithms for Big Data (IV) 8/19

slide-27
SLIDE 27

If Z = t at the end of the algorithm, then Yt = |B| and d = Yt2t. We use A to denote the bad event that

  • Yt2t − d
  • ≥ εd, or equivalently
  • Yt − d

2t

  • ≥ εd

2t . We will bound the probability of A using the following argument ▶ if t is small, then E [Yt] = d

2t is large, so we can apply concentration inequalities;

▶ the value t is unlikely to be very large. We let s be the threshold for small/large value mentioned above.

Algorithms for Big Data (IV) 8/19

slide-28
SLIDE 28

Pr [A] =

log n

r=1

Pr [

  • Yr − d

2r

  • ≥ εd

2r ∧ t = r ] ≤

s−1

r=1

Pr [

  • Yr − d

2r

  • ≥ εd

2r ] +

log n

r=s

Pr [t = r] =

s−1

r=1

Pr [|Yr − E [Yr]| ≥ εd/2r] + Pr [ Ys−1 ≥ c/ε2] ≤

s−1

r=1

2r ε2d + ε2d c2s−1 ≤ 2s ε2d + ε2d c2s−1 . So if we choose s such that d

2s = Θ (ε−2), Pr [A] can be bounded by any constant

(depending on c).

Algorithms for Big Data (IV) 9/19

slide-29
SLIDE 29

Pr [A] =

log n

r=1

Pr [

  • Yr − d

2r

  • ≥ εd

2r ∧ t = r ] ≤

s−1

r=1

Pr [

  • Yr − d

2r

  • ≥ εd

2r ] +

log n

r=s

Pr [t = r] =

s−1

r=1

Pr [|Yr − E [Yr]| ≥ εd/2r] + Pr [ Ys−1 ≥ c/ε2] ≤

s−1

r=1

2r ε2d + ε2d c2s−1 ≤ 2s ε2d + ε2d c2s−1 . So if we choose s such that d

2s = Θ (ε−2), Pr [A] can be bounded by any constant

(depending on c).

Algorithms for Big Data (IV) 9/19

slide-30
SLIDE 30

Space Complexity

We need to store the function h: O log n ; the function g: O log n ; the bucket B: O

c

log ran g O

c log n .

The botuleneck is to store B. Instead of using identity function g, we can tolerate collisions (with at most constant probability). This helps to reduce the memory needed (Exercise).

Algorithms for Big Data (IV) 10/19

slide-31
SLIDE 31

Space Complexity

We need to store ▶ the function h: O (log n); ▶ the function g: O (log n); ▶ the bucket B: O (

c ε2 · log ran(g)

) = O (

c ε2 log n

) . The botuleneck is to store B. Instead of using identity function g, we can tolerate collisions (with at most constant probability). This helps to reduce the memory needed (Exercise).

Algorithms for Big Data (IV) 10/19

slide-32
SLIDE 32

Space Complexity

We need to store ▶ the function h: O (log n); ▶ the function g: O (log n); ▶ the bucket B: O (

c ε2 · log ran(g)

) = O (

c ε2 log n

) . The botuleneck is to store B. Instead of using identity function g, we can tolerate collisions (with at most constant probability). This helps to reduce the memory needed (Exercise).

Algorithms for Big Data (IV) 10/19

slide-33
SLIDE 33

Space Complexity

We need to store ▶ the function h: O (log n); ▶ the function g: O (log n); ▶ the bucket B: O (

c ε2 · log ran(g)

) = O (

c ε2 log n

) . The botuleneck is to store B. Instead of using identity function g, we can tolerate collisions (with at most constant probability). This helps to reduce the memory needed (Exercise).

Algorithms for Big Data (IV) 10/19

slide-34
SLIDE 34

Space Complexity

We need to store ▶ the function h: O (log n); ▶ the function g: O (log n); ▶ the bucket B: O (

c ε2 · log ran(g)

) = O (

c ε2 log n

) . The botuleneck is to store B. Instead of using identity function g, we can tolerate collisions (with at most constant probability). This helps to reduce the memory needed (Exercise).

Algorithms for Big Data (IV) 10/19

slide-35
SLIDE 35

Freqency Estimation

Consider a stream of numbers a am and its frequency vector f f fn . Another fundamental problem is to estimate fa for each query a n . It is closely related to the Frequency problem which asks for the set j fj m k . We now describe a deterministic algorithm for Frequency-Estimation.

Algorithms for Big Data (IV) 11/19

slide-36
SLIDE 36

Freqency Estimation

Consider a stream of numbers ⟨a1, . . . , am⟩ and its frequency vector f = (f1, . . . , fn). Another fundamental problem is to estimate fa for each query a n . It is closely related to the Frequency problem which asks for the set j fj m k . We now describe a deterministic algorithm for Frequency-Estimation.

Algorithms for Big Data (IV) 11/19

slide-37
SLIDE 37

Freqency Estimation

Consider a stream of numbers ⟨a1, . . . , am⟩ and its frequency vector f = (f1, . . . , fn). Another fundamental problem is to estimate fa for each query a ∈ [n]. It is closely related to the Frequency problem which asks for the set j fj m k . We now describe a deterministic algorithm for Frequency-Estimation.

Algorithms for Big Data (IV) 11/19

slide-38
SLIDE 38

Freqency Estimation

Consider a stream of numbers ⟨a1, . . . , am⟩ and its frequency vector f = (f1, . . . , fn). Another fundamental problem is to estimate fa for each query a ∈ [n]. It is closely related to the Frequency problem which asks for the set { j : fj > m/k } . We now describe a deterministic algorithm for Frequency-Estimation.

Algorithms for Big Data (IV) 11/19

slide-39
SLIDE 39

Freqency Estimation

Consider a stream of numbers ⟨a1, . . . , am⟩ and its frequency vector f = (f1, . . . , fn). Another fundamental problem is to estimate fa for each query a ∈ [n]. It is closely related to the Frequency problem which asks for the set { j : fj > m/k } . We now describe a deterministic algorithm for Frequency-Estimation.

Algorithms for Big Data (IV) 11/19

slide-40
SLIDE 40

Misra-Gries

Algorithm Misra-Gries Algorithm for Frequency-Estimation Init: A table A On Input y: if y keys A then A y A y else if keys A k then A j else for all keys A do A A if A then Remove from A end if end for end if

Algorithms for Big Data (IV) 12/19

slide-41
SLIDE 41

Misra-Gries

Algorithm Misra-Gries Algorithm for Frequency-Estimation Init: A table A On Input y: if y ∈ keys(A) then A[y] ← A[y] + 1 else if

  • keys(A)
  • ≤ k − 1 then A[j] ← 1

else for all ℓ ∈ keys(A) do A[ℓ] ← A[ℓ] − 1 if A[ℓ] = 0 then Remove ℓ from A end if end for end if

Algorithms for Big Data (IV) 12/19

slide-42
SLIDE 42

Algorithm Misra-Gries (cont’d)) Output: On query j, if j ∈ keys(A) then

  • fj = A[j]

else

  • fj = 0

end if

Algorithms for Big Data (IV) 13/19

slide-43
SLIDE 43

Analysis

The algorithm uses O k log m log n bits of memory. It is not hard to see that for each j n , the output fj satisfies fj m k fj fj If fj m k, then j is in the table A. The reverse is not correct!

Algorithms for Big Data (IV) 14/19

slide-44
SLIDE 44

Analysis

The algorithm uses O (k (log m + log n)) bits of memory. It is not hard to see that for each j n , the output fj satisfies fj m k fj fj If fj m k, then j is in the table A. The reverse is not correct!

Algorithms for Big Data (IV) 14/19

slide-45
SLIDE 45

Analysis

The algorithm uses O (k (log m + log n)) bits of memory. It is not hard to see that for each j ∈ [n], the output fj satisfies fj − m k ≤ fj ≤ fj. If fj m k, then j is in the table A. The reverse is not correct!

Algorithms for Big Data (IV) 14/19

slide-46
SLIDE 46

Analysis

The algorithm uses O (k (log m + log n)) bits of memory. It is not hard to see that for each j ∈ [n], the output fj satisfies fj − m k ≤ fj ≤ fj. If fj > m/k, then j is in the table A. The reverse is not correct!

Algorithms for Big Data (IV) 14/19

slide-47
SLIDE 47

Analysis

The algorithm uses O (k (log m + log n)) bits of memory. It is not hard to see that for each j ∈ [n], the output fj satisfies fj − m k ≤ fj ≤ fj. If fj > m/k, then j is in the table A. The reverse is not correct!

Algorithms for Big Data (IV) 14/19

slide-48
SLIDE 48

In Misra-Gries, we compute a table A The table A stores information about the stream, so we can extract frequency from it. However, Misra-Gries sufgers from the following main drawbacks: given two tables A and A with respect to and respectively, we don’t know how to obtain the table for (algorithms with this property are called sketches); it does not extend to the turnstile model. In the turnstile model, each entry of the stream is a pair aj

j .

Upon receiving aj

j , we update faj to faj j.

Algorithms for Big Data (IV) 15/19

slide-49
SLIDE 49

In Misra-Gries, we compute a table A The table A stores information about the stream, so we can extract frequency from it. However, Misra-Gries sufgers from the following main drawbacks: given two tables A and A with respect to and respectively, we don’t know how to obtain the table for (algorithms with this property are called sketches); it does not extend to the turnstile model. In the turnstile model, each entry of the stream is a pair aj

j .

Upon receiving aj

j , we update faj to faj j.

Algorithms for Big Data (IV) 15/19

slide-50
SLIDE 50

In Misra-Gries, we compute a table A The table A stores information about the stream, so we can extract frequency from it. However, Misra-Gries sufgers from the following main drawbacks: given two tables A and A with respect to and respectively, we don’t know how to obtain the table for (algorithms with this property are called sketches); it does not extend to the turnstile model. In the turnstile model, each entry of the stream is a pair aj

j .

Upon receiving aj

j , we update faj to faj j.

Algorithms for Big Data (IV) 15/19

slide-51
SLIDE 51

In Misra-Gries, we compute a table A The table A stores information about the stream, so we can extract frequency from it. However, Misra-Gries sufgers from the following main drawbacks: ▶ given two tables A1 and A2 with respect to σ1 and σ2 respectively, we don’t know how to obtain the table for σ1 ◦ σ2 (algorithms with this property are called sketches); it does not extend to the turnstile model. In the turnstile model, each entry of the stream is a pair aj

j .

Upon receiving aj

j , we update faj to faj j.

Algorithms for Big Data (IV) 15/19

slide-52
SLIDE 52

In Misra-Gries, we compute a table A The table A stores information about the stream, so we can extract frequency from it. However, Misra-Gries sufgers from the following main drawbacks: ▶ given two tables A1 and A2 with respect to σ1 and σ2 respectively, we don’t know how to obtain the table for σ1 ◦ σ2 (algorithms with this property are called sketches); ▶ it does not extend to the turnstile model. In the turnstile model, each entry of the stream is a pair aj

j .

Upon receiving aj

j , we update faj to faj j.

Algorithms for Big Data (IV) 15/19

slide-53
SLIDE 53

In Misra-Gries, we compute a table A The table A stores information about the stream, so we can extract frequency from it. However, Misra-Gries sufgers from the following main drawbacks: ▶ given two tables A1 and A2 with respect to σ1 and σ2 respectively, we don’t know how to obtain the table for σ1 ◦ σ2 (algorithms with this property are called sketches); ▶ it does not extend to the turnstile model. In the turnstile model, each entry of the stream is a pair (aj, ∆j). Upon receiving aj

j , we update faj to faj j.

Algorithms for Big Data (IV) 15/19

slide-54
SLIDE 54

In Misra-Gries, we compute a table A The table A stores information about the stream, so we can extract frequency from it. However, Misra-Gries sufgers from the following main drawbacks: ▶ given two tables A1 and A2 with respect to σ1 and σ2 respectively, we don’t know how to obtain the table for σ1 ◦ σ2 (algorithms with this property are called sketches); ▶ it does not extend to the turnstile model. In the turnstile model, each entry of the stream is a pair (aj, ∆j). Upon receiving (aj, ∆j), we update faj to faj + ∆j.

Algorithms for Big Data (IV) 15/19

slide-55
SLIDE 55

Count Sketch

Algorithm Count Sketch Init: An array C j for j k where k . A random Hash function h n k from a -universal family. A random Hash function g n from a -universal family. On Input y : C h y C h y g y Output: On query a: Output fa g a C h a .

Algorithms for Big Data (IV) 16/19

slide-56
SLIDE 56

Count Sketch

Algorithm Count Sketch Init: An array C[j] for j ∈ [k] where k = 3

ε2 .

A random Hash function h : [n] → [k] from a 2-universal family. A random Hash function g : [n] → {−1, 1} from a 2-universal family. On Input (y, ∆): C[h(y)] ← C[h(y)] + ∆ · g(y) Output: On query a: Output fa = g(a) · C[h(a)].

Algorithms for Big Data (IV) 16/19

slide-57
SLIDE 57

Analysis

Let X fa be the output on the query a. For every j n , let Yj be the indicator of h j h a . X g a

n j

fj g j Yj We have E X E g a g a fa Ya

j n a

g a fj g j Yj fa Let Z

j n a fj g a

g j Yj, then X fa Z and Var X Var Z .

Algorithms for Big Data (IV) 17/19

slide-58
SLIDE 58

Analysis

Let X = fa be the output on the query a. For every j n , let Yj be the indicator of h j h a . X g a

n j

fj g j Yj We have E X E g a g a fa Ya

j n a

g a fj g j Yj fa Let Z

j n a fj g a

g j Yj, then X fa Z and Var X Var Z .

Algorithms for Big Data (IV) 17/19

slide-59
SLIDE 59

Analysis

Let X = fa be the output on the query a. For every j ∈ [n], let Yj be the indicator of h(j) = h(a). X g a

n j

fj g j Yj We have E X E g a g a fa Ya

j n a

g a fj g j Yj fa Let Z

j n a fj g a

g j Yj, then X fa Z and Var X Var Z .

Algorithms for Big Data (IV) 17/19

slide-60
SLIDE 60

Analysis

Let X = fa be the output on the query a. For every j ∈ [n], let Yj be the indicator of h(j) = h(a). X = g(a) ·

n

j=1

fj · g(j) · Yj. We have E X E g a g a fa Ya

j n a

g a fj g j Yj fa Let Z

j n a fj g a

g j Yj, then X fa Z and Var X Var Z .

Algorithms for Big Data (IV) 17/19

slide-61
SLIDE 61

Analysis

Let X = fa be the output on the query a. For every j ∈ [n], let Yj be the indicator of h(j) = h(a). X = g(a) ·

n

j=1

fj · g(j) · Yj. We have E [X] = E       g(a) · g(a) · fa · Ya + ∑

j∈[n]\{a}

g(a) · fj · g(j) · Yj       = fa. Let Z

j n a fj g a

g j Yj, then X fa Z and Var X Var Z .

Algorithms for Big Data (IV) 17/19

slide-62
SLIDE 62

Analysis

Let X = fa be the output on the query a. For every j ∈ [n], let Yj be the indicator of h(j) = h(a). X = g(a) ·

n

j=1

fj · g(j) · Yj. We have E [X] = E       g(a) · g(a) · fa · Ya + ∑

j∈[n]\{a}

g(a) · fj · g(j) · Yj       = fa. Let Z ≜ ∑

j∈[n]\{a} fj · g(a) · g(j) · Yj, then X = fa + Z and Var [X] = Var [Z].

Algorithms for Big Data (IV) 17/19

slide-63
SLIDE 63

Analysis

Let X = fa be the output on the query a. For every j ∈ [n], let Yj be the indicator of h(j) = h(a). X = g(a) ·

n

j=1

fj · g(j) · Yj. We have E [X] = E       g(a) · g(a) · fa · Ya + ∑

j∈[n]\{a}

g(a) · fj · g(j) · Yj       = fa. Let Z ≜ ∑

j∈[n]\{a} fj · g(a) · g(j) · Yj, then X = fa + Z and Var [X] = Var [Z].

Algorithms for Big Data (IV) 17/19

slide-64
SLIDE 64

E [ Z2] = E       ∑

j∈[n]\{a}

fj · g(a) · g(j)Yj       = E       ∑

j∈[n]\{a}

f2

j · Y2 j +

j,j′∈[n]\{a}:jj;

fj · fj′ · g(j) · g(j′) · Yj · Yj′       = E       ∑

j∈[n]\{a}

f2

j · Y2 j

      = ∑

j∈[n]\{a}

f2

j · E

[ Y2

j

] Note that for every j a, E Yj E Yj Pr h j h a k Therefore E Z

j n a fj

k f k

Algorithms for Big Data (IV) 18/19

slide-65
SLIDE 65

E [ Z2] = E       ∑

j∈[n]\{a}

fj · g(a) · g(j)Yj       = E       ∑

j∈[n]\{a}

f2

j · Y2 j +

j,j′∈[n]\{a}:jj;

fj · fj′ · g(j) · g(j′) · Yj · Yj′       = E       ∑

j∈[n]\{a}

f2

j · Y2 j

      = ∑

j∈[n]\{a}

f2

j · E

[ Y2

j

] Note that for every j a, E [ Y2

j

] = E [ Yj ] = Pr [h(j) = h(a)] = 1 k. Therefore E Z

j n a fj

k f k

Algorithms for Big Data (IV) 18/19

slide-66
SLIDE 66

E [ Z2] = E       ∑

j∈[n]\{a}

fj · g(a) · g(j)Yj       = E       ∑

j∈[n]\{a}

f2

j · Y2 j +

j,j′∈[n]\{a}:jj;

fj · fj′ · g(j) · g(j′) · Yj · Yj′       = E       ∑

j∈[n]\{a}

f2

j · Y2 j

      = ∑

j∈[n]\{a}

f2

j · E

[ Y2

j

] Note that for every j a, E [ Y2

j

] = E [ Yj ] = Pr [h(j) = h(a)] = 1 k. Therefore E [ Z2] = ∑

j∈[n]\{a} f2 j

k ≤ ∥f∥2

2

k .

Algorithms for Big Data (IV) 18/19

slide-67
SLIDE 67

Var [X] = Var [Z] = E [ Z2] − (E [Z])2 ≤ ∥f∥2

2

k . By Chebyshev, Pr fa fa f k We can then use Median trick to boost the algorithm so that Pr fa fa f ; it costs O log log m log n bits of memeory. Compare the performance (in terms of accuracy and space consumption) of Misra-Gries and Count Sketch (Exercise).

Algorithms for Big Data (IV) 19/19

slide-68
SLIDE 68

Var [X] = Var [Z] = E [ Z2] − (E [Z])2 ≤ ∥f∥2

2

k . By Chebyshev, Pr [

  • fa − fa
  • ≥ ε∥f∥2

] ≤ 1 kε2 = 1 3. We can then use Median trick to boost the algorithm so that Pr fa fa f ; it costs O log log m log n bits of memeory. Compare the performance (in terms of accuracy and space consumption) of Misra-Gries and Count Sketch (Exercise).

Algorithms for Big Data (IV) 19/19

slide-69
SLIDE 69

Var [X] = Var [Z] = E [ Z2] − (E [Z])2 ≤ ∥f∥2

2

k . By Chebyshev, Pr [

  • fa − fa
  • ≥ ε∥f∥2

] ≤ 1 kε2 = 1 3. We can then use Median trick to boost the algorithm so that ▶ Pr [

  • fa − fa
  • ≥ ε∥f∥2

] ≤ δ; ▶ it costs O (

1 ε2 log 1 δ (log m + log n)

) bits of memeory. Compare the performance (in terms of accuracy and space consumption) of Misra-Gries and Count Sketch (Exercise).

Algorithms for Big Data (IV) 19/19

slide-70
SLIDE 70

Var [X] = Var [Z] = E [ Z2] − (E [Z])2 ≤ ∥f∥2

2

k . By Chebyshev, Pr [

  • fa − fa
  • ≥ ε∥f∥2

] ≤ 1 kε2 = 1 3. We can then use Median trick to boost the algorithm so that ▶ Pr [

  • fa − fa
  • ≥ ε∥f∥2

] ≤ δ; ▶ it costs O (

1 ε2 log 1 δ (log m + log n)

) bits of memeory. Compare the performance (in terms of accuracy and space consumption) of Misra-Gries and Count Sketch (Exercise).

Algorithms for Big Data (IV) 19/19