Algorithms for Big Data (III) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

algorithms for big data iii
SMART_READER_LITE
LIVE PREVIEW

Algorithms for Big Data (III) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

Algorithms for Big Data (III) Chihao Zhang Shanghai Jiao Tong University Sept. 29, 2019 Algorithms for Big Data (III) 1/16 We introduced the notion of universal families of Hash functions. Review of the Last Lecture Last time, we proved a


slide-1
SLIDE 1

Algorithms for Big Data (III)

Chihao Zhang

Shanghai Jiao Tong University

  • Sept. 29, 2019

Algorithms for Big Data (III) 1/16

slide-2
SLIDE 2

Review of the Last Lecture

Last time, we proved a few useful concentration inequalities. We introduced the notion of universal families of Hash functions. We constructed a -universal universal family of Hash functions.

Algorithms for Big Data (III) 2/16

slide-3
SLIDE 3

Review of the Last Lecture

Last time, we proved a few useful concentration inequalities. We introduced the notion of universal families of Hash functions. We constructed a -universal universal family of Hash functions.

Algorithms for Big Data (III) 2/16

slide-4
SLIDE 4

Review of the Last Lecture

Last time, we proved a few useful concentration inequalities. We introduced the notion of universal families of Hash functions. We constructed a -universal universal family of Hash functions.

Algorithms for Big Data (III) 2/16

slide-5
SLIDE 5

Review of the Last Lecture

Last time, we proved a few useful concentration inequalities. We introduced the notion of universal families of Hash functions. We constructed a 2-universal universal family of Hash functions.

Algorithms for Big Data (III) 2/16

slide-6
SLIDE 6

Review: the construction

Let m, n be two integer and p ≥ m be a prime. The family ha b a p b p where each ha b m n is defined as ha b x ax b mod p mod n We proved that for every x y, Prh h x h y n So is a -universal Hash function family.

Algorithms for Big Data (III) 3/16

slide-7
SLIDE 7

Review: the construction

Let m, n be two integer and p ≥ m be a prime. The family H = {ha,b : 1 ≤ a ≤ p − 1, 0 ≤ b ≤ p − 1} , where each ha,b : [m] → [n] is defined as ha,b(x) = (ax + b mod p) mod n. We proved that for every x y, Prh h x h y n So is a -universal Hash function family.

Algorithms for Big Data (III) 3/16

slide-8
SLIDE 8

Review: the construction

Let m, n be two integer and p ≥ m be a prime. The family H = {ha,b : 1 ≤ a ≤ p − 1, 0 ≤ b ≤ p − 1} , where each ha,b : [m] → [n] is defined as ha,b(x) = (ax + b mod p) mod n. We proved that for every x y, Prh∈H [h(x) = h(y)] ≤ 1 n. So is a -universal Hash function family.

Algorithms for Big Data (III) 3/16

slide-9
SLIDE 9

Review: the construction

Let m, n be two integer and p ≥ m be a prime. The family H = {ha,b : 1 ≤ a ≤ p − 1, 0 ≤ b ≤ p − 1} , where each ha,b : [m] → [n] is defined as ha,b(x) = (ax + b mod p) mod n. We proved that for every x y, Prh∈H [h(x) = h(y)] ≤ 1 n. So H is a 2-universal Hash function family.

Algorithms for Big Data (III) 3/16

slide-10
SLIDE 10

Strongly 2-Universal Hash Family

Recall that if we further require that for any u v, Prh h x u h y v n then is called strongly -universal family of Hash functions. When m n p are primes, the we can modify the previously constructed to get a strong -universal family. In this case, we have ha b x ax b mod p a b p

Algorithms for Big Data (III) 4/16

slide-11
SLIDE 11

Strongly 2-Universal Hash Family

Recall that if we further require that for any u, v, Prh∈H [h(x) = u ∧ h(y) = v] = 1 n2, then H is called strongly 2-universal family of Hash functions. When m n p are primes, the we can modify the previously constructed to get a strong -universal family. In this case, we have ha b x ax b mod p a b p

Algorithms for Big Data (III) 4/16

slide-12
SLIDE 12

Strongly 2-Universal Hash Family

Recall that if we further require that for any u, v, Prh∈H [h(x) = u ∧ h(y) = v] = 1 n2, then H is called strongly 2-universal family of Hash functions. When m = n = p are primes, the we can modify the previously constructed H to get a strong 2-universal family. In this case, we have ha b x ax b mod p a b p

Algorithms for Big Data (III) 4/16

slide-13
SLIDE 13

Strongly 2-Universal Hash Family

Recall that if we further require that for any u, v, Prh∈H [h(x) = u ∧ h(y) = v] = 1 n2, then H is called strongly 2-universal family of Hash functions. When m = n = p are primes, the we can modify the previously constructed H to get a strong 2-universal family. In this case, we have H = {ha,b(x) = ax + b mod p : 0 ≤ a, b ≤ p − 1} .

Algorithms for Big Data (III) 4/16

slide-14
SLIDE 14

Proof

Lemma

The equation ax b mod p has unique solution (in

p) if a

and p is a prime. The equations ha b x y and ha b x y are equivalent to ax b y mod p ax b y mod p They have a unique solution a y y x x mod p b y ax mod p Therefore, Prha b H ha b x y ha b x y p

Algorithms for Big Data (III) 5/16

slide-15
SLIDE 15

Proof

Lemma

The equation ax + b = 0 mod p has unique solution (in Fp) if a 0 and p is a prime. The equations ha b x y and ha b x y are equivalent to ax b y mod p ax b y mod p They have a unique solution a y y x x mod p b y ax mod p Therefore, Prha b H ha b x y ha b x y p

Algorithms for Big Data (III) 5/16

slide-16
SLIDE 16

Proof

Lemma

The equation ax + b = 0 mod p has unique solution (in Fp) if a 0 and p is a prime. The equations ha,b(x1) = y1 and ha,b(x2) = y2 are equivalent to ax1 + b = y1 mod p, ax2 + b = y2 mod p. They have a unique solution a y y x x mod p b y ax mod p Therefore, Prha b H ha b x y ha b x y p

Algorithms for Big Data (III) 5/16

slide-17
SLIDE 17

Proof

Lemma

The equation ax + b = 0 mod p has unique solution (in Fp) if a 0 and p is a prime. The equations ha,b(x1) = y1 and ha,b(x2) = y2 are equivalent to ax1 + b = y1 mod p, ax2 + b = y2 mod p. They have a unique solution a = y2 − y1 x2 − x1 mod p, b = y1 − ax1 mod p. Therefore, Prha b H ha b x y ha b x y p

Algorithms for Big Data (III) 5/16

slide-18
SLIDE 18

Proof

Lemma

The equation ax + b = 0 mod p has unique solution (in Fp) if a 0 and p is a prime. The equations ha,b(x1) = y1 and ha,b(x2) = y2 are equivalent to ax1 + b = y1 mod p, ax2 + b = y2 mod p. They have a unique solution a = y2 − y1 x2 − x1 mod p, b = y1 − ax1 mod p. Therefore, Prha,b∈H [ha,b(x1) = y1 ∧ ha,b(x2) = y2] = 1 p2 .

Algorithms for Big Data (III) 5/16

slide-19
SLIDE 19

The General Case

The Hash family we just constructed has the restriction that m n We can naturally generalize m p to m pk. Write every number x in base p: x x x p x p xk pk For every a a a ak , with ai p and b p , define ha b x

k i

aixi b mod p Then ha b ai p b p

Algorithms for Big Data (III) 6/16

slide-20
SLIDE 20

The General Case

The Hash family we just constructed has the restriction that m = n We can naturally generalize m p to m pk. Write every number x in base p: x x x p x p xk pk For every a a a ak , with ai p and b p , define ha b x

k i

aixi b mod p Then ha b ai p b p

Algorithms for Big Data (III) 6/16

slide-21
SLIDE 21

The General Case

The Hash family we just constructed has the restriction that m = n We can naturally generalize m = p to m = pk. Write every number x in base p: x x x p x p xk pk For every a a a ak , with ai p and b p , define ha b x

k i

aixi b mod p Then ha b ai p b p

Algorithms for Big Data (III) 6/16

slide-22
SLIDE 22

The General Case

The Hash family we just constructed has the restriction that m = n We can naturally generalize m = p to m = pk. Write every number x in base p: x = x0 + x1 · p + x2 · p2 + . . . xk−1 · pk−1. For every a a a ak , with ai p and b p , define ha b x

k i

aixi b mod p Then ha b ai p b p

Algorithms for Big Data (III) 6/16

slide-23
SLIDE 23

The General Case

The Hash family we just constructed has the restriction that m = n We can naturally generalize m = p to m = pk. Write every number x in base p: x = x0 + x1 · p + x2 · p2 + . . . xk−1 · pk−1. For every ¯ a = (a0, a1, . . . , ak−1), with 0 ≤ ai ≤ p − 1 and 0 ≤ b ≤ p − 1, define h¯

a,b(x) =

  • k−1

i=0

aixi + b

  • mod p.

Then ha b ai p b p

Algorithms for Big Data (III) 6/16

slide-24
SLIDE 24

The General Case

The Hash family we just constructed has the restriction that m = n We can naturally generalize m = p to m = pk. Write every number x in base p: x = x0 + x1 · p + x2 · p2 + . . . xk−1 · pk−1. For every ¯ a = (a0, a1, . . . , ak−1), with 0 ≤ ai ≤ p − 1 and 0 ≤ b ≤ p − 1, define h¯

a,b(x) =

  • k−1

i=0

aixi + b

  • mod p.

Then H = {h¯

a,b : 0 ≤ ai ≤ p − 1, 0 ≤ b ≤ p − 1} .

Algorithms for Big Data (III) 6/16

slide-25
SLIDE 25

Proof

Assuming x y and they difger on the position i (xi yi). For every u v p , we have equations aixi b u

j i ajxj

mod p aiyi b v

j i ajyj

mod p For fixed x y u v and aj j i, a unique pair ai b (out of p pairs) is determined. Therefore, Prha b ha b x u ha b y v p

Algorithms for Big Data (III) 7/16

slide-26
SLIDE 26

Proof

Assuming x y and they difger on the position i (xi yi). For every u v p , we have equations aixi b u

j i ajxj

mod p aiyi b v

j i ajyj

mod p For fixed x y u v and aj j i, a unique pair ai b (out of p pairs) is determined. Therefore, Prha b ha b x u ha b y v p

Algorithms for Big Data (III) 7/16

slide-27
SLIDE 27

Proof

Assuming x y and they difger on the position i (xi yi). For every u, v ∈ {0, 1, . . . , p − 1}, we have equations      aixi + b = ( u − ∑

ji ajxj

) mod p aiyi + b = ( v − ∑

ji ajyj

) mod p For fixed x y u v and aj j i, a unique pair ai b (out of p pairs) is determined. Therefore, Prha b ha b x u ha b y v p

Algorithms for Big Data (III) 7/16

slide-28
SLIDE 28

Proof

Assuming x y and they difger on the position i (xi yi). For every u, v ∈ {0, 1, . . . , p − 1}, we have equations      aixi + b = ( u − ∑

ji ajxj

) mod p aiyi + b = ( v − ∑

ji ajyj

) mod p For fixed x, y, u, v and { aj }

ji, a unique pair (ai, b) (out of p2 pairs) is determined.

Therefore, Prh¯

a,b∈H [h¯

a,b(x) = u ∧ h¯ a,b(y) = v] = 1

p2 .

Algorithms for Big Data (III) 7/16

slide-29
SLIDE 29

Counting Distinct Elements

Back to the streaming model, we are given a sequence of numbers a am where each ai n . It defines a frequency vector f f fn where fi k m ak i . We want to compute the number d i n fi . The value d is the number of distinct elements in the stream.

Algorithms for Big Data (III) 8/16

slide-30
SLIDE 30

Counting Distinct Elements

Back to the streaming model, we are given a sequence of numbers ⟨a1, . . . , am⟩ where each ai ∈ [n]. It defines a frequency vector f f fn where fi k m ak i . We want to compute the number d i n fi . The value d is the number of distinct elements in the stream.

Algorithms for Big Data (III) 8/16

slide-31
SLIDE 31

Counting Distinct Elements

Back to the streaming model, we are given a sequence of numbers ⟨a1, . . . , am⟩ where each ai ∈ [n]. It defines a frequency vector f = (f1, . . . , fn) where fi =

  • {k ∈ [m] : ak = i}

. We want to compute the number d i n fi . The value d is the number of distinct elements in the stream.

Algorithms for Big Data (III) 8/16

slide-32
SLIDE 32

Counting Distinct Elements

Back to the streaming model, we are given a sequence of numbers ⟨a1, . . . , am⟩ where each ai ∈ [n]. It defines a frequency vector f = (f1, . . . , fn) where fi =

  • {k ∈ [m] : ak = i}

. We want to compute the number d =

  • {i ∈ [n] : fi > 0}

. The value d is the number of distinct elements in the stream.

Algorithms for Big Data (III) 8/16

slide-33
SLIDE 33

Counting Distinct Elements

Back to the streaming model, we are given a sequence of numbers ⟨a1, . . . , am⟩ where each ai ∈ [n]. It defines a frequency vector f = (f1, . . . , fn) where fi =

  • {k ∈ [m] : ak = i}

. We want to compute the number d =

  • {i ∈ [n] : fi > 0}

. The value d is the number of distinct elements in the stream.

Algorithms for Big Data (III) 8/16

slide-34
SLIDE 34

The AMS Algorithm

The algorithm is named afuer Alon, Matias and Szegedy. For every integer p , we use zero p to denote number of trailing zeros of p in binary. zero p max i

i divides p

Algorithms for Big Data (III) 9/16

slide-35
SLIDE 35

The AMS Algorithm

The algorithm is named afuer Alon, Matias and Szegedy. For every integer p > 0, we use zero(p) to denote number of trailing zeros of p in binary. zero p max i

i divides p

Algorithms for Big Data (III) 9/16

slide-36
SLIDE 36

The AMS Algorithm

The algorithm is named afuer Alon, Matias and Szegedy. For every integer p > 0, we use zero(p) to denote number of trailing zeros of p in binary. zero(p) ≜ max { i : 2i divides p } .

Algorithms for Big Data (III) 9/16

slide-37
SLIDE 37

Algorithm AMS Algorithm for Counting Distinct Elements Init: A random Hash function h : [n] → [n] from a 2-universal family Z ← 0 On Input y: if zero(h(y)) > Z then Z ← zero(h(y)) end if Output:

  • d = 2Z+ 1

2 .

Algorithms for Big Data (III) 10/16

slide-38
SLIDE 38

Intuition

Afuer applying the Hash function, h y is uniform is n . The probability that it has more than t trailing zeros is at most

t.

Therefore, at least in expectation, if we have d distinct numbers, one of them may have log d trailing zeros. We now turn this intuition into a rigorous proof.

Algorithms for Big Data (III) 11/16

slide-39
SLIDE 39

Intuition

Afuer applying the Hash function, h(y) is uniform is [n]. The probability that it has more than t trailing zeros is at most

t.

Therefore, at least in expectation, if we have d distinct numbers, one of them may have log d trailing zeros. We now turn this intuition into a rigorous proof.

Algorithms for Big Data (III) 11/16

slide-40
SLIDE 40

Intuition

Afuer applying the Hash function, h(y) is uniform is [n]. The probability that it has more than t trailing zeros is at most 2−t . Therefore, at least in expectation, if we have d distinct numbers, one of them may have log d trailing zeros. We now turn this intuition into a rigorous proof.

Algorithms for Big Data (III) 11/16

slide-41
SLIDE 41

Intuition

Afuer applying the Hash function, h(y) is uniform is [n]. The probability that it has more than t trailing zeros is at most 2−t. Therefore, at least in expectation, if we have d distinct numbers, one of them may have log2 d trailing zeros. We now turn this intuition into a rigorous proof.

Algorithms for Big Data (III) 11/16

slide-42
SLIDE 42

Intuition

Afuer applying the Hash function, h(y) is uniform is [n]. The probability that it has more than t trailing zeros is at most 2−t. Therefore, at least in expectation, if we have d distinct numbers, one of them may have log2 d trailing zeros. We now turn this intuition into a rigorous proof.

Algorithms for Big Data (III) 11/16

slide-43
SLIDE 43

For every 0 ≤ r ≤ n, we use a random variable Yr to denote the number of h(ai) with trailing zero at least r. The sequence of variables Yr

r n determines the variable Z since Z

maxr Yr . This motivates us to understand the behavior of Yr. For every k n , we denote Xk r as the indicator that h k has at least r trailing zeros, then Yr

k n fk

Xk r. Using this decomposition, it is not hard to see that E Yr

d

r and Var Yr

d

r .

Algorithms for Big Data (III) 12/16

slide-44
SLIDE 44

For every 0 ≤ r ≤ n, we use a random variable Yr to denote the number of h(ai) with trailing zero at least r. The sequence of variables {Yr}0≤r≤n determines the variable Z since Z = maxr {Yr > 0}. This motivates us to understand the behavior of Yr. For every k n , we denote Xk r as the indicator that h k has at least r trailing zeros, then Yr

k n fk

Xk r. Using this decomposition, it is not hard to see that E Yr

d

r and Var Yr

d

r .

Algorithms for Big Data (III) 12/16

slide-45
SLIDE 45

For every 0 ≤ r ≤ n, we use a random variable Yr to denote the number of h(ai) with trailing zero at least r. The sequence of variables {Yr}0≤r≤n determines the variable Z since Z = maxr {Yr > 0}. This motivates us to understand the behavior of Yr. For every k n , we denote Xk r as the indicator that h k has at least r trailing zeros, then Yr

k n fk

Xk r. Using this decomposition, it is not hard to see that E Yr

d

r and Var Yr

d

r .

Algorithms for Big Data (III) 12/16

slide-46
SLIDE 46

For every 0 ≤ r ≤ n, we use a random variable Yr to denote the number of h(ai) with trailing zero at least r. The sequence of variables {Yr}0≤r≤n determines the variable Z since Z = maxr {Yr > 0}. This motivates us to understand the behavior of Yr. For every k ∈ [n], we denote Xk,r as the indicator that h(k) has at least r trailing zeros, then Yr = ∑

k∈[n]:fk>0 Xk,r.

Using this decomposition, it is not hard to see that E Yr

d

r and Var Yr

d

r .

Algorithms for Big Data (III) 12/16

slide-47
SLIDE 47

For every 0 ≤ r ≤ n, we use a random variable Yr to denote the number of h(ai) with trailing zero at least r. The sequence of variables {Yr}0≤r≤n determines the variable Z since Z = maxr {Yr > 0}. This motivates us to understand the behavior of Yr. For every k ∈ [n], we denote Xk,r as the indicator that h(k) has at least r trailing zeros, then Yr = ∑

k∈[n]:fk>0 Xk,r.

Using this decomposition, it is not hard to see that E [Yr] = d

2r and Var [Yr] ≤ d 2r .

Algorithms for Big Data (III) 12/16

slide-48
SLIDE 48

Applying Markov’s inequality, we obtain Pr [Yr > 0] = Pr [Yr ≥ 1] ≤ E [Yr] = d 2r . Applying Chebyshev’s inequality, we obtain Pr Yr Pr Yr E Yr d

r r

d We know that Yr for all r Z and Yr for all r Z. Therefore, Z cannot be too far from log d: if Z log d, we can find a small r with Yr , which happens with small probability; if Z log d, we can find a big r with Yr , which happens with small probability.

Algorithms for Big Data (III) 13/16

slide-49
SLIDE 49

Applying Markov’s inequality, we obtain Pr [Yr > 0] = Pr [Yr ≥ 1] ≤ E [Yr] = d 2r . Applying Chebyshev’s inequality, we obtain Pr [Yr = 0] ≤ Pr [

  • Yr − E [Yr]

≥ d 2r ] ≤ 2r d . We know that Yr for all r Z and Yr for all r Z. Therefore, Z cannot be too far from log d: if Z log d, we can find a small r with Yr , which happens with small probability; if Z log d, we can find a big r with Yr , which happens with small probability.

Algorithms for Big Data (III) 13/16

slide-50
SLIDE 50

Applying Markov’s inequality, we obtain Pr [Yr > 0] = Pr [Yr ≥ 1] ≤ E [Yr] = d 2r . Applying Chebyshev’s inequality, we obtain Pr [Yr = 0] ≤ Pr [

  • Yr − E [Yr]

≥ d 2r ] ≤ 2r d . We know that Yr > 0 for all r ≤ Z and Yr = 0 for all r > Z. Therefore, Z cannot be too far from log d: if Z log d, we can find a small r with Yr , which happens with small probability; if Z log d, we can find a big r with Yr , which happens with small probability.

Algorithms for Big Data (III) 13/16

slide-51
SLIDE 51

Applying Markov’s inequality, we obtain Pr [Yr > 0] = Pr [Yr ≥ 1] ≤ E [Yr] = d 2r . Applying Chebyshev’s inequality, we obtain Pr [Yr = 0] ≤ Pr [

  • Yr − E [Yr]

≥ d 2r ] ≤ 2r d . We know that Yr > 0 for all r ≤ Z and Yr = 0 for all r > Z. Therefore, Z cannot be too far from log2 d: if Z log d, we can find a small r with Yr , which happens with small probability; if Z log d, we can find a big r with Yr , which happens with small probability.

Algorithms for Big Data (III) 13/16

slide-52
SLIDE 52

Applying Markov’s inequality, we obtain Pr [Yr > 0] = Pr [Yr ≥ 1] ≤ E [Yr] = d 2r . Applying Chebyshev’s inequality, we obtain Pr [Yr = 0] ≤ Pr [

  • Yr − E [Yr]

≥ d 2r ] ≤ 2r d . We know that Yr > 0 for all r ≤ Z and Yr = 0 for all r > Z. Therefore, Z cannot be too far from log2 d:

▶ if Z ≪ log2 d, we can find a small r with Yr = 0, which happens with small

probability; if Z log d, we can find a big r with Yr , which happens with small probability.

Algorithms for Big Data (III) 13/16

slide-53
SLIDE 53

Applying Markov’s inequality, we obtain Pr [Yr > 0] = Pr [Yr ≥ 1] ≤ E [Yr] = d 2r . Applying Chebyshev’s inequality, we obtain Pr [Yr = 0] ≤ Pr [

  • Yr − E [Yr]

≥ d 2r ] ≤ 2r d . We know that Yr > 0 for all r ≤ Z and Yr = 0 for all r > Z. Therefore, Z cannot be too far from log2 d:

▶ if Z ≪ log2 d, we can find a small r with Yr = 0, which happens with small

probability;

▶ if Z ≫ log2 d, we can find a big r with Yr > 0, which happens with small probability.

Algorithms for Big Data (III) 13/16

slide-54
SLIDE 54

If d ≤ d

3, let r be the largest integer with 2r+ 1

2 ≤ d

3.

Pr d d Pr Z r Pr Yr

r

d If d d, let r be the smallest integer with

r

d. Pr d d Pr Z r Pr Yr d

r

The algorithm costs O log n bits of memory.

Algorithms for Big Data (III) 14/16

slide-55
SLIDE 55

If d ≤ d

3, let r be the largest integer with 2r+ 1

2 ≤ d

3.

Pr [

  • d ≤ d

3 ] = Pr [Z ≤ r] = Pr [Yr+1 = 0] ≤ 2r+1 d ≤ √ 2 3 . If d d, let r be the smallest integer with

r

d. Pr d d Pr Z r Pr Yr d

r

The algorithm costs O log n bits of memory.

Algorithms for Big Data (III) 14/16

slide-56
SLIDE 56

If d ≤ d

3, let r be the largest integer with 2r+ 1

2 ≤ d

3.

Pr [

  • d ≤ d

3 ] = Pr [Z ≤ r] = Pr [Yr+1 = 0] ≤ 2r+1 d ≤ √ 2 3 . If d ≥ 3d, let r be the smallest integer with 2r+ 1

2 ≥ 3d.

Pr d d Pr Z r Pr Yr d

r

The algorithm costs O log n bits of memory.

Algorithms for Big Data (III) 14/16

slide-57
SLIDE 57

If d ≤ d

3, let r be the largest integer with 2r+ 1

2 ≤ d

3.

Pr [

  • d ≤ d

3 ] = Pr [Z ≤ r] = Pr [Yr+1 = 0] ≤ 2r+1 d ≤ √ 2 3 . If d ≥ 3d, let r be the smallest integer with 2r+ 1

2 ≥ 3d.

Pr [ d ≥ 3d ] = Pr [Z ≥ r] = Pr [Yr > 0] ≤ d 2r ≤ √ 2 3 . The algorithm costs O log n bits of memory.

Algorithms for Big Data (III) 14/16

slide-58
SLIDE 58

If d ≤ d

3, let r be the largest integer with 2r+ 1

2 ≤ d

3.

Pr [

  • d ≤ d

3 ] = Pr [Z ≤ r] = Pr [Yr+1 = 0] ≤ 2r+1 d ≤ √ 2 3 . If d ≥ 3d, let r be the smallest integer with 2r+ 1

2 ≥ 3d.

Pr [ d ≥ 3d ] = Pr [Z ≥ r] = Pr [Yr > 0] ≤ d 2r ≤ √ 2 3 . The algorithm costs O(log n) bits of memory.

Algorithms for Big Data (III) 14/16

slide-59
SLIDE 59

Median

We can apply the standard Median trick to the AMS algorithm. (Excercise) Using O log log n bits of memory, we can obtain Pr d d d

Algorithms for Big Data (III) 15/16

slide-60
SLIDE 60

Median

We can apply the standard Median trick to the AMS algorithm. (Excercise) Using O log log n bits of memory, we can obtain Pr d d d

Algorithms for Big Data (III) 15/16

slide-61
SLIDE 61

Median

We can apply the standard Median trick to the AMS algorithm. (Excercise) Using O(log 1

δ log n) bits of memory, we can obtain

Pr [d 3 ≤ d ≤ 3d ] ≥ 1 − δ.

Algorithms for Big Data (III) 15/16

slide-62
SLIDE 62

The BJKST Algorithm

The following improvement of AMS is due to Bar-Yossef, Jayram, Kumar, Sivakumar and Trevisan. Algorithm BJKST Algorithm for Counting Distinct Elements Init: Random Hash functions h : [n] → [n], g : [n] → [bε−4 log2 n], both from 2- universal families; Z ← 0, B ← ∅ On Input y: if zero(h(y)) ≥ Z then B ← B ∪ {(g(y), zeros(h(y)))} while |B| ≥ c/ε2 do Z ← Z + 1 Remove all (α, β) with β < Z from B end while end if Output: d = |B| 2Z

Algorithms for Big Data (III) 16/16