B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 1: - - PowerPoint PPT Presentation

b669 sublinear algorithms for big data
SMART_READER_LITE
LIVE PREVIEW

B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 1: - - PowerPoint PPT Presentation

B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 1: Sublinear in Space 2-1 The model and challenge The data stream model (Alon, Matias and Szegedy 1996) RAM a n a 2 a 1 CPU Why hard? Cannot store everything. Applications : Internet


slide-1
SLIDE 1

1-1

B669 Sublinear Algorithms for Big Data

Qin Zhang

slide-2
SLIDE 2

2-1

Part 1: Sublinear in Space

slide-3
SLIDE 3

3-1

RAM The data stream model (Alon, Matias and Szegedy 1996) CPU

The model and challenge

Why hard? Cannot store everything. Applications: Internet router, stock data, ad auction, flight logs on tape, etc. a1 a2 an

slide-4
SLIDE 4

4-1

§1.1 Point Queries (part I)

RAM CPU 9 7 6 3 3 9

Which items are most frequent? Approximation allowed.

slide-5
SLIDE 5

5-1

More about the streaming model

Denote the stream by A = a1, . . . , am, where m = nO(1) is the length of the stream, which is unknown at the beginning. Let [n] be the item universe. Let xj be the frequency of item j in the steam. Each ai = (j, ∆) denotes xj ← xj + ∆.

slide-6
SLIDE 6

5-2

More about the streaming model

We call an algorithm insertion-only if it only works for ∆ = 1. Denote the stream by A = a1, . . . , am, where m = nO(1) is the length of the stream, which is unknown at the beginning. Let [n] be the item universe. Let xj be the frequency of item j in the steam. Each ai = (j, ∆) denotes xj ← xj + ∆.

slide-7
SLIDE 7

5-3

More about the streaming model

We call an algorithm insertion-only if it only works for ∆ = 1. Denote the stream by A = a1, . . . , am, where m = nO(1) is the length of the stream, which is unknown at the beginning. Let [n] be the item universe. Let xj be the frequency of item j in the steam. Each ai = (j, ∆) denotes xj ← xj + ∆. Can represent the stream as a vector x = (x1, . . . , xn). When ai = (j, ∆) comes, xj ← xj + ∆. – for insertion only, m = x1

slide-8
SLIDE 8

6-1

The MAJORITY problem

MAJORITY: if ∃j : fj > m/2, then output j,

  • therwise, output ⊥.
slide-9
SLIDE 9

7-1

Heavy hitters and point queries

Lp heavy hitter set: HHp

φ(x) = {i : |xi| ≥ φ xp}

slide-10
SLIDE 10

7-2

Heavy hitters and point queries

Lp heavy hitter set: HHp

φ(x) = {i : |xi| ≥ φ xp}

Lp Heavy Hitter Problem: Given φ, φ′, (often φ′ = φ − ǫ), return a set S such that HHp

φ(x) ⊆ S ⊆ HHp φ′(x)

slide-11
SLIDE 11

7-3

Heavy hitters and point queries

Lp heavy hitter set: HHp

φ(x) = {i : |xi| ≥ φ xp}

Lp Heavy Hitter Problem: Given φ, φ′, (often φ′ = φ − ǫ), return a set S such that HHp

φ(x) ⊆ S ⊆ HHp φ′(x)

Lp Point Query Problem: Given ǫ, after reading the whole stream, given i, report ˜ xi = xi ± ǫ xp

slide-12
SLIDE 12

8-1

The Misra-Gries algorithm

The algorithm (Misra-Gries ’82)

  • 1. Maintain a set A; each item is a counter pair (i, xi). A ← ∅
  • 2. For each new coming item e,

(a) if e ∈ A then set (e, xe) ← (e, xe + 1) (b) else if |A| < 1/ǫ, add (e, 1) to A (c) else, for each e ∈ A, set (e, xe) ← (e, xe − 1), and if xe − 1 = 0, then remove (e, 0) from A.

  • 3. On query i, if i ∈ A, then return xi, otherwise return 0
slide-13
SLIDE 13

8-2

The Misra-Gries algorithm

The algorithm (Misra-Gries ’82)

  • 1. Maintain a set A; each item is a counter pair (i, xi). A ← ∅
  • 2. For each new coming item e,

(a) if e ∈ A then set (e, xe) ← (e, xe + 1) (b) else if |A| < 1/ǫ, add (e, 1) to A (c) else, for each e ∈ A, set (e, xe) ← (e, xe − 1), and if xe − 1 = 0, then remove (e, 0) from A.

  • 3. On query i, if i ∈ A, then return xi, otherwise return 0

Analysis (on board) Theorem

Misra-Gries uses O(1/ǫ · log n) bits, and for any j, produces an estimate ˜ xj satisfing xj − ǫm ≤ ˜ xj ≤ xj.

slide-14
SLIDE 14

9-1

Space-saving: an algorithm for insertion only

When a new item e comes, we have two cases.

  • 1. If e is already in the array. We just increment ˜

xe by 1 and reinsert the (e, ˜ xe) into the array.

  • 2. If e is not in the array, we create a new tuple (e, MIN + 1) where

MIN = min{˜ xe : e is in the array}. We always keep the array sorted according to ˜ xe, and then MIN is just the estimated frequency of the last item. If the length array is larger than 1/ǫ, we delete the last tuple. At the query of e, report ˜ xe if e is in the array, otherwise report MIN

Algorithm Space-saving [Metwally et al. ’05]

slide-15
SLIDE 15

9-2

Space-saving: an algorithm for insertion only

When a new item e comes, we have two cases.

  • 1. If e is already in the array. We just increment ˜

xe by 1 and reinsert the (e, ˜ xe) into the array.

  • 2. If e is not in the array, we create a new tuple (e, MIN + 1) where

MIN = min{˜ xe : e is in the array}. We always keep the array sorted according to ˜ xe, and then MIN is just the estimated frequency of the last item. If the length array is larger than 1/ǫ, we delete the last tuple. At the query of e, report ˜ xe if e is in the array, otherwise report MIN

Algorithm Space-saving [Metwally et al. ’05] Theorem

Space-saving uses O(1/ǫ · log n) bits, and for any j, produces an estimate ˜ xj satisfing xj ≤ ˜ xj ≤ xj + ǫm.

(Analysis on board)

slide-16
SLIDE 16

10-1

§1.2 Distinct Elements

RAM CPU 9 7 6 3 3 9

How many distinct elements? Approximation needed.

slide-17
SLIDE 17

11-1

Universal hash function

A family H ⊆ {h : X → Y } is said to be 2-universal if the following property holds, with h ∈R H picked uniformly at random: ∀x, x′ ∈ X, ∀y, y ′ ∈ Y ,

  • x = x′ ⇒ Prh[h(x) = y ∧ h(x′) = y ′] =

1 |Y |2

slide-18
SLIDE 18

12-1

The Flajoet-Martin algorithm

The algorithm (Flajoet and Martin ’83)

  • 1. Choose a random hash function h : [n] → [n] from a 2-universal
  • family. Set z = 0. Let zeros(h(e)) be the # tailing zeros of the

binary representation of h(e).

  • 2. For each new coming item e, if zeros(h(e)) > z, then set

z = zeros(h(e));

  • 3. Output 2z+0.5.
slide-19
SLIDE 19

12-2

The Flajoet-Martin algorithm

The algorithm (Flajoet and Martin ’83)

  • 1. Choose a random hash function h : [n] → [n] from a 2-universal
  • family. Set z = 0. Let zeros(h(e)) be the # tailing zeros of the

binary representation of h(e).

  • 2. For each new coming item e, if zeros(h(e)) > z, then set

z = zeros(h(e));

  • 3. Output 2z+0.5.

Analysis (on board)

slide-20
SLIDE 20

12-3

The Flajoet-Martin algorithm

The algorithm (Flajoet and Martin ’83)

  • 1. Choose a random hash function h : [n] → [n] from a 2-universal
  • family. Set z = 0. Let zeros(h(e)) be the # tailing zeros of the

binary representation of h(e).

  • 2. For each new coming item e, if zeros(h(e)) > z, then set

z = zeros(h(e));

  • 3. Output 2z+0.5.

Analysis (on board) Theorem

The number of distinct elements can be O(1)-approximated with probability 2/3 using O(log n) bits.

slide-21
SLIDE 21

13-1

Probability amplification

Can we boost the success probability to 1 − δ? The idea is to run k = Θ(log(1/δ)) copies of this algorithm in parallel, using mutually independent random hash functions, and output the median of the k answers.

slide-22
SLIDE 22

14-1

An improved algorithm

The algorithm (Bar-Yossef et al. ’02)

  • 1. Choose a random hash function h : [n] → [n] from a 2-universal
  • family. Set z = 0, B = ∅.

Choose a secondary 2-universal hash function g : [n] → [(log n/ǫ)O(1)].

  • 2. For each new coming item e, if zeros(h(e)) ≥ z, then

(a) set B ← B ∪ {(g(e), zeros(h(e)))}; (b) if |B| > c/ǫ2 then set z ← z + 1 and remove all (α, β) in B with β < z.

  • 3. Output |B| 2z.

Idea: two-level hashing.

slide-23
SLIDE 23

14-2

An improved algorithm

The algorithm (Bar-Yossef et al. ’02)

  • 1. Choose a random hash function h : [n] → [n] from a 2-universal
  • family. Set z = 0, B = ∅.

Choose a secondary 2-universal hash function g : [n] → [(log n/ǫ)O(1)].

  • 2. For each new coming item e, if zeros(h(e)) ≥ z, then

(a) set B ← B ∪ {(g(e), zeros(h(e)))}; (b) if |B| > c/ǫ2 then set z ← z + 1 and remove all (α, β) in B with β < z.

  • 3. Output |B| 2z.

Idea: two-level hashing. Analysis (on board)

slide-24
SLIDE 24

14-3

An improved algorithm

The algorithm (Bar-Yossef et al. ’02)

  • 1. Choose a random hash function h : [n] → [n] from a 2-universal
  • family. Set z = 0, B = ∅.

Choose a secondary 2-universal hash function g : [n] → [(log n/ǫ)O(1)].

  • 2. For each new coming item e, if zeros(h(e)) ≥ z, then

(a) set B ← B ∪ {(g(e), zeros(h(e)))}; (b) if |B| > c/ǫ2 then set z ← z + 1 and remove all (α, β) in B with β < z.

  • 3. Output |B| 2z.

Idea: two-level hashing. Analysis (on board) Theorem

The number of distinct elements can be (1 + ǫ)-approximated with probability 2/3 using O(log n + 1/ǫ2 · (log(1/ǫ) + log log n)) bits.

slide-25
SLIDE 25

15-1

§1.3 Linear Sketches

slide-26
SLIDE 26

16-1

We have seen Misra-Gries, Space-saving, Flajolet-Martin and its improvement.

slide-27
SLIDE 27

16-2

We have seen Misra-Gries, Space-saving, Flajolet-Martin and its improvement. Nice algorithms, but only work for insertion-only sequences ... Can we handle deletions?

slide-28
SLIDE 28

16-3

We have seen Misra-Gries, Space-saving, Flajolet-Martin and its improvement. Nice algorithms, but only work for insertion-only sequences ... Can we handle deletions? A popular way is to use linear sketches.

slide-29
SLIDE 29

17-1

Linear sketch

Random linear projection M : Rn → Rk that preserves properties of any v ∈ Rn with high prob. where k ≪ n. = M v Mv answer

slide-30
SLIDE 30

17-2

Linear sketch

Random linear projection M : Rn → Rk that preserves properties of any v ∈ Rn with high prob. where k ≪ n. = M v Mv answer Simple and useful Perfect for streaming and distributed computations. Work for insertion+deletion sequences (that is, ∆ can be either positive or negative).

slide-31
SLIDE 31

18-1

Search version ⇒ Decision version Let D be # distinct elements:

  • If D ≥ T(1 + ǫ), then answer YES.
  • If D ≤ T/(1 + ǫ), then answer NO.

Try T = 1, (1 + ǫ), (1 + ǫ)2, . . .

Distinct elements using linear sketches

slide-32
SLIDE 32

19-1

Now, the decision problem

The algorithm

  • 1. Select a random set S ⊆ {1, 2, . . . , n}, s.t. for each i, independently,

we have Pr[i ∈ S] = 1/T

  • 2. Make a pass over the stream, maintaining SumS(x) =

i∈S xi

Note: this is a linear sketch.

  • 3. If SumS(x) > 0, return YES, otherwise return NO.
slide-33
SLIDE 33

19-2

Now, the decision problem

The algorithm

  • 1. Select a random set S ⊆ {1, 2, . . . , n}, s.t. for each i, independently,

we have Pr[i ∈ S] = 1/T

  • 2. Make a pass over the stream, maintaining SumS(x) =

i∈S xi

Note: this is a linear sketch.

  • 3. If SumS(x) > 0, return YES, otherwise return NO.

Lemma

Let P = Pr[SumS(x) = 0]. If T is large enough, and ǫ is small enough, then

  • If D ≥ T(1 + ǫ), then P < 1/e − ǫ/3.
  • If D ≤ T/(1 + ǫ), then P > 1/e + ǫ/3.

Proof (on board)

slide-34
SLIDE 34

20-1

Amplify the success probability

Repeat to amplify the success probability

  • 1. Select k sets S1, . . . , Sk as in previous algorithm, for

k = C log(1/δ)/ǫ2, C > 0

  • 2. Let Z be the number of values of SumSj (x) that are equal to 0,

j = 1, . . . , k.

  • 3. If Z < k/e then report YES, otherwise report NO.
slide-35
SLIDE 35

20-2

Amplify the success probability

Repeat to amplify the success probability

  • 1. Select k sets S1, . . . , Sk as in previous algorithm, for

k = C log(1/δ)/ǫ2, C > 0

  • 2. Let Z be the number of values of SumSj (x) that are equal to 0,

j = 1, . . . , k.

  • 3. If Z < k/e then report YES, otherwise report NO.

Lemma

If the constant C is large enough, then this algorithm reports a correct answer with probability 1 − δ.

slide-36
SLIDE 36

20-3

Amplify the success probability

Repeat to amplify the success probability

  • 1. Select k sets S1, . . . , Sk as in previous algorithm, for

k = C log(1/δ)/ǫ2, C > 0

  • 2. Let Z be the number of values of SumSj (x) that are equal to 0,

j = 1, . . . , k.

  • 3. If Z < k/e then report YES, otherwise report NO.

Lemma

If the constant C is large enough, then this algorithm reports a correct answer with probability 1 − δ.

Proof (on board)

slide-37
SLIDE 37

20-4

Amplify the success probability

Repeat to amplify the success probability

  • 1. Select k sets S1, . . . , Sk as in previous algorithm, for

k = C log(1/δ)/ǫ2, C > 0

  • 2. Let Z be the number of values of SumSj (x) that are equal to 0,

j = 1, . . . , k.

  • 3. If Z < k/e then report YES, otherwise report NO.

Lemma

If the constant C is large enough, then this algorithm reports a correct answer with probability 1 − δ.

Proof (on board) Theorem

The number of distinct elements can be (1 ± ǫ)-approximated with probability 1 − δ using O(log2 n log(1/δ)/ǫ3) bits.

slide-38
SLIDE 38

21-1

Question: can we make FM sketch linear?

slide-39
SLIDE 39

22-1

§1.4 Point Queries (part II)

slide-40
SLIDE 40

23-1

L1-point-query

Algorithm Count-Min [Cormode and Muthu ’05]

  • Pick d (d = log(1/δ)) independent hash functions h1, . . . , hd where

hi : {1, . . . , n} → {1, . . . , w} (w = 2/ǫ) from a 2-universal family.

  • Maintain d vectors Z 1, . . . , Z d where Z t = {Z t

1, . . . , Z t w} such that

Z t

j = i:ht(i)=j xi

  • Estimator: x∗

i = mint Z t ht(i)

slide-41
SLIDE 41

23-2

L1-point-query

Algorithm Count-Min [Cormode and Muthu ’05]

  • Pick d (d = log(1/δ)) independent hash functions h1, . . . , hd where

hi : {1, . . . , n} → {1, . . . , w} (w = 2/ǫ) from a 2-universal family.

  • Maintain d vectors Z 1, . . . , Z d where Z t = {Z t

1, . . . , Z t w} such that

Z t

j = i:ht(i)=j xi

  • Estimator: x∗

i = mint Z t ht(i)

Analysis (on board)

slide-42
SLIDE 42

23-3

L1-point-query

Algorithm Count-Min [Cormode and Muthu ’05]

  • Pick d (d = log(1/δ)) independent hash functions h1, . . . , hd where

hi : {1, . . . , n} → {1, . . . , w} (w = 2/ǫ) from a 2-universal family.

  • Maintain d vectors Z 1, . . . , Z d where Z t = {Z t

1, . . . , Z t w} such that

Z t

j = i:ht(i)=j xi

  • Estimator: x∗

i = mint Z t ht(i)

Analysis (on board) Theorem

We can solve L1-point-query with approximation ǫ and failure probability δ by storing O(1/ǫ log(1/δ) log n) bits.

slide-43
SLIDE 43

24-1

L2-point-query

Algorithm Count-Sketch [Charikar et. al. ’05]

  • Pick d (d = log(1/δ)) independent hash functions h1, . . . , hd where

hi : {1, . . . , n} → {1, . . . , w} (w = 3/ǫ2) from a 2-universal family.

  • Pick d (d = log(1/δ)) independent hash functions g1, . . . , gd where

gi : {1, . . . , n} → {−1, 1} from a 2-universal family.

  • Maintain d vectors Z 1, . . . , Z d where Z t = {Z t

1, . . . , Z t w} such that

Z t

j =

  • i:ht(i)=j gt(i)xi
  • Estimator: ˜

xi = median{Z 1

h1(i), . . . , Z d hd (i)}

slide-44
SLIDE 44

24-2

L2-point-query

Algorithm Count-Sketch [Charikar et. al. ’05]

  • Pick d (d = log(1/δ)) independent hash functions h1, . . . , hd where

hi : {1, . . . , n} → {1, . . . , w} (w = 3/ǫ2) from a 2-universal family.

  • Pick d (d = log(1/δ)) independent hash functions g1, . . . , gd where

gi : {1, . . . , n} → {−1, 1} from a 2-universal family.

  • Maintain d vectors Z 1, . . . , Z d where Z t = {Z t

1, . . . , Z t w} such that

Z t

j =

  • i:ht(i)=j gt(i)xi
  • Estimator: ˜

xi = median{Z 1

h1(i), . . . , Z d hd (i)}

Analysis (on board)

slide-45
SLIDE 45

24-3

L2-point-query

Algorithm Count-Sketch [Charikar et. al. ’05]

  • Pick d (d = log(1/δ)) independent hash functions h1, . . . , hd where

hi : {1, . . . , n} → {1, . . . , w} (w = 3/ǫ2) from a 2-universal family.

  • Pick d (d = log(1/δ)) independent hash functions g1, . . . , gd where

gi : {1, . . . , n} → {−1, 1} from a 2-universal family.

  • Maintain d vectors Z 1, . . . , Z d where Z t = {Z t

1, . . . , Z t w} such that

Z t

j =

  • i:ht(i)=j gt(i)xi
  • Estimator: ˜

xi = median{Z 1

h1(i), . . . , Z d hd (i)}

Analysis (on board) Theorem

We can solve L2 point query, with approximation ǫ, and failure probability δ by storing O(1/ǫ2 log(1/δ) log n) bits.

slide-46
SLIDE 46

25-1

L2-point-query (an alternative approach)

The algorithm:

[Gilbert, Kotidis, Muthukrishnan and Strauss ’01]

  • Maintain a sketch Rx such that s = Rx2 = (1 ± ǫ) x2

(R is a O(1/ǫ2 log(1/δ)) × n matrix, which can be constructed, e.g., by taking each cell to be N(0, 1))

  • Estimator: ˜

xi = (1 − Rx/s − Rei2

2 /2)s

slide-47
SLIDE 47

25-2

L2-point-query (an alternative approach)

The algorithm:

[Gilbert, Kotidis, Muthukrishnan and Strauss ’01]

  • Maintain a sketch Rx such that s = Rx2 = (1 ± ǫ) x2

(R is a O(1/ǫ2 log(1/δ)) × n matrix, which can be constructed, e.g., by taking each cell to be N(0, 1))

  • Estimator: ˜

xi = (1 − Rx/s − Rei2

2 /2)s

Theorem Johnson-Linderstrauss Lemma

∀ x, we have (1 − ǫ) x2 ≤ Rx2 ≤ (1 + ǫ) x2 w.p. 1 − δ.

slide-48
SLIDE 48

25-3

L2-point-query (an alternative approach)

The algorithm:

[Gilbert, Kotidis, Muthukrishnan and Strauss ’01]

  • Maintain a sketch Rx such that s = Rx2 = (1 ± ǫ) x2

(R is a O(1/ǫ2 log(1/δ)) × n matrix, which can be constructed, e.g., by taking each cell to be N(0, 1))

  • Estimator: ˜

xi = (1 − Rx/s − Rei2

2 /2)s

Theorem

We can solve L2 point query, with approximation ǫ, and failure probability δ by storing O(1/ǫ2 log(1/δ) log n) bits.

Theorem Johnson-Linderstrauss Lemma

∀ x, we have (1 − ǫ) x2 ≤ Rx2 ≤ (1 + ǫ) x2 w.p. 1 − δ.

slide-49
SLIDE 49

26-1

Which one is better?

slide-50
SLIDE 50

27-1

§1.5 ℓ0-Sampling

slide-51
SLIDE 51

28-1

Algorithm for L0 sampling

Goal: sample an element from the support of x ∈ Rn

slide-52
SLIDE 52

28-2

Algorithm for L0 sampling

Goal: sample an element from the support of x ∈ Rn

  • Maintain ˜

F0, an (1 ± 0.1)-approximation to F0.

  • Hash items using hj : [n] → [0, 2j − 1] for j ∈ 0, 1, . . . , log n + 2.
  • For each j, maintain:

– Dj = (1 ± 0.1) |{t | hj(t) = 0}| – Sj =

t,hj (t)=0(xt · t)

– Cj =

t,hj (t)=0 xt

Algorithm (output can be thought as Mx for a fixed matrix M)

slide-53
SLIDE 53

28-3

Algorithm for L0 sampling

Goal: sample an element from the support of x ∈ Rn

  • Maintain ˜

F0, an (1 ± 0.1)-approximation to F0.

  • Hash items using hj : [n] → [0, 2j − 1] for j ∈ 0, 1, . . . , log n + 2.
  • For each j, maintain:

– Dj = (1 ± 0.1) |{t | hj(t) = 0}| – Sj =

t,hj (t)=0(xt · t)

– Cj =

t,hj (t)=0 xt

Algorithm (output can be thought as Mx for a fixed matrix M) Theorem

We can solve L1 point query, with approximation ǫ, and failure probability δ by storing O(1/ǫ log(1/δ)) numbers.

Lemma

At level j = 2 + ⌈log ˜ F0⌉, there is a unique element in the stream that maps to 0 with constant probability.

slide-54
SLIDE 54

28-4

Algorithm for L0 sampling

Goal: sample an element from the support of x ∈ Rn

  • Maintain ˜

F0, an (1 ± 0.1)-approximation to F0.

  • Hash items using hj : [n] → [0, 2j − 1] for j ∈ 0, 1, . . . , log n + 2.
  • For each j, maintain:

– Dj = (1 ± 0.1) |{t | hj(t) = 0}| – Sj =

t,hj (t)=0(xt · t)

– Cj =

t,hj (t)=0 xt

Algorithm (output can be thought as Mx for a fixed matrix M)

Uniqueness is verified if Dj = 1 ± 0.1. If unique, then Sj/Cj gives identity of the element and Cj is the count.

Theorem

We can solve L1 point query, with approximation ǫ, and failure probability δ by storing O(1/ǫ log(1/δ)) numbers.

Lemma

At level j = 2 + ⌈log ˜ F0⌉, there is a unique element in the stream that maps to 0 with constant probability.

Analysis: (on the board)

slide-55
SLIDE 55

29-1

End of Part 1 Thank you!

slide-56
SLIDE 56

30-1

§1.6 Frequency Moments

slide-57
SLIDE 57

31-1

Frequency moments and norms

Frequency moments: Fp =

i |xi|p, xi: frequency of item i.

  • F0: number of distinct items.
  • F1: total number of items.
  • F2: size of self-join.
slide-58
SLIDE 58

31-2

Frequency moments and norms

Frequency moments: Fp =

i |xi|p, xi: frequency of item i.

  • F0: number of distinct items.
  • F1: total number of items.
  • F2: size of self-join.

A very good measurement of the skewness of the dataset.

slide-59
SLIDE 59

31-3

Frequency moments and norms

Frequency moments: Fp =

i |xi|p, xi: frequency of item i.

  • F0: number of distinct items.
  • F1: total number of items.
  • F2: size of self-join.

A very good measurement of the skewness of the dataset. Norms: Lp = F 1/p

p

slide-60
SLIDE 60

32-1

L2 estimation

The sketch for L2: a linear sketch Rx = [Z1, . . . , Zk], where each entry

  • f k × n (k = O(1/ǫ2)) matrix R has distribution N(0, 1).
  • Each of Zi is draw from N(0, x2

2).

Alternatively, Zi = x2 Gi, where Gi drawn from N(0, 1).

slide-61
SLIDE 61

32-2

L2 estimation

The sketch for L2: a linear sketch Rx = [Z1, . . . , Zk], where each entry

  • f k × n (k = O(1/ǫ2)) matrix R has distribution N(0, 1).
  • Each of Zi is draw from N(0, x2

2).

Alternatively, Zi = x2 Gi, where Gi drawn from N(0, 1). The estimator: Y = median{|Z1| , . . . , |Zk|}/median{G}; G ∼ N(0, 1)

a aM is the median of a random variable R if Pr[|R| ≤ M] = 1/2

slide-62
SLIDE 62

32-3

L2 estimation

The sketch for L2: a linear sketch Rx = [Z1, . . . , Zk], where each entry

  • f k × n (k = O(1/ǫ2)) matrix R has distribution N(0, 1).
  • Each of Zi is draw from N(0, x2

2).

Alternatively, Zi = x2 Gi, where Gi drawn from N(0, 1). The estimator: Y = median{|Z1| , . . . , |Zk|}/median{G}; G ∼ N(0, 1)

a aM is the median of a random variable R if Pr[|R| ≤ M] = 1/2

Sounds like magic? The intuition behind: For “nice”– looking distributions (e.g., the Gaussian), the median of those samples, for large enough # samples, should converge to the median of the distribution.

slide-63
SLIDE 63

33-1

The proof

Closeness in Probability Let U1, . . . , Uk be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{U1, . . . , Uk}, there is an absolute constant C > 0, Pr[F(U) ∈ (1/2 − ǫ, 1/2 + ǫ)] ≥ 1 − e−Ckǫ2

slide-64
SLIDE 64

33-2

The proof

Closeness in Probability Let U1, . . . , Uk be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{U1, . . . , Uk}, there is an absolute constant C > 0, Pr[F(U) ∈ (1/2 − ǫ, 1/2 + ǫ)] ≥ 1 − e−Ckǫ2 Closeness in Value Let F be a c.d.f. of a random variable |G|, G drawn from N(0, 1). There exists an absolute constant C ′ > 0 such that if for any z ≥ 0 we have F(z) ∈ (1/2 − ǫ, 1/2 + ǫ), then z = M ± C ′ǫ.

slide-65
SLIDE 65

33-3

The proof

Closeness in Probability Let U1, . . . , Uk be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{U1, . . . , Uk}, there is an absolute constant C > 0, Pr[F(U) ∈ (1/2 − ǫ, 1/2 + ǫ)] ≥ 1 − e−Ckǫ2 Closeness in Value Let F be a c.d.f. of a random variable |G|, G drawn from N(0, 1). There exists an absolute constant C ′ > 0 such that if for any z ≥ 0 we have F(z) ∈ (1/2 − ǫ, 1/2 + ǫ), then z = M ± C ′ǫ.

Theorem

Y = x2 (M ± C ′ǫ)/M = x2 (1 ± C ′′ǫ), w.h.p.

slide-66
SLIDE 66

34-1

Generalization

Key property of Guassian distribution: If U1, . . . , Un and U are i.i.d drawn from Guassian distribution, then x1U1 + . . . + xnUn ∼ xp U for p = 2

slide-67
SLIDE 67

34-2

Generalization

Key property of Guassian distribution: If U1, . . . , Un and U are i.i.d drawn from Guassian distribution, then x1U1 + . . . + xnUn ∼ xp U for p = 2 Such distributions are called “p-stable” [Indyk ’06] Good news: p-stable distributions exist for any p ∈ (0, 2]

slide-68
SLIDE 68

34-3

Generalization

Key property of Guassian distribution: If U1, . . . , Un and U are i.i.d drawn from Guassian distribution, then x1U1 + . . . + xnUn ∼ xp U for p = 2 Such distributions are called “p-stable” [Indyk ’06] Good news: p-stable distributions exist for any p ∈ (0, 2] For p = 1, we get Cauchy distribution with density function: f (x) = 1/[π(1 + x2)]

slide-69
SLIDE 69

35-1

Attribution

Some of the contents are borrowed from Amit Chakrabarti’s course http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/ Pitor Indyk’s course http://stellar.mit.edu/S/course/6/fa07/6.895/, and Andrew McGregor’s course http://people.cs.umass.edu/~mcgregor/courses/ CS711S12/index.html