1-1
B669 Sublinear Algorithms for Big Data
Qin Zhang
B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 1: - - PowerPoint PPT Presentation
B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 1: Sublinear in Space 2-1 The model and challenge The data stream model (Alon, Matias and Szegedy 1996) RAM a n a 2 a 1 CPU Why hard? Cannot store everything. Applications : Internet
1-1
Qin Zhang
2-1
3-1
RAM The data stream model (Alon, Matias and Szegedy 1996) CPU
The model and challenge
Why hard? Cannot store everything. Applications: Internet router, stock data, ad auction, flight logs on tape, etc. a1 a2 an
4-1
RAM CPU 9 7 6 3 3 9
Which items are most frequent? Approximation allowed.
5-1
More about the streaming model
Denote the stream by A = a1, . . . , am, where m = nO(1) is the length of the stream, which is unknown at the beginning. Let [n] be the item universe. Let xj be the frequency of item j in the steam. Each ai = (j, ∆) denotes xj ← xj + ∆.
5-2
More about the streaming model
We call an algorithm insertion-only if it only works for ∆ = 1. Denote the stream by A = a1, . . . , am, where m = nO(1) is the length of the stream, which is unknown at the beginning. Let [n] be the item universe. Let xj be the frequency of item j in the steam. Each ai = (j, ∆) denotes xj ← xj + ∆.
5-3
More about the streaming model
We call an algorithm insertion-only if it only works for ∆ = 1. Denote the stream by A = a1, . . . , am, where m = nO(1) is the length of the stream, which is unknown at the beginning. Let [n] be the item universe. Let xj be the frequency of item j in the steam. Each ai = (j, ∆) denotes xj ← xj + ∆. Can represent the stream as a vector x = (x1, . . . , xn). When ai = (j, ∆) comes, xj ← xj + ∆. – for insertion only, m = x1
6-1
The MAJORITY problem
MAJORITY: if ∃j : fj > m/2, then output j,
7-1
Heavy hitters and point queries
Lp heavy hitter set: HHp
φ(x) = {i : |xi| ≥ φ xp}
7-2
Heavy hitters and point queries
Lp heavy hitter set: HHp
φ(x) = {i : |xi| ≥ φ xp}
Lp Heavy Hitter Problem: Given φ, φ′, (often φ′ = φ − ǫ), return a set S such that HHp
φ(x) ⊆ S ⊆ HHp φ′(x)
7-3
Heavy hitters and point queries
Lp heavy hitter set: HHp
φ(x) = {i : |xi| ≥ φ xp}
Lp Heavy Hitter Problem: Given φ, φ′, (often φ′ = φ − ǫ), return a set S such that HHp
φ(x) ⊆ S ⊆ HHp φ′(x)
Lp Point Query Problem: Given ǫ, after reading the whole stream, given i, report ˜ xi = xi ± ǫ xp
8-1
The Misra-Gries algorithm
The algorithm (Misra-Gries ’82)
(a) if e ∈ A then set (e, xe) ← (e, xe + 1) (b) else if |A| < 1/ǫ, add (e, 1) to A (c) else, for each e ∈ A, set (e, xe) ← (e, xe − 1), and if xe − 1 = 0, then remove (e, 0) from A.
8-2
The Misra-Gries algorithm
The algorithm (Misra-Gries ’82)
(a) if e ∈ A then set (e, xe) ← (e, xe + 1) (b) else if |A| < 1/ǫ, add (e, 1) to A (c) else, for each e ∈ A, set (e, xe) ← (e, xe − 1), and if xe − 1 = 0, then remove (e, 0) from A.
Analysis (on board) Theorem
Misra-Gries uses O(1/ǫ · log n) bits, and for any j, produces an estimate ˜ xj satisfing xj − ǫm ≤ ˜ xj ≤ xj.
9-1
Space-saving: an algorithm for insertion only
When a new item e comes, we have two cases.
xe by 1 and reinsert the (e, ˜ xe) into the array.
MIN = min{˜ xe : e is in the array}. We always keep the array sorted according to ˜ xe, and then MIN is just the estimated frequency of the last item. If the length array is larger than 1/ǫ, we delete the last tuple. At the query of e, report ˜ xe if e is in the array, otherwise report MIN
Algorithm Space-saving [Metwally et al. ’05]
9-2
Space-saving: an algorithm for insertion only
When a new item e comes, we have two cases.
xe by 1 and reinsert the (e, ˜ xe) into the array.
MIN = min{˜ xe : e is in the array}. We always keep the array sorted according to ˜ xe, and then MIN is just the estimated frequency of the last item. If the length array is larger than 1/ǫ, we delete the last tuple. At the query of e, report ˜ xe if e is in the array, otherwise report MIN
Algorithm Space-saving [Metwally et al. ’05] Theorem
Space-saving uses O(1/ǫ · log n) bits, and for any j, produces an estimate ˜ xj satisfing xj ≤ ˜ xj ≤ xj + ǫm.
(Analysis on board)
10-1
RAM CPU 9 7 6 3 3 9
How many distinct elements? Approximation needed.
11-1
Universal hash function
A family H ⊆ {h : X → Y } is said to be 2-universal if the following property holds, with h ∈R H picked uniformly at random: ∀x, x′ ∈ X, ∀y, y ′ ∈ Y ,
1 |Y |2
12-1
The Flajoet-Martin algorithm
The algorithm (Flajoet and Martin ’83)
binary representation of h(e).
z = zeros(h(e));
12-2
The Flajoet-Martin algorithm
The algorithm (Flajoet and Martin ’83)
binary representation of h(e).
z = zeros(h(e));
Analysis (on board)
12-3
The Flajoet-Martin algorithm
The algorithm (Flajoet and Martin ’83)
binary representation of h(e).
z = zeros(h(e));
Analysis (on board) Theorem
The number of distinct elements can be O(1)-approximated with probability 2/3 using O(log n) bits.
13-1
Probability amplification
Can we boost the success probability to 1 − δ? The idea is to run k = Θ(log(1/δ)) copies of this algorithm in parallel, using mutually independent random hash functions, and output the median of the k answers.
14-1
An improved algorithm
The algorithm (Bar-Yossef et al. ’02)
Choose a secondary 2-universal hash function g : [n] → [(log n/ǫ)O(1)].
(a) set B ← B ∪ {(g(e), zeros(h(e)))}; (b) if |B| > c/ǫ2 then set z ← z + 1 and remove all (α, β) in B with β < z.
Idea: two-level hashing.
14-2
An improved algorithm
The algorithm (Bar-Yossef et al. ’02)
Choose a secondary 2-universal hash function g : [n] → [(log n/ǫ)O(1)].
(a) set B ← B ∪ {(g(e), zeros(h(e)))}; (b) if |B| > c/ǫ2 then set z ← z + 1 and remove all (α, β) in B with β < z.
Idea: two-level hashing. Analysis (on board)
14-3
An improved algorithm
The algorithm (Bar-Yossef et al. ’02)
Choose a secondary 2-universal hash function g : [n] → [(log n/ǫ)O(1)].
(a) set B ← B ∪ {(g(e), zeros(h(e)))}; (b) if |B| > c/ǫ2 then set z ← z + 1 and remove all (α, β) in B with β < z.
Idea: two-level hashing. Analysis (on board) Theorem
The number of distinct elements can be (1 + ǫ)-approximated with probability 2/3 using O(log n + 1/ǫ2 · (log(1/ǫ) + log log n)) bits.
15-1
16-1
We have seen Misra-Gries, Space-saving, Flajolet-Martin and its improvement.
16-2
We have seen Misra-Gries, Space-saving, Flajolet-Martin and its improvement. Nice algorithms, but only work for insertion-only sequences ... Can we handle deletions?
16-3
We have seen Misra-Gries, Space-saving, Flajolet-Martin and its improvement. Nice algorithms, but only work for insertion-only sequences ... Can we handle deletions? A popular way is to use linear sketches.
17-1
Linear sketch
Random linear projection M : Rn → Rk that preserves properties of any v ∈ Rn with high prob. where k ≪ n. = M v Mv answer
17-2
Linear sketch
Random linear projection M : Rn → Rk that preserves properties of any v ∈ Rn with high prob. where k ≪ n. = M v Mv answer Simple and useful Perfect for streaming and distributed computations. Work for insertion+deletion sequences (that is, ∆ can be either positive or negative).
18-1
Search version ⇒ Decision version Let D be # distinct elements:
Try T = 1, (1 + ǫ), (1 + ǫ)2, . . .
Distinct elements using linear sketches
19-1
Now, the decision problem
The algorithm
we have Pr[i ∈ S] = 1/T
i∈S xi
Note: this is a linear sketch.
19-2
Now, the decision problem
The algorithm
we have Pr[i ∈ S] = 1/T
i∈S xi
Note: this is a linear sketch.
Lemma
Let P = Pr[SumS(x) = 0]. If T is large enough, and ǫ is small enough, then
Proof (on board)
20-1
Amplify the success probability
Repeat to amplify the success probability
k = C log(1/δ)/ǫ2, C > 0
j = 1, . . . , k.
20-2
Amplify the success probability
Repeat to amplify the success probability
k = C log(1/δ)/ǫ2, C > 0
j = 1, . . . , k.
Lemma
If the constant C is large enough, then this algorithm reports a correct answer with probability 1 − δ.
20-3
Amplify the success probability
Repeat to amplify the success probability
k = C log(1/δ)/ǫ2, C > 0
j = 1, . . . , k.
Lemma
If the constant C is large enough, then this algorithm reports a correct answer with probability 1 − δ.
Proof (on board)
20-4
Amplify the success probability
Repeat to amplify the success probability
k = C log(1/δ)/ǫ2, C > 0
j = 1, . . . , k.
Lemma
If the constant C is large enough, then this algorithm reports a correct answer with probability 1 − δ.
Proof (on board) Theorem
The number of distinct elements can be (1 ± ǫ)-approximated with probability 1 − δ using O(log2 n log(1/δ)/ǫ3) bits.
21-1
22-1
23-1
L1-point-query
Algorithm Count-Min [Cormode and Muthu ’05]
hi : {1, . . . , n} → {1, . . . , w} (w = 2/ǫ) from a 2-universal family.
1, . . . , Z t w} such that
Z t
j = i:ht(i)=j xi
i = mint Z t ht(i)
23-2
L1-point-query
Algorithm Count-Min [Cormode and Muthu ’05]
hi : {1, . . . , n} → {1, . . . , w} (w = 2/ǫ) from a 2-universal family.
1, . . . , Z t w} such that
Z t
j = i:ht(i)=j xi
i = mint Z t ht(i)
Analysis (on board)
23-3
L1-point-query
Algorithm Count-Min [Cormode and Muthu ’05]
hi : {1, . . . , n} → {1, . . . , w} (w = 2/ǫ) from a 2-universal family.
1, . . . , Z t w} such that
Z t
j = i:ht(i)=j xi
i = mint Z t ht(i)
Analysis (on board) Theorem
We can solve L1-point-query with approximation ǫ and failure probability δ by storing O(1/ǫ log(1/δ) log n) bits.
24-1
L2-point-query
Algorithm Count-Sketch [Charikar et. al. ’05]
hi : {1, . . . , n} → {1, . . . , w} (w = 3/ǫ2) from a 2-universal family.
gi : {1, . . . , n} → {−1, 1} from a 2-universal family.
1, . . . , Z t w} such that
Z t
j =
xi = median{Z 1
h1(i), . . . , Z d hd (i)}
24-2
L2-point-query
Algorithm Count-Sketch [Charikar et. al. ’05]
hi : {1, . . . , n} → {1, . . . , w} (w = 3/ǫ2) from a 2-universal family.
gi : {1, . . . , n} → {−1, 1} from a 2-universal family.
1, . . . , Z t w} such that
Z t
j =
xi = median{Z 1
h1(i), . . . , Z d hd (i)}
Analysis (on board)
24-3
L2-point-query
Algorithm Count-Sketch [Charikar et. al. ’05]
hi : {1, . . . , n} → {1, . . . , w} (w = 3/ǫ2) from a 2-universal family.
gi : {1, . . . , n} → {−1, 1} from a 2-universal family.
1, . . . , Z t w} such that
Z t
j =
xi = median{Z 1
h1(i), . . . , Z d hd (i)}
Analysis (on board) Theorem
We can solve L2 point query, with approximation ǫ, and failure probability δ by storing O(1/ǫ2 log(1/δ) log n) bits.
25-1
L2-point-query (an alternative approach)
The algorithm:
[Gilbert, Kotidis, Muthukrishnan and Strauss ’01]
(R is a O(1/ǫ2 log(1/δ)) × n matrix, which can be constructed, e.g., by taking each cell to be N(0, 1))
xi = (1 − Rx/s − Rei2
2 /2)s
25-2
L2-point-query (an alternative approach)
The algorithm:
[Gilbert, Kotidis, Muthukrishnan and Strauss ’01]
(R is a O(1/ǫ2 log(1/δ)) × n matrix, which can be constructed, e.g., by taking each cell to be N(0, 1))
xi = (1 − Rx/s − Rei2
2 /2)s
Theorem Johnson-Linderstrauss Lemma
∀ x, we have (1 − ǫ) x2 ≤ Rx2 ≤ (1 + ǫ) x2 w.p. 1 − δ.
25-3
L2-point-query (an alternative approach)
The algorithm:
[Gilbert, Kotidis, Muthukrishnan and Strauss ’01]
(R is a O(1/ǫ2 log(1/δ)) × n matrix, which can be constructed, e.g., by taking each cell to be N(0, 1))
xi = (1 − Rx/s − Rei2
2 /2)s
Theorem
We can solve L2 point query, with approximation ǫ, and failure probability δ by storing O(1/ǫ2 log(1/δ) log n) bits.
Theorem Johnson-Linderstrauss Lemma
∀ x, we have (1 − ǫ) x2 ≤ Rx2 ≤ (1 + ǫ) x2 w.p. 1 − δ.
26-1
27-1
28-1
Algorithm for L0 sampling
Goal: sample an element from the support of x ∈ Rn
28-2
Algorithm for L0 sampling
Goal: sample an element from the support of x ∈ Rn
F0, an (1 ± 0.1)-approximation to F0.
– Dj = (1 ± 0.1) |{t | hj(t) = 0}| – Sj =
t,hj (t)=0(xt · t)
– Cj =
t,hj (t)=0 xt
Algorithm (output can be thought as Mx for a fixed matrix M)
28-3
Algorithm for L0 sampling
Goal: sample an element from the support of x ∈ Rn
F0, an (1 ± 0.1)-approximation to F0.
– Dj = (1 ± 0.1) |{t | hj(t) = 0}| – Sj =
t,hj (t)=0(xt · t)
– Cj =
t,hj (t)=0 xt
Algorithm (output can be thought as Mx for a fixed matrix M) Theorem
We can solve L1 point query, with approximation ǫ, and failure probability δ by storing O(1/ǫ log(1/δ)) numbers.
Lemma
At level j = 2 + ⌈log ˜ F0⌉, there is a unique element in the stream that maps to 0 with constant probability.
28-4
Algorithm for L0 sampling
Goal: sample an element from the support of x ∈ Rn
F0, an (1 ± 0.1)-approximation to F0.
– Dj = (1 ± 0.1) |{t | hj(t) = 0}| – Sj =
t,hj (t)=0(xt · t)
– Cj =
t,hj (t)=0 xt
Algorithm (output can be thought as Mx for a fixed matrix M)
Uniqueness is verified if Dj = 1 ± 0.1. If unique, then Sj/Cj gives identity of the element and Cj is the count.
Theorem
We can solve L1 point query, with approximation ǫ, and failure probability δ by storing O(1/ǫ log(1/δ)) numbers.
Lemma
At level j = 2 + ⌈log ˜ F0⌉, there is a unique element in the stream that maps to 0 with constant probability.
Analysis: (on the board)
29-1
30-1
31-1
Frequency moments and norms
Frequency moments: Fp =
i |xi|p, xi: frequency of item i.
31-2
Frequency moments and norms
Frequency moments: Fp =
i |xi|p, xi: frequency of item i.
A very good measurement of the skewness of the dataset.
31-3
Frequency moments and norms
Frequency moments: Fp =
i |xi|p, xi: frequency of item i.
A very good measurement of the skewness of the dataset. Norms: Lp = F 1/p
p
32-1
L2 estimation
The sketch for L2: a linear sketch Rx = [Z1, . . . , Zk], where each entry
2).
Alternatively, Zi = x2 Gi, where Gi drawn from N(0, 1).
32-2
L2 estimation
The sketch for L2: a linear sketch Rx = [Z1, . . . , Zk], where each entry
2).
Alternatively, Zi = x2 Gi, where Gi drawn from N(0, 1). The estimator: Y = median{|Z1| , . . . , |Zk|}/median{G}; G ∼ N(0, 1)
a aM is the median of a random variable R if Pr[|R| ≤ M] = 1/2
32-3
L2 estimation
The sketch for L2: a linear sketch Rx = [Z1, . . . , Zk], where each entry
2).
Alternatively, Zi = x2 Gi, where Gi drawn from N(0, 1). The estimator: Y = median{|Z1| , . . . , |Zk|}/median{G}; G ∼ N(0, 1)
a aM is the median of a random variable R if Pr[|R| ≤ M] = 1/2
Sounds like magic? The intuition behind: For “nice”– looking distributions (e.g., the Gaussian), the median of those samples, for large enough # samples, should converge to the median of the distribution.
33-1
The proof
Closeness in Probability Let U1, . . . , Uk be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{U1, . . . , Uk}, there is an absolute constant C > 0, Pr[F(U) ∈ (1/2 − ǫ, 1/2 + ǫ)] ≥ 1 − e−Ckǫ2
33-2
The proof
Closeness in Probability Let U1, . . . , Uk be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{U1, . . . , Uk}, there is an absolute constant C > 0, Pr[F(U) ∈ (1/2 − ǫ, 1/2 + ǫ)] ≥ 1 − e−Ckǫ2 Closeness in Value Let F be a c.d.f. of a random variable |G|, G drawn from N(0, 1). There exists an absolute constant C ′ > 0 such that if for any z ≥ 0 we have F(z) ∈ (1/2 − ǫ, 1/2 + ǫ), then z = M ± C ′ǫ.
33-3
The proof
Closeness in Probability Let U1, . . . , Uk be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{U1, . . . , Uk}, there is an absolute constant C > 0, Pr[F(U) ∈ (1/2 − ǫ, 1/2 + ǫ)] ≥ 1 − e−Ckǫ2 Closeness in Value Let F be a c.d.f. of a random variable |G|, G drawn from N(0, 1). There exists an absolute constant C ′ > 0 such that if for any z ≥ 0 we have F(z) ∈ (1/2 − ǫ, 1/2 + ǫ), then z = M ± C ′ǫ.
Theorem
Y = x2 (M ± C ′ǫ)/M = x2 (1 ± C ′′ǫ), w.h.p.
34-1
Generalization
Key property of Guassian distribution: If U1, . . . , Un and U are i.i.d drawn from Guassian distribution, then x1U1 + . . . + xnUn ∼ xp U for p = 2
34-2
Generalization
Key property of Guassian distribution: If U1, . . . , Un and U are i.i.d drawn from Guassian distribution, then x1U1 + . . . + xnUn ∼ xp U for p = 2 Such distributions are called “p-stable” [Indyk ’06] Good news: p-stable distributions exist for any p ∈ (0, 2]
34-3
Generalization
Key property of Guassian distribution: If U1, . . . , Un and U are i.i.d drawn from Guassian distribution, then x1U1 + . . . + xnUn ∼ xp U for p = 2 Such distributions are called “p-stable” [Indyk ’06] Good news: p-stable distributions exist for any p ∈ (0, 2] For p = 1, we get Cauchy distribution with density function: f (x) = 1/[π(1 + x2)]
35-1
Attribution
Some of the contents are borrowed from Amit Chakrabarti’s course http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/ Pitor Indyk’s course http://stellar.mit.edu/S/course/6/fa07/6.895/, and Andrew McGregor’s course http://people.cs.umass.edu/~mcgregor/courses/ CS711S12/index.html