1-1
Communication-Efficient Computation on Distributed Noisy Datasets
SPAA’15 June 15, 2015
Communication-Efficient Computation on Distributed Noisy Datasets - - PowerPoint PPT Presentation
Communication-Efficient Computation on Distributed Noisy Datasets Qin Zhang Indiana University Bloomington SPAA15 June 15, 2015 1-1 Model of computation The coordinator model : k sites and 1 coordinator. each site has a 2-way
1-1
SPAA’15 June 15, 2015
2-1
The coordinator model: k sites and 1 coordinator. – each site has a 2-way communication channel with the coordinator. – each site Si has a piece of data xi. The coordinator has ∅. – Task: compute f (x1, . . . , xk) together via communication. The coordinator reports the answer. – computation is divided into rounds. – Goal: minimize both
S1 S2 S3 Sk C
x1 x2 x3 xk ∅
2-2
The coordinator model: k sites and 1 coordinator. – each site has a 2-way communication channel with the coordinator. – each site Si has a piece of data xi. The coordinator has ∅. – Task: compute f (x1, . . . , xk) together via communication. The coordinator reports the answer. – computation is divided into rounds. – Goal: minimize both
S1 S2 S3 Sk C
x1 x2 x3 xk ∅
– no constraint on #bits can be sent by each site on each round. (usually balanced) – do not count local computation (usually linear)
3-1
Abstraction
The BSP model.
Input Map Shuffle Reduce Output
The MapReduce model.
Communication → time, energy, bandwidth, . . .
Also network monitoring, sensor networks, etc.
3-2
Abstraction
The BSP model.
Input Map Shuffle Reduce Output
The MapReduce model.
Communication → time, energy, bandwidth, . . .
Also network monitoring, sensor networks, etc.
S1 S2 S3 Sk C
4-1
Function f can be: How many distinct elements (F0) in the union of the k bags?
4-2
Function f can be: How many distinct elements (F0) in the union of the k bags?
Important in: traffic monitoring, query optimization, ...
4-3
Function f can be: How many distinct elements (F0) in the union of the k bags?
Almost always allow a (1 + ǫ)-approximation Important in: traffic monitoring, query optimization, ...
5-1
local linear sketch global sketch = local sketches
How many distinct elements (F0) in the union of the k bags?
6-1
Random linear mapping M : Rn → Rk where k ≪ n. = M x Mx (approximate) f (x)
The data. e.g., a frequency vector linear mapping sketching vector
6-2
Random linear mapping M : Rn → Rk where k ≪ n. = M x Mx (approximate) f (x)
The data. e.g., a frequency vector linear mapping sketching vector
Simple and useful: Statistical/graph/algebraic problems in data streams, compressive sensing, . . .
6-3
Random linear mapping M : Rn → Rk where k ≪ n. = M x Mx (approximate) f (x)
The data. e.g., a frequency vector linear mapping sketching vector
Perfect for distributed computation The data is distributed as x = x1 + . . . + xk; xi on site i. Merge using linearity: Mx1 + . . . + Mxk = M(x1 + . . . + xk) Simple and useful: Statistical/graph/algebraic problems in data streams, compressive sensing, . . .
7-1
John Smith, 800 Mountain Av springfield Joe Smith, 800 Mount Av Springfield Joseph Smith, 800 Mt. Road Springfield Joe Smith, 800 Mt. Road Springfield
7-2
John Smith, 800 Mountain Av springfield Joe Smith, 800 Mount Av Springfield Joseph Smith, 800 Mt. Road Springfield Joe Smith, 800 Mt. Road Springfield
We (have to) consider similar items as
7-3
John Smith, 800 Mountain Av springfield Joe Smith, 800 Mount Av Springfield Joseph Smith, 800 Mt. Road Springfield Joe Smith, 800 Mt. Road Springfield
We (have to) consider similar items as
Cannot use linear sketches
may be mapped into different coordinates of the sketching vector
8-1
Music, Images, ... After compressions, resize, reformat, etc.
8-2
Music, Images, ... After compressions, resize, reformat, etc.
“SPAA 2015” “27th ACM Symposium on Parallelism in Algorithms and Architectures” “ACM FCRC SPAA’15”
Queries of the same meaning sent to Google
9-1
Related to Entity Resolution: Identify and link/group
different manifestations of the same real world object. Very important in data cleaning / integration. Have been studied for 40 years in DB, also in AI, NT.
Centralized, detect items representing the same entity, merge/output all distinct entities.
E.g. [Gill& Goldacre’03, Koudas et al.’06, Elmagarmid et al.’07, Herzog et al.’07, Dong& Naumann’09, Willinger et al.’09, Christen’12] for introductions, and [Getoor and Machanavajjhala’12] for a toturial.
9-2
Related to Entity Resolution: Identify and link/group
different manifestations of the same real world object. Very important in data cleaning / integration. Have been studied for 40 years in DB, also in AI, NT.
Centralized, detect items representing the same entity, merge/output all distinct entities. This work: distributed, statistical estimations,
E.g. [Gill& Goldacre’03, Koudas et al.’06, Elmagarmid et al.’07, Herzog et al.’07, Dong& Naumann’09, Willinger et al.’09, Christen’12] for introductions, and [Getoor and Machanavajjhala’12] for a toturial.
9-3
Related to Entity Resolution: Identify and link/group
different manifestations of the same real world object. Very important in data cleaning / integration. Have been studied for 40 years in DB, also in AI, NT.
Centralized, detect items representing the same entity, merge/output all distinct entities. This work: distributed, statistical estimations, We want more communication-efficient algorithms (o(input size)), without a comprehensive de-duplication.
E.g. [Gill& Goldacre’03, Koudas et al.’06, Elmagarmid et al.’07, Herzog et al.’07, Dong& Naumann’09, Willinger et al.’09, Christen’12] for introductions, and [Getoor and Machanavajjhala’12] for a toturial.
10-1
Problem: how can we perform noise-resilient statistical estimation in the coordinator model comm. efficiently?
Assume all parties are provided with a pairwise distance metric and a threshold determining whether two items u, v rep. the same entity (denoted by u ∼ v) or not.
S1 S2 S3 Sk C
Goal: minimize communication & #rounds
10-2
Problem: how can we perform noise-resilient statistical estimation in the coordinator model comm. efficiently?
Assume all parties are provided with a pairwise distance metric and a threshold determining whether two items u, v rep. the same entity (denoted by u ∼ v) or not.
S1 S2 S3 Sk C
Goal: minimize communication & #rounds The distance metric design is a separate issue. We will design a framework so that users can plug-in any “distance metric” at run time.
11-1
Remark 1. We do not specify the distance function in our algorithms, for two reasons:
(1) Allows our algorithms to work with any distance functions. (2) Sometimes it is very hard to assume that similarities between items can be expressed by a well-known distance function: “AT&T Corporation” is closer to “IBM Corporation” than “AT&T Corp” under the edit distance!
11-2
Remark 1. We do not specify the distance function in our algorithms, for two reasons:
(1) Allows our algorithms to work with any distance functions. (2) Sometimes it is very hard to assume that similarities between items can be expressed by a well-known distance function: “AT&T Corporation” is closer to “IBM Corporation” than “AT&T Corp” under the edit distance!
Remark 2. We assume transitivity: if u ∼ v, v ∼ w then u ∼ w. In other words, the noise is “well-shaped”. One may come up with the following problematic situation: we have a ∼ b, b ∼ c, . . . , y ∼ z, however, a ∼ z. Our algorithm still work if the number of “outliers” is small.
12-1
Remark 3. Do exist approaches w/o assuming transitivity. E.g., assume so-called ICAR properties [BGM+09], or use clustering based approaches [ACN08]. Unlikely to have comm.-efficient algorithms in our setting.
12-2
Remark 3. Do exist approaches w/o assuming transitivity. E.g., assume so-called ICAR properties [BGM+09], or use clustering based approaches [ACN08]. Unlikely to have comm.-efficient algorithms in our setting. Remark 4. Whether there exists a magic hash function that can map (only) items in the same group into the same bucket and can be described succinctly? Answer: NO
13-1
i∈[k] Si, let m = |S|.
set of groups G = {G1, . . . , Gn}. Each group Gi represents a distinct universe element.
O(·) hides poly log(m/ǫ) factors.
S1 S2 S3 Sk C
14-1
noisy data noise-free data (comm.) bits rounds bits F0 ˜ O(min{k/ǫ3, k2/ǫ2}) ˜ O(1) Ω(k/ǫ2) [WZ12,WZ14] L0-sampling ˜ O(k) ˜ O(1) Ω(k) Fp (p ≥ 1) ˜ O((kp−1 + k3)/ǫ3) O(1) Ω(kp−1/ǫ2) [WZ12] (φ, ǫ)-HH ˜ O(min{k/ǫ, 1/ǫ2}) O(1) Ω(min{
√ k ǫ , 1 ǫ2 }) [HYZ12,WZ12]
Entropy ˜ O(k/ǫ2) O(1) Ω(k/ǫ2) [WZ12]
i∈[n] |Gi|p.
We consider F0 and Fp (p ≥ 1), and allow a (1 + ǫ)-approximation.
uniformly at random from G.
i∈[n] |Gi | m log m |Gi |.
We allow a (1 + ǫ)-approximation.
15-1
16-1
Simple. ˜ O(k2/ǫ2) comm. 2 rounds. Complicated. ˜ O(k/ǫ3) comm. ˜ O(1) rounds
Better than ˜ O(k2/ǫ2) bits because (1) we want to scale on k (2) later used in ℓ0-sampling with ǫ = Θ(1)
17-1
i∈[k] |Si|.
(a) jointly sample a random item uj ∈ S; Let Guj be the group containing uj. (b) jointly compute
η
Theorem
Simple-Sampling gives a (1 + ǫ) approximation of F0 with probability 2/3 using ˜ O(k2/ǫ2) bits and 2 rounds.
Algorithm Simple-Sampling
S1 S2 S3 Sk C
18-1
˜ O(k/ǫ3) bits ˜ O(1) rounds
19-1
Main idea: reduce the variance of Xj in Simple-Sampling – If we can partition all groups in G into classes G0, . . . , Glog k such that Gℓ = {G ∈ G | |G| ∈ (2ℓ−1, 2ℓ]}, and apply Algo Simple-Sampling on each class individually. By doing this we can shave a factor of k in the number of samples Xj needed ( η : k/ǫ2 → 1/ǫ2).
19-2
Main idea: reduce the variance of Xj in Simple-Sampling – If we can partition all groups in G into classes G0, . . . , Glog k such that Gℓ = {G ∈ G | |G| ∈ (2ℓ−1, 2ℓ]}, and apply Algo Simple-Sampling on each class individually. By doing this we can shave a factor of k in the number of samples Xj needed ( η : k/ǫ2 → 1/ǫ2). – However, we cannot afford to partition the groups into classes in the distributed setting.
19-3
Main idea: reduce the variance of Xj in Simple-Sampling – If we can partition all groups in G into classes G0, . . . , Glog k such that Gℓ = {G ∈ G | |G| ∈ (2ℓ−1, 2ℓ]}, and apply Algo Simple-Sampling on each class individually. By doing this we can shave a factor of k in the number of samples Xj needed ( η : k/ǫ2 → 1/ǫ2). – However, we cannot afford to partition the groups into classes in the distributed setting. Our techniques: local hierarchical partition (have inconsistency) + distributed rejection sampling (resolve the inconsistency)
Fairly complicated (use Algo Simple-Sampling as a subroutine). See the paper for details.
20-1
O(k) communication and ˜ O(1) rounds. – Use Algorithm for F0 as a subroutine
O((kp−1 + k3)/ǫ3) comm. and ˜ O(1) rounds. – Adapt a very algo by Kannan, Vempala and Woodruff. (COLT 2014)
O(min{k/ǫ, 1/ǫ2}) comm. and O(1) rounds. – Easy
O(k/ǫ2) comm. and O(1) rounds. – Adapt an algo by Chakrabarti, Cormode and McGregor (SODA 2007) in data stream
21-1
– Can we get a (better) upper bound ˜ O(k/ǫ2) for F0? – Can we improve the round complexities of F0 and L0-sampling from ˜ O(1) to O(1)? – Can we remove the k3 factor in the communication cost for Fp?
21-2
– Can we get a (better) upper bound ˜ O(k/ǫ2) for F0? – Can we improve the round complexities of F0 and L0-sampling from ˜ O(1) to O(1)? – Can we remove the k3 factor in the communication cost for Fp?
Lp-sampling?
21-3
– Can we get a (better) upper bound ˜ O(k/ǫ2) for F0? – Can we improve the round complexities of F0 and L0-sampling from ˜ O(1) to O(1)? – Can we remove the k3 factor in the communication cost for Fp?
Lp-sampling?
21-4
– Can we get a (better) upper bound ˜ O(k/ǫ2) for F0? – Can we improve the round complexities of F0 and L0-sampling from ˜ O(1) to O(1)? – Can we remove the k3 factor in the communication cost for Fp?
Lp-sampling?
22-1