1-1
Distributed Statistical Estimation of Matrix Products with - - PowerPoint PPT Presentation
Distributed Statistical Estimation of Matrix Products with - - PowerPoint PPT Presentation
Distributed Statistical Estimation of Matrix Products with Applications David Woodruff Qin Zhang CMU IUB PODS 2018 June, 2018 1-1 The Distributed Computation Model p -norms, heavy-hitters, . . . A { 0 , 1 } m n B { 0 , 1 } n m
2-1
The Distributed Computation Model
Alice and Bob want to compute some function
- n C = A × B with the minimum amount of
communication and number of rounds A ∈ {0, 1}m×n B ∈ {0, 1}n×m
p-norms, heavy-hitters, . . . The protocol can fail with prob. 0.01 (over its randomness) Communication: sum of message lengths (maximized over all choices of A, B, and randomness)
3-1
Statistics of Matrix Products: p-Norms
- Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
- Let C = A · B. Alice and Bob want to approximate
Cp =
- i,j∈[n] |Ci,j|p1/p
3-2
Statistics of Matrix Products: p-Norms
- Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
- Let C = A · B. Alice and Bob want to approximate
Cp =
- i,j∈[n] |Ci,j|p1/p
– p = 0: number of non-zero entries of C ⇒ size of set-intersection join
i-th row of A as set Ai, j-th column of B as set Bj, compute #(i, j) s.t. Ai ∩ Bj = ∅
3-3
Statistics of Matrix Products: p-Norms
- Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
- Let C = A · B. Alice and Bob want to approximate
Cp =
- i,j∈[n] |Ci,j|p1/p
– p = 0: number of non-zero entries of C ⇒ size of set-intersection join
i-th row of A as set Ai, j-th column of B as set Bj, compute #(i, j) s.t. Ai ∩ Bj = ∅
– p = 1: sum of entries of C ⇒ size of corresponding natural join
compute #(i, k, j) s.t. k ∈ Ai ∩ Bj
3-4
Statistics of Matrix Products: p-Norms
- Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
- Let C = A · B. Alice and Bob want to approximate
Cp =
- i,j∈[n] |Ci,j|p1/p
– p = 0: number of non-zero entries of C ⇒ size of set-intersection join
i-th row of A as set Ai, j-th column of B as set Bj, compute #(i, j) s.t. Ai ∩ Bj = ∅
– p = 1: sum of entries of C ⇒ size of corresponding natural join
compute #(i, k, j) s.t. k ∈ Ai ∩ Bj
– p = ∞: maximum entry of C ⇒ most “similar” (Ai, Bj) pair
4-1
Application of set-intersection join
A1 S1, S4, S9, S13 A2 S2, S9, S10 Am S6, S7, S8, S15 B1 B2 Bm S2, S3, S4 S3, S4, S9, S11 S4, S8 ⊲ ⊳
Applicant Skills Skills Opening
Find all candidate (Applicant, Opening) pairs
5-1
C1 corresponds to natural join
U V V W 1 2 1 4 2 1 2 2 2 3 1 1 2 1 4 1 1 2 3 2 0 1 0 1 1 1 1 1 1 1 1 1 1
×
U V V W
⊲ ⊳
A B
5-2
C1 corresponds to natural join
U V V W 1 2 1 4 2 1 2 2 2 3 1 1 2 1 4 1 1 2 3 2 0 1 0 1 1 1 1 1 1 1 1 1 1
×
+1
=
U V V W
⊲ ⊳
A B C
⇒
(2, 1, 2)
6-1
Statistics of Matrix Products: Heavy Hitters
- Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
- Let C = A · B, and let
HHp
φ(C) = {(i, j) | Ci,j ≥ φ Cp}
6-2
Statistics of Matrix Products: Heavy Hitters
- Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
- Let C = A · B, and let
HHp
φ(C) = {(i, j) | Ci,j ≥ φ Cp}
– ℓp-(φ, ǫ)-heavy-hitter (0 < ǫ ≤ φ ≤ 1): output a set S ⊆ {(i, j) | i, j ∈ [m]} such that HHp
φ(C) ⊆ S ⊆ HHp φ−ǫ(C)
Pairs (Ai, Bj) that are similar; for similarity join
6-3
Statistics of Matrix Products: Heavy Hitters
- Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
- Let C = A · B, and let
HHp
φ(C) = {(i, j) | Ci,j ≥ φ Cp}
– ℓp-(φ, ǫ)-heavy-hitter (0 < ǫ ≤ φ ≤ 1): output a set S ⊆ {(i, j) | i, j ∈ [m]} such that HHp
φ(C) ⊆ S ⊆ HHp φ−ǫ(C)
Both the p-norm estimation and heavy-hitters are well-studied in the data stream literature, but not as much in the distributed model (model next slide) Pairs (Ai, Bj) that are similar; for similarity join
7-1
Our Main Results – ℓp (p ∈ [0, 2])
- For any p ∈ [0, 2], a 2-round ˜
O(n/ǫ)-bit algorithm that approximates ABp within a (1 + ǫ) factor For simplicity, assume m = n
7-2
Our Main Results – ℓp (p ∈ [0, 2])
- For any p ∈ [0, 2], a 2-round ˜
O(n/ǫ)-bit algorithm that approximates ABp within a (1 + ǫ) factor – For p = 0, this improves the previous result ˜ O(n/ǫ2) (Van Gucht et al., PODS’15) – Same paper shows a lower bound of Ω(n/ǫ2/3). For simplicity, assume m = n
7-3
Our Main Results – ℓp (p ∈ [0, 2])
- For any p ∈ [0, 2], a 2-round ˜
O(n/ǫ)-bit algorithm that approximates ABp within a (1 + ǫ) factor – For p = 0, this improves the previous result ˜ O(n/ǫ2) (Van Gucht et al., PODS’15) – Same paper shows a lower bound of Ω(n/ǫ2/3). If we restrict the communication to be one-way, then we have a lower bound Ω(n/ǫ2). For simplicity, assume m = n
8-1
Our Main Results – ℓ∞
- O(1)-round algorithms that approximate AB∞
– within a factor of (2 + ǫ) use ˜ O(n1.5/ǫ) bits – within a factor of κ use ˜ O(n1.5/κ) bits
8-2
Our Main Results – ℓ∞
- O(1)-round algorithms that approximate AB∞
– within a factor of (2 + ǫ) use ˜ O(n1.5/ǫ) bits – within a factor of κ use ˜ O(n1.5/κ) bits
- Any algorithm (regardless of the #rounds used) that
approximates AB∞ within a factor of – within a factor of 2 needs Ω(n2) bits – within a factor of κ ≥ 4 needs Ω(n1.5/κ) bits
8-3
Our Main Results – ℓ∞
- O(1)-round algorithms that approximate AB∞
– within a factor of (2 + ǫ) use ˜ O(n1.5/ǫ) bits – within a factor of κ use ˜ O(n1.5/κ) bits
- Any algorithm (regardless of the #rounds used) that
approximates AB∞ within a factor of – within a factor of 2 needs Ω(n2) bits – within a factor of κ ≥ 4 needs Ω(n1.5/κ) bits
- The above results hold for binary matrices A and B.
For general matrices A, B ∈ Σn×m, the bound is ˜ Θ(n2/κ2) bits (O(1)-round for UB, any round for LB)
9-1
Our Main Results – Heavy Hitters
- For binary matrices A and B, for any p ∈ (0, 2], an
O(1)-round ˜ O(n + φ
ǫ2 )-bit algorithm that computes
ℓp-(φ, ǫ)-heavy-hitters
9-2
Our Main Results – Heavy Hitters
- For binary matrices A and B, for any p ∈ (0, 2], an
O(1)-round ˜ O(n + φ
ǫ2 )-bit algorithm that computes
ℓp-(φ, ǫ)-heavy-hitters
- For general matrices A and B, for any p ∈ (0, 2], we
- btain O(1)-round ˜
O(
√φ ǫ · n) bits algorithms
9-3
Our Main Results – Heavy Hitters
- For binary matrices A and B, for any p ∈ (0, 2], an
O(1)-round ˜ O(n + φ
ǫ2 )-bit algorithm that computes
ℓp-(φ, ǫ)-heavy-hitters
- For general matrices A and B, for any p ∈ (0, 2], we
- btain O(1)-round ˜
O(
√φ ǫ · n) bits algorithms
All of our results above can be easily extended to rectangular matrices where A ∈ Σm×n and B ∈ Σn×m
10-1
Previous Results
- Most relevant (Van Gucht et al., PODS’15). It studies
set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of AB0 mentioned before
10-2
Previous Results
- Most relevant (Van Gucht et al., PODS’15). It studies
set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of AB0 mentioned before
- A number of recent works look at distributed linear
algebra problems (Balcan et al. KDD’16; Boutsidis et
- al. STOC’16; Woodruff&Zhong, ICDE’16; etc.)
These works concern statistics estimation on C = A + B, compared with C = A · B studied in this paper
10-3
Previous Results
- Most relevant (Van Gucht et al., PODS’15). It studies
set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of AB0 mentioned before
- A number of recent works look at distributed linear
algebra problems (Balcan et al. KDD’16; Boutsidis et
- al. STOC’16; Woodruff&Zhong, ICDE’16; etc.)
These works concern statistics estimation on C = A + B, compared with C = A · B studied in this paper
- Similar problems have been studied in the RAM model
(Cohen&Lewis, J. Algorithms, ’99; Pagh TOCT’13; etc.)
11-1
(1 + ǫ)-approximate ℓ0
12-1
- Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
- Let C = A · B. Goal: (1 + ǫ)-approximate C0
(1 + ǫ)-approximate ℓ0
High level idea:
- 1. First perform a rough estimation of the number of non-zero
entries in the rows of C
- 2. Use the rough estimation to partition the rows of C to groups
s.t. rows in the same group have similar #non-zero entries
- 3. Sample rows in each group of C with a probability propotional to
the (estimated) average #non-zero entries of the group
- 4. Use sampled rows to estimate #non-zero entries of C
× = A B C
n × n n × n n × n
B
12-2
- Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
- Let C = A · B. Goal: (1 + ǫ)-approximate C0
(1 + ǫ)-approximate ℓ0
High level idea:
- 1. First perform a rough estimation of the number of non-zero
entries in the rows of C
- 2. Use the rough estimation to partition the rows of C to groups
s.t. rows in the same group have similar #non-zero entries
- 3. Sample rows in each group of C with a probability propotional to
the (estimated) average #non-zero entries of the group
- 4. Use sampled rows to estimate #non-zero entries of C
× = A B C
n × n n × n n × n
Gℓ B
13-1
(1 + ǫ)-approximate ℓ0 (cont.)
- Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
- Let C = A · B. Goal: (1 + ǫ)-approximate C0
- Algorithm. Set β = √ǫ, ρ = Θ(1/ǫ)
- 1. Use the algorthm in VWWZ’15 to get a (1 + β)-approx of C
(denoted by ˜ C)
- 2. Partition the n rows of ˜
C to L = O(log n/β) groups G1, . . . , GL, s.t. Gℓ contains all row i ∈ [n] with (1 + β)ℓ ≤
- Cℓ,∗
- 0 ≤ (1 + β)ℓ+1
- 3. For each group ℓ, Alice samples each row i ∈ Gℓ w.pr.
pℓ =
ρ
˜
C0
·
- i∈Gℓ
Ci,∗0 |Gℓ|
. A′ : matrix containing sampled rows of A. Alice sends A′ to Bob
- 4. Bob computes C ′ ← A′B, outputs
ℓ∈[L]
- i∈Gℓ
1 pℓ
- C ′
i,∗
13-2
(1 + ǫ)-approximate ℓ0 (cont.)
- Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
- Let C = A · B. Goal: (1 + ǫ)-approximate C0
- Algorithm. Set β = √ǫ, ρ = Θ(1/ǫ)
- 1. Use the algorthm in VWWZ’15 to get a (1 + β)-approx of C
(denoted by ˜ C)
- 2. Partition the n rows of ˜
C to L = O(log n/β) groups G1, . . . , GL, s.t. Gℓ contains all row i ∈ [n] with (1 + β)ℓ ≤
- Cℓ,∗
- 0 ≤ (1 + β)ℓ+1
- 3. For each group ℓ, Alice samples each row i ∈ Gℓ w.pr.
pℓ =
ρ
˜
C0
·
- i∈Gℓ
Ci,∗0 |Gℓ|
. A′ : matrix containing sampled rows of A. Alice sends A′ to Bob
- 4. Bob computes C ′ ← A′B, outputs
ℓ∈[L]
- i∈Gℓ
1 pℓ
- C ′
i,∗
- Correctness: expectation
+ variance estimation
13-3
(1 + ǫ)-approximate ℓ0 (cont.)
- Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
- Let C = A · B. Goal: (1 + ǫ)-approximate C0
- Algorithm. Set β = √ǫ, ρ = Θ(1/ǫ)
- 1. Use the algorthm in VWWZ’15 to get a (1 + β)-approx of C
(denoted by ˜ C)
- 2. Partition the n rows of ˜
C to L = O(log n/β) groups G1, . . . , GL, s.t. Gℓ contains all row i ∈ [n] with (1 + β)ℓ ≤
- Cℓ,∗
- 0 ≤ (1 + β)ℓ+1
- 3. For each group ℓ, Alice samples each row i ∈ Gℓ w.pr.
pℓ =
ρ
˜
C0
·
- i∈Gℓ
Ci,∗0 |Gℓ|
. A′ : matrix containing sampled rows of A. Alice sends A′ to Bob
- 4. Bob computes C ′ ← A′B, outputs
ℓ∈[L]
- i∈Gℓ
1 pℓ
- C ′
i,∗
- Correctness: expectation
+ variance estimation
˜ O(n · ρ) = ˜ O(n/ǫ) bits ˜ O(n/β2) = ˜ O(n/ǫ) bits
14-1
(2 + ǫ)-approximate ℓ∞
15-1
(2 + ǫ)-approximate ℓ∞
A0 = A A1 = subsample each entry of A w.pr.
1 1+ǫ
A2 = subsample each entry of A w.pr.
1 (1+ǫ)2
... C 0 = A0 × B C 1 = A1 × B C 2 = A2 × B ...
× = A B C
n × n n × n n × n
B
- Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
- Let C = A · B. Goal: (1 + ǫ)-approximate C0
15-2
(2 + ǫ)-approximate ℓ∞
A0 = A A1 = subsample each entry of A w.pr.
1 1+ǫ
A2 = subsample each entry of A w.pr.
1 (1+ǫ)2
... C 0 = A0 × B C 1 = A1 × B C 2 = A2 × B ...
× = A B C
n × n n × n n × n
B
- Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
- Let C = A · B. Goal: (1 + ǫ)-approximate C0
The idea: subsample C (via subsampling A) to a level ℓ s.t. (1) ℓ is as large as possible, or, ℓ1(C ℓ) is as small as possible (2) ℓ∞(C ℓ) · (1 + ǫ)ℓ still approximates ℓ∞(C) well.
16-1
(2 + ǫ)-approximate ℓ∞ (cont.)
Algorithm Set L = O(log n/ǫ), γ = Θ(log n/ǫ2)
- 1. For ℓ = 0, 1, . . . , L, let C ℓ ← AℓB
Aℓ ⇐ sample each ‘1’ in A w.pr. pℓ =
1 (1+ǫ)ℓ .
- 2. Let ℓ∗ be the smallest ℓ for which
- C ℓ
- 1 ≤ γn2.
- 3. For each j ∈ [n]
(a) uj: #‘1’s in j-th column of Aℓ∗; vj: #‘1’s in j-th row of B (b) If uj ≤ vj, then Alice sends j-th column of Aℓ∗ to Bob;
- therwise Bob sends j-th row of B to Alice
- 4. Alice and Bob use received information to compute matrices CA
and CB respectively, s.t. CA + CB = C ℓ∗
- 5. Output max
- CA∞
pℓ∗
,
CB ∞ pℓ∗
- Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
- Let C = A · B. Goal: (1 + ǫ)-approximate C0
17-1
(2 + ǫ)-approximate ℓ∞ (cont.)
Lemma: With probability 1 − 1
n2 ,
- C ℓ∗
- ∞
pℓ∗
approximates C∞ within a factor of 1 + ǫ. Simple Fact: If CA + CB = C ℓ∗, then
- C ℓ∗
- ∞
2 ≤ max {CA∞ , CB∞} ≤
- C ℓ∗
- ∞
- Correctness
Put together, max
- CA∞
pℓ∗
, CB∞
pℓ∗
- approximates
C∞ within a factor of 2 + ǫ.
18-1
(2 + ǫ)-approximate ℓ∞ (cont.)
- Communication cost
For each j ∈ [n] (a) uj: #‘1’s in j-th column of Aℓ∗; vj: #‘1’s in j-th row of B (b) If uj ≤ vj, then Alice sends j-th column of Aℓ∗ to Bob;
- therwise Bob sends j-th row of B to Alice
Bottleneck:
18-2
(2 + ǫ)-approximate ℓ∞ (cont.)
- Communication cost
For each j ∈ [n] (a) uj: #‘1’s in j-th column of Aℓ∗; vj: #‘1’s in j-th row of B (b) If uj ≤ vj, then Alice sends j-th column of Aℓ∗ to Bob;
- therwise Bob sends j-th row of B to Alice
Bottleneck: For each j ∈ [n], we analyze two cases:
- If uj, vj > √n/ǫ, # such j is bounded by
- C ℓ∗
- 1
- (√n/ǫ)2
The comm. cost can be bounded by ˜ O
- n1.5
ǫ
- .
- If min{uj, vj} ≤ √n/ǫ, the communication is bounded by
- j:min{uj ,vj }≤
√n ǫ
min{uj, vj} ≤ n × √n ǫ ≤ n1.5 ǫ .
19-1
Concluding Remarks
Main results:
- The communication cost for (1 + ǫ)-approximating
ℓp (p ∈ [0, 2]) with Σ = Z is ˜ O(n/ǫ).
- The communication cost for (2 + ǫ)-approximating
ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/ǫ); that for κ-approximating ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/κ)
- The communication cost for ℓp-(φ, ǫ)-heavy-hitters
with Σ = Z is ˜ O(
√φ ǫ n);
that for ℓp-(φ, ǫ)-heavy-hitters with Σ = {0, 1} is ˜ O(n + φ
ǫ2 ).
19-2
Concluding Remarks
Main results:
- The communication cost for (1 + ǫ)-approximating
ℓp (p ∈ [0, 2]) with Σ = Z is ˜ O(n/ǫ).
- The communication cost for (2 + ǫ)-approximating
ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/ǫ); that for κ-approximating ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/κ)
- The communication cost for ℓp-(φ, ǫ)-heavy-hitters
with Σ = Z is ˜ O(
√φ ǫ n);
that for ℓp-(φ, ǫ)-heavy-hitters with Σ = {0, 1} is ˜ O(n + φ
ǫ2 ).
Open question: close the gap between this UB and the Ω(n/ǫ2/3) LB. Open question: tight LBs?
20-1
Thank you! Questions?
21-1
A lower bound for ℓ∞
22-1
A Lower Bound for ℓ∞
x ∈ {0, 1}n2/4 y ∈ {0, 1}n2/4
∃i, x[i] = y[i] = 1?
Set-disjointness Theorem (Bar-Yossef et al. JCSS’04): Any randomized algorithm that solves the above set-disjointness problem w.pr. 0.51 needs Ω(n2) comm.
23-1
A reduction from set-disjointness to ℓ∞(AB)
- 1. Alice partitions x ∈ {0, 1}n2/4 to n/2 chunks of size
n/2 each, and uses them as rows to construct A′ ∈ {0, 1}
n 2 × n 2 . Further, let
A = A′ I
- ,
- 2. Similarly, Bob uses y ∈ {0, 1}n2/4 to construct
B′ ∈ {0, 1}
n 2 × n 2 , and futher let
B =
- I
B′
- .
- 3. We have A · B∞ = A′ + B′∞, which is 2 if