[PPT] - Distributed Statistical Estimation of Matrix Products with PowerPoint Presentation

SLIDE 1

1-1

Distributed Statistical Estimation of Matrix Products with Applications

PODS 2018 June, 2018

Qin Zhang IUB David Woodruff CMU

SLIDE 2

2-1

The Distributed Computation Model

Alice and Bob want to compute some function

n C = A × B with the minimum amount of

communication and number of rounds A ∈ {0, 1}m×n B ∈ {0, 1}n×m

p-norms, heavy-hitters, . . . The protocol can fail with prob. 0.01 (over its randomness) Communication: sum of message lengths (maximized over all choices of A, B, and randomness)

SLIDE 3

3-1

Statistics of Matrix Products: p-Norms

Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
Let C = A · B. Alice and Bob want to approximate

Cp =

i,j∈[n] |Ci,j|p1/p

SLIDE 4

3-2

Statistics of Matrix Products: p-Norms

Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
Let C = A · B. Alice and Bob want to approximate

Cp =

i,j∈[n] |Ci,j|p1/p

– p = 0: number of non-zero entries of C ⇒ size of set-intersection join

i-th row of A as set Ai, j-th column of B as set Bj, compute #(i, j) s.t. Ai ∩ Bj = ∅

SLIDE 5

3-3

Statistics of Matrix Products: p-Norms

Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
Let C = A · B. Alice and Bob want to approximate

Cp =

i,j∈[n] |Ci,j|p1/p

– p = 0: number of non-zero entries of C ⇒ size of set-intersection join

i-th row of A as set Ai, j-th column of B as set Bj, compute #(i, j) s.t. Ai ∩ Bj = ∅

– p = 1: sum of entries of C ⇒ size of corresponding natural join

compute #(i, k, j) s.t. k ∈ Ai ∩ Bj

SLIDE 6

3-4

Statistics of Matrix Products: p-Norms

Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
Let C = A · B. Alice and Bob want to approximate

Cp =

i,j∈[n] |Ci,j|p1/p

– p = 0: number of non-zero entries of C ⇒ size of set-intersection join

i-th row of A as set Ai, j-th column of B as set Bj, compute #(i, j) s.t. Ai ∩ Bj = ∅

– p = 1: sum of entries of C ⇒ size of corresponding natural join

compute #(i, k, j) s.t. k ∈ Ai ∩ Bj

– p = ∞: maximum entry of C ⇒ most “similar” (Ai, Bj) pair

SLIDE 7

4-1

Application of set-intersection join

A1 S1, S4, S9, S13 A2 S2, S9, S10 Am S6, S7, S8, S15 B1 B2 Bm S2, S3, S4 S3, S4, S9, S11 S4, S8 ⊲ ⊳

Applicant Skills Skills Opening

Find all candidate (Applicant, Opening) pairs

SLIDE 8

5-1

C1 corresponds to natural join

U V V W 1 2 1 4 2 1 2 2 2 3 1 1 2 1 4 1 1 2 3 2 0 1 0 1 1 1 1 1 1 1 1 1 1

×

U V V W

⊲ ⊳

A B

SLIDE 9

5-2

C1 corresponds to natural join

U V V W 1 2 1 4 2 1 2 2 2 3 1 1 2 1 4 1 1 2 3 2 0 1 0 1 1 1 1 1 1 1 1 1 1

×

+1

=

U V V W

⊲ ⊳

A B C

⇒

(2, 1, 2)

SLIDE 10

6-1

Statistics of Matrix Products: Heavy Hitters

Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
Let C = A · B, and let

HHp

φ(C) = {(i, j) | Ci,j ≥ φ Cp}

SLIDE 11

6-2

Statistics of Matrix Products: Heavy Hitters

Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
Let C = A · B, and let

HHp

φ(C) = {(i, j) | Ci,j ≥ φ Cp}

– ℓp-(φ, ǫ)-heavy-hitter (0 < ǫ ≤ φ ≤ 1): output a set S ⊆ {(i, j) | i, j ∈ [m]} such that HHp

φ(C) ⊆ S ⊆ HHp φ−ǫ(C)

Pairs (Ai, Bj) that are similar; for similarity join

SLIDE 12

6-3

Statistics of Matrix Products: Heavy Hitters

Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
Let C = A · B, and let

HHp

φ(C) = {(i, j) | Ci,j ≥ φ Cp}

– ℓp-(φ, ǫ)-heavy-hitter (0 < ǫ ≤ φ ≤ 1): output a set S ⊆ {(i, j) | i, j ∈ [m]} such that HHp

φ(C) ⊆ S ⊆ HHp φ−ǫ(C)

Both the p-norm estimation and heavy-hitters are well-studied in the data stream literature, but not as much in the distributed model (model next slide) Pairs (Ai, Bj) that are similar; for similarity join

SLIDE 13

7-1

Our Main Results – ℓp (p ∈ [0, 2])

For any p ∈ [0, 2], a 2-round ˜

O(n/ǫ)-bit algorithm that approximates ABp within a (1 + ǫ) factor For simplicity, assume m = n

SLIDE 14

7-2

Our Main Results – ℓp (p ∈ [0, 2])

For any p ∈ [0, 2], a 2-round ˜

O(n/ǫ)-bit algorithm that approximates ABp within a (1 + ǫ) factor – For p = 0, this improves the previous result ˜ O(n/ǫ2) (Van Gucht et al., PODS’15) – Same paper shows a lower bound of Ω(n/ǫ2/3). For simplicity, assume m = n

SLIDE 15

7-3

Our Main Results – ℓp (p ∈ [0, 2])

For any p ∈ [0, 2], a 2-round ˜

O(n/ǫ)-bit algorithm that approximates ABp within a (1 + ǫ) factor – For p = 0, this improves the previous result ˜ O(n/ǫ2) (Van Gucht et al., PODS’15) – Same paper shows a lower bound of Ω(n/ǫ2/3). If we restrict the communication to be one-way, then we have a lower bound Ω(n/ǫ2). For simplicity, assume m = n

SLIDE 16

8-1

Our Main Results – ℓ∞

O(1)-round algorithms that approximate AB∞

– within a factor of (2 + ǫ) use ˜ O(n1.5/ǫ) bits – within a factor of κ use ˜ O(n1.5/κ) bits

SLIDE 17

8-2

Our Main Results – ℓ∞

O(1)-round algorithms that approximate AB∞

– within a factor of (2 + ǫ) use ˜ O(n1.5/ǫ) bits – within a factor of κ use ˜ O(n1.5/κ) bits

Any algorithm (regardless of the #rounds used) that

approximates AB∞ within a factor of – within a factor of 2 needs Ω(n2) bits – within a factor of κ ≥ 4 needs Ω(n1.5/κ) bits

SLIDE 18

8-3

Our Main Results – ℓ∞

O(1)-round algorithms that approximate AB∞

– within a factor of (2 + ǫ) use ˜ O(n1.5/ǫ) bits – within a factor of κ use ˜ O(n1.5/κ) bits

Any algorithm (regardless of the #rounds used) that

approximates AB∞ within a factor of – within a factor of 2 needs Ω(n2) bits – within a factor of κ ≥ 4 needs Ω(n1.5/κ) bits

The above results hold for binary matrices A and B.

For general matrices A, B ∈ Σn×m, the bound is ˜ Θ(n2/κ2) bits (O(1)-round for UB, any round for LB)

SLIDE 19

9-1

Our Main Results – Heavy Hitters

For binary matrices A and B, for any p ∈ (0, 2], an

O(1)-round ˜ O(n + φ

ǫ2 )-bit algorithm that computes

ℓp-(φ, ǫ)-heavy-hitters

SLIDE 20

9-2

Our Main Results – Heavy Hitters

For binary matrices A and B, for any p ∈ (0, 2], an

O(1)-round ˜ O(n + φ

ǫ2 )-bit algorithm that computes

ℓp-(φ, ǫ)-heavy-hitters

For general matrices A and B, for any p ∈ (0, 2], we
btain O(1)-round ˜

O(

√φ ǫ · n) bits algorithms

SLIDE 21

9-3

Our Main Results – Heavy Hitters

For binary matrices A and B, for any p ∈ (0, 2], an

O(1)-round ˜ O(n + φ

ǫ2 )-bit algorithm that computes

ℓp-(φ, ǫ)-heavy-hitters

For general matrices A and B, for any p ∈ (0, 2], we
btain O(1)-round ˜

O(

√φ ǫ · n) bits algorithms

All of our results above can be easily extended to rectangular matrices where A ∈ Σm×n and B ∈ Σn×m

SLIDE 22

10-1

Previous Results

Most relevant (Van Gucht et al., PODS’15). It studies

set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of AB0 mentioned before

SLIDE 23

10-2

Previous Results

Most relevant (Van Gucht et al., PODS’15). It studies

set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of AB0 mentioned before

A number of recent works look at distributed linear

algebra problems (Balcan et al. KDD’16; Boutsidis et

al. STOC’16; Woodruff&Zhong, ICDE’16; etc.)

These works concern statistics estimation on C = A + B, compared with C = A · B studied in this paper

SLIDE 24

10-3

Previous Results

Most relevant (Van Gucht et al., PODS’15). It studies

set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of AB0 mentioned before

A number of recent works look at distributed linear

algebra problems (Balcan et al. KDD’16; Boutsidis et

al. STOC’16; Woodruff&Zhong, ICDE’16; etc.)

These works concern statistics estimation on C = A + B, compared with C = A · B studied in this paper

Similar problems have been studied in the RAM model

(Cohen&Lewis, J. Algorithms, ’99; Pagh TOCT’13; etc.)

SLIDE 25

11-1

(1 + ǫ)-approximate ℓ0

SLIDE 26

12-1

Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
Let C = A · B. Goal: (1 + ǫ)-approximate C0

(1 + ǫ)-approximate ℓ0

High level idea:

1. First perform a rough estimation of the number of non-zero

entries in the rows of C

2. Use the rough estimation to partition the rows of C to groups

s.t. rows in the same group have similar #non-zero entries

3. Sample rows in each group of C with a probability propotional to

the (estimated) average #non-zero entries of the group

4. Use sampled rows to estimate #non-zero entries of C

× = A B C

n × n n × n n × n

B

SLIDE 27

12-2

Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
Let C = A · B. Goal: (1 + ǫ)-approximate C0

(1 + ǫ)-approximate ℓ0

High level idea:

1. First perform a rough estimation of the number of non-zero

entries in the rows of C

2. Use the rough estimation to partition the rows of C to groups

s.t. rows in the same group have similar #non-zero entries

3. Sample rows in each group of C with a probability propotional to

the (estimated) average #non-zero entries of the group

4. Use sampled rows to estimate #non-zero entries of C

× = A B C

n × n n × n n × n

Gℓ B

SLIDE 28

13-1

(1 + ǫ)-approximate ℓ0 (cont.)

Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
Let C = A · B. Goal: (1 + ǫ)-approximate C0
Algorithm. Set β = √ǫ, ρ = Θ(1/ǫ)
1. Use the algorthm in VWWZ’15 to get a (1 + β)-approx of C

(denoted by ˜ C)

2. Partition the n rows of ˜

C to L = O(log n/β) groups G1, . . . , GL, s.t. Gℓ contains all row i ∈ [n] with (1 + β)ℓ ≤

Cℓ,∗
0 ≤ (1 + β)ℓ+1
3. For each group ℓ, Alice samples each row i ∈ Gℓ w.pr.

pℓ =

ρ

˜

C0

·

i∈Gℓ

Ci,∗0 |Gℓ|

. A′ : matrix containing sampled rows of A. Alice sends A′ to Bob

4. Bob computes C ′ ← A′B, outputs

ℓ∈[L]

i∈Gℓ

1 pℓ

C ′

i,∗

SLIDE 29

13-2

(1 + ǫ)-approximate ℓ0 (cont.)

Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
Let C = A · B. Goal: (1 + ǫ)-approximate C0
Algorithm. Set β = √ǫ, ρ = Θ(1/ǫ)
1. Use the algorthm in VWWZ’15 to get a (1 + β)-approx of C

(denoted by ˜ C)

2. Partition the n rows of ˜

C to L = O(log n/β) groups G1, . . . , GL, s.t. Gℓ contains all row i ∈ [n] with (1 + β)ℓ ≤

Cℓ,∗
0 ≤ (1 + β)ℓ+1
3. For each group ℓ, Alice samples each row i ∈ Gℓ w.pr.

pℓ =

ρ

˜

C0

·

i∈Gℓ

Ci,∗0 |Gℓ|

. A′ : matrix containing sampled rows of A. Alice sends A′ to Bob

4. Bob computes C ′ ← A′B, outputs

ℓ∈[L]

i∈Gℓ

1 pℓ

C ′

i,∗

Correctness: expectation

+ variance estimation

SLIDE 30

13-3

(1 + ǫ)-approximate ℓ0 (cont.)

Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
Let C = A · B. Goal: (1 + ǫ)-approximate C0
Algorithm. Set β = √ǫ, ρ = Θ(1/ǫ)
1. Use the algorthm in VWWZ’15 to get a (1 + β)-approx of C

(denoted by ˜ C)

2. Partition the n rows of ˜

C to L = O(log n/β) groups G1, . . . , GL, s.t. Gℓ contains all row i ∈ [n] with (1 + β)ℓ ≤

Cℓ,∗
0 ≤ (1 + β)ℓ+1
3. For each group ℓ, Alice samples each row i ∈ Gℓ w.pr.

pℓ =

ρ

˜

C0

·

i∈Gℓ

Ci,∗0 |Gℓ|

. A′ : matrix containing sampled rows of A. Alice sends A′ to Bob

4. Bob computes C ′ ← A′B, outputs

ℓ∈[L]

i∈Gℓ

1 pℓ

C ′

i,∗

Correctness: expectation

+ variance estimation

˜ O(n · ρ) = ˜ O(n/ǫ) bits ˜ O(n/β2) = ˜ O(n/ǫ) bits

SLIDE 31

14-1

(2 + ǫ)-approximate ℓ∞

SLIDE 32

15-1

(2 + ǫ)-approximate ℓ∞

A0 = A A1 = subsample each entry of A w.pr.

1 1+ǫ

A2 = subsample each entry of A w.pr.

1 (1+ǫ)2

... C 0 = A0 × B C 1 = A1 × B C 2 = A2 × B ...

× = A B C

n × n n × n n × n

B

Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
Let C = A · B. Goal: (1 + ǫ)-approximate C0

SLIDE 33

15-2

(2 + ǫ)-approximate ℓ∞

A0 = A A1 = subsample each entry of A w.pr.

1 1+ǫ

A2 = subsample each entry of A w.pr.

1 (1+ǫ)2

... C 0 = A0 × B C 1 = A1 × B C 2 = A2 × B ...

× = A B C

n × n n × n n × n

B

Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
Let C = A · B. Goal: (1 + ǫ)-approximate C0

The idea: subsample C (via subsampling A) to a level ℓ s.t. (1) ℓ is as large as possible, or, ℓ1(C ℓ) is as small as possible (2) ℓ∞(C ℓ) · (1 + ǫ)ℓ still approximates ℓ∞(C) well.

SLIDE 34

16-1

(2 + ǫ)-approximate ℓ∞ (cont.)

Algorithm Set L = O(log n/ǫ), γ = Θ(log n/ǫ2)

1. For ℓ = 0, 1, . . . , L, let C ℓ ← AℓB

Aℓ ⇐ sample each ‘1’ in A w.pr. pℓ =

1 (1+ǫ)ℓ .

2. Let ℓ∗ be the smallest ℓ for which
C ℓ
1 ≤ γn2.
3. For each j ∈ [n]

(a) uj: #‘1’s in j-th column of Aℓ∗; vj: #‘1’s in j-th row of B (b) If uj ≤ vj, then Alice sends j-th column of Aℓ∗ to Bob;

therwise Bob sends j-th row of B to Alice
4. Alice and Bob use received information to compute matrices CA

and CB respectively, s.t. CA + CB = C ℓ∗

5. Output max
CA∞

pℓ∗

,

CB ∞ pℓ∗

Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
Let C = A · B. Goal: (1 + ǫ)-approximate C0

SLIDE 35

17-1

(2 + ǫ)-approximate ℓ∞ (cont.)

Lemma: With probability 1 − 1

n2 ,

C ℓ∗
∞

pℓ∗

approximates C∞ within a factor of 1 + ǫ. Simple Fact: If CA + CB = C ℓ∗, then

C ℓ∗
∞

2 ≤ max {CA∞ , CB∞} ≤

C ℓ∗
∞
Correctness

Put together, max

CA∞

pℓ∗

, CB∞

pℓ∗

approximates

C∞ within a factor of 2 + ǫ.

SLIDE 36

18-1

(2 + ǫ)-approximate ℓ∞ (cont.)

Communication cost

For each j ∈ [n] (a) uj: #‘1’s in j-th column of Aℓ∗; vj: #‘1’s in j-th row of B (b) If uj ≤ vj, then Alice sends j-th column of Aℓ∗ to Bob;

therwise Bob sends j-th row of B to Alice

Bottleneck:

SLIDE 37

18-2

(2 + ǫ)-approximate ℓ∞ (cont.)

Communication cost

For each j ∈ [n] (a) uj: #‘1’s in j-th column of Aℓ∗; vj: #‘1’s in j-th row of B (b) If uj ≤ vj, then Alice sends j-th column of Aℓ∗ to Bob;

therwise Bob sends j-th row of B to Alice

Bottleneck: For each j ∈ [n], we analyze two cases:

If uj, vj > √n/ǫ, # such j is bounded by
C ℓ∗
1
(√n/ǫ)2

The comm. cost can be bounded by ˜ O

n1.5

ǫ

.
If min{uj, vj} ≤ √n/ǫ, the communication is bounded by
j:min{uj ,vj }≤

√n ǫ

min{uj, vj} ≤ n × √n ǫ ≤ n1.5 ǫ .

SLIDE 38

19-1

Concluding Remarks

Main results:

The communication cost for (1 + ǫ)-approximating

ℓp (p ∈ [0, 2]) with Σ = Z is ˜ O(n/ǫ).

The communication cost for (2 + ǫ)-approximating

ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/ǫ); that for κ-approximating ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/κ)

The communication cost for ℓp-(φ, ǫ)-heavy-hitters

with Σ = Z is ˜ O(

√φ ǫ n);

that for ℓp-(φ, ǫ)-heavy-hitters with Σ = {0, 1} is ˜ O(n + φ

ǫ2 ).

SLIDE 39

19-2

Concluding Remarks

Main results:

The communication cost for (1 + ǫ)-approximating

ℓp (p ∈ [0, 2]) with Σ = Z is ˜ O(n/ǫ).

The communication cost for (2 + ǫ)-approximating

ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/ǫ); that for κ-approximating ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/κ)

The communication cost for ℓp-(φ, ǫ)-heavy-hitters

with Σ = Z is ˜ O(

√φ ǫ n);

that for ℓp-(φ, ǫ)-heavy-hitters with Σ = {0, 1} is ˜ O(n + φ

ǫ2 ).

Open question: close the gap between this UB and the Ω(n/ǫ2/3) LB. Open question: tight LBs?

SLIDE 40

20-1

Thank you! Questions?

SLIDE 41

21-1

A lower bound for ℓ∞

SLIDE 42

22-1

A Lower Bound for ℓ∞

x ∈ {0, 1}n2/4 y ∈ {0, 1}n2/4

∃i, x[i] = y[i] = 1?

Set-disjointness Theorem (Bar-Yossef et al. JCSS’04): Any randomized algorithm that solves the above set-disjointness problem w.pr. 0.51 needs Ω(n2) comm.

SLIDE 43

23-1

A reduction from set-disjointness to ℓ∞(AB)

1. Alice partitions x ∈ {0, 1}n2/4 to n/2 chunks of size

n/2 each, and uses them as rows to construct A′ ∈ {0, 1}

n 2 × n 2 . Further, let

A = A′ I

,
2. Similarly, Bob uses y ∈ {0, 1}n2/4 to construct

B′ ∈ {0, 1}

n 2 × n 2 , and futher let

B =

I

B′

.
3. We have A · B∞ = A′ + B′∞, which is 2 if