Distributed Statistical Estimation of Matrix Products with - - PowerPoint PPT Presentation

distributed statistical estimation of matrix products
SMART_READER_LITE
LIVE PREVIEW

Distributed Statistical Estimation of Matrix Products with - - PowerPoint PPT Presentation

Distributed Statistical Estimation of Matrix Products with Applications David Woodruff Qin Zhang CMU IUB PODS 2018 June, 2018 1-1 The Distributed Computation Model p -norms, heavy-hitters, . . . A { 0 , 1 } m n B { 0 , 1 } n m


slide-1
SLIDE 1

1-1

Distributed Statistical Estimation of Matrix Products with Applications

PODS 2018 June, 2018

Qin Zhang IUB David Woodruff CMU

slide-2
SLIDE 2

2-1

The Distributed Computation Model

Alice and Bob want to compute some function

  • n C = A × B with the minimum amount of

communication and number of rounds A ∈ {0, 1}m×n B ∈ {0, 1}n×m

p-norms, heavy-hitters, . . . The protocol can fail with prob. 0.01 (over its randomness) Communication: sum of message lengths (maximized over all choices of A, B, and randomness)

slide-3
SLIDE 3

3-1

Statistics of Matrix Products: p-Norms

  • Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
  • Let C = A · B. Alice and Bob want to approximate

Cp =

  • i,j∈[n] |Ci,j|p1/p
slide-4
SLIDE 4

3-2

Statistics of Matrix Products: p-Norms

  • Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
  • Let C = A · B. Alice and Bob want to approximate

Cp =

  • i,j∈[n] |Ci,j|p1/p

– p = 0: number of non-zero entries of C ⇒ size of set-intersection join

i-th row of A as set Ai, j-th column of B as set Bj, compute #(i, j) s.t. Ai ∩ Bj = ∅

slide-5
SLIDE 5

3-3

Statistics of Matrix Products: p-Norms

  • Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
  • Let C = A · B. Alice and Bob want to approximate

Cp =

  • i,j∈[n] |Ci,j|p1/p

– p = 0: number of non-zero entries of C ⇒ size of set-intersection join

i-th row of A as set Ai, j-th column of B as set Bj, compute #(i, j) s.t. Ai ∩ Bj = ∅

– p = 1: sum of entries of C ⇒ size of corresponding natural join

compute #(i, k, j) s.t. k ∈ Ai ∩ Bj

slide-6
SLIDE 6

3-4

Statistics of Matrix Products: p-Norms

  • Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
  • Let C = A · B. Alice and Bob want to approximate

Cp =

  • i,j∈[n] |Ci,j|p1/p

– p = 0: number of non-zero entries of C ⇒ size of set-intersection join

i-th row of A as set Ai, j-th column of B as set Bj, compute #(i, j) s.t. Ai ∩ Bj = ∅

– p = 1: sum of entries of C ⇒ size of corresponding natural join

compute #(i, k, j) s.t. k ∈ Ai ∩ Bj

– p = ∞: maximum entry of C ⇒ most “similar” (Ai, Bj) pair

slide-7
SLIDE 7

4-1

Application of set-intersection join

A1 S1, S4, S9, S13 A2 S2, S9, S10 Am S6, S7, S8, S15 B1 B2 Bm S2, S3, S4 S3, S4, S9, S11 S4, S8 ⊲ ⊳

Applicant Skills Skills Opening

Find all candidate (Applicant, Opening) pairs

slide-8
SLIDE 8

5-1

C1 corresponds to natural join

U V V W 1 2 1 4 2 1 2 2 2 3 1 1 2 1 4 1 1 2 3 2 0 1 0 1 1 1 1 1 1 1 1 1 1

×

U V V W

⊲ ⊳

A B

slide-9
SLIDE 9

5-2

C1 corresponds to natural join

U V V W 1 2 1 4 2 1 2 2 2 3 1 1 2 1 4 1 1 2 3 2 0 1 0 1 1 1 1 1 1 1 1 1 1

×

+1

=

U V V W

⊲ ⊳

A B C

(2, 1, 2)

slide-10
SLIDE 10

6-1

Statistics of Matrix Products: Heavy Hitters

  • Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
  • Let C = A · B, and let

HHp

φ(C) = {(i, j) | Ci,j ≥ φ Cp}

slide-11
SLIDE 11

6-2

Statistics of Matrix Products: Heavy Hitters

  • Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
  • Let C = A · B, and let

HHp

φ(C) = {(i, j) | Ci,j ≥ φ Cp}

– ℓp-(φ, ǫ)-heavy-hitter (0 < ǫ ≤ φ ≤ 1): output a set S ⊆ {(i, j) | i, j ∈ [m]} such that HHp

φ(C) ⊆ S ⊆ HHp φ−ǫ(C)

Pairs (Ai, Bj) that are similar; for similarity join

slide-12
SLIDE 12

6-3

Statistics of Matrix Products: Heavy Hitters

  • Alice holds A ∈ {0, 1}m×n, Bob holds B ∈ {0, 1}n×m
  • Let C = A · B, and let

HHp

φ(C) = {(i, j) | Ci,j ≥ φ Cp}

– ℓp-(φ, ǫ)-heavy-hitter (0 < ǫ ≤ φ ≤ 1): output a set S ⊆ {(i, j) | i, j ∈ [m]} such that HHp

φ(C) ⊆ S ⊆ HHp φ−ǫ(C)

Both the p-norm estimation and heavy-hitters are well-studied in the data stream literature, but not as much in the distributed model (model next slide) Pairs (Ai, Bj) that are similar; for similarity join

slide-13
SLIDE 13

7-1

Our Main Results – ℓp (p ∈ [0, 2])

  • For any p ∈ [0, 2], a 2-round ˜

O(n/ǫ)-bit algorithm that approximates ABp within a (1 + ǫ) factor For simplicity, assume m = n

slide-14
SLIDE 14

7-2

Our Main Results – ℓp (p ∈ [0, 2])

  • For any p ∈ [0, 2], a 2-round ˜

O(n/ǫ)-bit algorithm that approximates ABp within a (1 + ǫ) factor – For p = 0, this improves the previous result ˜ O(n/ǫ2) (Van Gucht et al., PODS’15) – Same paper shows a lower bound of Ω(n/ǫ2/3). For simplicity, assume m = n

slide-15
SLIDE 15

7-3

Our Main Results – ℓp (p ∈ [0, 2])

  • For any p ∈ [0, 2], a 2-round ˜

O(n/ǫ)-bit algorithm that approximates ABp within a (1 + ǫ) factor – For p = 0, this improves the previous result ˜ O(n/ǫ2) (Van Gucht et al., PODS’15) – Same paper shows a lower bound of Ω(n/ǫ2/3). If we restrict the communication to be one-way, then we have a lower bound Ω(n/ǫ2). For simplicity, assume m = n

slide-16
SLIDE 16

8-1

Our Main Results – ℓ∞

  • O(1)-round algorithms that approximate AB∞

– within a factor of (2 + ǫ) use ˜ O(n1.5/ǫ) bits – within a factor of κ use ˜ O(n1.5/κ) bits

slide-17
SLIDE 17

8-2

Our Main Results – ℓ∞

  • O(1)-round algorithms that approximate AB∞

– within a factor of (2 + ǫ) use ˜ O(n1.5/ǫ) bits – within a factor of κ use ˜ O(n1.5/κ) bits

  • Any algorithm (regardless of the #rounds used) that

approximates AB∞ within a factor of – within a factor of 2 needs Ω(n2) bits – within a factor of κ ≥ 4 needs Ω(n1.5/κ) bits

slide-18
SLIDE 18

8-3

Our Main Results – ℓ∞

  • O(1)-round algorithms that approximate AB∞

– within a factor of (2 + ǫ) use ˜ O(n1.5/ǫ) bits – within a factor of κ use ˜ O(n1.5/κ) bits

  • Any algorithm (regardless of the #rounds used) that

approximates AB∞ within a factor of – within a factor of 2 needs Ω(n2) bits – within a factor of κ ≥ 4 needs Ω(n1.5/κ) bits

  • The above results hold for binary matrices A and B.

For general matrices A, B ∈ Σn×m, the bound is ˜ Θ(n2/κ2) bits (O(1)-round for UB, any round for LB)

slide-19
SLIDE 19

9-1

Our Main Results – Heavy Hitters

  • For binary matrices A and B, for any p ∈ (0, 2], an

O(1)-round ˜ O(n + φ

ǫ2 )-bit algorithm that computes

ℓp-(φ, ǫ)-heavy-hitters

slide-20
SLIDE 20

9-2

Our Main Results – Heavy Hitters

  • For binary matrices A and B, for any p ∈ (0, 2], an

O(1)-round ˜ O(n + φ

ǫ2 )-bit algorithm that computes

ℓp-(φ, ǫ)-heavy-hitters

  • For general matrices A and B, for any p ∈ (0, 2], we
  • btain O(1)-round ˜

O(

√φ ǫ · n) bits algorithms

slide-21
SLIDE 21

9-3

Our Main Results – Heavy Hitters

  • For binary matrices A and B, for any p ∈ (0, 2], an

O(1)-round ˜ O(n + φ

ǫ2 )-bit algorithm that computes

ℓp-(φ, ǫ)-heavy-hitters

  • For general matrices A and B, for any p ∈ (0, 2], we
  • btain O(1)-round ˜

O(

√φ ǫ · n) bits algorithms

All of our results above can be easily extended to rectangular matrices where A ∈ Σm×n and B ∈ Σn×m

slide-22
SLIDE 22

10-1

Previous Results

  • Most relevant (Van Gucht et al., PODS’15). It studies

set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of AB0 mentioned before

slide-23
SLIDE 23

10-2

Previous Results

  • Most relevant (Van Gucht et al., PODS’15). It studies

set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of AB0 mentioned before

  • A number of recent works look at distributed linear

algebra problems (Balcan et al. KDD’16; Boutsidis et

  • al. STOC’16; Woodruff&Zhong, ICDE’16; etc.)

These works concern statistics estimation on C = A + B, compared with C = A · B studied in this paper

slide-24
SLIDE 24

10-3

Previous Results

  • Most relevant (Van Gucht et al., PODS’15). It studies

set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of AB0 mentioned before

  • A number of recent works look at distributed linear

algebra problems (Balcan et al. KDD’16; Boutsidis et

  • al. STOC’16; Woodruff&Zhong, ICDE’16; etc.)

These works concern statistics estimation on C = A + B, compared with C = A · B studied in this paper

  • Similar problems have been studied in the RAM model

(Cohen&Lewis, J. Algorithms, ’99; Pagh TOCT’13; etc.)

slide-25
SLIDE 25

11-1

(1 + ǫ)-approximate ℓ0

slide-26
SLIDE 26

12-1

  • Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
  • Let C = A · B. Goal: (1 + ǫ)-approximate C0

(1 + ǫ)-approximate ℓ0

High level idea:

  • 1. First perform a rough estimation of the number of non-zero

entries in the rows of C

  • 2. Use the rough estimation to partition the rows of C to groups

s.t. rows in the same group have similar #non-zero entries

  • 3. Sample rows in each group of C with a probability propotional to

the (estimated) average #non-zero entries of the group

  • 4. Use sampled rows to estimate #non-zero entries of C

× = A B C

n × n n × n n × n

B

slide-27
SLIDE 27

12-2

  • Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
  • Let C = A · B. Goal: (1 + ǫ)-approximate C0

(1 + ǫ)-approximate ℓ0

High level idea:

  • 1. First perform a rough estimation of the number of non-zero

entries in the rows of C

  • 2. Use the rough estimation to partition the rows of C to groups

s.t. rows in the same group have similar #non-zero entries

  • 3. Sample rows in each group of C with a probability propotional to

the (estimated) average #non-zero entries of the group

  • 4. Use sampled rows to estimate #non-zero entries of C

× = A B C

n × n n × n n × n

Gℓ B

slide-28
SLIDE 28

13-1

(1 + ǫ)-approximate ℓ0 (cont.)

  • Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
  • Let C = A · B. Goal: (1 + ǫ)-approximate C0
  • Algorithm. Set β = √ǫ, ρ = Θ(1/ǫ)
  • 1. Use the algorthm in VWWZ’15 to get a (1 + β)-approx of C

(denoted by ˜ C)

  • 2. Partition the n rows of ˜

C to L = O(log n/β) groups G1, . . . , GL, s.t. Gℓ contains all row i ∈ [n] with (1 + β)ℓ ≤

  • Cℓ,∗
  • 0 ≤ (1 + β)ℓ+1
  • 3. For each group ℓ, Alice samples each row i ∈ Gℓ w.pr.

pℓ =

ρ

˜

C0

·

  • i∈Gℓ

Ci,∗0 |Gℓ|

. A′ : matrix containing sampled rows of A. Alice sends A′ to Bob

  • 4. Bob computes C ′ ← A′B, outputs

ℓ∈[L]

  • i∈Gℓ

1 pℓ

  • C ′

i,∗

slide-29
SLIDE 29

13-2

(1 + ǫ)-approximate ℓ0 (cont.)

  • Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
  • Let C = A · B. Goal: (1 + ǫ)-approximate C0
  • Algorithm. Set β = √ǫ, ρ = Θ(1/ǫ)
  • 1. Use the algorthm in VWWZ’15 to get a (1 + β)-approx of C

(denoted by ˜ C)

  • 2. Partition the n rows of ˜

C to L = O(log n/β) groups G1, . . . , GL, s.t. Gℓ contains all row i ∈ [n] with (1 + β)ℓ ≤

  • Cℓ,∗
  • 0 ≤ (1 + β)ℓ+1
  • 3. For each group ℓ, Alice samples each row i ∈ Gℓ w.pr.

pℓ =

ρ

˜

C0

·

  • i∈Gℓ

Ci,∗0 |Gℓ|

. A′ : matrix containing sampled rows of A. Alice sends A′ to Bob

  • 4. Bob computes C ′ ← A′B, outputs

ℓ∈[L]

  • i∈Gℓ

1 pℓ

  • C ′

i,∗

  • Correctness: expectation

+ variance estimation

slide-30
SLIDE 30

13-3

(1 + ǫ)-approximate ℓ0 (cont.)

  • Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
  • Let C = A · B. Goal: (1 + ǫ)-approximate C0
  • Algorithm. Set β = √ǫ, ρ = Θ(1/ǫ)
  • 1. Use the algorthm in VWWZ’15 to get a (1 + β)-approx of C

(denoted by ˜ C)

  • 2. Partition the n rows of ˜

C to L = O(log n/β) groups G1, . . . , GL, s.t. Gℓ contains all row i ∈ [n] with (1 + β)ℓ ≤

  • Cℓ,∗
  • 0 ≤ (1 + β)ℓ+1
  • 3. For each group ℓ, Alice samples each row i ∈ Gℓ w.pr.

pℓ =

ρ

˜

C0

·

  • i∈Gℓ

Ci,∗0 |Gℓ|

. A′ : matrix containing sampled rows of A. Alice sends A′ to Bob

  • 4. Bob computes C ′ ← A′B, outputs

ℓ∈[L]

  • i∈Gℓ

1 pℓ

  • C ′

i,∗

  • Correctness: expectation

+ variance estimation

˜ O(n · ρ) = ˜ O(n/ǫ) bits ˜ O(n/β2) = ˜ O(n/ǫ) bits

slide-31
SLIDE 31

14-1

(2 + ǫ)-approximate ℓ∞

slide-32
SLIDE 32

15-1

(2 + ǫ)-approximate ℓ∞

A0 = A A1 = subsample each entry of A w.pr.

1 1+ǫ

A2 = subsample each entry of A w.pr.

1 (1+ǫ)2

... C 0 = A0 × B C 1 = A1 × B C 2 = A2 × B ...

× = A B C

n × n n × n n × n

B

  • Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
  • Let C = A · B. Goal: (1 + ǫ)-approximate C0
slide-33
SLIDE 33

15-2

(2 + ǫ)-approximate ℓ∞

A0 = A A1 = subsample each entry of A w.pr.

1 1+ǫ

A2 = subsample each entry of A w.pr.

1 (1+ǫ)2

... C 0 = A0 × B C 1 = A1 × B C 2 = A2 × B ...

× = A B C

n × n n × n n × n

B

  • Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
  • Let C = A · B. Goal: (1 + ǫ)-approximate C0

The idea: subsample C (via subsampling A) to a level ℓ s.t. (1) ℓ is as large as possible, or, ℓ1(C ℓ) is as small as possible (2) ℓ∞(C ℓ) · (1 + ǫ)ℓ still approximates ℓ∞(C) well.

slide-34
SLIDE 34

16-1

(2 + ǫ)-approximate ℓ∞ (cont.)

Algorithm Set L = O(log n/ǫ), γ = Θ(log n/ǫ2)

  • 1. For ℓ = 0, 1, . . . , L, let C ℓ ← AℓB

Aℓ ⇐ sample each ‘1’ in A w.pr. pℓ =

1 (1+ǫ)ℓ .

  • 2. Let ℓ∗ be the smallest ℓ for which
  • C ℓ
  • 1 ≤ γn2.
  • 3. For each j ∈ [n]

(a) uj: #‘1’s in j-th column of Aℓ∗; vj: #‘1’s in j-th row of B (b) If uj ≤ vj, then Alice sends j-th column of Aℓ∗ to Bob;

  • therwise Bob sends j-th row of B to Alice
  • 4. Alice and Bob use received information to compute matrices CA

and CB respectively, s.t. CA + CB = C ℓ∗

  • 5. Output max
  • CA∞

pℓ∗

,

CB ∞ pℓ∗

  • Alice holds A ∈ {0, 1}n×n, Bob holds B ∈ {0, 1}n×n
  • Let C = A · B. Goal: (1 + ǫ)-approximate C0
slide-35
SLIDE 35

17-1

(2 + ǫ)-approximate ℓ∞ (cont.)

Lemma: With probability 1 − 1

n2 ,

  • C ℓ∗

pℓ∗

approximates C∞ within a factor of 1 + ǫ. Simple Fact: If CA + CB = C ℓ∗, then

  • C ℓ∗

2 ≤ max {CA∞ , CB∞} ≤

  • C ℓ∗
  • Correctness

Put together, max

  • CA∞

pℓ∗

, CB∞

pℓ∗

  • approximates

C∞ within a factor of 2 + ǫ.

slide-36
SLIDE 36

18-1

(2 + ǫ)-approximate ℓ∞ (cont.)

  • Communication cost

For each j ∈ [n] (a) uj: #‘1’s in j-th column of Aℓ∗; vj: #‘1’s in j-th row of B (b) If uj ≤ vj, then Alice sends j-th column of Aℓ∗ to Bob;

  • therwise Bob sends j-th row of B to Alice

Bottleneck:

slide-37
SLIDE 37

18-2

(2 + ǫ)-approximate ℓ∞ (cont.)

  • Communication cost

For each j ∈ [n] (a) uj: #‘1’s in j-th column of Aℓ∗; vj: #‘1’s in j-th row of B (b) If uj ≤ vj, then Alice sends j-th column of Aℓ∗ to Bob;

  • therwise Bob sends j-th row of B to Alice

Bottleneck: For each j ∈ [n], we analyze two cases:

  • If uj, vj > √n/ǫ, # such j is bounded by
  • C ℓ∗
  • 1
  • (√n/ǫ)2

The comm. cost can be bounded by ˜ O

  • n1.5

ǫ

  • .
  • If min{uj, vj} ≤ √n/ǫ, the communication is bounded by
  • j:min{uj ,vj }≤

√n ǫ

min{uj, vj} ≤ n × √n ǫ ≤ n1.5 ǫ .

slide-38
SLIDE 38

19-1

Concluding Remarks

Main results:

  • The communication cost for (1 + ǫ)-approximating

ℓp (p ∈ [0, 2]) with Σ = Z is ˜ O(n/ǫ).

  • The communication cost for (2 + ǫ)-approximating

ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/ǫ); that for κ-approximating ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/κ)

  • The communication cost for ℓp-(φ, ǫ)-heavy-hitters

with Σ = Z is ˜ O(

√φ ǫ n);

that for ℓp-(φ, ǫ)-heavy-hitters with Σ = {0, 1} is ˜ O(n + φ

ǫ2 ).

slide-39
SLIDE 39

19-2

Concluding Remarks

Main results:

  • The communication cost for (1 + ǫ)-approximating

ℓp (p ∈ [0, 2]) with Σ = Z is ˜ O(n/ǫ).

  • The communication cost for (2 + ǫ)-approximating

ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/ǫ); that for κ-approximating ℓ∞ with Σ = {0, 1} is ˜ Θ(n1.5/κ)

  • The communication cost for ℓp-(φ, ǫ)-heavy-hitters

with Σ = Z is ˜ O(

√φ ǫ n);

that for ℓp-(φ, ǫ)-heavy-hitters with Σ = {0, 1} is ˜ O(n + φ

ǫ2 ).

Open question: close the gap between this UB and the Ω(n/ǫ2/3) LB. Open question: tight LBs?

slide-40
SLIDE 40

20-1

Thank you! Questions?

slide-41
SLIDE 41

21-1

A lower bound for ℓ∞

slide-42
SLIDE 42

22-1

A Lower Bound for ℓ∞

x ∈ {0, 1}n2/4 y ∈ {0, 1}n2/4

∃i, x[i] = y[i] = 1?

Set-disjointness Theorem (Bar-Yossef et al. JCSS’04): Any randomized algorithm that solves the above set-disjointness problem w.pr. 0.51 needs Ω(n2) comm.

slide-43
SLIDE 43

23-1

A reduction from set-disjointness to ℓ∞(AB)

  • 1. Alice partitions x ∈ {0, 1}n2/4 to n/2 chunks of size

n/2 each, and uses them as rows to construct A′ ∈ {0, 1}

n 2 × n 2 . Further, let

A = A′ I

  • ,
  • 2. Similarly, Bob uses y ∈ {0, 1}n2/4 to construct

B′ ∈ {0, 1}

n 2 × n 2 , and futher let

B =

  • I

B′

  • .
  • 3. We have A · B∞ = A′ + B′∞, which is 2 if

x ∩ y = ∅, and 1 otherwise.

A Lower Bound for ℓ∞ (cont.)