The Input/Output Complexity of Sparse Matrix Multiplication Rasmus - - PowerPoint PPT Presentation

the input output complexity of sparse matrix
SMART_READER_LITE
LIVE PREVIEW

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus - - PowerPoint PPT Presentation

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh, Morten St ockel IT University of Copenhagen September 9 2014 Pagh, St ockel September 9 2014 1 / 29 Sparse matrix multiplication Problem description Sparse


slide-1
SLIDE 1

The Input/Output Complexity of Sparse Matrix Multiplication

Rasmus Pagh, Morten St¨

  • ckel

IT University of Copenhagen September 9 2014

Pagh, St¨

  • ckel

September 9 2014 1 / 29

slide-2
SLIDE 2

Sparse matrix multiplication Problem description

Sparse matrix multiplication Problem description Upper bound Size estimation Partitioning Outputting from partitions

Pagh, St¨

  • ckel

September 9 2014 2 / 29

slide-3
SLIDE 3

Sparse matrix multiplication Problem description

Overview

I Let A and C be matrices over a semiring R with N nonzero entries in

total.

I The problem: Compute matrix product [AC]i,j = P k Ai,kCk,j with Z

nonzero entries.

I Central result: Can be done in (for most of parameter space) optimal

˜ O ⇣

N p Z B p M

⌘ I/Os.

Pagh, St¨

  • ckel

September 9 2014 3 / 29

slide-4
SLIDE 4

Sparse matrix multiplication Problem description

Matrix multiplication, basics

a11 a12 ... a1p a21 a22 ... a2p . . . . . . ... . . . an1 an2 ... anp                                     A : n rows p columns

×

c11 c12 ... c1q c21 c22 ... c2q . . . . . . ... . . . cp1 cp2 ... cpq                                     C : p rows q columns

=

ac11 ac12 ... ac1q ac21 ac22 ... ac2q . . . . . . ... . . . acn1 acn2 ... acnq                                     AC = A×C : n rows q columns

Pagh, St¨

  • ckel

September 9 2014 4 / 29

slide-5
SLIDE 5

Sparse matrix multiplication Problem description

Matrix multiplication, basics

a11 a12 ... a1p a21 a22 ... a2p . . . . . . ... . . . an1 an2 ... anp                                     A : n rows p columns c11 c12 ... c1q c21 c22 ... c2q . . . . . . ... . . . cp1 cp2 ... cpq                                     C : p rows q columns ac11 ac12 ... ac1q ac21 ac22 ... ac2q . . . . . . ... . . . acn1 acn2 ... acnq                                     a21 ×c12 a

2 2

× c

2 2

a

2 p

× c

p 2

+ +...+ AC = A×C : n rows q columns

Pagh, St¨

  • ckel

September 9 2014 5 / 29

slide-6
SLIDE 6

Sparse matrix multiplication Problem description

Cancellation of elementary products

We say that we have cancellation when two or more summands of [AC]i,j = P

k Ai,kCk,j are nonzero

but the sum is zero, e.g. 2 ⇤ 3 + 1 ⇤ 6 + 0 ⇤ 4. Our algorithm handles such cases.

a11 a12 ... a1p a21 a22 ... a2p . . . . . . ... . . . an1 an2 ... anp                                     A : n rows p columns c11 c12 ... c1q c21 c22 ... c2q . . . . . . ... . . . cp1 cp2 ... cpq                                     C : p rows q columns ac11 ac12 ... ac1q ac21 ac22 ... ac2q . . . . . . ... . . . acn1 acn2 ... acnq                                     a21 ×c12 a22 × c22 a2p × cp2 + + . . . + AC = A×C : n rows q columns 1

Pagh, St¨

  • ckel

September 9 2014 6 / 29

slide-7
SLIDE 7

Sparse matrix multiplication Problem description

Motivation

Some applications:

I Computing determinants and inverses of matrices. I Bioinformatics. I Graphs: counting cycles, computing matchings.

Pagh, St¨

  • ckel

September 9 2014 7 / 29

slide-8
SLIDE 8

Sparse matrix multiplication Problem description

The semiring I/O model, 1

I A word is big enough to hold a matrix element plus its coordinates. I Internal memory that holds M words and disk of infinite size. I One I/O: Transfer B words from disk to internal memory. I Cost of an algorithm: Number of I/Os used. I Operations allowed: Semiring operations, copy and equality check.

Pagh, St¨

  • ckel

September 9 2014 8 / 29

slide-9
SLIDE 9

Sparse matrix multiplication Problem description

The semiring I/O model, 2

I We make no assumptions about cancellation. I To produce output: must invoke emit(.) on every nonzero output

entry once.

I Matrices are of size U ⇥ U. I ˜

O suppresses polylog factors in U and N.

Pagh, St¨

  • ckel

September 9 2014 9 / 29

slide-10
SLIDE 10

Sparse matrix multiplication Problem description

Our results, 1

I Let A and C be U ⇥ U matrices over semiring R with N nonzero

input and Z nonzero output entries. There exist algorithms 1 and 2 such that:

  • 1. emits the set of nonzero entries of AC with probability at least

1 1/U, using ˜ O ⇣ N p Z/(B p M) ⌘ I/Os.

  • 2. emits the set of nonzero entries of AC, and uses O
  • N 2/(MB)
  • I/Os.

I Previous best [Amossen & Pagh, 09]: ˜

O ⇣ N p Z/(BM1/8) ⌘ I/Os (boolean matrices = ) no cancellation).

Pagh, St¨

  • ckel

September 9 2014 10 / 29

slide-11
SLIDE 11

Sparse matrix multiplication Problem description

Our results, 2

I Let A and C be U ⇥ U matrices over semiring R with N nonzero

input and Z nonzero output entries. There exist algorithms 1 and 2 such that:

  • 1. emits the set of nonzero entries of AC with probability at least

1 1/U, using ˜ O ⇣ N p Z/(B p M) ⌘ I/Os.

  • 2. emits the set of nonzero entries of AC, and uses O
  • N 2/(MB)
  • I/Os.

I There exist matrices that require Ω

⇣ min ⇣

N2 MB, N p Z p MB

⌘⌘ I/Os to compute all nonzero entries of AC.

Pagh, St¨

  • ckel

September 9 2014 10 / 29

slide-12
SLIDE 12

Upper bound Size estimation

Output size estimation

Size estimation tool: Given matrices A and C with N nonzero entries, compute ε-estimate of number of nonzeroes of each column of AC using ˜ O(ε3N/B) I/Os. Black boxed used [BBFJV,07]:

Fact

For dense 1 ⇥ U vector y and sparse U ⇥ U matrix S we can compute yS in O((nnz(S)/B) logM/B(U/M)) = ˜ O((nnz(S)/B) I/Os.

Pagh, St¨

  • ckel

September 9 2014 11 / 29

slide-13
SLIDE 13

Upper bound Size estimation

Distinct elements and matrix size

I Distinct elements: Given frequency vector x of size n where xi

denotes the number of times element i occurs, then F0 = P

i |xi|0. I Fundamental problem in streaming: Estimate F0 without

materializing x.

I Observation: The distinct elements of AC is nnz(AC).

Pagh, St¨

  • ckel

September 9 2014 12 / 29

slide-14
SLIDE 14

Upper bound Size estimation

Linear distinct elements sketch, 1

Simple linear distinct elements sketch [Indyk slides, McGregor book]. Answer question: For a picked T, is F0 > (1 + ε)T?

  • 1. Select sets S1, . . . , Sk of coordinates s.t. Pr[i 2 Sj] = 1/T.
  • 2. For each Si: sj(x) = P

i2Sj xi.

  • 3. Answer yes if at most k/e of sj are zero.

Analysis: For one set Sj we have Pr[sj = 0] = (1 1/T)F0 ⇡ eF0/T . If F0 > (1 + ε)T then Pr[sj = 0] < 1/e ε/3. Repeat for k = O(ε2 log δ1) independent sets to get probability 1 δ.

Pagh, St¨

  • ckel

September 9 2014 13 / 29

slide-15
SLIDE 15

Upper bound Size estimation

Linear distinct elements sketch, 2

I Can answer if F0 > (1 + ε)T for some T. I Repeat for T = 1, (1 + ε), (1 + ε)2, . . . , n, i.e. O(ε1 log n) values. I Total space: O(ε3 log n log δ1). I Note: Random sets Sj form k ⇥ n projection matrix F and we

maintain Fx.

I Linearity: F(x + ei) = Fx + Fei

Pagh, St¨

  • ckel

September 9 2014 14 / 29

slide-16
SLIDE 16

Upper bound Size estimation

Output estimation

F is ε2 log δ1 ⇥ U. A and C are U ⇥ U. To get size estimate we must compute:

F ⇥ A ⇥ C

Pagh, St¨

  • ckel

September 9 2014 15 / 29

slide-17
SLIDE 17

Upper bound Size estimation

Output estimation

F is ε2 log δ1 ⇥ U. A and C are U ⇥ U. To get size estimate we must compute:

(F ⇥ A) ⇥ C

Due to associativity: Pick cheap order. Analysis: ε2 log δ1 invocations of dense vector sparse matrix black box: ˜ O(ε3N/B) I/Os. Note: Works with cancellation, contrary to previous size estimation.

Pagh, St¨

  • ckel

September 9 2014 15 / 29

slide-18
SLIDE 18

Upper bound Partitioning

Matrix mult partitioning, 1

A C ⇥

Pagh, St¨

  • ckel

September 9 2014 16 / 29

slide-19
SLIDE 19

Upper bound Partitioning

Matrix mult partitioning, 1

A C ⇥

Pagh, St¨

  • ckel

September 9 2014 16 / 29

slide-20
SLIDE 20

Upper bound Partitioning

Matrix mult partitioning, 2

A C

=

⇥ + ⇥ + ⇥ + ⇥

Pagh, St¨

  • ckel

September 9 2014 17 / 29

slide-21
SLIDE 21

Upper bound Partitioning

Partitioning the matrices

I What we want: Split matrices into disjoint colored groups s.t. every

color combination has at most M nonzero output entries.

I Problem: Can’t be done. I Instead: Color rows of A using c colors. For each c groups of rows, do

an independent coloring with c colors of columns of C.

⇥ + ⇥

Pagh, St¨

  • ckel

September 9 2014 18 / 29

slide-22
SLIDE 22

Upper bound Partitioning

Partitioning the matrices, 2

Overview of how to partition matrices A and C:

  • 1. Pick number of colors c =

q

nnz(AC) log U M

+ O(1)

  • 2. Recurse: Split A into A1 and A2 where it holds: A1C ⇡ nnz(AC)/2

and A2C ⇡ nnz(AC).

  • 3. After log c + O(1) recursive levels we have O(c) disjoint colored

groups of rows of A.

  • 4. For each of those groups: Repeat procedure for columns of C.
  • 5. The key point: O(c2) problems of size nnz(AC)/c2 = O(M/ log U).

Pagh, St¨

  • ckel

September 9 2014 19 / 29

slide-23
SLIDE 23

Upper bound Partitioning

Getting the correct subproblem size

Say we can do splits of A into A1, A2 s.t.

  • 1. nnz(A1C) 2

⇥ (1 log1 U) nnz(AC)/2; (1 + log1 U) nnz(AC)/2 ⇤ .

  • 2. nnz(A2C) 2

⇥ (1 log1 U) nnz(AC)/2; (1 + log1 U) nnz(AC)/2 ⇤ . Assume biggest possible positive error: after q recursions have problem

  • utput size nnz(AC)(1/2 + 1/(2 log U))q. Then after log c2 + O(1)

recursions: nnz(AC) ✓1 2 + 1 2 log U ◆log c2  nnz(AC)2 log c2e

log c2 log U

 nnz(AC)O(1)/c2 = O(M/ log U)

Pagh, St¨

  • ckel

September 9 2014 20 / 29

slide-24
SLIDE 24

Upper bound Partitioning

How to compute the split

How to do relative error 1/ log U splits: Use size estimation tool: For any set r of rows we have access to ˆ zi’s s.t. (1log1 U) nnz X

i2r

[AC]i⇤ !  X

i2r

ˆ zi  (1+log1 U) nnz X

i2r

[AC]i⇤ ! . Splitting A into A1 and A2:

  • 1. Let ˆ

Z = P

i ˆ

zi.

  • 2. Add rows from A to A1 until P

i2A1 ˆ

zi ˆ Z/2.

  • 3. The row that y overflows A1: Compute y ⇥ C directly.
  • 4. Add remaining rows to A2

Pagh, St¨

  • ckel

September 9 2014 21 / 29

slide-25
SLIDE 25

Upper bound Partitioning

I/O cost of splitting

I/O cost:

I Initial size est: ˜

O(N/B).

I Partition A: c dense-vector-sparse-matrix: ˜

O(cN/B).

I For each of the c A partitions: Size est of total ˜

O(N/B) and total c DVSM of ˜ O(cN/B).

I Total: ˜

O(cN/B) = ˜ O ✓

Np nnz(AC) B p M

◆ since c = q

nnz(AC) log U M

.

Pagh, St¨

  • ckel

September 9 2014 22 / 29

slide-26
SLIDE 26

Upper bound Outputting from partitions

Are we done?

⇥ + ⇥

Pagh, St¨

  • ckel

September 9 2014 23 / 29

slide-27
SLIDE 27

Upper bound Outputting from partitions

Status

I Where we are: have c2 = nnz(AC) log U M

subproblems with output  M/ log U.

I Central cancellation difficulty: Intermediate results can be much

larger than M.

I Our I/O aim: ˜

O(cN/B), hence we can’t pay for those cancelling inner products.

I Solution: Compute a particular polynomial and allow polynomially

small error probability.

Pagh, St¨

  • ckel

September 9 2014 24 / 29

slide-28
SLIDE 28

Upper bound Outputting from partitions

Compressed matrix mult sketches, 1

Idea from [Pagh,12]. Computes each outer product as a polynomial of degree ⇡ M/ log U.

I Let r = α4M/ log U for small constant α. I Let ht, h0 t : [U] 7! [r] be random hash functions. I The polynomial:

pt(x) =

U

X

k=1

U X

i=1

Ai,kxht(i) ! 0 @

U

X

j=1

Ck,jxh0

t(j)

1 A

I Can be computed in space 4r.

Pagh, St¨

  • ckel

September 9 2014 25 / 29

slide-29
SLIDE 29

Upper bound Outputting from partitions

Compressed matrix mult sketches, 2

pt(x) =

U

X

k=1

U X

i=1

Ai,kxht(i) ! 0 @

U

X

j=1

Ck,jxh0

t(j)

1 A

I PU i=1 Ai,kxht(i) can be computed in one pass over col k of A using

space r

I Multiply and add to sum of products: additional 2r space. I Central observation: Coefficient of xht(i)+h0

t(j) is [AC]i,j with

constant probability.

I Left to do: Extract coordinates of coefficients and get high probability. I Can be done in O(log U) repetitions [Pagh, 2012].

Pagh, St¨

  • ckel

September 9 2014 26 / 29

slide-30
SLIDE 30

Upper bound Outputting from partitions

Algorithm summary

I/O cost of steps taken:

I Initial size est: ˜

O(N/B).

I Partition into c2 problems with output M/ log U: ˜

O(cN/B).

I Compute and emit(.) all subproblems: ˜

O(cN/B).

I Total: ˜

O(cN/B) = ˜ O ✓

Np nnz(AC) B p M

◆ since c = q

nnz(AC) log U M

.

Pagh, St¨

  • ckel

September 9 2014 27 / 29

slide-31
SLIDE 31

Upper bound Outputting from partitions

Concluding remarks

In the article:

I Size estimation: Supports cancellation and uses ˜

O(ε3N/B) I/Os.

I Algorithm 1: ˜

O ⇣ N p Z/(B p M) ⌘ I/Os.

I Algorithm 2: O

  • N2/(MB)
  • I/Os.

I Lower bound: Ω

⇣ min ⇣

N2 MB, N p Z p MB

⌘⌘ I/Os. Open: Remove monte carlo (and log factors).

Pagh, St¨

  • ckel

September 9 2014 28 / 29

slide-32
SLIDE 32

The Input/Output Complexity of Sparse Matrix Multiplication

Rasmus Pagh, Morten St¨

  • ckel

IT University of Copenhagen September 9 2014

Pagh, St¨

  • ckel

September 9 2014 29 / 29