The Input/Output Complexity of Sparse Matrix Multiplication
Rasmus Pagh, Morten St¨
- ckel
IT University of Copenhagen September 9 2014
Pagh, St¨
- ckel
September 9 2014 1 / 29
The Input/Output Complexity of Sparse Matrix Multiplication Rasmus - - PowerPoint PPT Presentation
The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh, Morten St ockel IT University of Copenhagen September 9 2014 Pagh, St ockel September 9 2014 1 / 29 Sparse matrix multiplication Problem description Sparse
Rasmus Pagh, Morten St¨
IT University of Copenhagen September 9 2014
Pagh, St¨
September 9 2014 1 / 29
Sparse matrix multiplication Problem description
Sparse matrix multiplication Problem description Upper bound Size estimation Partitioning Outputting from partitions
Pagh, St¨
September 9 2014 2 / 29
Sparse matrix multiplication Problem description
I Let A and C be matrices over a semiring R with N nonzero entries in
total.
I The problem: Compute matrix product [AC]i,j = P k Ai,kCk,j with Z
nonzero entries.
I Central result: Can be done in (for most of parameter space) optimal
˜ O ⇣
N p Z B p M
⌘ I/Os.
Pagh, St¨
September 9 2014 3 / 29
Sparse matrix multiplication Problem description
a11 a12 ... a1p a21 a22 ... a2p . . . . . . ... . . . an1 an2 ... anp A : n rows p columns
×
c11 c12 ... c1q c21 c22 ... c2q . . . . . . ... . . . cp1 cp2 ... cpq C : p rows q columns
=
ac11 ac12 ... ac1q ac21 ac22 ... ac2q . . . . . . ... . . . acn1 acn2 ... acnq AC = A×C : n rows q columns
Pagh, St¨
September 9 2014 4 / 29
Sparse matrix multiplication Problem description
a11 a12 ... a1p a21 a22 ... a2p . . . . . . ... . . . an1 an2 ... anp A : n rows p columns c11 c12 ... c1q c21 c22 ... c2q . . . . . . ... . . . cp1 cp2 ... cpq C : p rows q columns ac11 ac12 ... ac1q ac21 ac22 ... ac2q . . . . . . ... . . . acn1 acn2 ... acnq a21 ×c12 a
2 2
× c
2 2
a
2 p
× c
p 2
+ +...+ AC = A×C : n rows q columns
Pagh, St¨
September 9 2014 5 / 29
Sparse matrix multiplication Problem description
We say that we have cancellation when two or more summands of [AC]i,j = P
k Ai,kCk,j are nonzero
but the sum is zero, e.g. 2 ⇤ 3 + 1 ⇤ 6 + 0 ⇤ 4. Our algorithm handles such cases.
a11 a12 ... a1p a21 a22 ... a2p . . . . . . ... . . . an1 an2 ... anp A : n rows p columns c11 c12 ... c1q c21 c22 ... c2q . . . . . . ... . . . cp1 cp2 ... cpq C : p rows q columns ac11 ac12 ... ac1q ac21 ac22 ... ac2q . . . . . . ... . . . acn1 acn2 ... acnq a21 ×c12 a22 × c22 a2p × cp2 + + . . . + AC = A×C : n rows q columns 1
Pagh, St¨
September 9 2014 6 / 29
Sparse matrix multiplication Problem description
Some applications:
I Computing determinants and inverses of matrices. I Bioinformatics. I Graphs: counting cycles, computing matchings.
Pagh, St¨
September 9 2014 7 / 29
Sparse matrix multiplication Problem description
I A word is big enough to hold a matrix element plus its coordinates. I Internal memory that holds M words and disk of infinite size. I One I/O: Transfer B words from disk to internal memory. I Cost of an algorithm: Number of I/Os used. I Operations allowed: Semiring operations, copy and equality check.
Pagh, St¨
September 9 2014 8 / 29
Sparse matrix multiplication Problem description
I We make no assumptions about cancellation. I To produce output: must invoke emit(.) on every nonzero output
entry once.
I Matrices are of size U ⇥ U. I ˜
O suppresses polylog factors in U and N.
Pagh, St¨
September 9 2014 9 / 29
Sparse matrix multiplication Problem description
I Let A and C be U ⇥ U matrices over semiring R with N nonzero
input and Z nonzero output entries. There exist algorithms 1 and 2 such that:
1 1/U, using ˜ O ⇣ N p Z/(B p M) ⌘ I/Os.
I Previous best [Amossen & Pagh, 09]: ˜
O ⇣ N p Z/(BM1/8) ⌘ I/Os (boolean matrices = ) no cancellation).
Pagh, St¨
September 9 2014 10 / 29
Sparse matrix multiplication Problem description
I Let A and C be U ⇥ U matrices over semiring R with N nonzero
input and Z nonzero output entries. There exist algorithms 1 and 2 such that:
1 1/U, using ˜ O ⇣ N p Z/(B p M) ⌘ I/Os.
I There exist matrices that require Ω
⇣ min ⇣
N2 MB, N p Z p MB
⌘⌘ I/Os to compute all nonzero entries of AC.
Pagh, St¨
September 9 2014 10 / 29
Upper bound Size estimation
Size estimation tool: Given matrices A and C with N nonzero entries, compute ε-estimate of number of nonzeroes of each column of AC using ˜ O(ε3N/B) I/Os. Black boxed used [BBFJV,07]:
Fact
For dense 1 ⇥ U vector y and sparse U ⇥ U matrix S we can compute yS in O((nnz(S)/B) logM/B(U/M)) = ˜ O((nnz(S)/B) I/Os.
Pagh, St¨
September 9 2014 11 / 29
Upper bound Size estimation
I Distinct elements: Given frequency vector x of size n where xi
denotes the number of times element i occurs, then F0 = P
i |xi|0. I Fundamental problem in streaming: Estimate F0 without
materializing x.
I Observation: The distinct elements of AC is nnz(AC).
Pagh, St¨
September 9 2014 12 / 29
Upper bound Size estimation
Simple linear distinct elements sketch [Indyk slides, McGregor book]. Answer question: For a picked T, is F0 > (1 + ε)T?
i2Sj xi.
Analysis: For one set Sj we have Pr[sj = 0] = (1 1/T)F0 ⇡ eF0/T . If F0 > (1 + ε)T then Pr[sj = 0] < 1/e ε/3. Repeat for k = O(ε2 log δ1) independent sets to get probability 1 δ.
Pagh, St¨
September 9 2014 13 / 29
Upper bound Size estimation
I Can answer if F0 > (1 + ε)T for some T. I Repeat for T = 1, (1 + ε), (1 + ε)2, . . . , n, i.e. O(ε1 log n) values. I Total space: O(ε3 log n log δ1). I Note: Random sets Sj form k ⇥ n projection matrix F and we
maintain Fx.
I Linearity: F(x + ei) = Fx + Fei
Pagh, St¨
September 9 2014 14 / 29
Upper bound Size estimation
F is ε2 log δ1 ⇥ U. A and C are U ⇥ U. To get size estimate we must compute:
Pagh, St¨
September 9 2014 15 / 29
Upper bound Size estimation
F is ε2 log δ1 ⇥ U. A and C are U ⇥ U. To get size estimate we must compute:
Due to associativity: Pick cheap order. Analysis: ε2 log δ1 invocations of dense vector sparse matrix black box: ˜ O(ε3N/B) I/Os. Note: Works with cancellation, contrary to previous size estimation.
Pagh, St¨
September 9 2014 15 / 29
Upper bound Partitioning
A C ⇥
Pagh, St¨
September 9 2014 16 / 29
Upper bound Partitioning
A C ⇥
Pagh, St¨
September 9 2014 16 / 29
Upper bound Partitioning
A C
⇥
=
⇥ + ⇥ + ⇥ + ⇥
Pagh, St¨
September 9 2014 17 / 29
Upper bound Partitioning
I What we want: Split matrices into disjoint colored groups s.t. every
color combination has at most M nonzero output entries.
I Problem: Can’t be done. I Instead: Color rows of A using c colors. For each c groups of rows, do
an independent coloring with c colors of columns of C.
⇥ + ⇥
Pagh, St¨
September 9 2014 18 / 29
Upper bound Partitioning
Overview of how to partition matrices A and C:
q
nnz(AC) log U M
+ O(1)
and A2C ⇡ nnz(AC).
groups of rows of A.
Pagh, St¨
September 9 2014 19 / 29
Upper bound Partitioning
Say we can do splits of A into A1, A2 s.t.
⇥ (1 log1 U) nnz(AC)/2; (1 + log1 U) nnz(AC)/2 ⇤ .
⇥ (1 log1 U) nnz(AC)/2; (1 + log1 U) nnz(AC)/2 ⇤ . Assume biggest possible positive error: after q recursions have problem
recursions: nnz(AC) ✓1 2 + 1 2 log U ◆log c2 nnz(AC)2 log c2e
log c2 log U
nnz(AC)O(1)/c2 = O(M/ log U)
Pagh, St¨
September 9 2014 20 / 29
Upper bound Partitioning
How to do relative error 1/ log U splits: Use size estimation tool: For any set r of rows we have access to ˆ zi’s s.t. (1log1 U) nnz X
i2r
[AC]i⇤ ! X
i2r
ˆ zi (1+log1 U) nnz X
i2r
[AC]i⇤ ! . Splitting A into A1 and A2:
Z = P
i ˆ
zi.
i2A1 ˆ
zi ˆ Z/2.
Pagh, St¨
September 9 2014 21 / 29
Upper bound Partitioning
I/O cost:
I Initial size est: ˜
O(N/B).
I Partition A: c dense-vector-sparse-matrix: ˜
O(cN/B).
I For each of the c A partitions: Size est of total ˜
O(N/B) and total c DVSM of ˜ O(cN/B).
I Total: ˜
O(cN/B) = ˜ O ✓
Np nnz(AC) B p M
◆ since c = q
nnz(AC) log U M
.
Pagh, St¨
September 9 2014 22 / 29
Upper bound Outputting from partitions
⇥ + ⇥
Pagh, St¨
September 9 2014 23 / 29
Upper bound Outputting from partitions
I Where we are: have c2 = nnz(AC) log U M
subproblems with output M/ log U.
I Central cancellation difficulty: Intermediate results can be much
larger than M.
I Our I/O aim: ˜
O(cN/B), hence we can’t pay for those cancelling inner products.
I Solution: Compute a particular polynomial and allow polynomially
small error probability.
Pagh, St¨
September 9 2014 24 / 29
Upper bound Outputting from partitions
Idea from [Pagh,12]. Computes each outer product as a polynomial of degree ⇡ M/ log U.
I Let r = α4M/ log U for small constant α. I Let ht, h0 t : [U] 7! [r] be random hash functions. I The polynomial:
pt(x) =
U
X
k=1
U X
i=1
Ai,kxht(i) ! 0 @
U
X
j=1
Ck,jxh0
t(j)
1 A
I Can be computed in space 4r.
Pagh, St¨
September 9 2014 25 / 29
Upper bound Outputting from partitions
pt(x) =
U
X
k=1
U X
i=1
Ai,kxht(i) ! 0 @
U
X
j=1
Ck,jxh0
t(j)
1 A
I PU i=1 Ai,kxht(i) can be computed in one pass over col k of A using
space r
I Multiply and add to sum of products: additional 2r space. I Central observation: Coefficient of xht(i)+h0
t(j) is [AC]i,j with
constant probability.
I Left to do: Extract coordinates of coefficients and get high probability. I Can be done in O(log U) repetitions [Pagh, 2012].
Pagh, St¨
September 9 2014 26 / 29
Upper bound Outputting from partitions
I/O cost of steps taken:
I Initial size est: ˜
O(N/B).
I Partition into c2 problems with output M/ log U: ˜
O(cN/B).
I Compute and emit(.) all subproblems: ˜
O(cN/B).
I Total: ˜
O(cN/B) = ˜ O ✓
Np nnz(AC) B p M
◆ since c = q
nnz(AC) log U M
.
Pagh, St¨
September 9 2014 27 / 29
Upper bound Outputting from partitions
In the article:
I Size estimation: Supports cancellation and uses ˜
O(ε3N/B) I/Os.
I Algorithm 1: ˜
O ⇣ N p Z/(B p M) ⌘ I/Os.
I Algorithm 2: O
I Lower bound: Ω
⇣ min ⇣
N2 MB, N p Z p MB
⌘⌘ I/Os. Open: Remove monte carlo (and log factors).
Pagh, St¨
September 9 2014 28 / 29
Rasmus Pagh, Morten St¨
IT University of Copenhagen September 9 2014
Pagh, St¨
September 9 2014 29 / 29