The Input/Output Complexity of Sparse Matrix Multiplication Rasmus - PowerPoint PPT Presentation

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh 1 , Morten St¨ ockel 2 1 IT University of Copenhagen, 2 University of Copenhagen SIAM LA, October 26 2015 Pagh, St¨ ockel ITU, DIKU October 26 2015 1 / 30

Sparse matrix multiplication Problem description Sparse matrix multiplication Problem description Upper bound Size estimation Partitioning Outputting from partitions Summary Lower bound Technique used Bounding #phases Pagh, St¨ ockel ITU, DIKU October 26 2015 2 / 30

Sparse matrix multiplication Problem description Overview I Let A and C be matrices over a semiring R with N nonzero entries in total. I The problem: Compute matrix product [ AC ] i,j = P k A i,k C k,j with Z nonzero entries. I Central result: Can be done in (for most of parameter space) optimal √ ⇣ ⌘ ˜ N Z O I/Os. √ B M Pagh, St¨ ockel ITU, DIKU October 26 2015 3 / 30

Sparse matrix multiplication Problem description Cancellation of elementary products C : p rows q columns   ... c 11 c 12 c 1 q           ...  c 21 c 22 c 2 q      a 21 × c 12     . . .  ...  . . . +  . . .  c 22   ×   a 22    ...  + c p 1 c p 2 c pq .   . . + c p 2 × a 2 p     ... ... a 11 a 12 a 1 p ac 11 ac 12 ac 1 q             We say that we have cancellation         ... ...  a 21 a 22 a 2 p   ac 21 ac 22 ac 2 q                  . . ... . . . ... . when two or more summands of  . . .   . . .   . . .   . . .               ...   ...  a n 1 a n 2 a np ac n 1 ac n 2 ac nq     [ AC ] i,j = P k A i,k C k,j are nonzero A : n rows p columns AC = A × C : n rows q columns but the sum is zero. Our algorithm handles such cases. 1 Pagh, St¨ ockel ITU, DIKU October 26 2015 4 / 30

Sparse matrix multiplication Problem description Motivation Lots of applications. Some of them: I Computing determinants and inverses of matrices. I Bioinformatics. I Graphs: counting cycles, computing matchings. Pagh, St¨ ockel ITU, DIKU October 26 2015 5 / 30

Sparse matrix multiplication Problem description The semiring I/O model, 1 I A word is big enough to hold a matrix element plus its coordinates. I Internal memory that holds M words and disk of infinite size. I One I/O: Transfer B words from disk to internal memory. I Cost of an algorithm: Number of I/Os used. I Operations allowed: Semiring operations, copy and equality check. Pagh, St¨ ockel ITU, DIKU October 26 2015 6 / 30

Sparse matrix multiplication Problem description The semiring I/O model, 2 I We make no assumptions about cancellation. I To produce output: must invoke emit ( . ) on every nonzero output entry once. I Matrices are of size U × U . I ˜ O suppresses polylog factors in U and N . Pagh, St¨ ockel ITU, DIKU October 26 2015 7 / 30

Sparse matrix multiplication Problem description Our results, 1 I Let A and C be U × U matrices over semiring R with N nonzero input and Z nonzero output entries. There exist algorithms 1 and 2 such that: 1. emits the set of nonzero entries of AC with probability at least √ √ ⇣ ⌘ 1 − 1 /U , using ˜ O N Z/ ( B M ) I/Os. � N 2 / ( MB ) � 2. emits the set of nonzero entries of AC , and uses O I/Os. √ ⇣ ⌘ I Previous best [Amossen-Pagh, ’09]: ˜ Z/ ( BM 1 / 8 ) O N I/Os (boolean matrices = ⇒ no cancellation). Pagh, St¨ ockel ITU, DIKU October 26 2015 8 / 30

Sparse matrix multiplication Problem description Our results, 2 I Let A and C be U × U matrices over semiring R with N nonzero input and Z nonzero output entries. There exist algorithms 1 and 2 such that: 1. emits the set of nonzero entries of AC with probability at least √ √ ⇣ ⌘ 1 − 1 /U , using ˜ O N Z/ ( B M ) I/Os. � N 2 / ( MB ) � 2. emits the set of nonzero entries of AC , and uses O I/Os. √ ⇣ ⇣ ⌘⌘ N 2 MB , N Z I There exist matrices that require Ω min I/Os to √ B M compute all nonzero entries of AC . Pagh, St¨ ockel ITU, DIKU October 26 2015 8 / 30

Upper bound Size estimation Output size estimation Size estimation tool: Given matrices A and C with N nonzero entries, compute ε -estimate of number of nonzeroes of each column of AC using ˜ O ( ε − 3 N/B ) I/Os. Fact (Bender et al, ’07) For dense 1 × U vector y and sparse U × U matrix S we can compute yS in ˜ O (( nnz ( S ) /B ) I/Os. Pagh, St¨ ockel ITU, DIKU October 26 2015 9 / 30

Upper bound Size estimation Distinct elements and matrix size I Distinct elements: Given frequency vector x of size n where x i i | x i | 0 . denotes the number of times element i occurs, then F 0 = P I Fundamental problem in streaming: Estimate F 0 without materializing x . I Observation: The distinct elements of AC is nnz ( AC ) . I Good news: use existing machinery. Size O ( ε − 3 log n log δ − 1 ) × n matrix F exists s.t Fx gives F 0 whp [Flajolet-Martin, ’85]. Pagh, St¨ ockel ITU, DIKU October 26 2015 10 / 30

Upper bound Size estimation Output estimation F is ε − 3 log δ − 1 log U × U . A and C are U × U . To get size estimate we must compute: F × A × C Pagh, St¨ ockel ITU, DIKU October 26 2015 11 / 30

Upper bound Size estimation Output estimation F is ε − 3 log δ − 1 log U × U . A and C are U × U . To get size estimate we must compute: ( F × A ) × C Due to associativity: Pick cheap order. Analysis: ε − 3 log δ − 1 log U invocations of dense vector sparse matrix black box: ˜ O ( ε − 3 N/B ) I/Os. Note: Works with cancellation, contrary to previous size estimation. Pagh, St¨ ockel ITU, DIKU October 26 2015 11 / 30

Upper bound Partitioning Matrix mult partitioning, 1 × A C Pagh, St¨ ockel ITU, DIKU October 26 2015 12 / 30

Upper bound Partitioning Matrix mult partitioning, 2 A C = × × + × + × + × Pagh, St¨ ockel ITU, DIKU October 26 2015 13 / 30

Upper bound Partitioning Partitioning the matrices I What we want: Split matrices into disjoint colored groups s.t. every color combination has at most M nonzero output entries. I Problem: Can’t be done. I Instead: Color rows of A using c colors. For each c groups of rows, do an independent coloring with c colors of columns of C . + × × Pagh, St¨ ockel ITU, DIKU October 26 2015 14 / 30

Upper bound Partitioning Partitioning the matrices, 2 Overview of how to partition matrices A and C : q nnz ( AC ) log U 1. Pick number of colors c = + O (1) M 2. Recurse: Split A into A 1 and A 2 where it holds: nnz ( A 1 C ) ≈ nnz ( AC ) / 2 and nnz ( A 2 C ) ≈ nnz ( AC ) . 3. After log c + O (1) recursive levels we have O ( c ) disjoint colored groups of rows of A . 4. For each of those groups: Repeat procedure for columns of C . 5. The key point: O ( c 2 ) problems of size nnz ( AC ) /c 2 = O ( M/ log U ) . Pagh, St¨ ockel ITU, DIKU October 26 2015 15 / 30

Upper bound Partitioning Getting the correct subproblem size Say we can do splits of A into A 1 , A 2 s.t. (1 − log − 1 U ) nnz ( AC ) / 2; (1 + log − 1 U ) nnz ( AC ) / 2 ⇥ ⇤ 1. nnz ( A 1 C ) ∈ . (1 − log − 1 U ) nnz ( AC ) / 2; (1 + log − 1 U ) nnz ( AC ) / 2 ⇥ ⇤ 2. nnz ( A 2 C ) ∈ . Assume biggest possible positive error: after q recursions have problem output size nnz ( AC )(1 / 2 + 1 / (2 log U )) q . Then after log c 2 + O (1) recursions: ◆ log c 2 ✓ 1 1 log c 2 ≤ nnz ( AC )2 − log c 2 e nnz ( AC ) 2 + log U 2 log U ≤ nnz ( AC ) O (1) /c 2 = O ( M/ log U ) Pagh, St¨ ockel ITU, DIKU October 26 2015 16 / 30

Upper bound Partitioning How to compute the split How to do relative error 1 / log U splits: Use size estimation tool: For any set r of rows we have access to ˆ z i ’s s.t. X ! X ! (1 − log − 1 U ) nnz z i ≤ (1+log − 1 U ) nnz X [ AC ] i ∗ ˆ [ AC ] i ∗ . ≤ i ∈ r i ∈ r i ∈ r Splitting A into A 1 and A 2 : 1. Let ˆ Z = P i ˆ z i . z i ≥ ˆ 2. Add rows from A to A 1 until P i ∈ A 1 ˆ Z/ 2 . 3. The row that y overflows A 1 : Compute y × C directly. 4. Add remaining rows to A 2 Pagh, St¨ ockel ITU, DIKU October 26 2015 17 / 30

Upper bound Partitioning I/O cost of splitting I/O cost: I Initial size est: ˜ O ( N/B ) . I Partition A : c dense-vector-sparse-matrix: ˜ O ( cN/B ) . I For the c A -partitions: one size est of total ˜ O ( N/B ) and c DVSM of total ˜ O ( cN/B ) . N √ ✓ ◆ q nnz ( AC ) I Total: ˜ O ( cN/B ) = ˜ nnz ( AC ) log U O since c = . √ M B M Pagh, St¨ ockel ITU, DIKU October 26 2015 18 / 30

Upper bound Outputting from partitions Are we done? + × × Pagh, St¨ ockel ITU, DIKU October 26 2015 19 / 30

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus - PowerPoint PPT Presentation

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh 1 , Morten St ockel 2 1 IT University of Copenhagen, 2 University of Copenhagen SIAM LA, October 26 2015 Pagh, St ockel ITU, DIKU October 26 2015 1 / 30 Sparse matrix

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh, Morten St ockel IT

17. Recursion 2 Input: 3 + 5 * 20 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20

7. Java Input/Output User Input/Console Output, File Input and Output (I/O) 133 User Input (half

BASIC INPUT/OUTPUT Fundamentals of Computer Science I Outline: Basic Input/Output Screen

BASIC INPUT/OUTPUT Fundamentals of Computer Science Outline: Basic Input/Output Screen Output

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

Nonlinear Control Lecture # 14 Input-Output Stability Nonlinear Control Lecture # 14 Input-Output

The Stream Hierarchy Inheritance of istream and ostream from ios ios istream ostream Stream

Learning algorithms using logic (inductive logic programming) input output cat c dog d bear

SFU Applicant Seminar: What are the judges looking for? March 7, 2019 10:15 11:15 Professor

What is the execution time of spin(n) when n = 1 000 000? Function spin(n) : void spin(int n) {

Teaching Process Reengineering at DIKU Tietojenksittelytieteen pivt May 21-22, 2001

Who Is Storyful? We are an We have 200+ We monitor all social award-winning partnerships

Trade-off Between Computational Complexity and Accuracy in Evolutionary Image Feature Extraction

Who am I? DIKUfant to the bone 2009 2017 (BSc, MSc, PhD) The Canteen Board RKG

PhD project Financing and practicalities Lise Bakke Brndbo, Section for Internationalisation,

Where in the basin are they? Validating habitat suitability models for 22 rare plant species.