the input output complexity of sparse matrix
play

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus - PowerPoint PPT Presentation

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh, Morten St ockel IT University of Copenhagen September 9 2014 Pagh, St ockel September 9 2014 1 / 29 Sparse matrix multiplication Problem description Sparse


  1. The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh, Morten St¨ ockel IT University of Copenhagen September 9 2014 Pagh, St¨ ockel September 9 2014 1 / 29

  2. Sparse matrix multiplication Problem description Sparse matrix multiplication Problem description Upper bound Size estimation Partitioning Outputting from partitions Pagh, St¨ ockel September 9 2014 2 / 29

  3. Sparse matrix multiplication Problem description Overview I Let A and C be matrices over a semiring R with N nonzero entries in total. I The problem: Compute matrix product [ AC ] i,j = P k A i,k C k,j with Z nonzero entries. I Central result: Can be done in (for most of parameter space) optimal p ⇣ ⌘ ˜ N Z O I/Os. p B M Pagh, St¨ ockel September 9 2014 3 / 29

  4. Sparse matrix multiplication Problem description Matrix multiplication, basics       ... ... ... a 11 a 12 a 1 p c 11 c 12 c 1 q ac 11 ac 12 ac 1 q                               ... ... ... a 21 a 22 a 2 p c 21 c 22 c 2 q ac 21 ac 22 ac 2 q                   × =             . . . . . . . . ... . ... ...  . . .   . . .   . . .   . . .   . . .   . . .                     ...   ...   ...  a n 1 a n 2 a np c p 1 c p 2 c pq ac n 1 ac n 2 ac nq       C : p rows q columns A : n rows p columns AC = A × C : n rows q columns Pagh, St¨ ockel September 9 2014 4 / 29

  5. Sparse matrix multiplication Problem description Matrix multiplication, basics C : p rows q columns   ... c 11 c 12 c 1 q           ...  c 21 c 22 c 2 q      a 21 × c 12     . . ... .   . . . +  . . .  2 2   c ×   2   2 a ...  c p 1 c p 2 c pq  + ... +   2 c p × p 2 a     ... ... a 11 a 12 a 1 p ac 11 ac 12 ac 1 q                     ... ...  a 21 a 22 a 2 p   ac 21 ac 22 ac 2 q                  . . . . . . ... ...  . . .   . . .  . . . . . .                  ...   ...  a n 1 a n 2 a np ac n 1 ac n 2 ac nq     A : n rows p columns AC = A × C : n rows q columns Pagh, St¨ ockel September 9 2014 5 / 29

  6. Sparse matrix multiplication Problem description Cancellation of elementary products C : p rows q columns   ... c 11 c 12 c 1 q           ...  c 21 c 22 c 2 q      a 21 × c 12     . . ... .  . . .  + c 22  . . .    ×   a 22    ...  c p 1 c p 2 c pq + . .   . + c p 2 × a 2 p     We say that we have cancellation ... ... a 11 a 12 a 1 p ac 11 ac 12 ac 1 q                     a 21 a 22 ... a 2 p ac 21 ac 22 ... ac 2 q     when two or more summands of                 . . . . . .  ...   ...  . . . . . .  . . .   . . .      [ AC ] i,j = P k A i,k C k,j are nonzero          ...   ...  a n 1 a n 2 a np ac n 1 ac n 2 ac nq     but the sum is zero, e.g. A : n rows p columns AC = A × C : n rows q columns � 2 ⇤ 3 + 1 ⇤ 6 + 0 ⇤ 4 . Our algorithm handles such cases. 1 Pagh, St¨ ockel September 9 2014 6 / 29

  7. Sparse matrix multiplication Problem description Motivation Some applications: I Computing determinants and inverses of matrices. I Bioinformatics. I Graphs: counting cycles, computing matchings. Pagh, St¨ ockel September 9 2014 7 / 29

  8. Sparse matrix multiplication Problem description The semiring I/O model, 1 I A word is big enough to hold a matrix element plus its coordinates. I Internal memory that holds M words and disk of infinite size. I One I/O: Transfer B words from disk to internal memory. I Cost of an algorithm: Number of I/Os used. I Operations allowed: Semiring operations, copy and equality check. Pagh, St¨ ockel September 9 2014 8 / 29

  9. Sparse matrix multiplication Problem description The semiring I/O model, 2 I We make no assumptions about cancellation. I To produce output: must invoke emit ( . ) on every nonzero output entry once. I Matrices are of size U ⇥ U . I ˜ O suppresses polylog factors in U and N . Pagh, St¨ ockel September 9 2014 9 / 29

  10. Sparse matrix multiplication Problem description Our results, 1 I Let A and C be U ⇥ U matrices over semiring R with N nonzero input and Z nonzero output entries. There exist algorithms 1 and 2 such that: 1. emits the set of nonzero entries of AC with probability at least p p ⇣ ⌘ 1 � 1 /U , using ˜ O N Z/ ( B M ) I/Os. � N 2 / ( MB ) � 2. emits the set of nonzero entries of AC , and uses O I/Os. p ⇣ ⌘ I Previous best [Amossen & Pagh, 09]: ˜ Z/ ( BM 1 / 8 ) O N I/Os (boolean matrices = ) no cancellation). Pagh, St¨ ockel September 9 2014 10 / 29

  11. Sparse matrix multiplication Problem description Our results, 2 I Let A and C be U ⇥ U matrices over semiring R with N nonzero input and Z nonzero output entries. There exist algorithms 1 and 2 such that: 1. emits the set of nonzero entries of AC with probability at least p p ⇣ ⌘ 1 � 1 /U , using ˜ O N Z/ ( B M ) I/Os. � N 2 / ( MB ) � 2. emits the set of nonzero entries of AC , and uses O I/Os. p ⇣ ⇣ ⌘⌘ N 2 MB , N Z I There exist matrices that require Ω min I/Os to p MB compute all nonzero entries of AC . Pagh, St¨ ockel September 9 2014 10 / 29

  12. Upper bound Size estimation Output size estimation Size estimation tool: Given matrices A and C with N nonzero entries, compute ε -estimate of number of nonzeroes of each column of AC using ˜ O ( ε � 3 N/B ) I/Os. Black boxed used [BBFJV,07]: Fact For dense 1 ⇥ U vector y and sparse U ⇥ U matrix S we can compute yS in O (( nnz ( S ) /B ) log M/B ( U/M )) = ˜ O (( nnz ( S ) /B ) I/Os. Pagh, St¨ ockel September 9 2014 11 / 29

  13. Upper bound Size estimation Distinct elements and matrix size I Distinct elements: Given frequency vector x of size n where x i i | x i | 0 . denotes the number of times element i occurs, then F 0 = P I Fundamental problem in streaming: Estimate F 0 without materializing x . I Observation: The distinct elements of AC is nnz ( AC ) . Pagh, St¨ ockel September 9 2014 12 / 29

  14. Upper bound Size estimation Linear distinct elements sketch, 1 Simple linear distinct elements sketch [Indyk slides, McGregor book]. Answer question: For a picked T , is F 0 > (1 + ε ) T ? 1. Select sets S 1 , . . . , S k of coordinates s.t. Pr [ i 2 S j ] = 1 /T . 2. For each S i : s j ( x ) = P i 2 S j x i . 3. Answer yes if at most k/e of s j are zero. Analysis: For one set S j we have Pr [ s j = 0] = (1 � 1 /T ) F 0 ⇡ e � F 0 /T . If F 0 > (1 + ε ) T then Pr [ s j = 0] < 1 /e � ε / 3 . Repeat for k = O ( ε � 2 log δ � 1 ) independent sets to get probability 1 � δ . Pagh, St¨ ockel September 9 2014 13 / 29

  15. Upper bound Size estimation Linear distinct elements sketch, 2 I Can answer if F 0 > (1 + ε ) T for some T . I Repeat for T = 1 , (1 + ε ) , (1 + ε ) 2 , . . . , n , i.e. O ( ε � 1 log n ) values. I Total space: O ( ε � 3 log n log δ � 1 ) . I Note: Random sets S j form k ⇥ n projection matrix F and we maintain Fx . I Linearity: F ( x + e i ) = Fx + Fe i Pagh, St¨ ockel September 9 2014 14 / 29

  16. Upper bound Size estimation Output estimation F is ε � 2 log δ � 1 ⇥ U . A and C are U ⇥ U . To get size estimate we must compute: F ⇥ A ⇥ C Pagh, St¨ ockel September 9 2014 15 / 29

  17. Upper bound Size estimation Output estimation F is ε � 2 log δ � 1 ⇥ U . A and C are U ⇥ U . To get size estimate we must compute: ( F ⇥ A ) ⇥ C Due to associativity: Pick cheap order. Analysis: ε � 2 log δ � 1 invocations of dense vector sparse matrix black box: ˜ O ( ε � 3 N/B ) I/Os. Note: Works with cancellation, contrary to previous size estimation. Pagh, St¨ ockel September 9 2014 15 / 29

  18. Upper bound Partitioning Matrix mult partitioning, 1 ⇥ A C Pagh, St¨ ockel September 9 2014 16 / 29

  19. Upper bound Partitioning Matrix mult partitioning, 1 ⇥ A C Pagh, St¨ ockel September 9 2014 16 / 29

  20. Upper bound Partitioning Matrix mult partitioning, 2 A C = ⇥ + + + ⇥ ⇥ ⇥ ⇥ Pagh, St¨ ockel September 9 2014 17 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend