Fast Output-Sensitive Matrix Multiplication ESA 2015 Riko Jacob 1 , - PowerPoint PPT Presentation

Fast Output-Sensitive Matrix Multiplication ESA 2015 Riko Jacob 1 , Morten St¨ ockel 2 1 IT University of Copenhagen, 2 University of Copenhagen March 8 2016 Jacob, St¨ ockel ITU, DIKU March 8 2016 1 / 28

(Fast) Sparse matrix multiplication Problem description (Fast) Sparse matrix multiplication Problem description Fast and output-sensitive matrix mult High level The row-balanced case The general case Jacob, St¨ ockel ITU, DIKU March 8 2016 2 / 28

(Fast) Sparse matrix multiplication Problem description Overview ◮ Let A and C be U × U matrices over a field F with N nonzero entries in total. ◮ The problem: Compute matrix product [ AC ] i,j = � k A i,k C k,j with Z nonzero entries. ◮ Well known solution: O ( U ω ) word-RAM operations. O ( U 2 ( Z/U ) ω − 2 + N ) ◮ Our main result: Monte Carlo algorithm using ˜ word-RAM operations. Jacob, St¨ ockel ITU, DIKU March 8 2016 3 / 28

(Fast) Sparse matrix multiplication Problem description Matrix multiplication, basics       ... ... ... a 11 a 12 a 1 p c 11 c 12 c 1 q ac 11 ac 12 ac 1 q                               ... ... ...  a 21 a 22 a 2 p   c 21 c 22 c 2 q   ac 21 ac 22 ac 2 q          ×   =               . . . . . . ... . . ... . ...  . . .     . . .  . . . . . . . . .    . . .                      ...  ...   c p 1 c p 2 c pq   ...  a n 1 a n 2 a np ac n 1 ac n 2 ac nq       A : n rows p columns C : p rows q columns AC = A × C : n rows q columns Jacob, St¨ ockel ITU, DIKU March 8 2016 4 / 28

(Fast) Sparse matrix multiplication Problem description Matrix multiplication, basics C : p rows q columns   ... c 11 c 12 c 1 q           ... c 21 c 22 c 2 q       a 21 × c 12     . . .  ...  . . . +  . . .  2 2 c   ×     2 2 a  ...  c p 1 c p 2 c pq + ... +   2 p c × p 2 a     ... ... a 11 a 12 a 1 p ac 11 ac 12 ac 1 q                     ... ... a 21 a 22 a 2 p ac 21 ac 22 ac 2 q                     . . ... . . . ... .     . . . . . .  . . .   . . .               ...   ...  a n 1 a n 2 a np ac n 1 ac n 2 ac nq     A : n rows p columns AC = A × C : n rows q columns Jacob, St¨ ockel ITU, DIKU March 8 2016 5 / 28

(Fast) Sparse matrix multiplication Problem description Motivation Some applications: ◮ Computing determinants and inverses of matrices. ◮ Bioinformatics. ◮ Graphs: counting cycles, computing matchings. Jacob, St¨ ockel ITU, DIKU March 8 2016 6 / 28

(Fast) Sparse matrix multiplication Problem description Some intuition, fast matrix multiplication U × U A C Jacob, St¨ ockel ITU, DIKU March 8 2016 7 / 28

(Fast) Sparse matrix multiplication Problem description Some intuition, fast matrix mult 2 ◮ Can be done in O ( U ω ) operations due to Strassen who showed ω ≤ log 2 7 . Most recently ω < 2 . 3728639 due to Le Gall. ◮ But what if the input and/or output is sparse? Jacob, St¨ ockel ITU, DIKU March 8 2016 8 / 28

(Fast) Sparse matrix multiplication Problem description Some intuition, fast matrix multiplication U × U A C Jacob, St¨ ockel ITU, DIKU March 8 2016 9 / 28

(Fast) Sparse matrix multiplication Problem description Some intuition, The Dream U × U A C Jacob, St¨ ockel ITU, DIKU March 8 2016 10 / 28

(Fast) Sparse matrix multiplication Problem description ◮ We could apply the fast matrix mult black box on the colored area to get N ω/ 2 operations – unfortunately difficult/impossible, since: 1. Even sparse input can mean dense output (maybe ( N + Z ) ω/ 2 possible?) 2. Compressing like this breaks matrix structure. ◮ Main idea: Compress the input according to sparsity and structure of the output instead. U × U A C Jacob, St¨ ockel ITU, DIKU March 8 2016 11 / 28

(Fast) Sparse matrix multiplication Problem description Our results ◮ Let A and C be U × U matrices over field F with N nonzero input and Z nonzero output entries. There exist Monte Carlo algorithms 1 and 2 such that: 1. When we have Z/U nonzero entries per row and per column, uses ˜ ω − 1 O ( UZ + N ) operations. 2 2. When the input matrices have arbitrary balance, uses O ( U 2 ( Z/U ) ω − 2 + N ) operations. ˜ Jacob, St¨ ockel ITU, DIKU March 8 2016 12 / 28

(Fast) Sparse matrix multiplication Problem description Our results, overview Method word-RAM complexity Notes O ( U ω ) General dense ˜ U 2 Z ω/ 2 − 1 � O � Requires boolean matrices. Lingas � U 2+ ε � � n 0 . 3 � Iwen-Spencer, Le Gall O Requires O nonzeros per column. U 2 + UZ ˜ � � Williams-Yu, Pagh O √ � � ˜ Van Gucht et al. O N Z + Z + N UZ ( ω − 1) / 2 + N ˜ O � � Requires balanced rows and columns. This paper U 2 ( Z/U ) ω − 2 + N ˜ O � � This paper Method I/O complexity Notes ˜ � U ω / ( M ω/ 2 − 1 B ) � General dense O √ √ � � ˜ Pagh-St¨ ockel O N Z/ ( B M ) Elements from semirings. � � ˜ ω − 1 2 / ( M ω/ 2 − 1 B ) + Z/B + N/B O UZ Requires balanced rows and columns. This paper ˜ � U 2 ( Z/U ) ω − 2 / ( M ω/ 2 − 1 B ) + U 2 /B � This paper O Jacob, St¨ ockel ITU, DIKU March 8 2016 13 / 28

(Fast) Sparse matrix multiplication Problem description Our results, overview Method word-RAM complexity Notes O ( U ω ) General dense ˜ � U 2 Z ω/ 2 − 1 � Lingas O Requires boolean matrices. U 2+ ε � n 0 . 3 � O � Requires O � nonzeros per column. Iwen-Spencer, Le Gall U 2 + UZ ˜ O � � Williams-Yu, Pagh √ � � ˜ Van Gucht et al. O N Z + Z + N UZ ( ω − 1) / 2 + N ˜ � � This paper O Requires balanced rows and columns. U 2 ( Z/U ) ω − 2 + N ˜ � � This paper O Method I/O complexity Notes ˜ U ω / ( M ω/ 2 − 1 B ) � � General dense O √ √ � � ˜ Pagh-St¨ ockel O N Z/ ( B M ) Elements from semirings. � ω − 1 � ˜ 2 / ( M ω/ 2 − 1 B ) + Z/B + N/B This paper O UZ Requires balanced rows and columns. ˜ U 2 ( Z/U ) ω − 2 / ( M ω/ 2 − 1 B ) + U 2 /B � � This paper O When N = U 2 we use less word-RAM operations for any Z >> U and U > 1 . Jacob, St¨ ockel ITU, DIKU March 8 2016 13 / 28

(Fast) Sparse matrix multiplication Problem description Our results, overview Method word-RAM complexity Notes O ( U ω ) General dense ˜ � U 2 Z ω/ 2 − 1 � Lingas O Requires boolean matrices. U 2+ ε � n 0 . 3 � O � Requires O � nonzeros per column. Iwen-Spencer, Le Gall U 2 + UZ ˜ O � � Williams-Yu, Pagh √ � � ˜ Van Gucht et al. O N Z + Z + N UZ ( ω − 1) / 2 + N ˜ � � This paper O Requires balanced rows and columns. U 2 ( Z/U ) ω − 2 + N ˜ � � This paper O Method I/O complexity Notes ˜ � U ω / ( M ω/ 2 − 1 B ) � General dense O √ √ � � ˜ Pagh-St¨ O N Z/ ( B M ) Elements from semirings. ockel � ω − 1 � ˜ 2 / ( M ω/ 2 − 1 B ) + Z/B + N/B This paper O UZ Requires balanced rows and columns. ˜ � U 2 ( Z/U ) ω − 2 / ( M ω/ 2 − 1 B ) + U 2 /B � This paper O When N = U 2 we use less external memory operations, unless M is larger than Z . Jacob, St¨ ockel ITU, DIKU March 8 2016 13 / 28

Fast and output-sensitive matrix mult High level Overview Our approach at a high level: 1. Assume bounded number of nonzero entries in output rows – solve this case efficiently. 2. Show that any matrix can be divided into a small number of such subproblems. Jacob, St¨ ockel ITU, DIKU March 8 2016 14 / 28

Fast and output-sensitive matrix mult The row-balanced case Row-balance intuition Promise: Upper bound on number of nonzero entries in a row of AC . Goal: Use this to compress the input. = × A C AC Jacob, St¨ ockel ITU, DIKU March 8 2016 15 / 28

Fast and output-sensitive matrix mult The row-balanced case Row-balance intuition Promise: Upper bound on number of nonzero entries in a row of AC . Goal: Use this to compress the input. Idea: Collapse columns (“make rows shorter”). = × A C AC Jacob, St¨ ockel ITU, DIKU March 8 2016 15 / 28

Fast and output-sensitive matrix mult The row-balanced case Row-balance intuition Promise: Upper bound on number of nonzero entries in a row of AC . Goal: Use this to compress the input. Idea: Collapse columns (”make rows shorter”). = × A C AC Jacob, St¨ ockel ITU, DIKU March 8 2016 15 / 28

Fast Output-Sensitive Matrix Multiplication ESA 2015 Riko Jacob 1 , - PowerPoint PPT Presentation

Fast Output-Sensitive Matrix Multiplication ESA 2015 Riko Jacob 1 , Morten St ockel 2 1 IT University of Copenhagen, 2 University of Copenhagen March 8 2016 Jacob, St ockel ITU, DIKU March 8 2016 1 / 28 (Fast) Sparse matrix multiplication

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Matrix Multiplication Matrix multiplication is an operation with properties quite different from

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

Proposing a Fast and Scalable Systolic Array for Matrix Multiplication Bahar Asgari , , Ra

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

CS 401 Integer Multiplication / Matrix Multiplication Xiaorui Sun 1 Integer Multiplication

Matrix-chain multiplication Carola Wenk 1 CMPS 6610 Algorithms Matrix-chain multiplication

Chapter VI All Pair Shortest Paths and Matrix Multiplication VI.1 APSPs and Matrix

Efficient multiplication 2 Matrix multiplication If you have square matrices A and B, then C =

Matrix Calculations: Kernels & Images, Matrix Multiplication A. Kissinger (and H. Geuvers)

Communication Lower Bounds for Matrix-Matrix Multiplication Dagstuhl Seminar #15281 July 6-9,

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

MATH 105: Finite Mathematics 2-5: Matrix Multiplication Prof. Jonathan Duncan Walla Walla

Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors

Investigation of the background sources of muography Lszl Olh 1 , Hiroyuki Tanaka 1 , Dezs

Malaysian Healthy Ageing Society Stress and Trauma Recovery of Elderly Post 2010 Merapi Eruption:

Regression and Difference of Two Proportions August 28, 2019 August 28, 2019 1 / 34 Regression

Families: Opportunities for Offense and Building Power in in th the States August 2, 2017

Datenschutz in Datenschutz in Rechnernetzen Rechnernetzen Regierungsdirektor W. Ernestus

Slide 1 / 32 Slide 2 / 32 AP Physics C Universal Gravity Multiple Choice www.njctl.org Slide 3

Acknowledgements Much of the material in this video is based on the excellent course An

Les Trous Noirs Astrophysiques Pierre-Olivier Petrucci Institut de Plantologie et

Fast Output-Sensitive Matrix Multiplication ESA 2015 Riko Jacob 1 , - PowerPoint PPT Presentation

Fast Output-Sensitive Matrix Multiplication ESA 2015 Riko Jacob 1 , Morten St ockel 2 1 IT University of Copenhagen, 2 University of Copenhagen March 8 2016 Jacob, St ockel ITU, DIKU March 8 2016 1 / 28 (Fast) Sparse matrix multiplication

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Matrix Multiplication Matrix multiplication is an operation with properties quite different from

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

Proposing a Fast and Scalable Systolic Array for Matrix Multiplication Bahar Asgari , , Ra

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

CS 401 Integer Multiplication / Matrix Multiplication Xiaorui Sun 1 Integer Multiplication

Matrix-chain multiplication Carola Wenk 1 CMPS 6610 Algorithms Matrix-chain multiplication

Chapter VI All Pair Shortest Paths and Matrix Multiplication VI.1 APSPs and Matrix

Efficient multiplication 2 Matrix multiplication If you have square matrices A and B, then C =

Matrix Calculations: Kernels &amp; Images, Matrix Multiplication A. Kissinger (and H. Geuvers)

Communication Lower Bounds for Matrix-Matrix Multiplication Dagstuhl Seminar #15281 July 6-9,

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

MATH 105: Finite Mathematics 2-5: Matrix Multiplication Prof. Jonathan Duncan Walla Walla

Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors

Investigation of the background sources of muography Lszl Olh 1 , Hiroyuki Tanaka 1 , Dezs

Malaysian Healthy Ageing Society Stress and Trauma Recovery of Elderly Post 2010 Merapi Eruption:

Regression and Difference of Two Proportions August 28, 2019 August 28, 2019 1 / 34 Regression

Families: Opportunities for Offense and Building Power in in th the States August 2, 2017

Datenschutz in Datenschutz in Rechnernetzen Rechnernetzen Regierungsdirektor W. Ernestus

Slide 1 / 32 Slide 2 / 32 AP Physics C Universal Gravity Multiple Choice www.njctl.org Slide 3

Acknowledgements Much of the material in this video is based on the excellent course An

Les Trous Noirs Astrophysiques Pierre-Olivier Petrucci Institut de Plantologie et

Matrix Calculations: Kernels & Images, Matrix Multiplication A. Kissinger (and H. Geuvers)