A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, - PowerPoint PPT Presentation

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, Tristan van Leeuwen Utrecht University ¨ Umit C ¸ ataly¨ urek Ohio State University Support from BSIK-BRICKS/MSV and NCF SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 1

Outline 1. Introduction Mondriaan 2D matrix partitioning Fine-grain 2D partitioning 2. New: hybrid method for 2D partitioning The difficulty of hybrids Combining the Mondriaan and fine-grain methods 3. Experimental results PageRank matrices: Stanford, Stanford-Berkeley Other sparse matrices: term-by-document, linear programming, polymers 4. Conclusions and future work SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 2

Parallel sparse matrix–vector multiplication u := A v A sparse m × n matrix, u dense m -vector, v dense n -vector n − 1 � u i := a ij v j j =0 v 2 1 1 4 3 6 3 1 9 4 1 22 5 9 2 41 6 5 3 64 5 8 9 u A p = 2 4 phases: communicate, compute, communicate, compute SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 3

Hypergraph 0 5 1 6 2 7 3 8 4 Hypergraph with 9 vertices and 6 hyperedges (nets), partitioned over 2 processors SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 4

1D matrix partitioning using hypergraphs vertices 0 1 2 3 4 5 6 0 1 2 3 4 5 nets Column bipartitioning of m × n matrix Hypergraph H = ( V , N ) ⇒ exact communication volume in sparse matrix–vector multiplication. Columns ≡ Vertices: 0 , 1 , 2 , 3 , 4 , 5 , 6 . Rows ≡ Hyperedges (nets, subsets of V ): n 0 = { 1 , 4 , 6 } , n 1 = { 0 , 3 , 6 } , n 2 = { 4 , 5 , 6 } , n 3 = { 0 , 2 , 3 } , n 4 = { 2 , 3 , 5 } , n 5 = { 1 , 4 , 6 } . SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 5

Minimising communication volume vertices 0 1 2 3 4 5 6 0 1 2 3 4 5 nets Broken nets: n 1 , n 2 cause one horizontal communication Use Kernighan–Lin/Fiduccia–Mattheyses for hypergraph bipartitioning Multilevel scheme: merge similar columns first, refine bipartitioning afterwards Used in PaToH (Çatalyürek and Aykanat 1999) for 1D matrix partitioning. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 6

Mondriaan 2D matrix partitioning Block distribution (without row/column permutations) of 59 × 59 matrix impcol_b with 312 nonzeros, for p = 4 Mondriaan package v1.0 (May 2002). Originally developed by Vastenhouw and Bisseling for partitioning term-by-document matrices for a parallel web search machine. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 7

Mondriaan 2D partitioning ⇒ ⇒ ⇒ Recursively split the matrix into 2 parts Try splits in row and column directions, allowing permutations. Each time, choose the best direction SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 8

Fine-grain 2D partitioning Assign each nonzero of A individually to a part. Each nonzero becomes a vertex; each matrix row and column a hyperedge. Hence nz ( A ) vertices and m + n hyperedges. Proposed by Çatalyürek and Aykanat, 2001. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 9

Matrix view of fine-grain 2D partitioning vertices 0 1 2 3 4 5 6 0 5 10 15 0 0 1 2 5 3 4 10 5 nets A F = F A m × n matrix A with nz ( A ) nonzeros ( m + n ) × nz ( A ) matrix F = F A with 2 · nz ( A ) nonzeros a ij is k th nonzero of A ⇔ f ik , f m + j,k are nonzero in F SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 10

Communication for fine-grain 2D partitioning vertices 0 1 2 3 4 5 6 0 5 10 15 0 0 1 2 5 3 4 10 5 nets A F = F A Broken net in first m nets of hypergraph of F : nonzeros from row a i ∗ are in different parts, hence horizontal communication in A . Broken net in last n nets of hypergraph of F : vertical communication in A . SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 11

Fine-grain 2D partitioning ⇒ ⇒ Recursively split the matrix into 2 parts Assign individual nonzeros to parts SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 12

The difficulty of hybrids — a story The beautiful American dancer Isadora Duncan (1878–1927) suggested to the Irish writer George Bernard Shaw (1856–1950) that they should have a child together: “Think of it! With your brains and my body, what a wonder it would be." Shaw’s reply: “Yes, but what if it had my body and your brains?" Source: http://www.chiasmus.com/mastersofchiasmus/shaw.shtml Many different versions exist. Story may be apocryphal. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 13

Hybrid 2D partitioning ⇒ ⇒ Recursively split the matrix into 2 parts Try splits in row and column directions, and fine-grain Each time, choose the best of 3 SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 14

Recursive, adaptive bipartitioning algorithm MatrixPartition( A, p, ǫ ) input: ǫ = allowed load imbalance, ǫ > 0 . output: p -way partitioning of A with imbalance ≤ ǫ . if p > 1 then q := log 2 p ; ( A r 0 , A r 1 ) := h ( A, row , ǫ/q ) ; hypergraph splitting ( A c 0 , A c 1 ) := h ( A, col , ǫ/q ) ; ( A f 0 , A f 1 ) := h ( A, fine , ǫ/q ) ; ( A 0 , A 1 ) := best of ( A r 0 , A r 1 ) , ( A c 0 , A c 1 ) , ( A f 0 , A f 1 ) ; maxnz := nz ( A ) (1 + ǫ ) ; p nz ( A 0 ) · p ǫ 0 := maxnz 2 − 1 ; MatrixPartition( A 0 , p/ 2 , ǫ 0 ); nz ( A 1 ) · p 2 − 1 ; MatrixPartition( A 1 , p/ 2 , ǫ 1 ); ǫ 1 := maxnz else output A ; SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 15

Similarity metric for column merging (coarsening) Column-scaled inner product: m − 1 1 � M ( u, v ) = u i v i ω uv i =0 ω uv = 1 measures overlap ω uv = √ d u d v measures cosine of angle ω uv = min { d u , d v } measures relative overlap ω uv = max { d u , d v } Here, d u is the number of nonzeros of column u . SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 16

Speeding up the fine-grain method 2 normalized average time 1.5 1 0.5 1 0.98597 0.84233 0.89712 0 ip rnd ip1 ip2 ip = standard inner product matching ip1 = inner product matching using an upper bound on the overlap, e.g. d u to stop searching early. For fine-grain method, bound is sharper: 1 at first level. ip2 = alternate between matching with overlap in top and bottom rows. rnd = choose a random match with overlap ≥ 1 SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 17

Web searching: which page ranks first? SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 18

The link matrix A Given n web pages with links between them. We can define the sparse n × n link matrix A by � 1 if there is a link from page j to page i a ij = 0 otherwise . Let e = (1 , 1 , . . . , 1) T , representing an initial uniform importance (rank) of all web pages. Then � � ( A e ) i = a ij e j = a ij j j is the total number of links pointing to page i . The vector A e represents the importance of the pages; A 2 e takes the importance of the pointing pages into account as well; and so on. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 19

The Google matrix A web surfer chooses each of the outgoing N j links from page j with equal probability. Define the n × n diagonal matrix D with d jj = 1 /N j . Let α be the probability that a surfer follows an outlink of the current page. Typically α = 0 . 85 . The surfer jumps to a random page with probability 1 − α . The Google matrix is defined by (Brin and Page 1998) G = αAD + (1 − α ) ee T /n. The PageRank of a set of web pages is obtained by repeated multiplication by G , involving sparse matrix–vector multiplication by A , and some vector operations. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 20

Comparing 1D, 2D fine-grain, and 2D Mondriaan The following 1D and 2D fine-grain communication volumes for PageRank matrices are published results from the parallel program Par k way v2.1 (Bradley, de Jager, Knottenbelt, Trifunovi´ c 2005). The 2D Mondriaan volumes are results with all our improvements (to be incorporated in v2.0), but using only row/column partitioning, not the fine-grain option. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 21

Communication volume: PageRank matrix Stanford 4 x 10 8 7 6 5 4 3 2 1 p = 4 , 8 , 16 0 Parkway 1D Parkway fine−grained Mondriaan 2D n = 281 , 903 (pages), nz ( A ) = 2 , 594 , 228 nonzeros (links). Represents the Stanford WWW subdomain, obtained by a web crawl in September 2002 by Sep Kamvar. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 22

Communication volume: Stanford_Berkeley 4 x 10 15 10 5 p = 4 , 8 , 16 0 Parkway 1D Parkway fine−grained Mondriaan 2D n = 683 , 446 , nz ( A ) = 8 , 262 , 087 nonzeros. Represents the Stanford and Berkeley subdomains, obtained by a web crawl in Dec. 2002 by Sep Kamvar. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 23

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, - PowerPoint PPT Presentation

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, Tristan van Leeuwen Utrecht University Umit C ataly urek Ohio State University Support from BSIK-BRICKS/MSV and NCF SIAM Conf. Parallel Processing for Scientfic

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

Sparse matrix partitioning, ordering, and visualisation by Mondriaan 3.0 Outline Partitioning

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, Tristan van Leeuwen Utrecht

Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors

A User-Friendly Hybrid Sparse Matrix Class in C++ Conrad Sanderson, Ryan R. Curtin July 19, 2018

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Partitioning and Divide-and- Conquer Strategies Partitioning Strategies Partitioning simply

Partitioning Introduction to Partitioning Mahapatra-Texas A&M-Spring02 1 System

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse

Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of

Music Education Advocacy and Public Policy in Pennsylvania Mark Despotakis Chair, PMEA

MUSICAL RHYTHM YU / LAMONT APRIL 3, 2018 LINGUIST 197M, SPRING 2018. CLASS 10.1 2 TIME IN

SWCN workshop 21 st January 2020, Golden Lion Hotel, Stirling 10:00 Welcome Welcome and opening

TR Service Delivery: From Institution to Independence TRO Conference June 1, 2017 By: Benson

Connected Eco-Bus: An Innovative Vehicle-Powertrain Eco-Operation System for Efficient Plug-in

Power System Driven Hardware in the y Loop Simulations at Florida State University's Center for

COMPOSITES AS STRAIN SENSORS AUTHORS Catarina Lopes 1, * , Rita Salvado 1, *, Pedro Arajo 1 ,

Development of Carburetor for Optimum Performance of Producer Gas Fueled Dual Fuel Compression

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, - PowerPoint PPT Presentation

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, Tristan van Leeuwen Utrecht University Umit C ataly urek Ohio State University Support from BSIK-BRICKS/MSV and NCF SIAM Conf. Parallel Processing for Scientfic

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

Sparse matrix partitioning, ordering, and visualisation by Mondriaan 3.0 Outline Partitioning

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, Tristan van Leeuwen Utrecht

Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors

A User-Friendly Hybrid Sparse Matrix Class in C++ Conrad Sanderson, Ryan R. Curtin July 19, 2018

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Partitioning and Divide-and- Conquer Strategies Partitioning Strategies Partitioning simply

Partitioning Introduction to Partitioning Mahapatra-Texas A&amp;M-Spring02 1 System

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse

Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of

Music Education Advocacy and Public Policy in Pennsylvania Mark Despotakis Chair, PMEA

MUSICAL RHYTHM YU / LAMONT APRIL 3, 2018 LINGUIST 197M, SPRING 2018. CLASS 10.1 2 TIME IN

SWCN workshop 21 st January 2020, Golden Lion Hotel, Stirling 10:00 Welcome Welcome and opening

TR Service Delivery: From Institution to Independence TRO Conference June 1, 2017 By: Benson

Connected Eco-Bus: An Innovative Vehicle-Powertrain Eco-Operation System for Efficient Plug-in

Power System Driven Hardware in the y Loop Simulations at Florida State University's Center for

COMPOSITES AS STRAIN SENSORS AUTHORS Catarina Lopes 1, * , Rita Salvado 1, *, Pedro Arajo 1 ,

Development of Carburetor for Optimum Performance of Producer Gas Fueled Dual Fuel Compression

Partitioning Introduction to Partitioning Mahapatra-Texas A&M-Spring02 1 System