a hybrid 2d method for sparse matrix partitioning
play

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, - PowerPoint PPT Presentation

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, Tristan van Leeuwen Utrecht University Umit C ataly urek Ohio State University Support from BSIK-BRICKS/MSV and NCF PMAA 2008, Neuch atel, June 20, 2008 p. 1


  1. A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, Tristan van Leeuwen Utrecht University ¨ Umit C ¸ ataly¨ urek Ohio State University Support from BSIK-BRICKS/MSV and NCF PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 1

  2. Outline 1. Introduction Mondriaan 2D matrix partitioning Fine-grain 2D partitioning 2. New: hybrid method for 2D partitioning Combining the Mondriaan and fine-grain methods 3. Experimental results PageRank matrices: Stanford-Berkeley subdomain Other sparse matrices: term-by-document, linear programming, polymers 4. Conclusions and future work PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 2

  3. Parallel sparse matrix–vector multiplication u := A v A sparse m × n matrix, u dense m -vector, v dense n -vector n − 1 � u i := a ij v j j =0 v 2 1 1 4 3 6 3 1 9 4 1 22 5 9 2 41 6 5 3 64 5 8 9 u A p = 2 4 phases: communicate, compute, communicate, compute PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 3

  4. Hypergraph 0 5 1 6 2 7 3 8 4 Hypergraph with 9 vertices and 6 hyperedges (nets), partitioned over 2 processors PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 4

  5. 1D matrix partitioning using hypergraphs vertices 0 1 2 3 4 5 6 0 1 2 3 4 5 nets Column bipartitioning of m × n matrix Hypergraph H = ( V , N ) ⇒ exact communication volume in sparse matrix–vector multiplication. Columns ≡ Vertices: 0 , 1 , 2 , 3 , 4 , 5 , 6 . Rows ≡ Hyperedges (nets, subsets of V ): n 0 = { 1 , 4 , 6 } , n 1 = { 0 , 3 , 6 } , n 2 = { 4 , 5 , 6 } , n 3 = { 0 , 2 , 3 } , n 4 = { 2 , 3 , 5 } , n 5 = { 1 , 4 , 6 } . PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 5

  6. Minimising communication volume vertices 0 1 2 3 4 5 6 0 1 2 3 4 5 nets Broken nets: n 1 , n 2 cause one horizontal communication Use Kernighan–Lin/Fiduccia–Mattheyses for hypergraph bipartitioning Multilevel scheme: merge similar columns first, refine bipartitioning afterwards Used in PaToH (Çatalyürek and Aykanat 1999) for 1D matrix partitioning. PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 6

  7. Mondriaan 2D matrix partitioning Block distribution (without row/column permutations) of 59 × 59 matrix impcol_b with 312 nonzeros, for p = 4 Mondriaan package v1.0 (May 2002). Originally developed by Vastenhouw and Bisseling for partitioning term-by-document matrices for a parallel web search machine. PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 7

  8. Mondriaan 2D partitioning ⇒ ⇒ ⇒ Recursively split the matrix into 2 parts. Try splits in row and column directions, allowing permutations. Each time, choose the best direction. PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 8

  9. Fine-grain 2D partitioning Assign each nonzero of A individually to a part. Each nonzero becomes a vertex in the hypergraph. Each matrix row and column becomes a hyperedge. Hence nz ( A ) vertices and m + n hyperedges. Proposed by Çatalyürek and Aykanat, 2001. PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 9

  10. PMAA view of fine-grain 2D partitioning vertices 0 1 2 3 4 5 6 0 5 10 15 0 0 1 2 5 3 4 10 5 nets A F = F A View the fine-grain hypergraph as an incidence matrix. m × n matrix A with nz ( A ) nonzeros ( m + n ) × nz ( A ) matrix F = F A with 2 · nz ( A ) nonzeros a ij is k th nonzero of A ⇔ f ik , f m + j,k are nonzero in F PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 10

  11. Communication for fine-grain 2D partitioning vertices 0 1 2 3 4 5 6 0 5 10 15 0 0 1 2 5 3 4 10 5 nets A F = F A Broken net in first m nets of hypergraph of F : nonzeros from row a i ∗ are in different parts, hence horizontal communication in A . Broken net in last n nets of hypergraph of F : vertical communication in A . PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 11

  12. Fine-grain 2D partitioning ⇒ ⇒ Recursively split the matrix into 2 parts Assign individual nonzeros to parts For visualisation: move mixed rows to middle, red up, blue down. Same for columns. PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 12

  13. Hybrid 2D partitioning ⇒ ⇒ Recursively split the matrix into 2 parts Try splits in row and column directions, and fine-grain Each time, choose the best of 3 PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 13

  14. Recursive, adaptive bipartitioning algorithm MatrixPartition( A, p, ǫ ) input: ǫ = allowed load imbalance, ǫ > 0 . output: p -way partitioning of A with imbalance ≤ ǫ . if p > 1 then q := log 2 p ; ( A r 0 , A r 1 ) := h ( A, row , ǫ/q ) ; hypergraph splitting ( A c 0 , A c 1 ) := h ( A, col , ǫ/q ) ; ( A f 0 , A f 1 ) := h ( A, fine , ǫ/q ) ; ( A 0 , A 1 ) := best of ( A r 0 , A r 1 ) , ( A c 0 , A c 1 ) , ( A f 0 , A f 1 ) ; maxnz := nz ( A ) (1 + ǫ ) ; p nz ( A 0 ) · p ǫ 0 := maxnz 2 − 1 ; MatrixPartition( A 0 , p/ 2 , ǫ 0 ); nz ( A 1 ) · p 2 − 1 ; MatrixPartition( A 1 , p/ 2 , ǫ 1 ); ǫ 1 := maxnz else output A ; PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 14

  15. Non-power-of 2 algorithm MatrixPartition( A, p, ǫ ) input: ǫ = allowed load imbalance, ǫ > 0 . output: p -way partitioning of A with imbalance ≤ ǫ . if p > 1 then q := ⌈ log 2 p ⌉ ; ( A r 0 , A r 1 ) := h ( A, row , ǫ/q ) ; ( A c 0 , A c 1 ) := h ( A, col , ǫ/q ) ; ( A f 0 , A f 1 ) := h ( A, fine , ǫ/q ) ; ( A 0 , A 1 ) := best of ( A r 0 , A r 1 ) , ( A c 0 , A c 1 ) , ( A f 0 , A f 1 ) ; Choose p 0 , p 1 ≥ 1 with p = p 0 + p 1 ; maxnz := nz ( A ) (1 + ǫ ) ; p ǫ 0 := maxnz nz ( A 0 ) · p 0 − 1 ; MatrixPartition( A 0 , p 0 , ǫ 0 ); nz ( A 1 ) · p 1 − 1 ; MatrixPartition( A 1 , p 1 , ǫ 1 ); ǫ 1 := maxnz else output A ; PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 15

  16. Similarity metric for column merging (coarsening) Column-scaled inner product: m − 1 1 � M ( u, v ) = u i v i ω uv i =0 ω uv = 1 measures overlap ω uv = √ d u d v measures cosine of angle ω uv = min { d u , d v } measures relative overlap ω uv = max { d u , d v } ω uv = d u ∪ v , Jaccard metric from information retrieval Here, d u is the number of nonzeros of column u . PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 16

  17. Speeding up the fine-grain method 2 normalized average time 1.5 1 0.5 1 0.98597 0.84233 0.89712 0 ip rnd ip1 ip2 ip = standard inner product matching ip1 = inner product matching using an upper bound on the overlap, e.g. d u to stop searching early. For fine-grain method, bound is sharper: 1 at first level. ip2 = alternate between matching with overlap in top and bottom rows. rnd = choose a random match with overlap ≥ 1 PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 17

  18. Web searching: which page ranks first? PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 18

  19. The link matrix A Given n web pages with links between them. We can define the sparse n × n link matrix A by � 1 if there is a link from page j to page i a ij = 0 otherwise . Let e = (1 , 1 , . . . , 1) T , representing an initial uniform importance (rank) of all web pages. Then � � ( A e ) i = a ij e j = a ij j j is the total number of links pointing to page i . The vector A e represents the importance of the pages; A 2 e takes the importance of the pointing pages into account as well; and so on. PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 19

  20. The Google matrix A web surfer chooses each of the outgoing N j links from page j with equal probability. Define the n × n diagonal matrix D with d jj = 1 /N j . Let α be the probability that a surfer follows an outlink of the current page. Typically α = 0 . 85 . The surfer jumps to a random page with probability 1 − α . The Google matrix is defined by (Brin and Page 1998) G = αAD + (1 − α ) ee T /n. The PageRank of a set of web pages is obtained by repeated multiplication by G , involving sparse matrix–vector multiplication by A , and some vector operations. PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 20

  21. Comparing 1D, 2D fine-grain, and 2D Mondriaan The following 1D and 2D fine-grain communication volumes for PageRank matrices are published results from the parallel program Par k way v2.1 (Bradley, de Jager, Knottenbelt, Trifunovi´ c 2005). The 2D Mondriaan volumes are results with all our improvements (incorporated in v2.0), but using only row/column partitioning, not the fine-grain option. PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 21

  22. Communication volume: Stanford_Berkeley 4 x 10 15 10 5 p = 4 , 8 , 16 0 Parkway 1D Parkway fine−grained Mondriaan 2D n = 683 , 446 , nz ( A ) = 8 , 262 , 087 nonzeros. Represents the Stanford and Berkeley subdomains, obtained by a web crawl in Dec. 2002 by Sep Kamvar. PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 22

  23. Meaning of results Both 2D methods save an order of magnitude in communication volume compared to 1D. Parkway fine-grain is slightly better than Mondriaan, in terms of partitioning quality. This may be due to a better implementation, or due to the fine-grain method itself. Further investigation is needed. 2D Mondriaan is much faster than fine-grain, since the hypergraphs involved are much smaller: 7 × 10 5 vs. 8 × 10 6 vertices for Stanford_Berkeley . PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 23

  24. Transition matrix cage6 of Markov model Reduced transition matrix cage6 with n = 93 , nz ( A ) = 785 for polymer length L = 6 . Larger matrix cage10 is included in our test set of 18 matrices representing various applications: 3 linear programming matrices, 2 information retrieval, 2 chemical engineering, 2 circuit simulation, 1 polymer simulation, . . . PMAA 2008, Neuchˆ atel, June 20, 2008 – p. 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend