a hybrid 2d method for sparse matrix partitioning
play

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, - PowerPoint PPT Presentation

A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, Tristan van Leeuwen Utrecht University Umit C ataly urek Ohio State University Support from BSIK-BRICKS/MSV and NCF SIAM Conf. Parallel Processing for Scientfic


  1. A Hybrid 2D Method for Sparse Matrix Partitioning Rob Bisseling, Tristan van Leeuwen Utrecht University ¨ Umit C ¸ ataly¨ urek Ohio State University Support from BSIK-BRICKS/MSV and NCF SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 1

  2. Outline 1. Introduction Mondriaan 2D matrix partitioning Fine-grain 2D partitioning 2. New: hybrid method for 2D partitioning The difficulty of hybrids Combining the Mondriaan and fine-grain methods 3. Experimental results PageRank matrices: Stanford, Stanford-Berkeley Other sparse matrices: term-by-document, linear programming, polymers 4. Conclusions and future work SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 2

  3. Parallel sparse matrix–vector multiplication u := A v A sparse m × n matrix, u dense m -vector, v dense n -vector n − 1 � u i := a ij v j j =0 v 2 1 1 4 3 6 3 1 9 4 1 22 5 9 2 41 6 5 3 64 5 8 9 u A p = 2 4 phases: communicate, compute, communicate, compute SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 3

  4. Hypergraph 0 5 1 6 2 7 3 8 4 Hypergraph with 9 vertices and 6 hyperedges (nets), partitioned over 2 processors SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 4

  5. 1D matrix partitioning using hypergraphs vertices 0 1 2 3 4 5 6 0 1 2 3 4 5 nets Column bipartitioning of m × n matrix Hypergraph H = ( V , N ) ⇒ exact communication volume in sparse matrix–vector multiplication. Columns ≡ Vertices: 0 , 1 , 2 , 3 , 4 , 5 , 6 . Rows ≡ Hyperedges (nets, subsets of V ): n 0 = { 1 , 4 , 6 } , n 1 = { 0 , 3 , 6 } , n 2 = { 4 , 5 , 6 } , n 3 = { 0 , 2 , 3 } , n 4 = { 2 , 3 , 5 } , n 5 = { 1 , 4 , 6 } . SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 5

  6. Minimising communication volume vertices 0 1 2 3 4 5 6 0 1 2 3 4 5 nets Broken nets: n 1 , n 2 cause one horizontal communication Use Kernighan–Lin/Fiduccia–Mattheyses for hypergraph bipartitioning Multilevel scheme: merge similar columns first, refine bipartitioning afterwards Used in PaToH (Çatalyürek and Aykanat 1999) for 1D matrix partitioning. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 6

  7. Mondriaan 2D matrix partitioning Block distribution (without row/column permutations) of 59 × 59 matrix impcol_b with 312 nonzeros, for p = 4 Mondriaan package v1.0 (May 2002). Originally developed by Vastenhouw and Bisseling for partitioning term-by-document matrices for a parallel web search machine. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 7

  8. Mondriaan 2D partitioning ⇒ ⇒ ⇒ Recursively split the matrix into 2 parts Try splits in row and column directions, allowing permutations. Each time, choose the best direction SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 8

  9. Fine-grain 2D partitioning Assign each nonzero of A individually to a part. Each nonzero becomes a vertex; each matrix row and column a hyperedge. Hence nz ( A ) vertices and m + n hyperedges. Proposed by Çatalyürek and Aykanat, 2001. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 9

  10. Matrix view of fine-grain 2D partitioning vertices 0 1 2 3 4 5 6 0 5 10 15 0 0 1 2 5 3 4 10 5 nets A F = F A m × n matrix A with nz ( A ) nonzeros ( m + n ) × nz ( A ) matrix F = F A with 2 · nz ( A ) nonzeros a ij is k th nonzero of A ⇔ f ik , f m + j,k are nonzero in F SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 10

  11. Communication for fine-grain 2D partitioning vertices 0 1 2 3 4 5 6 0 5 10 15 0 0 1 2 5 3 4 10 5 nets A F = F A Broken net in first m nets of hypergraph of F : nonzeros from row a i ∗ are in different parts, hence horizontal communication in A . Broken net in last n nets of hypergraph of F : vertical communication in A . SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 11

  12. Fine-grain 2D partitioning ⇒ ⇒ Recursively split the matrix into 2 parts Assign individual nonzeros to parts SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 12

  13. The difficulty of hybrids — a story The beautiful American dancer Isadora Duncan (1878–1927) suggested to the Irish writer George Bernard Shaw (1856–1950) that they should have a child together: “Think of it! With your brains and my body, what a wonder it would be." Shaw’s reply: “Yes, but what if it had my body and your brains?" Source: http://www.chiasmus.com/mastersofchiasmus/shaw.shtml Many different versions exist. Story may be apocryphal. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 13

  14. Hybrid 2D partitioning ⇒ ⇒ Recursively split the matrix into 2 parts Try splits in row and column directions, and fine-grain Each time, choose the best of 3 SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 14

  15. Recursive, adaptive bipartitioning algorithm MatrixPartition( A, p, ǫ ) input: ǫ = allowed load imbalance, ǫ > 0 . output: p -way partitioning of A with imbalance ≤ ǫ . if p > 1 then q := log 2 p ; ( A r 0 , A r 1 ) := h ( A, row , ǫ/q ) ; hypergraph splitting ( A c 0 , A c 1 ) := h ( A, col , ǫ/q ) ; ( A f 0 , A f 1 ) := h ( A, fine , ǫ/q ) ; ( A 0 , A 1 ) := best of ( A r 0 , A r 1 ) , ( A c 0 , A c 1 ) , ( A f 0 , A f 1 ) ; maxnz := nz ( A ) (1 + ǫ ) ; p nz ( A 0 ) · p ǫ 0 := maxnz 2 − 1 ; MatrixPartition( A 0 , p/ 2 , ǫ 0 ); nz ( A 1 ) · p 2 − 1 ; MatrixPartition( A 1 , p/ 2 , ǫ 1 ); ǫ 1 := maxnz else output A ; SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 15

  16. Similarity metric for column merging (coarsening) Column-scaled inner product: m − 1 1 � M ( u, v ) = u i v i ω uv i =0 ω uv = 1 measures overlap ω uv = √ d u d v measures cosine of angle ω uv = min { d u , d v } measures relative overlap ω uv = max { d u , d v } Here, d u is the number of nonzeros of column u . SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 16

  17. Speeding up the fine-grain method 2 normalized average time 1.5 1 0.5 1 0.98597 0.84233 0.89712 0 ip rnd ip1 ip2 ip = standard inner product matching ip1 = inner product matching using an upper bound on the overlap, e.g. d u to stop searching early. For fine-grain method, bound is sharper: 1 at first level. ip2 = alternate between matching with overlap in top and bottom rows. rnd = choose a random match with overlap ≥ 1 SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 17

  18. Web searching: which page ranks first? SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 18

  19. The link matrix A Given n web pages with links between them. We can define the sparse n × n link matrix A by � 1 if there is a link from page j to page i a ij = 0 otherwise . Let e = (1 , 1 , . . . , 1) T , representing an initial uniform importance (rank) of all web pages. Then � � ( A e ) i = a ij e j = a ij j j is the total number of links pointing to page i . The vector A e represents the importance of the pages; A 2 e takes the importance of the pointing pages into account as well; and so on. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 19

  20. The Google matrix A web surfer chooses each of the outgoing N j links from page j with equal probability. Define the n × n diagonal matrix D with d jj = 1 /N j . Let α be the probability that a surfer follows an outlink of the current page. Typically α = 0 . 85 . The surfer jumps to a random page with probability 1 − α . The Google matrix is defined by (Brin and Page 1998) G = αAD + (1 − α ) ee T /n. The PageRank of a set of web pages is obtained by repeated multiplication by G , involving sparse matrix–vector multiplication by A , and some vector operations. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 20

  21. Comparing 1D, 2D fine-grain, and 2D Mondriaan The following 1D and 2D fine-grain communication volumes for PageRank matrices are published results from the parallel program Par k way v2.1 (Bradley, de Jager, Knottenbelt, Trifunovi´ c 2005). The 2D Mondriaan volumes are results with all our improvements (to be incorporated in v2.0), but using only row/column partitioning, not the fine-grain option. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 21

  22. Communication volume: PageRank matrix Stanford 4 x 10 8 7 6 5 4 3 2 1 p = 4 , 8 , 16 0 Parkway 1D Parkway fine−grained Mondriaan 2D n = 281 , 903 (pages), nz ( A ) = 2 , 594 , 228 nonzeros (links). Represents the Stanford WWW subdomain, obtained by a web crawl in September 2002 by Sep Kamvar. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 22

  23. Communication volume: Stanford_Berkeley 4 x 10 15 10 5 p = 4 , 8 , 16 0 Parkway 1D Parkway fine−grained Mondriaan 2D n = 683 , 446 , nz ( A ) = 8 , 262 , 087 nonzeros. Represents the Stanford and Berkeley subdomains, obtained by a web crawl in Dec. 2002 by Sep Kamvar. SIAM Conf. Parallel Processing for Scientfic Computing, Feb. 23, 2006 – p. 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend