euro pvm mpi 2003
play

Euro PVM/MPI 2003 1/22 Venezia, Italia Efficient Parallel - PowerPoint PPT Presentation

Euro PVM/MPI 2003 1/22 Venezia, Italia Efficient Parallel Implementation of Transitive Closure of Digraphs C. E. R. Alves Univsidade S ao Judas Tadeu E. N. C aceres Universidade Federal de Mato Grosso do Sul A. A. Castro Jr.


  1. Euro PVM/MPI 2003 1/22 Venezia, Italia Efficient Parallel Implementation of Transitive Closure of Digraphs C. E. R. Alves Univsidade S˜ ao Judas Tadeu E. N. C´ aceres � Universidade Federal de Mato Grosso do Sul A. A. Castro Jr. � Universidade Cat´ olica Dom Bosco � S. W. Song � Universidade de S˜ ao Paulo J. L. Szwarcfiter � Universidade Federal do Rio de Janeiro � �

  2. 2/22 The Transitive Closure Problem • Used in many areas such as – Network Planning – Distributed Systems Design • Used in problems such as – All Shortest Paths in a Directed Graph – Breadth-First Spanning Trees � • Directed graph D ( V, E ) with | V | = n , | E | = m � • We present a parallel algorithm to compute its transi- � tive closure using � – p processors � – each with O ( n 2 p ) local memory � �

  3. 3/22 Example 5 3 2 4 6 � � � 1 � A directed graph. � � �

  4. 4/22 Example 5 3 2 4 6 � � � 1 � Its transitive closure: green edges joining i to j if j can � be reached from i . � �

  5. 5/22 BSP/CGM Model CGM (Coarse Grained Multicomputer) model: p of pro- cessors, each with its own local memory, communicating through a network. The algorithm alternates between • Computation round: each processor computes inde- pendently. • Communication round: each processor sends/receives � data to/from other processors. � � Goals: � • Obtain a linear speed-up on p . � • Minimize the number of rounds. � �

  6. 6/22 The CGM Model Computation round Communication round P p − 1 P 2 � � P 1 � Global Communication � Synchronization Barrier P 0 � Local computation � �

  7. 7/22 Previous Parallel Algorithms 1. PRAM: • Karp et al.: CREW: O (log 2 n ) time with O ( M ( n )) 1 processors. a: CRCW: O (log n ) time with O ( n 3 ) processors. • J´ aJ´ 2. C´ aceres et al.: Acyclic digraph with linear extension labeling O ( logp ) rounds with O ( n 3 /p ) local time � 3. Dependency Graph Approach: � O ( p ) rounds with O ( n 3 /p ) local • Pagourtzis et al.: � time � � � 1 M ( n ) is the best known sequential bound for multiplying two n × n matrices over a ring �

  8. 8/22 Warshall’s Algorithm Algorithm 1: Warshall’s Algorithm Input: Adjacency matrix M n × n of graph G Output: Transitive closure of graph G 1: for k ← 1 until n do for i ← 1 until n do 2: for j ← 1 until n do 3: M [ i, j ] ← M [ i, j ] or ( M [ i, k ] and M [ k, j ]) � 4: end for 5: � end for 6: � 7: end for � � � �

  9. 9/22 Partitioning the Adjacency Matrix 1 2 3 4 j k 1 k t i t t 2 � � � 3 � � 4 � �

  10. 10/22 The Parallel Algorithm Algorithm 2: Parallel Warshall Input: Adjacency matrix M stored in the p processors: each processor q (1 ≤ q ≤ p ) stores submatrices M [( q − 1) n p + 1 ..q n p ][1 ..n ] and M [1 ..n ][( q − 1) n p + 1 ..q n p ]. Output: Transitive closure of graph G represented by the trans- formed matrix M . � � � � � � �

  11. Algorithm 3: Parallel Warshall 11/22 Each processor q (1 ≤ q ≤ p ) does the following. 1: repeat for k = ( q − 1) n p + 1 until q n p do 2: for i = 0 until n − 1 do 3: for j = 0 until n − 1 do 4: if M [ i ][ k ] = 1 and M [ k ][ j ] = 1 then 5: M [ i ][ j ] = 1 (if M [ i ][ j ] belongs to processor different 6: from q then store it for subsequent transmission to the corresponding processor.) end if 7: � Send stored data to the corresponding processors. 8: Receive data that belong to processor q from other pro- � 9: cessors. � end for 10: � end for 11: � end for 12: � 13: until no new matrix entry updates are done �

  12. 12/22 The Main Idea • Make a partition of V ( D ) . • In each partition, using the edges of D construct a digraph formed by the edges of D that have at least one of its extremes in the partition. • Compute the Transitive Closure in each partition. • Send the computed transitive edges to the proper par- tition. � � � � � � �

  13. 13/22 Example 1 5 3 2 8 � � 4 6 � � � 7 � �

  14. 14/22 Example 1 5 5 3 2 3 2 8 4 6 6 � � � 7 7 � Processor 0 Processor 1 � � �

  15. 15/22 Example 1 5 5 3 2 3 2 8 4 6 6 � � � 7 7 � Processor 0 Processor 1 � � �

  16. 16/22 Example 1 5 1 5 3 2 8 3 2 8 4 6 4 6 � � � 7 7 � Processor 0 Processor 1 � � �

  17. 17/22 Implementation • 64-node Beowulf cluster - low cost microcomputers with 256MB RAM, 256MB swap memory, CPU In- tel Pentium III 448.956 MHz, 512KB cache. • 100 Mb fast-Ethernet switch. • Code in standard ANSI C and LAM-MPI Version 6.5.6. • Tests on randomly generated digraphs with 20 % prob- ability of an edge between two vertices. � � • In all the tests, the number of communication rounds � required are less than log p . � � � �

  18. 18/22 Implementation Results • 25 ◦ 480x480 • 512x512 20 ◦ 15 Seconds ◦ � 10 •• ◦ � � 5 ◦ • • • ◦ ◦ � • 0 � 10 20 30 40 50 60 � No. Processors �

  19. 19/22 Implementation Results ⋄ 1500 ⋄ 1920x1920 • 1024x1024 ◦ 960x960 1000 Seconds � ⋄ � 500 � ⋄ ◦ � • •• • ⋄ ◦◦ ◦ ⋄ ⋄ ⋄ • • • ◦ ◦ ◦ 0 � 10 20 30 40 50 60 � No. Processors �

  20. 20/22 Implementation Results 15 • 10 • Speedup • � ◦ • ◦ � 5 • 512x512 ◦ � •• ◦ 480x480 ◦◦ � • ◦ 0 � 10 20 30 40 50 60 � No. Processors �

  21. 21/22 Implementation Results ⋄ 30 ⋄ 20 ◦ ◦ ◦ Speedup � ⋄ � ⋄ • ◦ 10 • � • ⋄ 1920x1920 ⋄ ◦ • 1024x1024 � • ⋄ ◦ 960x960 ◦ •• ◦ • ⋄ � 0 10 20 30 40 50 60 � No. Processors �

  22. 22/22 Conclusion A BSP/CGM algorithm for the Transitive Closure problem. • Digraph with n vertices and m edges. • The number of communication rounds measured: O (log p ) . • Local computation time: O ( mn/p ) . � � � � � � �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend