Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms Approximate - - PowerPoint PPT Presentation
Approximate Graph Operations on Parallel Platforms Approximate Graph Operations on Parallel Platforms Overview Computing similarity of nodes in two graphs Essentially ranking pairs of nodes Network similarity decomposition NSD Algorithm
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms
afigures from M. Bayati, M.Gerritsen,
Approximate Graph Operations on Parallel Platforms
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms
i=1 aji; similarly for ˜
A ⊗ B = 2 6 6 4 a1,1B a1,2B · · · a2,1B a2,2B . . . ... 3 7 7 5 = 2 6 6 6 6 6 6 6 6 6 6 4 a1,1b1,1 a1,1b1,2 · · · a1,2b1,1 a1,2b1,2 · · · a1,1b2,1 a1,1b2,2 a1,2b2,1 a1,2b2,2 . . . ... a2,1b1,1 a2,1b1,2 a2,1b2,1 a2,1b2,2 . . . 3 7 7 7 7 7 7 7 7 7 7 5 . Approximate Graph Operations on Parallel Platforms
Because AXB = unvec((BT ⊗ A)x) (property of Kronecker products) MAT3 (triple matrix product implementation of IsoRank idea)
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms
1 u(k) = ˜
2 X (n) = (1 − α) n−1
Approximate Graph Operations on Parallel Platforms
i
i
i
i
i
i
i
i
i
i
i T
i
i
i
i T
i=1 X (n) i
Approximate Graph Operations on Parallel Platforms
1
2
3
In the sequel IsoRank refers to the implementation of the IsoRank idea followed by the application of Hungarian and PDM algorithms to resulting X (as available in Singh’s binary code). Approximate Graph Operations on Parallel Platforms
Species Nodes Edges celeg (worm) 2805 4572 dmela (fly) 7518 25830 ecoli (bacterium) 1821 6849 hpylo (bacterium) 706 1414 hsapi (human) 9633 36386 mmusc (mouse) 290 254 scere (yeast) 5499 31898 Species pair NSD (secs) MAT3 (secs) PDM (secs) GM (secs) IsoRank (secs) celeg-dmela 3.15 64.20 152.12 7.29 783.48 celeg-hsapi 3.28 69.74 163.05 9.54 1209.28 celeg-scere 1.97 44.61 127.70 4.16 949.58 dmela-ecoli 1.86 37.79 86.80 4.78 807.93 dmela-hsapi 8.61 211.19 590.16 28.10 7840.00 dmela-scere 4.79 131.22 182.91 12.97 4905.00 ecoli-hsapi 2.41 47.48 79.23 4.76 2029.56 ecoli-scere 1.49 35.86 69.88 2.60 1264.24 hsapi-scere 6.09 152.02 181.17 15.56 6714.00
Network Similarity Decomposition (NSD): A Fast and Scalable Approach to Network Alignment. IEEE Transactions on Knowledge and Data Engineering, 2011. Approximate Graph Operations on Parallel Platforms
Parallel NSD: Root process compute ˜ A, ˜ B for i = 1 to s do w(0)
i
← wi , z(0)
i
← zi for k = 0 to n do w(k)
i
← ˜ Bw(k−1)
i
z(k)
i
← ˜ Az(k−1)
i
end for end for for i = 1, . . . s, k = 0, . . . , n do Partition w(k)
i
in p fragments, w(k)
i,1 , . . . , w(k) i,p
Partition z(k)
i
in q fragments, z(k)
i,1 , . . . , z(k) i,q
end for Send to every process (r, u) in the process grid p × q its corresponding w(k)
i,r , z(k) i,u fragments,
∀i = 1, . . . s, k = 0, . . . , n (r = 1, . . . , p, u = 1, . . . , q) Parallel NSD: Worker process (r, u) Receive corresponding w(k)
i,r , z(k) i,u fragments,
∀i = 1, . . . s, k = 0, . . . , n from the root process for i = 1 to s do zero X (n)
i,ru
for k = 0 to n − 1 do X (n)
i,ru ← X (n) i,ru + αk w(k) i,r z(k) i,u T
end for X (n)
i,ru ← (1 − α)X (n) i,ru + αnw(n) i,r z(n) i,u T
end for X (n)
ru
← Ps
i=1 X (n) i,ru
dmela-hsapi hsapi-scere num cores tIters tSimMat tIters tSimMat 4 0.211 28.103 0.194 21.062 9 0.210 15.914 0.213 11.865 16 0.219 9.851 0.215 7.478 25 0.202 7.072 0.195 5.283 36 0.311 6.080 0.209 4.493 49 0.193 5.809 0.240 4.233 64 0.207 4.915 0.253 3.576
Approximate Graph Operations on Parallel Platforms
Each conserved edge implies matching the corresponding edges connecting the elements of the matching pairs at its endpoints in the input networks. These subgraphs are essentially matchings of substructures in the input networks (CCS, Common Connected Subgraphs).
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms
i
i
i
i
i,r ← ˜
i
i
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms
1 2 4 8 16 32 64 2 4 8 16 32 64
Speed Improvement (T1 core / Tparallel) Protein-Protein Interaction
Size 10k
t-total t-similarityMatrix t-parallelAuction 1 2 4 8 16 32 96 192 384 768 1536 96 192 384 768 1536 96 192 384 768 1536
Speed Improvement (T48 cores / Tparallel) snapA
Size 100k
t-total t-similarityMatrix t-parallelAuction net/pfinan snapB
Approximate Graph Operations on Parallel Platforms
1 2 4 8 768 1536 3072 768 1536 3072 768 1536 3072
Speed Improvement (T384 cores / Tparallel) dnvs
Size 200k
t-total t-similarityMatrix t-parallelAuction usroads b3 1 2 4 8 1536 3072 1536 3072 1536 3072
Speed Improvement (T768 cores / Tparallel) notreDame
Size 300k
t-total t-similarityMatrix t-parallelAuction coAuthors stanford
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms
Cores 1 2 4 8 16 32 64 t similarityMatrix 11.73 6.10 2.94 1.46 0.73 0.36 0.18 t parallelAuction 62.68 34.02 17.61 9.49 5.07 3.47 2.80 t totalSimilarityProcess 74.52 40.23 20.65 11.05 5.90 3.94 3.10 Conserved edges 625 691 688 737 745 668 658
1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 Spped Improvement (Tseq / Tpar) Number of Compute Cores
Strong Scaling: Protein-Protein Interaction
t_total t_similarity t_auction
Approximate Graph Operations on Parallel Platforms
Cores 128 256 512 1024 t generateIterates 5.07 5.04 5.33 6.13 t generateRow 16,450.88 8,152.49 4,030.19 1,224.77 t sort 1,577.80 788.39 394.80 197.03 t similarityMatrix 18,045.54 8,949.46 4,429.16 1,423.65 t parallelAuction 55.16 28.82 16.32 11.90 t totalSimilarityProcess 18,121.54 8,998.95 4,466.56 1,457.53 Conserved edges 80,884 80,884 80,884 80,884
Cores 128 256 512 1024 t generateIterates 11.00 11.19 11.53 12.36 t generateRow 15,703.82 7,475.45 3,254.46 1,228.59 t sort 1,606.47 802.44 400.97 200.69 t similarityMatrix 17,327.27 8,286.56 3,659.58 1,431.17 t parallelAuction 31.97 19.78 14.37 14.93 t totalSimilarityProcess 17,382.15 8,329.41 3,697.45 1,470.62 Conserved edges 1,010/1,100 1,018/1,097 1,014/1,088 1,014/1,088
Approximate Graph Operations on Parallel Platforms
2 4 6 8 10 12 256 512 1024 2048 Speed Improvement T256 proc / Tpar Number of Compute Cores
Strong + Weak Scaling: Large Wikipedia Graphs
t_total t_auction
0,5 1 1,5 2 2,5 3 3,5 4 4,5 256 512 1024 2048 Speed Improvement T256 proc / Tpar Number of Compute Cores
Strong + Weak Scaling: Self-Similarity Web Graph
t_total t_auction
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms
Approximate Graph Operations on Parallel Platforms