gpu accelerated network centrality
play

GPU-Accelerated Network Centrality Erik Saule collaborative work - PowerPoint PPT Presentation

GPU-Accelerated Network Centrality Erik Saule collaborative work with: uce (OSU), Kamer Kaya (Sabanc), Ahmet Erdem Sary Umit V. C ataly urek (OSU) University of North Carolina at Charlotte (CS) GTC 2015 Erik Saule (UNCC) GPU


  1. GPU-Accelerated Network Centrality Erik Saule collaborative work with: uce (OSU), Kamer Kaya (Sabancı), ¨ Ahmet Erdem Sarıy¨ Umit V. C ¸ataly¨ urek (OSU) University of North Carolina at Charlotte (CS) GTC 2015 Erik Saule (UNCC) GPU Centrality GTC 2015 1 / 24

  2. Outline Introduction 1 Decomposition for GPU 2 An SpMM-based approach 3 Conclusion 4 Erik Saule (UNCC) GPU Centrality GTC 2015 2 / 24

  3. Centralities - Concept Answer questions such as Who controls the flow in a network? Applications Who is more important? Covert network (e.g., terrorist Who has more influence? identification) Whose contribution is significant for Contingency analysis (e.g., connections? weakness/robustness of networks) Viral marketing (e.g., who will spread the Different kinds of graph word best) road networks Traffic analysis social networks Store locations power grids mechanical mesh Erik Saule (UNCC) GPU Centrality GTC 2015 3 / 24

  4. Centrality Formally Betweenness Centrality Closeness Centrality Let G = ( V , E ) be an unweighted graph. Let σ st be the number of shortest paths Let G = ( V , E ) be an unweighted graph with connecting s amd t . Let σ st ( v ) be the the vertex set V and edge set E . number of such s - t paths passing through v . 1 cc [ v ] = � d ( v , u ) where d ( u , v ) is the u ∈ V bc [ v ] = � s � = v � = t ∈ V δ st ( v ) where shortest path length between u and v . δ st ( v ) = σ st ( v ) σ st . Algorithm In each case, the best algorithm computes the shortest path graph rooted in each vertex of the graph and extract the relevant information. The complexity is O ( E ) per source, O ( VE ) in total, which makes its computationally expensive. Erik Saule (UNCC) GPU Centrality GTC 2015 4 / 24

  5. Computing Breadth First Traversal (Centrality) Top-down (scatter writes) Direction Optimizing. For each element of the frontier, touch the Level synchronous bfs. neighbors. T o Complexity: O ( E ) x x Writes are scattered in memory x x x From Bottom-up (gather reads) x x For each vertex, are the neighbors in the x x x frontier? x Complexity O ( ED ), where D is the diameter of the graph. Writes are performed once linearly. Erik Saule (UNCC) GPU Centrality GTC 2015 5 / 24

  6. Outline Introduction 1 Decomposition for GPU 2 An SpMM-based approach 3 Conclusion 4 Erik Saule (UNCC) GPU Centrality GTC 2015 6 / 24

  7. Traditionally ... x x Vertex Centric x 1 thread: 1 vertex x x x x x No graph coalescing x x Vector read is not coalesced x No atomics High divergence (high degree) Erik Saule (UNCC) GPU Centrality GTC 2015 7 / 24

  8. Traditionally ... x x Edge Centric x 1 thread: 1 edge x x x x Graph read is coalesced x Vector read is not coalesced x x Many atomics x Little divergence (likely to have adjacent thread doing the same vertex) Erik Saule (UNCC) GPU Centrality GTC 2015 8 / 24

  9. Virtual vertex decomposition Virtual Vertex x x 1 thread: 1 virtual vertex x x x x x High degree vertices are split in multiple x ”virtual vertices” x x No graph coalescing x Vector read is not coalesced Some atomics Bounded divergence Erik Saule (UNCC) GPU Centrality GTC 2015 9 / 24

  10. Strided virtual vertex decomposition Virtual Vertex x x 1 thread: 1 virtual vertex x x x x x x x x x Some graph coalescing Vector read is not coalesced Some atomics Bounded divergence Erik Saule (UNCC) GPU Centrality GTC 2015 10 / 24

  11. Experimental Setting Instances Graph | V | | E | Avg | Γ( v ) | Max | Γ( v ) | Diam. 403K 4,886K 12.1 2,752 19 Amazon 196K 1,900K 9.6 14,730 12 Gowalla Google 855K 8,582K 10.0 6,332 18 NotreDame 325K 2,180K 6.6 10,721 27 WikiTalk 2,388K 9,313K 3.8 100,029 10 3,072K 234,370K 76.2 33,313 9 Orkut 4,843K 85,691K 17.6 20,333 15 LiveJournal Machines 2 Intel Sandybridge EP NVIDIA K20 Metric VE Traversed Edge Per Second: time . Erik Saule (UNCC) GPU Centrality GTC 2015 11 / 24

  12. First results 11 GPU vertex 10 GPU edge 9 GPU virtual Speedup wrt CPU 1 thread GPU stride 8 7 6 5 4 3 2 1 0 Erik Saule (UNCC) GPU Centrality GTC 2015 12 / 24

  13. Outline Introduction 1 Decomposition for GPU 2 An SpMM-based approach 3 Conclusion 4 Erik Saule (UNCC) GPU Centrality GTC 2015 13 / 24

  14. No vector coalescing x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Edge Vertex Virtual Virtual Coalesced All the representations give vector coalescing only “if you are lucky” Erik Saule (UNCC) GPU Centrality GTC 2015 14 / 24

  15. Simultaneous sources traversal The problem with previous methods is that BFS leaves the coalescing of the access to the vector up to the structure of the graph. Multiple sources All threads of a warp should make x x similar access pattern. x Since there are multiple traversals to x x perform in a Centrality computation, process B traversals at once. x x x If a vertex is in the same level of the x x BFS in multiple traversal, they will be x processed at the same time. Social networks have most vertices in a few levels. Erik Saule (UNCC) GPU Centrality GTC 2015 15 / 24

  16. An SpMV-based approach of BFS for Closeness Centrality A simpler definition of level synchronous BFS Vertex v is at level ℓ if and only if one of the neighbors of v is at level ℓ − 1 and v is not at any level ℓ ′ < ℓ . Let x ℓ i = true if vertex i is a part of the frontier at level ℓ . y ℓ +1 is the neighbors of level ℓ . y ℓ +1 = OR j ∈ Γ( k ) x ℓ j . ( (OR, AND)-SpMV ) k Compute the next level frontier x ℓ +1 = y ℓ +1 & ¬ ( OR ℓ ′ ≤ ℓ x ℓ ′ i ) . i i Contribution of the source to cc [ i ] is x ℓ ℓ . i It allows to compute Closeness Centrality by encoding the state of 32 traversal with an int . Erik Saule (UNCC) GPU Centrality GTC 2015 16 / 24

  17. Impact on working warps 1.8 Amazon Gowalla 1.7 Google NotreDame 1.6 WikiT alk Orkut 1.5 LiveJournal 1.4 1.3 1.2 1.1 1 0.9 0.8 1 2 4 8 16 32 B Number of active warps necessary for 32 traversals. Small increase in the number of warps. Erik Saule (UNCC) GPU Centrality GTC 2015 17 / 24

  18. Impact on non simultaneous traversal 1 Amazon Gowalla 0.9 Google NotreDame 0.8 WikiT alk Orkut 0.7 LiveJournal 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 4 8 16 32 B With B = 4, 32 traversals of one vertex are distributed in about 40% of 32 warps. Good coalescing. Erik Saule (UNCC) GPU Centrality GTC 2015 18 / 24

  19. Impact on Runtime 1 0.9 0.8 Normalized Time 0.7 0.6 0.5 Amazon 0.4 Gowalla Google NotreDame 0.3 WikiT alk Orkut LiveJournal 0.2 1 2 4 8 16 32 64 128 B Erik Saule (UNCC) GPU Centrality GTC 2015 19 / 24

  20. Outline Introduction 1 Decomposition for GPU 2 An SpMM-based approach 3 Conclusion 4 Erik Saule (UNCC) GPU Centrality GTC 2015 20 / 24

  21. On other architecture? Betweenness Centrality 1200 CPU-SNAP CPU-Ligra CPU-BC 1000 GPU-VirBC GPU-VirBC-Multi 800 MTEPS 600 400 200 0 Amazon Gowalla Google NotreDame WikiT alk Orkut LiveJournal O ( DE ) algorithms (GPU-) is unsuitable for NotreDame because of its high diameter. CPU: 2 Intel Sandybridge EP (2x8 cores) Erik Saule (UNCC) GPU Centrality GTC 2015 21 / 24

  22. On other architecture? Closeness Centrality 90 CPU-DO CPU-SpMM 80 PHI-DO PHI-SpMM 70 GPU-VirCC GPU-SpMM 60 50 GTEPS 40 30 20 10 0 Amazon Gowalla Google NotreDame WikiT alk Orkut LiveJournal CPU: 2 Intel Sandybridge EP (2x8 cores) PHI: Intel Xeon Phi 5120 Erik Saule (UNCC) GPU Centrality GTC 2015 22 / 24

  23. Conclusion Centrality Multiple traversals Betweenness and Closeness Centrality are Centrality requires graph traversal from computed using multiple Breadth First many different sources. Search traversal. Threads of a warp can be set to process different traversal for the same Graph representation for GPU decomposition. Vertex Centric Provided a vertex is used in the same Edge Centric level in multiple traversals, all the Virtual Vertex memory accesses can be coalesced. Coalesced Virtual Vertex Improves performance by a factor of 70x. Adapts to CPU architecture for similar Determine parallelism but also memory effects. access patterns and thread divergence. Erik Saule (UNCC) GPU Centrality GTC 2015 23 / 24

  24. Thank you Other centrality works (with Sarıy¨ uce, Kaya and C ¸ataly¨ urek) Compression using graph properties (SDM 2013) GPU optimization (GPGPU 2013) Incremental algorithm (BigData 2013) Distributed memory incremental framework (Cluster 2013, ParCo 2015) Regularized memory accesses for CPU, GPU, Xeon Phi (MTAAP 2014, JPDC 2015) More information Contact : esaule@uncc.edu Visit: http://webpages.uncc.edu/~esaule Erik Saule (UNCC) GPU Centrality GTC 2015 24 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend