Efficient Batched Distance and Centrality Computation in Unweighted - - PowerPoint PPT Presentation

efficient batched distance and centrality computation in
SMART_READER_LITE
LIVE PREVIEW

Efficient Batched Distance and Centrality Computation in Unweighted - - PowerPoint PPT Presentation

Efficient Batched Distance and Centrality Computation in Unweighted and Weighted Graphs Manuel Then, Stephan Gnnemann, Alfons Kemper, Thomas Neumann Technische Universitt Mnchen Chair for Database Systems Graph Centrality Goal : Find


slide-1
SLIDE 1

Manuel Then, Stephan Günnemann, Alfons Kemper, Thomas Neumann Technische Universität München Chair for Database Systems

Efficient Batched Distance and Centrality Computation in Unweighted and Weighted Graphs

slide-2
SLIDE 2

Goal: Find the most central vertices

  • Influencers in social networks
  • Critical routers in computer networks

Centrality measures

  • Degree: degree centrality, PageRank
  • Distances: closeness centrality
  • Paths: betweenness centrality

Challenges

  • Algorithmic complexity
  • Random data access
  • Redundant computation, hard to vectorize

2 Manuel Then | Efficient Batched Distance and Centrality Computation

Graph Centrality

slide-3
SLIDE 3

Unweighted closeness centrality build on BFSs Goal: Run multiple BFSs concurrently and share common traversals

3 Manuel Then | Efficient Batched Distance and Centrality Computation

Challenges Visualized

slide-4
SLIDE 4

BFS traversals using bit operations ∀v∈V: ∀n∈neighbors(v): next[n] = visit[v] & ~seen[n] Used to win SIGMOD 2014 programming contest

4 Manuel Then | Efficient Batched Distance and Centrality Computation

Background: Multi-Source BFS

[1] Then et al., The More the Merrier: Efficient Multi-source Graph Traversal, VLDB 2015 [2] Kaufmann et al., Parallel Array-Based Single- and Multi-Source Breadth First Searches on Large Dense Graphs, EDBT 2017

slide-5
SLIDE 5

Motivation: Graph Centrality Background: MS-BFS Centrality in unweighted graphs Centrality in weighted graphs Evaluation Summary and Future Work

5 Manuel Then | Efficient Batched Distance and Centrality Computation

Overview

slide-6
SLIDE 6

Distance-based centrality metric

  • Central vertices have a low average geodesic distance to all other vertices

MS-BFS from all vertices

  • No need to store distances

Efficient batch incrementer

  • Significantly improves the performance of counting discovered vertices

6 Manuel Then | Efficient Batched Distance and Centrality Computation

Unweighted Closeness Centrality

slide-7
SLIDE 7

Path-based centrality metric

  • Central vertices are part of many shortest paths

Naïve computation very costly. We use Brandes’s algorithm Forward step can leverage MS-BFS

  • Batching improves locality
  • Allows vectorization of numeric computations

Challenges: Backward step requires

  • Reverse MS-BFS
  • Vertex predecessor calculation

7 Manuel Then | Efficient Batched Distance and Centrality Computation

Unweighted Betweenness Centrality

[3] Brandes, A Faster Algorithm for Betweenness Centrality, Journal of Mathematical Sociology, 2001

slide-8
SLIDE 8

Reverse BFS: traverse graph in inverse BFS order

  • Stacks unsuited for MS-BFS

Reconstruct traversal order forward iteration frontiers Batched vertex predecessor computation Correctness proof and full batched betweenness centrality algorithm in the paper

8 Manuel Then | Efficient Batched Distance and Centrality Computation

Reverse MS-BFS and Vertex Predecessors

slide-9
SLIDE 9

Motivation: Graph Centrality Background: MS-BFS Centrality in unweighted graphs Centrality in weighted graphs Evaluation Summary and Future Work

9 Manuel Then | Efficient Batched Distance and Centrality Computation

Overview

slide-10
SLIDE 10

Problem: MS-BFS cannot be used for distance computation in weighted graphs Batched Algorithm Execution

  • Run algorithm from multiple vertices at the same time
  • Synchronize algorithm executions
  • Share common computations and data accesses
  • Adapt memory layout

10 Manuel Then | Efficient Batched Distance and Centrality Computation

Batched Algorithm Execution

slide-11
SLIDE 11

Batched Bellman-Ford algorithm Weighted all pairs shortest path Batched algorithm execution

  • … improves temporal and spatial locality
  • … facilitates vectorized computation

11 Manuel Then | Efficient Batched Distance and Centrality Computation

Batched Algorithm Execution: Example

Non-batched execution Batched execution

slide-12
SLIDE 12

Comparison of common weighted distance algorithms:

  • Kronecker, 5 weights

Kronecker, 10 weights Kronecker, 100 weights LDBC, 5 weights LDBC, 10 weights LDBC, 100 weights 100 10 k 1 M 100 10 k 1 M 10 k 1 M 10 k 1 M 10 k 1 M

Graph size (number of vertices) Runtime (in milliseconds) Execution

  • Batched

Non−batched

Algorithm

  • Bellman−Ford

Dijkstra

12 Manuel Then | Efficient Batched Distance and Centrality Computation

Batched Weighted Distances

slide-13
SLIDE 13

Closeness Centrality

  • Batched execution allows vectorizing the CC computation from the distances

Betweenness Centrality

  • Requires global distance ordering
  • Implicit predecessor computation
  • Vectorized numeric computations

13 Manuel Then | Efficient Batched Distance and Centrality Computation

Weighted Centralities

slide-14
SLIDE 14

Motivation: Graph Centrality Background: MS-BFS Centrality in unweighted graphs Centrality in weighted graphs Evaluation Summary and Future Work

14 Manuel Then | Efficient Batched Distance and Centrality Computation

Overview

slide-15
SLIDE 15

Algorithms implemented as stand-alone programs

  • C++14, GCC 5.2.1
  • No framework dependencies

Synthetic datasets

  • LDBC Social Network friendships graph
  • Kronecker graph, edge factor 32

Real-world datasets

  • Citeseer (384k verts), DBLP (1.3M verts), Wikipedia (1.9M verts), and Hudong (3M verts)
  • KONECT repository

Evaluated on dual Intel Xeon E5-2660 v2, 20x 2.2GHz, 256GB

15 Manuel Then | Efficient Batched Distance and Centrality Computation

Evaluation: Setup

slide-16
SLIDE 16
  • Closeness Centrality, Unweighted

Closeness Centrality, Weighted Betweenness Centrality, Unweighted Betweenness Centrality, Weighted 1 2 5 10 1 2 5 10 1 4 8 16 32 64 128 256 1 4 8 16 32 64 128 256

Number of concurrent executions Batched algorithm execution speedup Dataset

  • LDBC 100

Kronecker S21 Citeseer DBLP Hudong Wikipedia

16 Manuel Then | Efficient Batched Distance and Centrality Computation

Evaluation: Number of Concurrent Executions

slide-17
SLIDE 17
  • LDBC, Unweighted

LDBC, Weighted 10 k 100 k 1 M 10 k 100 k 1 M 1 2 5 10

Graph size (number of vertices) Batched algorithm execution speedup Algorithm

  • Closeness Centrality

Betweenness Centrality

  • vs. Brandes's BC

WeightCount

  • 1

10

17 Manuel Then | Efficient Batched Distance and Centrality Computation

Evaluation: Graph Size Scalability

slide-18
SLIDE 18
  • LDBC, Weighted

10 k 100 k 1 M 1 2 5

Graph size (number of vertices) Batched algorithm execution speedup Algorithm

  • Closeness Centrality

Betweenness Centrality

WeightCount

  • 5

10 100

18 Manuel Then | Efficient Batched Distance and Centrality Computation

Evaluation: Number of Edge Weights

slide-19
SLIDE 19

Batched algorithm execution

  • Shares common data accesses,
  • Avoids/vectorizes computations, and
  • Significantly reduces graph algorithm execution times

Improved centrality computation performance

  • Unweighted by up to 20x (closeness) and 6x (betweenness)
  • Weighted by up to 7x (closeness) and 3x (betweenness)

Details and all algorithms are listed in the paper Future work: Apply batched execution to further classes of algorithms

19 Manuel Then | Efficient Batched Distance and Centrality Computation

Summary