A Round-Efficient Distributed Betweenness Centrality Algorithm Loc - PowerPoint PPT Presentation

A Round-Efficient Distributed Betweenness Centrality Algorithm Loc Hoang , Matteo Pontecorvi, Roshan Dathathri, Gurbinder Gill, Bozhi You, Keshav Pingali, and Vijaya Ramachandran 1

Betweenness Centrality Betweenness Centrality (BC) used to determine relative importance of node in graph Applications Key actor detection in terrorist nets Disease studies Power grid analysis River flow confluence Distributed implementations necessary Large graphs with billions of nodes/edges BC takes hours to complete even if approximating Figure Credit: Claudio Rocchini, Creative Commons Attribution 2.5 Generic 2

Betweenness Centrality Definition B BC: fraction of shortest paths in which A D E node appears C Example: consider the 2 shortest paths from A to E: 1 1 2 B appears in 1: 2 ; C appears in 1: 2 ; D appears in 2: 2 = 1 3

Betweenness Centrality Definition B BC: fraction of shortest paths in which A D E node appears C Example: consider the 2 shortest paths from A to E: 1 1 2 B appears in 1: 2 ; C appears in 1: 2 ; D appears in 2: 2 = 1 σ st , number of shortest paths from s to t ; σ st ( v ), number of shortest paths from s to t passing through v , v � = s � = t . Betweenness Centrality (BC) σ st ( v ) BC ( v ) = � σ st s � = t � = v From definition: about n 3 operations ( n is number of vertices) 3

Brandes Betweenness Centrality B Shortest-path DAG with shortest path counts rooted at node s : A D E propagate dependencies ( δ s • ) along C DAG predecessors 4

Brandes Betweenness Centrality B Shortest-path DAG with shortest path counts rooted at node s : A D E propagate dependencies ( δ s • ) along C DAG predecessors BC from Dependencies of a Node BC ( v ) = � s � = v δ s • ( v ) δ s • ( v ) = � σ sw · (1 + δ s • ( w )) σ sv where w : v ∈ P s ( w ) P s ( w ) are predecessors of w in DAG Brandes BC [1]: sum dependencies from all DAGs: O ( nm ) operations ( m is number of edges) All-pairs shortest paths (APSP) or k-source shortest paths (k-SSP, shortest paths for subset of k nodes) to find DAGs 4 [1] U. Brandes. A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology 2001.

Related APSP and BC Work APSP O ( n ) round undirected, unweighted APSP algorithms [2,3,4] Lenzen-Peleg: prior best unweighted APSP BC Asynchronous Brandes BC (ABBC): asynchronous, shared-memory [5] Maximal Frontier BC (MFBC): distributed, sparse-matrix Brandes BC [6] Hua et al.: distributed BC for undirected, unweighted graphs [7] [2] S. Holzer and R. Wattenhofer. Optimal Distributed All Pairs Shortest Paths and Applications. PODC 2012. [3] D. Peleg, L. Roditty, and E. Tal. Distributed Algorithms for Network Diameter and Girth. ICALP 2012. [4] C. Lenzen and D. Peleg. Efficient Distributed Source Detection with Limited Bandiwidth. PODC 2013 [5] D. Prountzos and K. Pingali. Betweenness centrality: algorithms and implementations. PPoPP’13. [6] E. Solomonik, M. Besta, F. Vella, and T. Hoefler. Scaling Betweenness Centrality Using Communication-efficient Sparse Matrix Multiplication. [7] Q. S. Hua, H. Fan, M. Ai, L. Qian, Y. Li, X. Shi, and X. Jin. Nearly Optimal Distributed Algorithm for Computing Betweenness Centrality. ICDCS 2016. 5

Motivation for Our Work Practical implementations of theoretical, distributed O(n)-round APSP/BC algorithms do not exist Existing distributed BC mainly use SSSP/k-SSP with Brandes BC High amount of bulk-synchronous parallel (BSP) rounds with expensive communication barriers 6

Tradeoff exploration: decreasing number of rounds at cost of increasing computation per round 7

Our Contributions: Theory Min-Rounds APSP and Min-Rounds Betweenness Centrality (MRBC) for directed and undirected unweighted graphs CONGEST: (known) n nodes, m edges, diameter D : APSP in min ( n + O ( D ) , 2 n ) rounds and mn + O ( m ) messages 8

Our Contributions: Theory Min-Rounds APSP and Min-Rounds Betweenness Centrality (MRBC) for directed and undirected unweighted graphs CONGEST: (known) n nodes, m edges, diameter D : APSP in min ( n + O ( D ) , 2 n ) rounds and mn + O ( m ) messages In systems that detect termination: k -SSP in at most k + H rounds and m · k messages, H is largest finite shortest path distance for the k sources 8

Our Contributions: Theory Min-Rounds APSP and Min-Rounds Betweenness Centrality (MRBC) for directed and undirected unweighted graphs CONGEST: (known) n nodes, m edges, diameter D : APSP in min ( n + O ( D ) , 2 n ) rounds and mn + O ( m ) messages In systems that detect termination: k -SSP in at most k + H rounds and m · k messages, H is largest finite shortest path distance for the k sources BC: at most twice the rounds/messages as APSP/k-SSP 8

Our Contributions: Practice MRBC implementation in D-Galois[8] with communication optimization exploiting MRBC properties MRBC evaluation 3 × faster than prior state-of-the-art MFBC 2 . 8 × speedup over Brandes BC on high diameter graphs [8] R. Dathathri, G. Gill, L. Hoang, H.V. Dang, A. Brooks, N. Dryden, M. Snir, K. Pingali. Gluon: A Communication-Optimizing Substrate for Distributed Heterogeneous Graph Analytics. PLDI 2018. 9

Outline 1 Introduction 2 MRBC Min-Rounds APSP Min-Rounds BC D-Galois Model and Delayed Synchronization 3 Evaluation 4 Conclusion 10

CONGEST Model for Distributed Algorithms Machines are nodes, edges are communication channels Send message (constant number of words) per round to do updates 2 1 2 1 3 6 3 6 4 5 4 5 11

k-SSP Example: Initial State C Left: Initial State of k -SSP F A where k = 2 sources A and B (0, A) Vertices store current distance D from a source to self in lexicographically sorted vector G B Every round, vertex chooses 1 (0, B) (distance, source) pair to send E along outgoing edges (distance, sourceID) 12

APSP: When To Send A Pair? Problem: sent distance may not be final distance associated with source 13

APSP: When To Send A Pair? Problem: sent distance may not be final distance associated with source Min-Rounds APSP New Insight: Message Send Rule Send unsent distance d with position p on sorted vector with corresponding source in round r if p + d = r Like Dijkstra: sends only final distance Resulting algorithm pipelines messages: orchestrates updates across edges and reduces amount of messages sent 13

k-SSP Example: Round 1 C Message Send Rule (1, A) (1, A) F A Send unsent distance d with position (1, A) p on sorted vector with (0, A) D corresponding source in round r if (1, A) (1, B) p + d = r (1, B) G B (1, B) Example: (0 , A ) chosen because (0, B) E 0 + 1 (1 is position on vector) equals round 1 (1, B) (distance, sourceID) 14

k-SSP Example: Round 2 C (1, A) F A (2, A) (2, A) (2, A) (0, A) D (2, A) (1, A) (1, B) G B (2, A) (2, B) (0, B) (2, B) E (1, B) (distance, sourceID) 15

k-SSP Example: Round 3 C (1, A) F A (2, B) (2, A) (0, A) (2, B) D (2, B) (1, A) (1, B) G B (2, A) (0, B) (2, B) E (1, B) (distance, sourceID) 16

k-SSP Example: Round 4 (Final) C (1, A) F A (2, A) (0, A) (2, B) D (1, A) (1, B) G B (2, A) (0, B) (2, B) E (1, B) (distance, sourceID) 17

APSP for Brandes BC Min-Rounds APSP as subroutine for Brandes BC backward accumulation Three Additions to APSP Send shortest path count with distance/source ID in APSP Timestamp round number in which message is sent Track predecessors of shortest path DAG for each source 18

Min-Rounds BC: Reversing Global Delays Insight: leverage saved timestamps, send final values C C (1, A, 1, 0),2 (1, A, 1, 0),3 F F A A (2, A, 1, 0),3 (2, A, 1, 0),2 (0, A, 1, _),1 (2, B, 1, 0),4 (0, A, 1, _),4 (2, B, 1, 0),1 D D → (1, A, 1, 0),2 (1, A, 1, 0),3 (1, B, 1, 0),3 (1, B, 1, 0),2 G G B B (2, A, 1, 0),3 (2, A, 1, 0),2 (0, B, 1, _),1 (0, B, 1, _),4 (2, B, 2, 0),4 (2, B, 2, 0),1 E E (1, B, 1, 0),2 (1, B, 1, 0),3 (distance, sourceID, #shortpaths, dependency), sentround (distance, sourceID, #shortpaths, dependency),sendround Timestamp Pipelining By Reversing Global Delay Send source’s dependency value to predecessors in source’s DAG in reverse round order : total rounds + 1 - timestamp 19

Backward Accumulation: Round 1 Brandes formulation to propagate finalized dependencies C (1, A, 1, 0),3 F A (2, A, 1, 0),2 B, 1 (0, A, 1, _),4 (2, B, 1, 0),1 D B, 0.5 (1, A, 1, 0),3 (1, B, 1, 1.5),2 G B (2, A, 1, 0),2 (0, B, 1, _),4 (2, B, 2, 0),1 B, 0.5 E (1, B, 1, 0.5),3 (distance, sourceID, #shortpaths, dependency),sendround 20

Backward Accumulation: Round 2 C (1, A, 1, 0),3 F A A, 1 (2, A, 1, 0),2 (0, A, 1, _),4 (2, B, 1, 0),1 D A, 1 (1, A, 1, 2),3 B, 2.5 (1, B, 1, 1.5),2 G B (2, A, 1, 0),2 (0, B, 1, _),4 (2, B, 2, 0),1 E (1, B, 1, 0.5),3 (distance, sourceID, #shortpaths, dependency),sendround 21

A Round-Efficient Distributed Betweenness Centrality Algorithm Loc - PowerPoint PPT Presentation

A Round-Efficient Distributed Betweenness Centrality Algorithm Loc Hoang , Matteo Pontecorvi, Roshan Dathathri, Gurbinder Gill, Bozhi You, Keshav Pingali, and Vijaya Ramachandran 1 Betweenness Centrality Betweenness Centrality (BC) used to

Array Based Betweenness Centrality Eric Robinson Northeastern University MIT Lincoln Labs

Effective Evaluation of Betweenness Centrality on Multi-GPU systems Massimo Bernaschi 1 ,

Maximum Betweenness Centrality: Approximability and Tractable Cases Martin Fink and Joachim

Degree centrality Network Analysis in Python I Important nodes Which nodes are important?

Scaling Betweenness Centrality using Communication-Efficient Sparse Matrix Multiplication Edgar

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or

Betweenness centrality on 1-dimensional periodic graphs Norie Fu, Vorapong Suppakitpaisarn June

14: Clique Finding Machine Learning and Real-world Data (MLRD) Ann Copestake (based on slides

13: Betweenness Centrality Machine Learning and Real-world Data (MLRD) Ann Copestake (based on

Computing Betweenness Centrality in Link Streams Cl emence Magnien joint work with Fr ed

14: Clique Finding Machine Learning and Real-world Data Ann Copestake and Simone Teufel Computer

13: Betweenness Centrality Machine Learning and Real-world Data Ann Copestake and Simone Teufel

14: Clique Finding Machine Learning and Real-world Data (MLRD) Ryan Cotterell (based on slides

REDEFINING CENTRALITY Redefining Centrality Overview - Regional Integration - Global and Local

Centrality Argimiro Arratia & R. Ferrer-i-Cancho Universitat Polit` ecnica de Catalunya Version

Centrality Social and Technological Networks Rik Sarkar University of Edinburgh, 2017.

CSCE 471/871 Lecture 5: Phylogenetic Trees Building Phylogenetic Trees Stephen Scott

Nonresponse Bias J. Michael Brick, Westat Roger Tourangeau, Westat Adaptive Survey Design

ALICE Grid operations: last year and perspectives (+ some general remarks) ALICE T1/T2 workshop

Lower-Stretch Spanning Trees Presenter: Yajun Wang COMP670P 1-1 Introduction Graph Embedding

VLSI Testing Power Aware Serial Scan Virendra Singh Associate Professor C omputer A rchitecture

Non-Recursive In-Place FFT Algorithm Idea: "Unwind the in-place recursive algorithm and work

CSC 369: Distributed Computing Alex Dekhtyar April 17 Day 6: The Algebra Of Data Transformations

On Graph Rewriting, Reduction and Evaluation Ian Zerny Department of Computer Science, Aarhus

A Round-Efficient Distributed Betweenness Centrality Algorithm Loc - PowerPoint PPT Presentation

A Round-Efficient Distributed Betweenness Centrality Algorithm Loc Hoang , Matteo Pontecorvi, Roshan Dathathri, Gurbinder Gill, Bozhi You, Keshav Pingali, and Vijaya Ramachandran 1 Betweenness Centrality Betweenness Centrality (BC) used to

Array Based Betweenness Centrality Eric Robinson Northeastern University MIT Lincoln Labs

Effective Evaluation of Betweenness Centrality on Multi-GPU systems Massimo Bernaschi 1 ,

Maximum Betweenness Centrality: Approximability and Tractable Cases Martin Fink and Joachim

Degree centrality Network Analysis in Python I Important nodes Which nodes are important?

Scaling Betweenness Centrality using Communication-Efficient Sparse Matrix Multiplication Edgar

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or

Betweenness centrality on 1-dimensional periodic graphs Norie Fu, Vorapong Suppakitpaisarn June

14: Clique Finding Machine Learning and Real-world Data (MLRD) Ann Copestake (based on slides

13: Betweenness Centrality Machine Learning and Real-world Data (MLRD) Ann Copestake (based on

Computing Betweenness Centrality in Link Streams Cl emence Magnien joint work with Fr ed

14: Clique Finding Machine Learning and Real-world Data Ann Copestake and Simone Teufel Computer

13: Betweenness Centrality Machine Learning and Real-world Data Ann Copestake and Simone Teufel

14: Clique Finding Machine Learning and Real-world Data (MLRD) Ryan Cotterell (based on slides

REDEFINING CENTRALITY Redefining Centrality Overview - Regional Integration - Global and Local

Centrality Argimiro Arratia &amp; R. Ferrer-i-Cancho Universitat Polit` ecnica de Catalunya Version

Centrality Social and Technological Networks Rik Sarkar University of Edinburgh, 2017.

CSCE 471/871 Lecture 5: Phylogenetic Trees Building Phylogenetic Trees Stephen Scott

Nonresponse Bias J. Michael Brick, Westat Roger Tourangeau, Westat Adaptive Survey Design

ALICE Grid operations: last year and perspectives (+ some general remarks) ALICE T1/T2 workshop

Lower-Stretch Spanning Trees Presenter: Yajun Wang COMP670P 1-1 Introduction Graph Embedding

VLSI Testing Power Aware Serial Scan Virendra Singh Associate Professor C omputer A rchitecture

Non-Recursive In-Place FFT Algorithm Idea: &quot;Unwind the in-place recursive algorithm and work

CSC 369: Distributed Computing Alex Dekhtyar April 17 Day 6: The Algebra Of Data Transformations

On Graph Rewriting, Reduction and Evaluation Ian Zerny Department of Computer Science, Aarhus

Centrality Argimiro Arratia & R. Ferrer-i-Cancho Universitat Polit` ecnica de Catalunya Version

Non-Recursive In-Place FFT Algorithm Idea: "Unwind the in-place recursive algorithm and work