An Adaptive Parallel Algorithm for Computing Connectivity Chirag - PowerPoint PPT Presentation

An Adaptive Parallel Algorithm for Computing Connectivity Chirag Jain, Patrick Flick, Tony Pan, Oded Green, Srinivas Aluru SIAM Workshop on Combinatorial Scientific Computing (CSC16) October 10, 2016 1

Introduction Methods Experiments Connected Components • Finding connected components is at the heart of many graph applications. • Sequentially, we have linear time O(|E|) solutions. • Union-find G(V,E) • BFS / DFS 2

Introduction Methods Experiments Scaling to Large Graphs Sequencing machines • Sizes of graph datasets continue generate ~ 10 9 DNA to grow in multiple scientific reads in 1 day domains • Bioinformatics : Metagenomics de-Bruijn graphs • Iowa Prairie (3.3B reads) - JGI > 10 9 content uploads in 1 day • Social networks, WWW • We need method that scales to graphs with billions/trillion of edges • irrespective of graph topology 3

Introduction Methods Experiments Background A. Parallel connectivity algorithms source 1. Parallel BFS 2. Shiloach-Vishkin PRAM algorithm (SV) B. Recent prior work Buluç and Madduri “Parallel breadth-first search …” SC 11 Beamer et. al. "Distributed memory breadth-first search revisited …” IPDPSW 13 4

Introduction Methods Experiments Background A. Parallel connectivity algorithms 1. Parallel BFS 2. Shiloach-Vishkin PRAM algorithm (SV) B. Recent prior work Shiloach and Vishkin “An O(log n) parallel connecLvity algorithm” 1982 5

Introduction Methods Experiments Background Label PropagaLon Shiloach-Vishkin A. Parallel connectivity algorithms 1. Parallel BFS Pointer jumping for 2. Shiloach-Vishkin O( |V| ) iterations faster convergence PRAM algorithm (SV) → O( |E| . |V| ) work O( log |V| ) iterations → O( |E| log |V| ) work B. Recent prior work Shiloach and Vishkin “An O(log n) parallel connecLvity algorithm” 1982 6

Introduction Methods Experiments Background A. Parallel connectivity algorithms Part of popular graph analysis frameworks : GraphX, PowerLyra, PowerGraph 1. Parallel BFS 2. Shiloach-Vishkin PRAM Parallel 1 Parallel BFS G(V,E) Label algorithm (SV) iteration Propagation B. Recent prior work Multistep algorithm Slota et. al. “A Case Study of Complex Graph Analysis …” IPDPS 2016 Slota et al. “BFS and coloring-based parallel … IPDPS 2014 7

Introduction Methods Experiments Contributions 1. Novel edge-based adaptation of Shiloach-Vishkin algorithm for distributed memory parallel systems. 2. Fast heuristic to guide algorithm selection at run-time . Parallel BFS 2 G(V,E) Parallel SV 1 Flick et. al. “A parallel connecLvity algorithm …” SC 15 8

Introduction Methods Experiments Parallel SV algorithm u • Initialization v 2 v 1 • We work with an array of tuples (call it A ) to keep partition v 1 v 2 v 1 v 2 u u id of each vertex. • O( |V| ) partitions at Current partition id beginning u v 1 v 2 u v 1 v 2 Vertex ids • Size of A :   O(|V| + |E|) v 1 u v 2 u v 1 u v 2 u 9

Introduction Methods Experiments Parallel SV algorithm u • Initialization v 2 v 1 • We work with an array of tuples (call it A ) to keep partition v 1 v 2 v 1 v 2 u u id of each vertex. • O(|V|) partitions at Current partition id beginning u v 1 v 2 u v 1 v 2 Vertex ids • Size of A :   O(|V| + |E|) v 1 u v 2 u v 1 u v 2 u , < 10

Introduction Methods Experiments Parallel SV algorithm u u Current partition id u u u Vertex ids • vertex ‘u’ is member of which all partition ids? • Sort A by ‘vertex id’ layer 11

Introduction Methods Experiments Parallel SV algorithm u u v w u v w Current partition id Current partition id u v w u u u Vertex ids • vertex ‘u’ is member of which all • Which all vertices are member partition ids? of partition ? • Sort A by ‘vertex id’ layer • Sort A by ‘partition id’ layer 12

Introduction Methods Experiments Parallel SV algorithm u u v w u v w Current partition id Current partition id u v w u u u Vertex ids • vertex ‘u’ is member of which all • Which all vertices are member partition ids? of partition ? • Sort A by ‘vertex id’ layer • Sort A by ‘partition id’ layer 13

Introduction Methods Experiments Parallel SV algorithm • In our implementation, we use parallel sample sort. • Custom reduction operations to efficiently compute minimums. • Additional details: Check our preprint • pointer jumping • detect convergence of small components early, load balance • Runtime : 14

Introduction Methods Experiments Contributions 1. Novel edge-based adaptation of Shiloach-Vishkin algorithm for distributed memory parallel systems. 2. Fast heuristic to guide algorithm selection at run-time . Parallel BFS 2 G(V,E) Parallel SV 1 Flick et. al. “A parallel connecLvity algorithm …” SC 15 15

Introduction Methods Experiments Dynamic hybrid method • Parallel BFS is close to work efficient for a giant small world graph component. • Efficiency is lost when : • Large number of small components • Large diameter of a graph component • How to decide which algorithm to choose at runtime? 16

Introduction Methods Experiments Dynamic hybrid method Compute degree distribution of input graph Curve fits power- Yes law distribution? 1 BFS iteration No Run Parallel-SV on remaining graph 17

Introduction Methods Experiments Experimental Setup • Software : C++14, MPI, CombBLAS library for parallel BFS • Hardware : Cray XC30 (Edison) at Lawrence Berkeley National Laboratory • 5,576 nodes, each with 2 x 12-core Intel Ivy processors and 64 GB RAM • 1 MPI process per physical core • Timing : • Exclude graph construction and I/O time • Profiling starts after having block-distributed list of edges in memory Buluç and Gilbert “The Combinatorial BLAS: Design …” IJHPCA 2011 18

Introduction Methods Experiments Datasets 19

Introduction Methods Experiments Datasets Small world graphs 20

Introduction Methods Experiments Datasets Small world graphs Large diameter graph 21

Introduction Methods Experiments Datasets Large number of components Small world graphs Large diameter graph 22

Introduction Methods Experiments Dynamic Approach Timings against opposite choice, using 2K cores Method 4.0x 60 Dynamic 0.9x Static (Opp. Choice) Time (sec) 40 Time (sec) 1.2x 4.7x 3.7x 20 3.6x 4.1x 1.2x 0 M1 M2 M3 G1 G2 G3 K1 K2 Datasets Run BFS? Graphs 23

Introduction Methods Experiments Dynamic Approach Proportion of time spent in prediction (using 2K cores) Method 60 Dynamic Static (Opp. Choice) Time (sec) 40 Time (sec) Proportion of time 20 0 M1 M2 M3 G1 G2 G3 K1 K2 Datasets Run BFS? Graphs 24

Introduction Methods Experiments Strong Scalability 300 • Maximum speedup of ~8x 200 Time (sec) using 4096 cores (Ideal :16x) Time (sec) 100 • Sorting benchmark with 2B integers achieves 8.06x ● ● ● ● ● ● ● ● ● ● 0 speedup as well. Dataset G1 ● 7.5 G2 Speedup ● ● G3 Speedup K1 5.0 Timings for the largest graph M4 ● ● M1 M2 ● ● 2.5 ● ● ● ● 256 512 1024 2048 4096 Number of cores (log scale) Number of cores (log scale) 25

Introduction Methods Experiments v/s Multistep method Method 24x Our method 75 Multistep 2.1x 1.1x Time (sec) Time (sec) 50 2.7x 25 1.1x 1.9x 0.9x 1.1x 0 M1 M2 M3 G1 G2 G3 K1 K2 Diameter 4K 4K 2K 16 17 25K 9 9 Datasets Graphs 26

Introduction Methods Experiments v/s Best sequential method • Performance comparison against Rem’s algorithm (based on union-find) • Using small graphs that fit in single node (64 GB RAM) E. W. Dijkstra, A discipline of programming. 1976 27

Conclusions 1. Efficient distributed memory parallel connectivity algorithm based on Shiloach-Vishkin approach. 2. Propose heuristic to guide algorithm selection at runtime. 3. Efficient as well as generic, scales on a variety of large graphs. 4. Significant performance gains against previous state- of-the-art, particularly in case of large diameter graphs. 28

Thank you! arxiv.org/abs/1607.06156 cjain @ gatech.edu github.com/ParBLiSS/ parconnect Reproducibility IniLaLve Award

An Adaptive Parallel Algorithm for Computing Connectivity Chirag - PowerPoint PPT Presentation

An Adaptive Parallel Algorithm for Computing Connectivity Chirag Jain, Patrick Flick, Tony Pan, Oded Green, Srinivas Aluru SIAM Workshop on Combinatorial Scientific Computing (CSC16) October 10, 2016 1 Introduction Methods Experiments

Connectivity Corollary. GRAPH CONNECTIVITY is not FO definable Connectivity Corollary. GRAPH

Average Connectivity and Average Edge-connectivity in Graphs Suil O joint work with Jaehoon Kim

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Cuts and Connectivity Cuts and Connectivity CSE, IIT KGP Vertex Cut and Connectivity Vertex Cut

Parallel Computing the Why and the How Albert-Jan Yzelman February, 2010 Albert-Jan Yzelman

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

A NEW STANDARD FOR IOT CONNECTIVITY Managed connectivity services for Internet of Things

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &

Welcome Inge Li Grtz. Reverse teaching and discussion of exercises: 3 teaching

e . zero (it is instantaneously at rest). 3 FE Review-Dynamics Wheel OA is rotating

Lecture 3: Monte Carlo and Generalization CS234: RL Emma Brunskill Spring 2017 Much of the

Challenges for Socially-Beneficial AI Daniel S. Weld University of Washington Outline

1,000 Web Pages 12 Languages 50% Faster? Presenter Name: Sinclair Morgan Presenter Title: iMT

Analysis of Large Scale Visual Recogni4on Fei-Fei Li and

Heavy Flavor and Jet Production at LHCb Mike Williams on behalf of the LHCb Collaboration

How To Keep Your Head Above Water While Detecting Errors Ignacio Laguna, Fahad A. Arshad, David

An Adaptive Parallel Algorithm for Computing Connectivity Chirag - PowerPoint PPT Presentation

An Adaptive Parallel Algorithm for Computing Connectivity Chirag Jain, Patrick Flick, Tony Pan, Oded Green, Srinivas Aluru SIAM Workshop on Combinatorial Scientific Computing (CSC16) October 10, 2016 1 Introduction Methods Experiments

Connectivity Corollary. GRAPH CONNECTIVITY is not FO definable Connectivity Corollary. GRAPH

Average Connectivity and Average Edge-connectivity in Graphs Suil O joint work with Jaehoon Kim

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Cuts and Connectivity Cuts and Connectivity CSE, IIT KGP Vertex Cut and Connectivity Vertex Cut

Parallel Computing the Why and the How Albert-Jan Yzelman February, 2010 Albert-Jan Yzelman

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

A NEW STANDARD FOR IOT CONNECTIVITY Managed connectivity services for Internet of Things

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &amp;

Welcome Inge Li Grtz. Reverse teaching and discussion of exercises: 3 teaching

e . zero (it is instantaneously at rest). 3 FE Review-Dynamics Wheel OA is rotating

Lecture 3: Monte Carlo and Generalization CS234: RL Emma Brunskill Spring 2017 Much of the

Challenges for Socially-Beneficial AI Daniel S. Weld University of Washington Outline

1,000 Web Pages 12 Languages 50% Faster? Presenter Name: Sinclair Morgan Presenter Title: iMT

Analysis of Large Scale Visual Recogni4on Fei-Fei Li and

Heavy Flavor and Jet Production at LHCb Mike Williams on behalf of the LHCb Collaboration

How To Keep Your Head Above Water While Detecting Errors Ignacio Laguna, Fahad A. Arshad, David

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &