An Adaptive Parallel Algorithm for Computing Connectivity
Chirag Jain, Patrick Flick, Tony Pan, Oded Green, Srinivas Aluru
1
SIAM Workshop on Combinatorial Scientific Computing (CSC16) October 10, 2016
An Adaptive Parallel Algorithm for Computing Connectivity Chirag - - PowerPoint PPT Presentation
An Adaptive Parallel Algorithm for Computing Connectivity Chirag Jain, Patrick Flick, Tony Pan, Oded Green, Srinivas Aluru SIAM Workshop on Combinatorial Scientific Computing (CSC16) October 10, 2016 1 Introduction Methods Experiments
Chirag Jain, Patrick Flick, Tony Pan, Oded Green, Srinivas Aluru
1
SIAM Workshop on Combinatorial Scientific Computing (CSC16) October 10, 2016
components is at the heart of many graph applications.
time O(|E|) solutions.
G(V,E)
2 Introduction Methods Experiments
to grow in multiple scientific domains
de-Bruijn graphs
graphs with billions/trillion of edges
Sequencing machines generate ~109 DNA reads in 1 day > 109 content uploads in 1 day
3 Introduction Methods Experiments
4
algorithms
algorithm (SV)
Buluç and Madduri “Parallel breadth-first search …” SC 11 Beamer et. al. "Distributed memory breadth-first search revisited …” IPDPSW 13
source
Introduction Methods Experiments
5
algorithms
PRAM algorithm (SV)
Shiloach and Vishkin “An O(log n) parallel connecLvity algorithm” 1982 Introduction Methods Experiments
Shiloach and Vishkin “An O(log n) parallel connecLvity algorithm” 1982 6
Pointer jumping for faster convergence
O(log |V|) iterations
→ O(|E| log |V|) work
algorithms
PRAM algorithm (SV)
Introduction Methods Experiments
O(|V|) iterations
→ O(|E|.|V|) work
Label PropagaLon Shiloach-Vishkin
7
algorithms
algorithm (SV)
G(V,E)
Multistep algorithm Part of popular graph analysis frameworks : GraphX, PowerLyra, PowerGraph 1 Parallel BFS iteration Parallel Label Propagation
Slota et. al. “A Case Study of Complex Graph Analysis …” IPDPS 2016 Slota et al. “BFS and coloring-based parallel … IPDPS 2014 Introduction Methods Experiments
Flick et. al. “A parallel connecLvity algorithm …” SC 15
algorithm for distributed memory parallel systems.
8
G(V,E) Parallel SV Parallel BFS
1 2
Introduction Methods Experiments
9
Current partition id Vertex ids
array of tuples (call it
A) to keep partition
id of each vertex.
beginning
O(|V| + |E|)
Introduction Methods Experiments
u
v1 v2
u v1 v2 u v1 v2 v2 v1 u v2 u v1 u v2 v1 u v2 u v1 u
10
<
,
array of tuples (call it
A) to keep partition
id of each vertex.
beginning
O(|V| + |E|)
Introduction Methods Experiments
u
v1 v2
u v1 v2 u v1 v2 v2 v1 u v2 u v1 u v2 v1 u v2 u v1 u Current partition id Vertex ids
11
u
u u u
Current partition id Vertex ids
partition ids?
Introduction Methods Experiments u
12
u
u u u
Current partition id
u v w
u v w
Current partition id
Introduction Methods Experiments
Vertex ids
partition ids?
u v w
13
u
u u u
Current partition id
u v w
u v w
Current partition id
Introduction Methods Experiments
Vertex ids
partition ids?
u v w
14
minimums.
balance
Introduction Methods Experiments
Check our preprint
Flick et. al. “A parallel connecLvity algorithm …” SC 15
algorithm for distributed memory parallel systems.
15
G(V,E) Parallel SV Parallel BFS
1 2
Introduction Methods Experiments
16
efficient for a giant small world graph component.
components
component
choose at runtime?
Introduction Methods Experiments
17 Introduction Methods Experiments
Run Parallel-SV on remaining graph Curve fits power- law distribution? Compute degree distribution of input graph 1 BFS iteration Yes No
Laboratory
RAM
18 Buluç and Gilbert “The Combinatorial BLAS: Design …” IJHPCA 2011 Introduction Methods Experiments
19 Introduction Methods Experiments
Small world graphs
20 Introduction Methods Experiments
21
Small world graphs Large diameter graph
Introduction Methods Experiments
22
Small world graphs Large diameter graph Large number of components
Introduction Methods Experiments
20 40 60 M1 M2 M3 G1 G2 G3 K1 K2 Datasets Time (sec)
Method Dynamic Static (Opp. Choice)
Time (sec) Graphs
23
Run BFS?
1.2x 0.9x 1.2x 4.1x 3.7x 4.7x 3.6x 4.0x
Timings against opposite choice, using 2K cores
Introduction Methods Experiments
20 40 60 M1 M2 M3 G1 G2 G3 K1 K2 Datasets Time (sec)
Method Dynamic Static (Opp. Choice)
Time (sec) Graphs
24
Proportion of time spent in prediction (using 2K cores) Proportion
Introduction Methods Experiments
Run BFS?
using 4096 cores (Ideal :16x)
integers achieves 8.06x speedup as well.
25
200 300 2.5 5.0 7.5
Time (sec) Speedup
256 512 1024 2048 4096 Number of cores (log scale) Dataset
G2 G3 K1 M1 M2
Number of cores (log scale)
Timings for the largest graph M4
Time (sec) Speedup Introduction Methods Experiments
26
25 50 75 M1 M2 M3 G1 G2 G3 K1 K2 Datasets Time (sec)
Method Our method Multistep
Time (sec)
2.1x 1.1x 2.7x 24x 0.9x 1.1x 1.1x 1.9x
Diameter
4K 4K 2K 25K 9 9 16 17
Graphs
Introduction Methods Experiments
27
union-find)
Introduction Methods Experiments
algorithm based on Shiloach-Vishkin approach.
runtime.
graphs.
28
arxiv.org/abs/1607.06156 cjain @ gatech.edu
Reproducibility IniLaLve Award
github.com/ParBLiSS/ parconnect