- M. Naumov, A. Vrielink and M. Garland, GTC 2017
Parallel Depth First on GPU M. Naumov, A. Vrielink and M. Garland, - - PowerPoint PPT Presentation
Parallel Depth First on GPU M. Naumov, A. Vrielink and M. Garland, - - PowerPoint PPT Presentation
Parallel Depth First on GPU M. Naumov, A. Vrielink and M. Garland, GTC 2017 Introduction Directed Trees Directed Acyclic Graphs (DAGs) AGENDA Path- and SSSP-based variants Optimizations Performance Experiments 2 What is DFS? a Node:
2
AGENDA
Introduction Directed Trees Directed Acyclic Graphs (DAGs) Path- and SSSP-based variants Optimizations Performance Experiments
3
What is DFS?
a d b f e c j g i Node: a,b,c,d,e,f,g,i,j Parent: Discovery: Finish:
4
What is DFS?
a d b f e c j g i Node: a,b,c,d,e,f,g,i,j Parent: /,a Discovery: a,b Finish:
5
What is DFS?
a d b f e c j g i Node: a,b,c,d,e,f,g,i,j Parent: /,a, b, Discovery: a,b,e Finish: e
6
What is DFS?
a d b f e c j g i Node: a,b,c,d,e,f,g,i,j Parent: /,a, b,b Discovery: a,b,e,f Finish: e
7
What is DFS?
a d b f e c j g i Node: a,b,c,d,e,f,g,i,j Parent: /,a, b,b, ,f Discovery: a,b,e,f,i Finish: e,i
8
What is DFS?
a d b f e c j g i Node: a,b,c,d,e,f,g,i,j Parent: /,a, b,b, ,f,f Discovery: a,b,e,f,i,j Finish: e,i,j
9
What is DFS?
a d b f e c j g i Node: a,b,c,d,e,f,g,i,j Parent: /,a,a,a,b,b,d,f,f Discovery: a,b,e,f,i,j,c,d,g Finish: e,i,j,f,b,c,g,d,a
10
Previous Work on DFS
Planar Graphs
Time O(log2n) Processors O(n) Directed Acyclic Graphs (DAGs) Time O(log2n) Processors O(nω/log n) Directed Graphs with Cycles Time O( 𝑜 log11n) Processors O(n3) where ω < 2.373 is the matrix multiplication exponent Lexicographic DFS
11
Previous Work on DFS
Planar Graphs
Time O(log2n) Processors O(n) Directed Acyclic Graphs (DAGs) Time O(log2n) Processors O(nω/log n) Directed Graphs with Cycles Time O( 𝑜 log11n) Processors O(n3) where ω < 2.373 is the matrix multiplication exponent Lexicographic DFS topological sort, bi-connectivity and planarity testing
12
DIRECTED TREES
13
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] Phase 2: Bottom-Up Traversal
14
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] [0,1,1] Phase 2: Bottom-Up Traversal
15
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1,2] [0,1] prefix sum Phase 2: Bottom-Up Traversal
16
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] [0,1,3] [0,1,2] Phase 2: Bottom-Up Traversal
17
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] prefix sum [0,1,4] [0,1,2] Phase 2: Bottom-Up Traversal
18
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] [0,5,1,2] [0,1,4] [0,1,2] Phase 2: Bottom-Up Traversal
19
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] [0,5,6,8] [0,1,4] [0,1,2] Phase 2: Bottom-Up Traversal
20
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] [0,5,6,8] [0,1,4] [0,1,2] This phase is done, next phase is about to start …
21
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] [0,5,6,8] [0,1,4] [0,1,2] Phase 3: Top-down Traversal
- ffset 0
22
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] [0,5,6,8] [0,1,4] [0,1,2] Phase 3: Top-down Traversal
- ffset 0
- ffset 1
23
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] [0,5,6,8] [0,1,4] [0,1,2] Phase 3: Top-down Traversal
- ffset 0
- ffset 6
- ffset 1
24
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] [0,5,6,8] [0,1,4] [0,1,2] Phase 3: Top-down Traversal discovery 0+2 discovery 1+3 discovery = offset + depth discovery 6+1
25
Directed Tree
a d b f e c j g i [0] [0] [0] [0] [0] [0,1] [0,5,6,8] [0,1,4] [0,1,2] Phase 3: Top-down Traversal finish 0+0 finish 1+0 finish = offset + sub-tree size finish 6+1
26
DIRECTED ACYCLIC GRAPHS
PATH-BASED VARIANT
27
Path-Based (for DAGs)
a d b f e c j g i left right [a,b,f] [a,d,f] f collision Phase 1
28
Path-Based (for DAGs)
a d b f e c j g i left right [a,b,f] [a,d,f] f collision
- wait until all paths to a node are traversed
- align path sequences
left [a,b,f] right [a,d,f]
- compare left-to-right and choose smallest
(lexicographically smallest)
resolution Phase 1
29
Path-Based (for DAGs)
a d b f e c j g i This phase is done
30
OPTIMIZATIONS
31
Path Pruning
[a,c,d,f] a b e d c f [a,b,e,f]
32
Path Pruning
[a,c,d,f] a b e d c f [a,b,e,f] When two paths reach the same node There exists a parent “a” where the path split [a,b,…] and [a,c,…]
33
Path Pruning
[a,c,d,f] a b e d c f [a,b,e,f] When two paths reach the same node There exists a parent “a” where the path split [a,b,…] and [a,c,…] It is the comparison between “b” and “c” that allows us to distinguish between paths
34
Path Pruning
[a,c,d,f] a b e d c f [a,b,e,f] When two paths reach the same node There exists a parent “a” where the path split [a,b,…] and [a,c,…] It is the comparison between “b” and “c” that allows us to distinguish between paths Parent node with a single edge will never be a decision point
35
Path Pruning
[a,c,f] a b e d c f [a,b,f] When two paths reach the same node There exists a parent “a” where the path split [a,b,…] and [a,c,…] It is the comparison between “b” and “c” that allows us to distinguish between paths Parent node with a single edge will never be a decision point No need to store nodes with such parents
36
Path Pruning
37
Phase Composition
38
SSSP-BASED VARIANT
39
SSSP-based (for DAGs)
a d b f e c j g i [1] [1] [1] [1] [1] Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0)
40
SSSP-based (for DAGs)
a d b f e c j g i [1] [1] [1] [1] [1] [1,1] [1,1,1] Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0)
41
SSSP-based (for DAGs)
a d b f e c j g i [1] [1] [1] [1] [1] [1,2] [1,2,3] Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) prefix sum Phase 1: Bottom-Up Traversal Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0)
42
SSSP-based (for DAGs)
a d b f e c j g i [1] [1] [1] [1] [1] [1,2] [1,1,3] [1,2,3] Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0)
43
SSSP-based (for DAGs)
a d b f e c j g i [1] [1] [1] [1] [1] [1,2] [1,2,4] [1,2,3] Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal prefix sum Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0)
44
SSSP-based (for DAGs)
a d b f e c j g i [1] [1] [1] [1] [1] [1,2] [1,5,1,2,1] [1,2,4] [1,2,3] Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0)
45
SSSP-based (for DAGs)
a d b f e c j g i [1] [1] [1] [1] [1] [1,2] [1,6,7,9,10] [1,2,4] [1,2,3] Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal
46
SSSP-based (for DAGs)
a d b f e c j g i 1 2 1 6 7 9 1 1 2 This phase is done, next phase is about to start … Assign # of nodes as the edge weight
47
SSSP-based (for DAGs)
a d b f e c j g i 1 2 1 6 7 9 1 1 2 Phase 2: Top-down traversal 1+2+2=5 < 9
48
SSSP-based (for DAGs)
a d b f e c j g i 1 2 1 6 7 9 1 1 2 Phase 2: Top-down traversal Shortest Path is the DFS path 1+2+2=5 < 9
49
SSSP-based (for DAGs)
a d b f e c j g i Phase 2: This phase is done
50
OPTIMIZATIONS
51
Discovery time
The length of shortest path defines an ordering of nodes a d b f e c j g i 8 1 6 7 2 4 5 3 Phase 3a: Sorting
52
Discovery time
The length of shortest path defines an ordering of nodes We can sort them to obtain discovery time a d b f e c j g i 8 1 6 7 2 4 5 3 Phase 3a: Sorting
53
Discovery time
a d b f e c j g i 8 1 6 7 2 4 5 3 Discovery: a,b,e,f,i,j,c,d,g The length of shortest path defines an ordering of nodes We can sort them to obtain discovery time Phase 3a: This phase is done (Phase 3b will find the finish time)
54
Phase composition
55
EXPERIMENTS
56
Data
# Graph n m Application 1 coPapersDBLP 540487 15251812 Citations 2 auto 448696 3350678 Numeric Sim. 3 hugebubbles-000… 18318144 30144175 Numeric Sim. 4 delaunay_n24 16777217 52556391 Random Tri. 5 il2010 451555 1166978 Census Data 6 fl2010 484482 1270757 Census Data 7 ca2010 710146 1880571 Census Data 8 tx2010 914232 2403504 Census Data 9 great-britain_osm 7733823 8523976 Road Network 10 germanu_osm 11548846 12793527 Road Network 11 road_central 14081817 21414269 Road Network 12 road_usa 23947348 35246600 Road Network
When necessary DAGs are created from general graphs by dropping back edges
57
Performance
Results obtained with Nvidia Pascal TitanX GPU, Intel Core i7-3930K @3.2GHz CPU, Ubuntu 14.04 LTS OS, CUDA Toolkit 8.0
0.5 1 1.5 2 2.5 3 3.5 4
Speedup (Parallel vs. Seq. DFS)
Path-based SSSP-based 3 BFS 6x 5x
58
CONCLUSIONS
59
Conclusions
- Parallel DFS for DAGs
Work-efficient O(m+n) The algorithm takes O(z log n) steps, where z is the maximum depth of a node
- Performance
Depends highly on the connectivity/sparsity pattern Can achieve up to 6x speedup (but slowdown possible)
- Details
M. Naumov, A. Vrielink and M. Garland, “Parallel Depth-First Search for Directed Acyclic Graphs”, Technical Report, NVR-2017-001, 2017 https://research.nvidia.com/publication/parallel-depth-first-search-directed-acyclic-graphs
60
Thank you
https://research.nvidia.com/publication/parallel-depth-first-search-directed-acyclic-graphs