Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
1
Big graphs for big data: parallel matching and Outline clustering - - PowerPoint PPT Presentation
Big graphs for big data: parallel matching and Outline clustering on billion-vertex graphs Matching Introduction Greedy Parallelisable BSP algorithm GPU algorithm Rob H. Bisseling Results Clustering Introduction Mathematical Institute,
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
1
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
2
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
3
◮ Hodel: Well, somebody has to arrange the matches.
◮ Hodel: For Papa, make him a scholar. ◮ Chava: For Mama, make him rich as a king.
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
4
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
5
◮ Graph matching is a pairing of neighbouring vertices. ◮ It has applications in
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
6
◮ Optimal solution is possible in polynomial time. ◮ Time for weighted matching in graph G = (V , E) is
◮ The aim is a billion vertices, n = 109, with 100 edges per
◮ Thus, a time of O(1020) = 100, 000 Petaflop units is far
◮ We need linear-time greedy or approximation algorithms.
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
7
◮ A graph is a pair G = (V , E) with vertices V and edges E. ◮ All edges e ∈ E are of the form e = (v, w) for vertices
◮ A matching is a collection M ⊆ E of disjoint edges. ◮ Here, the graph is undirected, so (v, w) = (w, v).
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
8
◮ A matching is maximal if we cannot enlarge it further by
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
9
◮ A matching is maximum if it possesses the largest possible
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
10
◮ If the edges are provided with weights ω : E → R>0,
◮ Greedy matching provides us with maximal matchings,
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
11
◮ In random order, vertices v ∈ V select and match
◮ Here, we can pick
◮ Or: we sort all the edges by weight, and successively match
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
12
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
13
◮ Weight ω(M) ≥ ωoptimal/2 ◮ Cardinality |M| ≥ |Mcard−max|/2, because M is maximal. ◮ Time complexity is O(m log m), because all edges must be
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
14
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
14
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
14
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
15
◮ An edge (v, w) ∈ E is dominant if
9 7 3 2 6 w v 5 6 8
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
16
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
17
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
18
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
19
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
20
◮ Dominant-edge algorithm is a 1/2-approximation:
◮ Dominant edge means mutual preference:
◮ Dominance is a local property: easy to parallelise. ◮ Algorithm keeps going until set of dominant vertices D is
◮ Assumption without loss of generality: weights are unique.
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
21
◮ Linear time complexity O(|E|) if edges of each vertex are
◮ Sorting costs are
◮ This algorithm is based on a dominant-edge algorithm by
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
22
◮ Processor P(s) has vertex set Vs, with
◮ This is a p-way partitioning of the vertex set.
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
23
◮ The adjacency set Adjv of a vertex v may contain vertices
◮ We define the set of halo vertices
◮ The weights ω(v, w) are stored with the edges, for all
◮ Es = {(v, w) ∈ E : v ∈ Vs}
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
24
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
25
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
26
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
27
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
28
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
29
◮ The algorithm alternates supersteps of computation
◮ The whole algorithm terminates when no messages have
◮ This can be checked at every synchronisation point.
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
30
◮ Processors can have different amounts of work, even if
◮ Use can be made of a global clock based on ticks, the unit
◮ Here, ‘handling’ could mean setting a new preference. ◮ After every k ticks, everybody synchronises.
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
31
◮ Guidance for the choice of k is provided by the BSP
◮ Choosing k ≥ l guarantees that at most 50% of the total
◮ Choosing k sufficiently small will cause all processors to be
◮ Good choice: k = 2l?
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
32
◮ The BSP system takes care that messages are sent
◮ In the next superstep, all received messages are read (using
◮ Google’s Pregel system (Malewicz 2010) follows this BSP
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
33
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
34
◮ BSP program can remain the same, giving portability. ◮ To exploit the ease of reading data in shared memory, the
◮ This performs the communication immediately and blocks
◮ Possible use: replace the set Ms of matched edges by a
◮ This array can be read by all processors using
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
35
9 8 6 5 7 3 1 4 2 ◮ A different approach, tightly coupled to the GPU
◮ To prevent matching conflicts, we create two groups of
◮ Proposals that were responded to, are matched.
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
36
◮ The graph (neighbour ranges, indices, and weights) is
◮ We create one thread for each vertex in V . ◮ Each vertex v ∈ V only updates
◮ π(v) = π(w) means (v, w) ∈ M. ◮ Both π and σ are stored in 1D arrays in global memory.
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
37
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
38
20 40 60 80 100 5 10 15 20 Matched vertices/total nr. of vertices (%) Number of iterations Saturation of matching size ecology2 (1,997,996) ecology1 (1,998,000) G3_circuit (3,037,674) thermal2 (3,676,134) kkt_power (6,482,320) af_shell9 (8,542,010) ldoor (22,785,136) af_shell10 (25,582,130) audikw1 (38,354,076) nlpkkt120 (46,651,696) cage15 (47,022,346)
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
39
◮ At each iteration, we colour the vertices v ∈ V differently. ◮ For a fixed p ∈ [0, 1]
◮ How to choose p? Maximise the number of matched
◮ For large random graphs, the expected fraction of matched
p 1−p
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
40
20 40 60 80 100 20 40 60 80 100 Fraction of maximum value (%) Fraction of vertices that are blue (%) Influence of relative blue/red group size Matching weight Matching size Matching time 20 40 60 80 100 20 40 60 80 100 Fraction of matched vertices (%) Fraction of vertices that are blue (%) Influence of relative blue/red group size Observed Equation (2)
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
41
◮ Implementation on the GPU using CUDA, on the CPU
◮ We consider both greedy random and greedy weighted
◮ Test set: 10th DIMACS challenge on graph partitioning
◮ Test hardware: dual quad-core Xeon E5620 and an
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
42
10 20 30 40 50 60 70 80 90 100 1 2 4 8 16 Relative matching time (%) Number of CPU threads Matching time scaling ecology2 (1,997,996) ecology1 (1,998,000) G3_circuit (3,037,674) thermal2 (3,676,134) kkt_power (6,482,320) af_shell9 (8,542,010) ldoor (22,785,136) af_shell10 (25,582,130) audikw1 (38,354,076) nlpkkt120 (46,651,696) cage15 (47,022,346) ideal scaling
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
43
80 85 90 95 100 105 110 115 120 101 102 103 104 105 106 107 108 Matching size rel. to Alg. 1 (%) Number of graph edges Matching size for random parallel matching (vs. Alg. 1) CUDA TBB 1 2 3 4 5 6 7 101 102 103 104 105 106 107 108 Speedup rel. to Alg. 1 Number of graph edges Speedup for random parallel matching (vs. Alg. 1) CUDA TBB
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
44
50 100 150 200 250 101 102 103 104 105 106 107 108 Matching weight rel. to Alg. 1 (%) Number of graph edges Matching weight for weighted parallel matching (vs. Alg. 1) CUDA TBB 1 2 3 4 5 6 101 102 103 104 105 106 107 108 Speedup rel. to Alg. 1 Number of graph edges Speedup for weighted parallel matching (vs. Alg. 1) CUDA TBB
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
45
50 100 150 200 250 101 102 103 104 105 106 107 108 Matching weight rel. to Alg. 2 (%) Number of graph edges Matching weight for weighted parallel matching (vs. Alg. 2) CUDA TBB 5 10 15 20 25 30 35 40 101 102 103 104 105 106 107 108 Speedup rel. to Alg. 2 Number of graph edges Speedup for weighted parallel matching (vs. Alg. 2) CUDA TBB
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
46
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
47
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
48
◮ A clustering of an undirected graph G = (V , E) is a
◮ Elements C ∈ C are called clusters. ◮ The number of clusters is not fixed beforehand. ◮ Extreme cases: a single large cluster, |V | single-vertex
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
49
◮ The quality measure modularity was introduced by
◮ Let G = (V , E, ω) be a weighted undirected graph without
◮ Then, the modularity of a clustering C of G is defined by
◮ −1
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
50
◮ The weight of a cluster is
◮ The set of all cut edges between clusters C and C ′ is
◮ If we merge clusters C and C ′ from C into one cluster
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
51
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
52
◮ Based on slight adaptations of functions from Thrust, an
◮ Also, for. . . parallel do constructs indicating a for-loop
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
53
10-5 10-4 10-3 10-2 10-1 100 101 102 103 101 102 103 104 105 106 107 108 109 Clustering time (s) Number of graph edges |E| Clustering time 3*10-7 |E| CUDA TBB
◮ DIMACS categories: clustering/, coauthor/,
◮ CUDA implementation with the Thrust template library
◮ Web link graph uk-2002 with 0.26 billion vertices
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
54
10 20 30 40 50 60 70 80 90 100 1 2 4 8 16 Relative clustering time (%) Number of CPU threads Clustering time scaling linear 215 216 217 218 219 220 221 222 223 224
◮ The clustering time as a function of the number of threads. ◮ Graphs from the category random/ with 215–224 vertices. ◮ Intel TBB implementation on 2 quad-core 2.4 GHz Intel
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
55
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
56
◮ A sparse matrix is the adjacency matrix of a sparse graph:
◮ Partitioning the nonzeros of a matrix is the same as
◮ 2D partitioning splits both rows and columns. ◮ Partitioning for parallel sparse matrix-vector multiplication
◮ Partitioning for SpMV also gives a good partitioning for
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
57
◮ We can use both dimensions of the matrix to reduce SpMV
◮ For a √p × √p block distribution, each matrix row or
◮ Relatively dense rows and columns can be split and do not
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
58
◮ Existing 2D methods: coarse-grain, fine-grain, Mondriaan. ◮ New medium-grain method (Pelt & Bisseling 2014) based
◮ Then partition the (m + n) × (m + n) matrix B by a 1D
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
59
◮ 46 nodes, 132 edges ◮ Source: University of Florida Sparse Matrix Collection
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
60
◮ 47 × 47 matrix gd97 b with 264 nonzeros ◮ Partitioning for 2 processors ◮ Communication volume = 11, which is optimal
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
61
1.0 1.2 1.4 1.6 1.8 2.0
0.0 0.2 0.4 0.6 0.8 1.0
◮ Test set: 2264 Florida matrices, 500 ≤ nz ≤ 5, 000, 000 ◮ LB = localbest (original Mondriaan) = best of 1D row and
◮ FG = fine-grain ◮ MG = medium-grain ◮ IR = iterative refinement, to improve the partitioning
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
62
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
63
◮ BSP is extremely suitable for parallel graph computations:
◮ Matching can be the basis for clustering, as demonstrated
◮ We clustered Europe’s road network with 51M vertices and
◮ Partitioning for sparse matrix-vector multiplication reduces
◮ Parallel graph algorithms will benefit from
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
64
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
65
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
66
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
67
Outline Matching
Introduction Greedy Parallelisable BSP algorithm GPU algorithm Results
Clustering
Introduction Sequential GPU algorithm Results
2D partitioning 2D matching Conclusion References
68