Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
1
Partitioning for applications Outline Meshes Rob H. Bisseling, - - PowerPoint PPT Presentation
Partitioning for applications Outline Meshes Rob H. Bisseling, Albert-Jan Yzelman, Bas Fagginger Auer Laplacian BSP cost Diamonds Mathematical Institute, Utrecht University 3D Rob Bisseling: also joint Laboratory CERFACS/INRIA, Toulouse,
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
1
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
2
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
3
◮ Source: N. Gourdain et al. ‘High performance Parallel
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
4
◮ In many applications, a physical domain can be partitioned
◮ Communication is only needed for exchanging information
◮ Grid points interact only with a set of immediate
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
5
(1,0) (0,0) (2,0) (0,1) (0,2) 1 2 3 4 5 6 7 8
◮ xi+1,j − xi,j approximates the derivative of the temperature
◮ (xi+1,j − xi,j) − (xi,j − xi−1,j) = xi−1,j + xi+1,j − 2xi,j
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
6
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
7
◮ We must assign each grid point to a processor. ◮ We assign the values xi,j and ∆i,j to the owner of grid
◮ Each point of the grid has an amount of computation
◮ Here, an interior point has 5 flops; a border point 4 flops; a
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
8
◮ Bulk synchronous parallel (BSP) model by Valiant (1990):
◮ An h-relation is a communication phase (superstep) in
◮ T(h) = hg + l, where g is the time per data word
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
9
(a) (b) (c) ◮ (a) Partition into strips: long Norwegian borders,
◮ (b) Boundary corrections improve load balance. ◮ (c) Partition into square blocks: shorter borders,
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
10
◮ The communication-to-computation ratio for square blocks
◮ This ratio is often called the surface-to-volume ratio,
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
11
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
12
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
13
◮ Digital diamond, or closed l1-sphere, defined by
◮ Br(c) is the set of points with Manhattan distance
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
14
c
◮ The number of points of Br(c) is
◮ The number of neighbouring points is 4r + 4. ◮ This is also the number of ghost cells needed in a parallel
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
15
◮ For a k × k grid and p processors, we have
◮ Just on the basis of 4r + 4 receives from neighbour points,
◮ Compare with value 4√p
◮ This gain was caused by reuse of data: the value at a grid
◮ Also
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
16
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
17
a b
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
18
◮ Discard one layer of points from the north-eastern and
◮ For r = 3, the number of points decreases from 25 to 18.
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
19
◮ Total computation: 672 flops. Avg 84. Max 90. ◮ Communication: 104 values. Avg 13. Max 14. ◮ Total time: 90 + 14g = 90 + 14 · 10 = 230 (ignoring 2l). ◮ 8 rectangular blocks of size 6 × 3 blocks:
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
20
◮ Partitioning obtained by translating into a sparse matrix.
◮ Total computation: 672 flops. Avg 84. Max 91. (allowed
◮ Communication: 85 values. Avg 10.525. Max 16. ◮ Total time: 91 + 16g = 91 + 16 · 10 = 251.
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
21
◮ Find a better solution than can be obtained manually,
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
22
◮ If a processor has a cubic block of N = k3/p points,
◮ If a processor has a 10 × 10 × 10 block, 488 points are on
◮ Thus, communication is important in 3D. ◮ Based on the surface-to-volume ratio of a 3D digital
◮ The prime application of diamond-shaped distributions will
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
23
◮ Basic cell: grid points in a truncated octahedron. ◮ For load balancing, take care with the boundaries. ◮ What You See, Is What You Get (WYSIWYG):
◮ Gain factor of 1.68 achieved for p = 2q3.
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
24
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
25
1 22 2 3 5 5 9 1 3 4 6 5 8 4 6 41 3 1 9 2 64 9 1
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
26
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
27
◮ Mondriaan block partitioning of 60 × 60 matrix prime60
◮ aij = 0 ⇐
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
28
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
29
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
30
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
31
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
32
vertices nets ◮ Hypergraph H = (V, N) ⇒ exact communication volume
◮ Columns ≡ Vertices: 0, 1, 2, 3, 4, 5, 6.
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
33
◮ 138 × 138 symmetric matrix bcsstk22, nz = 696, p = 8 ◮ Reordered to Bordered Block Diagonal (BBD) form ◮ Split of row i over λi processors causes
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
34
◮ Row split has unit cost, irrespective of λi
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
35
◮ p = 4, ǫ = 0.2, global non-permuted view
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
36
◮ Each individual nonzero is a vertex in the hypergraph,
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
37
◮ New algorithms for vector partitioning. ◮ Much faster, by a factor of 10 compared to version 1.0. ◮ 10% better quality of the matrix partitioning. ◮ Inclusion of fine-grain partitioning method ◮ Inclusion of hybrid between original Mondriaan and
◮ Can also handle p = 2q.
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
38
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
39
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
40
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
41
◮ Ordering of matrices to SBD and BBD structure: cut rows
◮ Visualisation through Matlab interface, MondriaanPlot,
◮ Library-callable, so you can link it to your own program ◮ Hypergraph metrics: λ − 1 for parallelism, and cut-net for
◮ Interface to PaToH hypergraph partitioner
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
42
◮ SBD structure is obtained by recursively partitioning the
◮ The cut rows are sparse and serve as a gentle cache
◮ Mondriaan is used in one-dimensional mode, splitting only
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
43
◮ The recursive, fractal-like nature makes the ordering
◮ The ordering is cache-oblivious.
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
44
! " # $ % %&" %&$ %&' %&( ! !&" !&$ )*+,-./01234, 510!+-24/6*-0 / / 758 ! " # $ 9 !% !9 "%
◮ Experiments on 1 core of the dual-core 4.7 GHz Power6+
◮ 64 kB L1 cache, 4 MB L2, 32 MB L3. ◮ Test matrices: 1. stanford; 2. stanford berkeley;
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
45
◮ Matrix rhpentium, split over 30 processors
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
46
◮ Unstructured grid and its sparse matrix ◮ Source: N. Gourdain et al. ‘High performance Parallel
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
47
◮ Use Mondriaan in 1D mode, not in full 2D mode. ◮ Advantage: no need to change data structure, while still
◮ Advantage: hypergraph partitioning leads to less ghost
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
48
◮ Advantage: Mondriaan is open-source, can be changed by
◮ Disadvantage: hypergraph partioner Mondriaan itself takes
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
49
◮ To achieve a good partitioning with a low
◮ In 2D, an even better method is to use digital diamonds.
◮ In 3D, the best method is to use truncated octahedra with
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
50
◮ For unstructured grids, the same gains can be obtained by
◮ Using graph partitioning and the edge-cut metric will lead
Outline Meshes
Laplacian BSP cost Diamonds 3D
Matrices
Matrix-vector Movies Hypergraphs SBD
Mesh-Matrix Conclusions
51
◮ Mondriaan 3.0, to be released soon, contains improved
◮ We are working on a converter for reading meshes directly,
◮ We hope to be able to build a Mondriaan hypergraph