Automating Topology Aware Mapping for Supercomputers
Abhinav Bhatele, Gagan Gupta Laxmikant
- V. Kale
1
1
Automating Topology Aware Mapping for Supercomputers Abhinav - - PowerPoint PPT Presentation
Automating Topology Aware Mapping for Supercomputers Abhinav Bhatele, Gagan Gupta Laxmikant V. Kale 1 1 Application Topologies Patch Compute Proxy
1
1
Patch Compute Proxy
2
, Cray XT4/5
3
3
4
4
5
5
0.075 0.15 0.225 0.3 512 1024 2048 4096 8192 Time per step (s) Number of cores
Default Topology
6
Optimizations on 3D Mesh Interconnects. In Euro-Par, LNCS 5704, pages 1015–1028, 2009. Distinguished Paper Award.
Balancing Algorithms for Molecular Dynamics Applications, In 23rd ACM International Conference on Supercomputing (ICS), 2009.
6
Inner Brick Outer Brick Patch 1 Patch 2
0.075 0.15 0.225 0.3 512 1024 2048 4096 8192 Time per step (s) Number of cores
Default Topology
6
3.75 7.5 11.25 15 512 1024 2048 4096 8192 16384 Time per step (ms) Number of cores
Topology Oblivious TopoAware Patches TopoAware LDBs
Optimizations on 3D Mesh Interconnects. In Euro-Par, LNCS 5704, pages 1015–1028, 2009. Distinguished Paper Award.
Balancing Algorithms for Molecular Dynamics Applications, In 23rd ACM International Conference on Supercomputing (ICS), 2009.
6
7
7
http://charm.cs.uiuc.edu/~bhatele/phd/TopoMgrAPI.tar.gz
8
8
9
9
10
Processors
31
10
11
11
12
Object Graph: 7 x 4 Processor Graph: 4 x 7
12
12
Object Graph: 7 x 4 Processor Graph: 4 x 7
12
12
Object Graph: 7 x 4 Processor Graph: 4 x 7
12
12
Object Graph: 7 x 4 Processor Graph: 4 x 7
12
12
Object Graph: 7 x 4 Processor Graph: 4 x 7
12
12
Object Graph: 7 x 4 Processor Graph: 4 x 7
12
12
Object Graph: 7 x 4 Processor Graph: 4 x 7
12
12
Object Graph: 7 x 4 Processor Graph: 4 x 7
12
12
Object Graph: 7 x 4 Processor Graph: 4 x 7
12
13 Aleliunas, R. and Rosenberg, A. L. On Embedding Rectangular Grids in Square Grids. IEEE Trans. Comput., 31(9):907–913, 1982
13
13 Aleliunas, R. and Rosenberg, A. L. On Embedding Rectangular Grids in Square Grids. IEEE Trans. Comput., 31(9):907–913, 1982
13
13 Aleliunas, R. and Rosenberg, A. L. On Embedding Rectangular Grids in Square Grids. IEEE Trans. Comput., 31(9):907–913, 1982
13
13 Aleliunas, R. and Rosenberg, A. L. On Embedding Rectangular Grids in Square Grids. IEEE Trans. Comput., 31(9):907–913, 1982
13
13 Aleliunas, R. and Rosenberg, A. L. On Embedding Rectangular Grids in Square Grids. IEEE Trans. Comput., 31(9):907–913, 1982
13
13 Aleliunas, R. and Rosenberg, A. L. On Embedding Rectangular Grids in Square Grids. IEEE Trans. Comput., 31(9):907–913, 1982
13
13 Aleliunas, R. and Rosenberg, A. L. On Embedding Rectangular Grids in Square Grids. IEEE Trans. Comput., 31(9):907–913, 1982
13
13 Aleliunas, R. and Rosenberg, A. L. On Embedding Rectangular Grids in Square Grids. IEEE Trans. Comput., 31(9):907–913, 1982
13
13 Aleliunas, R. and Rosenberg, A. L. On Embedding Rectangular Grids in Square Grids. IEEE Trans. Comput., 31(9):907–913, 1982
13
14
14
15
di = distance bi = bytes n = no. of messages
15
16
7.5 15 22.5 30 14X6 to 7X12 16X16 to 8X32 27X35 to 45X21 Hops per processor Different mapping configurations
MXOVLP MXOV+AL EXCO COCE AFFN STEP Lower Bound
16
reduces by 45%
by 17%
1 2 3 4 256 512 1024 2048 Average hops per byte per core Number of nodes
Default Topology Lower Bound
17
17
18
Processor Mesh: 10 x 9
18
configuration, try to guess the structure of the graph
19
19
mapped neighbors
mapped neighbors
20
20
21
21
map it on to the processor grid
processor
22
22
23
23
24
150000 300000 450000 600000 90 nodes 256 nodes 1024 nodes Hop bytes
Default BFT MHT AFFN COCE Lower bound
24
25
25
26
26
27
27