QMPI: A Library for Multithreaded MPI Applications
Alex Brooks Hoang-Vu Dang Marc Snir
QMPI: A Library for Multithreaded MPI Applications Alex Brooks - - PowerPoint PPT Presentation
QMPI: A Library for Multithreaded MPI Applications Alex Brooks Hoang-Vu Dang Marc Snir Outline Motivation Communication Model Qthreads QMPI Summary 2 MOTIVATION 3 Issue Large numbers of threads performing
Alex Brooks Hoang-Vu Dang Marc Snir
2
3
4
5
6
0.001 0.01 0.1 1 10 100 1000 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1024K 2048K 4096K Bandwidth (MB/s) Message size (bytes)
MPICH
0.001 0.01 0.1 1 10 100 1000 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1024K 2048K 4096K Message size (bytes)
MVAPICH
7
8
9
Communication Engine Worker Thread Worker Thread Worker Thread …
10
11
: data is written
12
13
14
15
16
17
Worker Worker Worker … Shepherd Shepherd Shepherd … Node Network FEB Queue FEB Thread Comm Queue Comm Thread Synch Container
18
Worker Worker Worker … Shepherd Shepherd Shepherd … Node Network FEB Queue FEB Thread Comm Queue Comm Thread Synch Container
19
Worker Worker … Shepherd Shepherd Shepherd … Node Network FEB Queue FEB Thread Comm Queue Comm Thread Worker Synch Container
20
Worker Worker Worker … Shepherd Shepherd Shepherd … Node Network FEB Queue FEB Thread Comm Queue Comm Thread Synch Container
21
Worker Worker Worker … Shepherd Shepherd Shepherd … Node Network FEB Queue FEB Thread Comm Queue Comm Thread Synch Container
22
Worker Worker Worker … Shepherd Shepherd Shepherd … Node Network FEB Queue FEB Thread Comm Queue Comm Thread Synch Container
23
0.001 0.01 0.1 1 10 100 1000 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1024K 2048K 4096K Bandwidth (MB/s) Message size (bytes)
MPICH
0.001 0.01 0.1 1 10 100 1000 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1024K 2048K 4096K Message size (bytes)
MVAPICH
24
25
26
1 10 100 1000 120 1200 12000 Execution Time (usec) Grid Size (1 side)
Send Phase
MPI+Pthread QMPI 1 10 100 1000 10000 120 1200 12000 Grid Size (1 side)
Receive Phase
MPI+Pthread QMPI 1 10 100 1000 10000 100000 120 1200 12000 Execution Time (usec) Grid Size (1 side)
Calculation Phase
MPI+Pthread QMPI
27
28
29
30