Multithreaded distributed BFS with regularized communication pattern on the cluster with Angara interconnect
T-PLATFORMS
www.t-platforms.com GraphHPC-2016
March 3, 2016 Artem Osipov Alexander Daryin
T-PLATFORMS March 3, 2016 Artem Osipov Alexander Daryin - - PowerPoint PPT Presentation
Multithreaded distributed BFS with regularized communication pattern on the cluster with Angara interconnect T-PLATFORMS March 3, 2016 Artem Osipov Alexander Daryin GraphHPC-2016 www.t-platforms.com BFS Revisited Breadth-First Search
www.t-platforms.com GraphHPC-2016
March 3, 2016 Artem Osipov Alexander Daryin
www.t-platforms.com GraphHPC-2016
www.t-platforms.com GraphHPC-2016
3 1 2 … rowstarts 21 6 27 2 42 15 13 7 … 1 27 rank local_id global_id columns local_id global_id
Data: CSR (rowstarts R, column C), current queue Q fun function ction ProcessQueue(R, C, Q) for
in Q do do for
in {R[v]..R[v + 1]} do do send v, local(C[e]) to owner(C[e]))
www.t-platforms.com GraphHPC-2016
Data: CSR (rowstarts R, column C), current queue Q fun function ction ProcessQueue(R, C, Q) for
in Q do do for
in {R[v]..R[v + 1]} do do send v, local(C[p]) to owner(C[p]))
www.t-platforms.com GraphHPC-2016
Message coalescing Small message size low bandwidth Separate send buffer for each peer process large memory overhead multiple threads - ? Irregular communication pattern connection overhead
www.t-platforms.com GraphHPC-2016
www.t-platforms.com GraphHPC-2016
www.t-platforms.com GraphHPC-2016
… 3 2 1 1 4 7 8 4 1 .. 6 5 2 1 8 2 1 6 3 … 5 4 7 9 7 3 6 2 2 … 9 7 1 8 9 5 9 dst 1 dst 0 dst 2 dst 3 local_id
packed rowstarts local_id
Data: Packed CSR (vertex indices V, row offsets D, column C), current queue Q (bit mask), number of processes NP fun function ction ProcessQueue(R, C, Q) for
in {0..NP-1} do do for
in {0..|V[p]|-1} do do if if V[p][i] in n Q then hen for for e in in {D[p][i]..D[p][i + 1] do do send V[p][i], C[e]) to p
www.t-platforms.com GraphHPC-2016
www.t-platforms.com GraphHPC-2016
Data: Packed CSR (vertex indices V, row offsets D, column C), bit mask of visited vertices M, number of processes NP fun function ction ProcessUnvisited() // probe first edges for
in {0..NP-1} do do for
in {0..|V[p]-1|} do do if if V[p][i] in in M then hen send V[p][i], C[D[p][i]] to peer flush send buffer wait for all acks // probe other edges for
in {0..NP-1} do do for
in {0..|V[p]|-1} do do if if V[p][i] in in M then hen for
do send V[peer][i], C[e] to peer flush send buffer
www.t-platforms.com GraphHPC-2016
was used)
www.t-platforms.com GraphHPC-2016
www.t-platforms.com GraphHPC-2016
thread 0
performs MPI communication and coordinates worker threads
thread 1
processes received messages
thread 2
processes packed CSR for one dst at a time
thread k-1
processes packed CSR for one dst at a time
…
processes packed CSR for one dst at a time send buffers recv buffers dst list
www.t-platforms.com GraphHPC-2016
InitNewRound Send/Recv Allgather updated count WaitMainThread ProcessLocal ProcessRecv WaitMainThread ProcessGlobal SendRequests
BuffersQueue ActionsQueue
RecvRequests
BuffersQueue ActionsQueue
RanksToProcess
MsgHandlerThread(1) MainThread(0) QueueThread(2..k-1)
www.t-platforms.com GraphHPC-2016
InitNewRound Send/Recv Allgather updated count WaitMainThread ProcessLocal ProcessBckRecv FwdRecvRequests
BuffersQueue ActionsQueue
MsgHandlerThread(1) MainThread(0)
www.t-platforms.com GraphHPC-2016
InitNewRound Send/Recv Allgather updated count WaitMainThread ProcessGlobal BckSendRequests
BuffersQueue ActionsQueue
RanksToProcess
MainThread(0) QueueThread(2..k-1)
FwdSendRequests
BuffersQueue ActionsQueue
BckRecvRequests
BuffersQueue ActionsQueue
ProcessFwdRecv
www.t-platforms.com GraphHPC-2016
1 2 3 4 5 6 7 8 22 23 24 25 26 27 28 29 30
16 Nodes
16x2 4 OMP 16x 6 OMP 16x2 10 OMP 16x8
www.t-platforms.com GraphHPC-2016
1 2 3 4 5 6 7 22 23 24 25 26 27 28 29 30
32 Nodes
32x1 6 OMP 32x4 32x8
www.t-platforms.com GraphHPC-2016
www.t-platforms.com