 
              spcl.inf.ethz.ch @spcl_eth Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided R OBERT G ERSTENBERGER , M ACIEJ B ESTA , T ORSTEN H OEFLER
spcl.inf.ethz.ch @spcl_eth MPI-3.0 R EMOTE M EMORY A CCESS  MPI-3.0 supports RMA (“MPI One Sided”)  Designed to react to hardware trends  Majority of HPC networks support RDMA [1] http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf 2
spcl.inf.ethz.ch @spcl_eth MPI-3.0 R EMOTE M EMORY A CCESS  MPI-3.0 supports RMA (“MPI One Sided”)  Designed to react to hardware trends  Majority of HPC networks support RDMA [1] http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf 3
spcl.inf.ethz.ch @spcl_eth MPI-3.0 R EMOTE M EMORY A CCESS  MPI-3.0 supports RMA (“MPI One Sided”)  Designed to react to hardware trends  Majority of HPC networks support RDMA [1] http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf 4
spcl.inf.ethz.ch @spcl_eth MPI-3.0 R EMOTE M EMORY A CCESS  MPI-3.0 supports RMA (“MPI One Sided”)  Designed to react to hardware trends  Majority of HPC networks support RDMA  Communication is „one sided” (no involvement of destination)  RMA decouples communication & synchronization  Different from message passing one sided two sided Proc B Proc A Proc A Proc B Communication send Communication put + recv Synchronization sync Synchronization [1] http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf 5
spcl.inf.ethz.ch @spcl_eth P RESENTATION O VERVIEW 1. Overview of three MPI-3 RMA concepts 2. MPI window creation 3. Communication 5. Application evaluation 4. Synchronization 6
spcl.inf.ethz.ch @spcl_eth MPI-3 RMA C OMMUNICATION O VERVIEW Process B (active) Memory Process A (passive) Memory Put Non-atomic communication calls (put, get) MPI window Atomic Get MPI window Process C (active) … Process D (active) … Atomic communication calls (Acc, Get & Acc, CAS, FAO) 7
spcl.inf.ethz.ch @spcl_eth MPI-3 RMA C OMMUNICATION O VERVIEW Process B (active) Memory Process A (passive) Memory Put Non-atomic communication calls (put, get) MPI window Atomic Get MPI window Process C (active) … Process D (active) … Atomic communication calls (Acc, Get & Acc, CAS, FAO) 8
spcl.inf.ethz.ch @spcl_eth MPI-3 RMA C OMMUNICATION O VERVIEW Process B (active) Memory Process A (passive) Memory Put Non-atomic communication calls (put, get) MPI window Atomic Get MPI window Process C (active) … Process D (active) … Atomic communication calls (Acc, Get & Acc, CAS, FAO) 9
spcl.inf.ethz.ch @spcl_eth MPI-3 RMA C OMMUNICATION O VERVIEW Process B (active) Memory Process A (passive) Memory Put Non-atomic communication calls (put, get) MPI window Atomic Get MPI window Process C (active) … Process D (active) … Atomic communication calls (Acc, Get & Acc, CAS, FAO) 10
spcl.inf.ethz.ch @spcl_eth MPI-3 RMA C OMMUNICATION O VERVIEW Process B (active) Memory Process A (passive) Memory Put Non-atomic communication calls (put, get) MPI window Atomic Get MPI window Process C (active) … Process D (active) … Atomic communication calls (Acc, Get & Acc, CAS, FAO) 11
spcl.inf.ethz.ch @spcl_eth MPI-3.0 RMA S YNCHRONIZATION O VERVIEW Active Target Mode Passive Target Mode Active process Passive process Fence Lock Synchroni- zation Communi- cation Post/Start/ Lock All Complete/Wait 12
spcl.inf.ethz.ch @spcl_eth MPI-3.0 RMA S YNCHRONIZATION O VERVIEW Active Target Mode Passive Target Mode Active process Passive process Fence Lock Synchroni- zation Communi- cation Post/Start/ Lock All Complete/Wait 13
spcl.inf.ethz.ch @spcl_eth MPI-3.0 RMA S YNCHRONIZATION O VERVIEW Active Target Mode Passive Target Mode Active process Passive process Fence Lock Synchroni- zation Communi- cation Post/Start/ Lock All Complete/Wait 14
spcl.inf.ethz.ch @spcl_eth MPI-3.0 RMA S YNCHRONIZATION O VERVIEW Active Target Mode Passive Target Mode Active process Passive process Fence Lock Synchroni- zation Communi- cation Post/Start/ Lock All Complete/Wait 15
spcl.inf.ethz.ch @spcl_eth MPI-3.0 RMA S YNCHRONIZATION O VERVIEW Active Target Mode Passive Target Mode Active process Passive process Fence Lock Synchroni- zation Communi- cation Post/Start/ Lock All Complete/Wait 16
spcl.inf.ethz.ch @spcl_eth S CALABLE P ROTOCOLS & R EFERENCE I MPLEMENTATION  Scalable & generic protocols  Can be used on any RDMA network (e.g., OFED/IB) 17
spcl.inf.ethz.ch @spcl_eth S CALABLE P ROTOCOLS & R EFERENCE I MPLEMENTATION  Scalable & generic protocols  Can be used on any RDMA network (e.g., OFED/IB) 18
spcl.inf.ethz.ch @spcl_eth S CALABLE P ROTOCOLS & R EFERENCE I MPLEMENTATION  Scalable & generic protocols  Can be used on any RDMA network (e.g., OFED/IB)  Window creation, communication and synchronization Synchronization Communication Window creation 19
spcl.inf.ethz.ch @spcl_eth S CALABLE P ROTOCOLS & R EFERENCE I MPLEMENTATION  Scalable & generic protocols  Can be used on any RDMA network (e.g., OFED/IB)  Window creation, communication and synchronization  foMPI, a fully functional MPI-3 RMA implementation  DMAPP: lowest-level networking API for Cray Gemini/Aries systems  XPMEM: a portable Linux kernel module http://spcl.inf.ethz.ch/Research/Parallel_Programming/foMPI 20
spcl.inf.ethz.ch @spcl_eth S CALABLE P ROTOCOLS & R EFERENCE I MPLEMENTATION  Scalable & generic protocols  Can be used on any RDMA network (e.g., OFED/IB)  Window creation, communication and synchronization  foMPI, a fully functional MPI-3 RMA implementation  DMAPP: lowest-level networking API for Cray Gemini/Aries systems  XPMEM: a portable Linux kernel module http://spcl.inf.ethz.ch/Research/Parallel_Programming/foMPI 21
spcl.inf.ethz.ch @spcl_eth S CALABLE P ROTOCOLS & R EFERENCE I MPLEMENTATION  Scalable & generic protocols  Can be used on any RDMA network (e.g., OFED/IB)  Window creation, communication and synchronization  foMPI, a fully functional MPI-3 RMA implementation  DMAPP: lowest-level networking API for Cray Gemini/Aries systems  XPMEM: a portable Linux kernel module http://spcl.inf.ethz.ch/Research/Parallel_Programming/foMPI 22
spcl.inf.ethz.ch @spcl_eth P ART 1: S CALABLE W INDOW C REATION Traditional windows Process A Process B Process C Memory Memory Memory 0x123 0x120 0x111 𝑞 = total number Time bound: 𝒫 𝑞 backwards compatible of processes Memory bound: 𝒫 𝑞 (MPI-2) 23
spcl.inf.ethz.ch @spcl_eth P ART 1: S CALABLE W INDOW C REATION Allocated windows Process A Process B Process C Memory Memory Memory 0x123 0x123 0x123 𝑞 = total number Time bound: 𝒫 log 𝑞 (𝑥ℎ𝑞) Allows MPI of processes Memory bound: 𝒫 1 to allocate memory 24
spcl.inf.ethz.ch @spcl_eth P ART 1: S CALABLE W INDOW C REATION Dynamic windows Process A Process B Process C Memory Memory Memory 0x129 0x129 0x123 0x120 0x111 𝑞 = total number Time bound: 𝒫 𝑞 Local attach/detach of processes Memory bound: 𝒫 𝑞 Most flexible 25
spcl.inf.ethz.ch @spcl_eth P ART 2: C OMMUNICATION MPI_Put Remote  Put and Get: process  Direct DMAPP put and get operations or … local (blocking) memcpy (XPMEM) dmapp_put_nbi  Accumulate:  DMAPP atomic operations for 64 bit types  ...or fall back to remote locking protocol  MPI datatype handling with MPITypes library [1]  Fast path for contiguous data transfers of common intrinisic datatypes (e.g., MPI_DOUBLE) MPI_Compare _and_swap Remote process … Contiguous memory dmapp_ acswap_qw_nbi [1] Ross, Latham, Gropp, Lusk, Thakur. Processing MPI datatypes outside MPI. EuroMPI /PVM’09 26
spcl.inf.ethz.ch @spcl_eth P ERFORMANCE I NTER - NODE : L ATENCY Put Inter-Node Get Inter-Node 80% faster 20% faster Half ping-pong Proc 1 Proc 0 put sync memory 27
spcl.inf.ethz.ch @spcl_eth P ERFORMANCE I NTRA - NODE : L ATENCY Put/Get Intra-Node Half ping-pong 3x Proc 0 Proc 1 faster put sync memory 28
spcl.inf.ethz.ch @spcl_eth P ERFORMANCE : O VERLAP Proc 1 Proc 0 put comp. Inter-Node Overlap in % Sync memory Useful for, e.g., scientific codes: AWM-Olsen 3D FFT seismic MILC 29
spcl.inf.ethz.ch @spcl_eth Proc 1 P ERFORMANCE : M ESSAGE R ATE Proc 0 puts ... Sync memory Intra-Node Inter-Node 30
spcl.inf.ethz.ch @spcl_eth P ERFORMANCE : A TOMICS hardware- accelerated protocol: lower latency fall back protocol: higher bandwidth proprietary 64 bit integers 31
spcl.inf.ethz.ch @spcl_eth P ART 3: S YNCHRONIZATION Active Target Mode Passive Target Mode Active process Passive process Fence Lock Synchroni- zation Communi- cation Post/Start/ Lock All Complete/Wait 32
spcl.inf.ethz.ch @spcl_eth S CALABLE F ENCE I MPLEMENTATION  Collective call  Completes all outstanding memory operations Node 0 Node 1 int int MPI_Win_fence(…) { asm( mfence ); Proc 1 Proc 2 Proc 3 Proc 0 dmapp_gsync_wait(); MPI_Barrier(...); return MPI_SUCCESS; return put } put put put put put put 33
Recommend
More recommend