No/fied Access: Extending Remote Memory Access Programming - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth R OBERTO ¡B ELLI , ¡T ORSTEN ¡H OEFLER ¡ No/fied ¡Access: ¡Extending ¡Remote ¡Memory ¡Access ¡ ¡ Programming ¡Models ¡for ¡Producer-‑Consumer ¡Synchroniza/on ¡ ¡

spcl.inf.ethz.ch @spcl_eth C OMMUNICATION IN T ODAY ’ S HPC S YSTEMS § The de-facto programming model: MPI-1 § Using send/recv messages and collectives § The de-facto network standard: RDMA § Zero-copy, user-level, os-bypass, fuzz-bang 2

spcl.inf.ethz.ch @spcl_eth P RODUCER -C ONSUMER R ELATIONS § Most important communication idiom § Some examples: § Perfectly supported by MPI-1 Message Passing § But how does this actually work over RDMA? 3

spcl.inf.ethz.ch @spcl_eth MPI-1 M ESSAGE P ASSING – S IMPLE E AGER 4 [1]: T. S. Woodall, G. M. Shipman, G. Bosilca, R. L. Graham, and A. B. Maccabe, “High performance RDMA protocols in HPC.”, EuroMPI’06

spcl.inf.ethz.ch @spcl_eth MPI-1 M ESSAGE P ASSING – S IMPLE E AGER Critical path: 1 latency + 1 copy 7 [1]: T. S. Woodall, G. M. Shipman, G. Bosilca, R. L. Graham, and A. B. Maccabe, “High performance RDMA protocols in HPC.”, EuroMPI’06

spcl.inf.ethz.ch @spcl_eth MPI-1 M ESSAGE P ASSING – S IMPLE R ENDEZVOUS 8 [1]: T. S. Woodall, G. M. Shipman, G. Bosilca, R. L. Graham, and A. B. Maccabe, “High performance RDMA protocols in HPC.”, EuroMPI’06

spcl.inf.ethz.ch @spcl_eth MPI-1 M ESSAGE P ASSING – S IMPLE R ENDEZVOUS Critical path: 3 latencies 13 [1]: T. S. Woodall, G. M. Shipman, G. Bosilca, R. L. Graham, and A. B. Maccabe, “High performance RDMA protocols in HPC.”, EuroMPI’06

spcl.inf.ethz.ch @spcl_eth C OMMUNICATION IN T ODAY ’ S HPC S YSTEMS § The de-facto programming model: MPI-1 § Using send/recv messages and collectives § The de-facto hardware standard: RDMA § Zero-copy, user-level, os-bypass, fuzz bang 14 http://www.hpcwire.com/2006/08/18/a_critique_of_rdma-1/

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS P ROGRAMMING § Why not use these RDMA features more directly? § A global address space may simplify programming § … and accelerate communication § … and there could be a widely accepted standard § MPI-3 RMA (“MPI One Sided”) was born § Just one among many others (UPC, CAF, … ) § Designed to react to hardware trends, learn from others § Direct (hardware-supported) remote access § New way of thinking for programmers 15 [1] http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf

spcl.inf.ethz.ch @spcl_eth MPI-3 RMA S UMMARY § MPI-3 updates RMA (“MPI One Sided”) § Significant change from MPI-2 § Communication is „one sided” (no involvement of destination) § Utilize direct memory access § RMA decouples communication & synchronization § Fundamentally different from message passing one sided two sided Proc B Proc A Proc A Proc B Communication send Communication put + recv Synchronization sync Synchronization 16 [1] http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf

spcl.inf.ethz.ch @spcl_eth MPI-3 RMA C OMMUNICATION O VERVIEW Process B (active) Memory Process A (passive) Memory Put Non-atomic communication calls (put, get) MPI window Atomic Get MPI window Process C (active) … Process D (active) … Atomic communication calls (Acc, Get & Acc, CAS, FAO) 17

spcl.inf.ethz.ch @spcl_eth MPI-3 RMA S YNCHRONIZATION O VERVIEW Active Target Mode Passive Target Mode Active process Passive process Fence Lock Synchroni- zation Communi- cation Post/Start/ Lock All Complete/Wait 22

spcl.inf.ethz.ch @spcl_eth I N CASE YOU WANT TO LEARN MORE MPI-3 RMA S YNCHRONIZATION O VERVIEW Active Target Mode Passive Target Mode Active process Passive process Fence Lock Synchroni- How to implement producer/consumer in passive mode? zation Communi- cation Post/Start/ Lock All Complete/Wait 27

spcl.inf.ethz.ch @spcl_eth O NE S IDED – P UT + S YNCHRONIZATION 28

spcl.inf.ethz.ch @spcl_eth O NE S IDED – P UT + S YNCHRONIZATION Critical path: 3 latencies 31

spcl.inf.ethz.ch @spcl_eth C OMPARING A PPROACHES Message Passing One Sided 1 latency + copy / 3 latencies 3 latencies 32

spcl.inf.ethz.ch @spcl_eth I DEA : RMA N OTIFICATIONS § First seen in Split-C (1992) § Combine communication and synchronization using RDMA § RDMA networks can provide various notifications § Flags § Counters § Event Queues 33

spcl.inf.ethz.ch @spcl_eth C OMPARING A PPROACHES Message Passing One Sided Notified Access 1 latency + copy / 3 latencies 1 latency 3 latencies 34

spcl.inf.ethz.ch @spcl_eth C OMPARING A PPROACHES But how to notify? Message Passing One Sided Notified Access 1 latency + copy / 3 latencies 1 latency 3 latencies 35

spcl.inf.ethz.ch @spcl_eth P REVIOUS WORK : O VERWRITING I NTERFACE § Flags (polling at the remote side) § Used in GASPI, DMAPP, NEON § Disadvantages § Location of the flag chosen at the sender side § Consumer needs at least one flag for every process § Polling a high number of flags is inefficient 36

spcl.inf.ethz.ch @spcl_eth P REVIOUS WORK : C OUNTING I NTERFACE § Atomic counters (accumulate notifications → scalable) § Used in Split-C, LAPI, SHMEM - Counting Puts, … § Disadvantages § Dataflow applications may require many counters § High polling overhead to identify accesses § Does not preserve order (may not be linearizable) 37

No/fied Access: Extending Remote Memory Access Programming - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth R OBERTO B ELLI , T ORSTEN H OEFLER No/fied Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchroniza/on

DTCP + Remote Access Proposal for Discussion with 3S October 28, 2009 1 Remote Access (RA)

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Remote Access Plus Enterprise Remote Access Sofuware - Product Overview Covers all your

Memory Management Memory Manager Requirements Minimize primary memory access time

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Remote Files Traditional Memory Interfaces Process Primary Memory Interface Secondary Memory

COLLARTS SOURCING REMOTE INTERNSHIPS WHAT IS A REMOTE INTERNSHIP? COLLARTS REMOTE INTERNSHIPS

Extending ns Extending ns In OTcl In C++ Debugging Padma Haldar USC/ISI 1 2 ns

Extending CSP with tests for availability Gavin Lowe Extending CSP with tests for availability

Distributed File Systems 14A. Remote Data Access: Architectures Operating Systems Principles

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Systems Design & Programming CMPE 310 Memory Address Decoding The processor can

Distributed Objects and Remote Invocation Programming Models for Distributed Applications

opencypher.org | opencypher@googlegroups.com opencypher.org |

Parallel Programming Libraries and implementations Funding Partners bioexcel.eu Reusing this

Parallel Programming Libraries and Implementations Reusing this material This work is licensed

Int Introductio ion t n to Deep Deep Lea earn rning Prof. Leal-Taix and Prof. Niessner 1

Examining Racial and Gender Wealth Inequity: How Public Policy Promotes and Prevents Shared

Lecture 5: Part-of-Speech Tagging Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Talk the Second Matthew Turk Questions Collaborations Future directions yt

Outline Why RDF (in general)? Why RDF as a universal healthcare exchange language? 2

Sambuz

Useful Links

Newsletter

Mail Us

No/fied Access: Extending Remote Memory Access Programming - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth R OBERTO B ELLI , T ORSTEN H OEFLER No/fied Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchroniza/on

DTCP + Remote Access Proposal for Discussion with 3S October 28, 2009 1 Remote Access (RA)

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Remote Access Plus Enterprise Remote Access Sofuware - Product Overview Covers all your

Memory Management Memory Manager Requirements Minimize primary memory access time

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Remote Files Traditional Memory Interfaces Process Primary Memory Interface Secondary Memory

COLLARTS SOURCING REMOTE INTERNSHIPS WHAT IS A REMOTE INTERNSHIP? COLLARTS REMOTE INTERNSHIPS

Extending ns Extending ns In OTcl In C++ Debugging Padma Haldar USC/ISI 1 2 ns

Extending CSP with tests for availability Gavin Lowe Extending CSP with tests for availability

Distributed File Systems 14A. Remote Data Access: Architectures Operating Systems Principles

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Systems Design &amp; Programming CMPE 310 Memory Address Decoding The processor can

Distributed Objects and Remote Invocation Programming Models for Distributed Applications

opencypher.org | opencypher@googlegroups.com opencypher.org |

Parallel Programming Libraries and implementations Funding Partners bioexcel.eu Reusing this

Parallel Programming Libraries and Implementations Reusing this material This work is licensed

Int Introductio ion t n to Deep Deep Lea earn rning Prof. Leal-Taix and Prof. Niessner 1

Examining Racial and Gender Wealth Inequity: How Public Policy Promotes and Prevents Shared

Lecture 5: Part-of-Speech Tagging Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Talk the Second Matthew Turk Questions Collaborations Future directions yt

Outline Why RDF (in general)? Why RDF as a universal healthcare exchange language? 2

Sambuz

Useful Links

Newsletter

Mail Us

Memory Systems Design & Programming CMPE 310 Memory Address Decoding The processor can