Shawn Hall Hybrid RDMA RDMA/SR mix for data, SR otherwise Client - PowerPoint PPT Presentation

Shawn Hall

Hybrid RDMA

RDMA/SR mix for data, SR otherwise Client side events Completion of sending request messages – polling Completion of incoming reply and control messages – interrupts Server side events Since it’s dedicated - polling

For small messages, memory (de)registration cost > zero-copy benefit Data-piggybacked SR w/ pre-registered buffers Client caches location of preallocated/ preregistered Fast RDMA buffers on I/O server RDMA Write with Immediate data Large transfers split into smaller ones Client/server communication and disk I/O pipelined

Internal Buffer Credit-Based Flow Control Preallocated/prepinned buffers per connection Server RDMA Buffer Management Most I/O server memory allocated as RDMA buffers Buffers are grouped by size into “zones” Try to fit into contiguous buffer, otherwise split transfer Client RDMA Buffer Management Dynamic (de)registration required for clients Pin-down cache delays deregistration, caches info Pin-down not useful for I/O intensive applications

Fast Memory Registration and Deregistration Uses pin-down cache and batched deregistration

% of pin-down cache hits

Chunk List – Multidimensional lists that store the locations of multiple buffers RPC Long Call – Long RPCs are broken into chunks First message contains chunk list of other messages NFS Write Client Server

NFS Readdir and Readlink – similar to NFS Read NFS Read Read-Read Design Read-Write Design Client Server Client Server

Server buffers exposed to client RDMA Server resources not freed until client sends RDMA_DONE Synchronous RDMA read causes latency Number of concurrent RDMA reads is limited

RPC long replies and NFS READ can come directly from server Client cannot initiate RDMA and try to access other buffers, so more secure Mellanox HCA can issue many RDMA write operations in parallel No waiting for RDMA_DONE Fewer server interrupts

Fast Memory Registration – registration steps that involve communication with the HCA are done at initialization rather than dynamically Buffer Registration Cache No information about server buffers exposed Physical Memory Registration – avoids virtual to physical address translation Translation also does not need to be sent to HCA

RDMA_DONE elimination RDMA Write parallelism

No local scatter/gather, so more RDMA reads. Simultaneous reads are capped though, so decreased parallelism.

Server memory saturates.

Applies only to MPI Not portable. Portable and transparent to implementation. MPI stacks and applications

FUSE – software that allows to create a user level virtual file system. Berkeley Lab Checkpoint/Restart (BLCR) – writes a process image to a file for later restart. MPI Checkpointing Mechanisms – offered by MVAPICH2, MPICH2, OpenMPI. MPI library flushes communication channel BLCR library used to dump memory snapshot BLCR library used to restart job if needed

VFS Cache Needs Work Efficient Sequential Writes

File Open – caught by FUSE, CRFS inserts/increments value in hash table, passes call to underlying file system File Close – buffer pool flushed into work queue, blocked until operations complete File Sync – complete all writes on file, pass fsync() to underlying file system Other File Operations – passed to file system

File Write Data copied from file into chunk in buffer pool until chunk is full Chunk enqueued into work queue This triggers an I/O thread to wake up and write chunk Number of I/O threads limited to prevent contention

Shawn Hall Hybrid RDMA RDMA/SR mix for data, SR otherwise Client - PowerPoint PPT Presentation

Shawn Hall Hybrid RDMA RDMA/SR mix for data, SR otherwise Client side events Completion of sending request messages polling Completion of incoming reply and control messages interrupts Server side events Since its dedicated -

ACI Mix Design Updated Version CIVL 3137 1 ACI Mix Design So-called mix design methods

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

1 Wils Wils lsdorf lsdorf dorf Hall dorf Hall Hall Hall Wils Wils lsdorf lsdorf dorf

Warm Mix Asphalt Warm Mix Asphalt (WMA 101) (WMA 101) What Is Warm Mix Asphalt ? What Is Warm

Mix-Nets Lecture 19 Some tools for electronic-voting (and other things) Mix-Nets Mix-Nets

Shawn Foreman By Shawn Foreman To Begin... My name is Shawn Foreman.I am currently a student at

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Design Guidelines for High Performance RDMA Systems Anuj Kalia (CMU) Michael Kaminsky (Intel

the kernel bypass with RDMA! Using the RDMA infrastructure for performance while retaining kernel

RoGUE: RDMA over Generic Unconverged Ethernet Yanfang Le with Brent Stephens, Arjun Singhvi,

Performance Isolation Anomalies in RDMA Yiwen Zhang with Juncheng Gu, Youngmoon Lee, Mosharaf

NFS over RDMA Brent Callaghan, Theresa Lingutla-Raj, Alex Chiu, Peter Staubach, Omer Asad Sun

Performance of RDMA-Capable Storage Performance of RDMA-Capable Storage Protocols on Wide-Area

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Linearly-Homomorphic Signatures and Scalable Mix-Nets Chlo Hbant, Duong Hieu Phan and David

Input and Output and Arrays Robots Learning to Program with Java Byron Weber Becker chapter 9

STATUS OF FINAL PRODUCT WILL SHOW SAMPLE RESULTS FROM DIFFERENT SIMULATIONS PREVIOUS WHOLE DAY

Uncompressed high quality video over IP Ladan Gharai USC/ISI Uncompressed high quality video

Concepts of programming languages Lecture 4 Wouter Swierstra Faculty of Science Information and

Better Lemmas with Lambda Extraction Mathias Preiner, Aina Niemetz and Armin Biere Institute for

Embedded Systems Task Scheduling Algorithms and Deterministic Behavior Prepared By Khalil

Scheduling Algorithms of deciding which process in ready queue should be allocated to CPU.

Session 2 Algorithms to Manage Operating System Abstractions Sbastien Combfis Winter 2020