PASTE: A Network Programming Interface for Non-Volatile Main Memory - PowerPoint PPT Presentation

PASTE: A Network Programming Interface for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Giuseppe Lettieri (Università di Pisa) Lars Eggert and Douglas Santry (NetApp) USENIX NSDI 2018

Review: Memory Hierarchy Slow, block-oriented persistence CPU 5-50 ns Caches Byte access w/ load/store 70 ns Main Memory Block access w/ 100-1000s us HDD / SSD system calls

Review: Memory Hierarchy Fast, byte-addressable persistence CPU 5-50 ns Caches Byte access w/ 70 ns -1000s ns load/store Main Memory Block access w/ 100-1000s us HDD / SSD system calls

Networking is faster than disks/SSDs 1.2KB durable write over TCP/HTTP Cables, NICs, TCP/IP, Syscall, PCIe bus, socket API physical media Client Server SSD 23us 1300us

Networking is slower than NVMM 1.2KB durable write over TCP/HTTP Cables, NICs, TCP/IP, Memcpy, memory bus, socket API physical media Client Server NVMM 23us 2us

Networking is slower than NVMM nevts = epoll_wait(fds) 1.2KB durable write over TCP/HTTP for (i =0; i < nevts; i++) { read(fds[i], buf); ... Cables, NICs, TCP/IP, Memcpy, memory bus, memcpy(nvmm, buf); socket API physical media ... write(fds[i], reply) Client Server NVMM } Client Client

Innovations at both stacks Network stack Storage stack MegaPipe [OSDI’12] NVTree [FAST’15] Seastar NVWal [ASPLOS’16] mTCP [NSDI’14] NOVA [FAST’16] IX [OSDI’14] Decibel [NSDI’17] Stackmap [ATC’16] LSNVMM [ATC’17]

Stacks are isolated Costs of Network stack Storage stack moving data MegaPipe [OSDI’12] NVTree [FAST’15] Seastar NVWal [ASPLOS’16] mTCP [NSDI’14] NOVA [FAST’16] IX [OSDI’14] Decibel [NSDI’17] Stackmap [ATC’16] LSNVMM [ATC’17]

Bridging the gap Network stack Storage stack MegaPipe [OSDI’12] NVTree [FAST’15] Seastar PASTE NVWal [ASPLOS’16] mTCP [NSDI’14] NOVA [FAST’16] IX [OSDI’14] Decibel [NSDI’17] Stackmap [ATC’16] LSNVMM [ATC’17]

PASTE Design Goals ● Durable zero copy ○ DMA to NVMM ● Selective persistence ○ Exploit modern NIC’s DMA to L3 cache ● Persistent data structures ○ Indexed, named packet buffers backed fy a file ● Generality and safety ○ TCP/IP in the kernel and netmap API ● Best practices from modern network stacks ○ Run-to-completion, blocking, busy-polling, batching etc

PASTE in Action Plog /mnt/pm/plog App thread pbuf len off Zero copy user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] cur /mnt/pm 20 27 [ 0 ] 21 26 TCP/IP [ 4 ] Pbufs 22 25 [ 8 ] 23 24 NIC

PASTE in Action ● poll() system call Plog 1. Run NIC I/O and TCP/IP /mnt/pm/plog App thread pbuf len off Zero copy user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] cur /mnt/pm 20 27 [ 0 ] 21 26 TCP/IP [ 4 ] Pbufs 22 25 [ 8 ] 23 24 NIC

PASTE in Action ● poll() system call Plog 1. Run NIC I/O and TCP/IP /mnt/pm/plog ○ Got 6 in-order TCP App thread pbuf len off segments Zero copy user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] cur /mnt/pm 20 27 [ 0 ] 21 26 TCP/IP [ 4 ] Pbufs 22 25 [ 8 ] 23 24 NIC

PASTE in Action ● poll() system call Plog 1. Run NIC I/O and TCP/IP /mnt/pm/plog ○ They are set to Pring App thread pbuf len off slots Zero copy user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] cur /mnt/pm tail 0 27 [ 0 ] 1 6 TCP/IP [ 4 ] Pbufs 2 5 [ 8 ] 3 4 NIC

PASTE in Action ● Return from poll() Plog 1. Run NIC I/O and TCP/IP /mnt/pm/plog App thread pbuf len off Zero copy user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] cur /mnt/pm tail 0 27 [ 0 ] 1 6 TCP/IP [ 4 ] Pbufs 2 5 [ 8 ] 3 4 NIC

PASTE in Action Plog 1. Run NIC I/O and TCP/IP 2. Read data on Pring /mnt/pm/plog App thread pbuf len off Zero copy user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] cur /mnt/pm tail 0 27 [ 0 ] 1 6 TCP/IP [ 4 ] Pbufs 2 5 [ 8 ] 3 4 NIC

PASTE in Action ● flush Pbuf data from Plog 1. Run NIC I/O and TCP/IP 2. Read data on Pring /mnt/pm/plog CPU cache to DIMM App thread 3. Flush Pbuf(s) pbuf len off ○ clflush(opt) instruction Zero copy user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] cur /mnt/pm tail 0 27 [ 0 ] 1 6 TCP/IP [ 4 ] Pbufs 2 5 [ 8 ] 3 4 NIC

PASTE in Action ● Pbuf is persistent Plog 1. Run NIC I/O and TCP/IP 2. Read data on Pring /mnt/pm/plog data representation App thread 3. Flush Pbuf(s) pbuf len off 4. Flush Plog entry(ies) ○ Base address is static 1 96 120 Zero copy i.e., file (/mnt/pm/pp) user ○ Buffers can be kernel Ppool (shared memory) Pring recovered after reboot slot [0] File system /mnt/pm/pp [7] cur /mnt/pm tail 27 0 [ 0 ] 1 6 [ 4 ] TCP/IP Pbufs 2 5 [ 8 ] 3 4 NIC

PASTE in Action ● Prevent the kernel Plog 1. Run NIC I/O and TCP/IP 2. Read data on Pring /mnt/pm/plog from recycling the App thread 3. Flush Pbuf(s) pbuf len off 4. Flush Plog entry(ies) buffer 1 96 120 5. Swap out Pbuf(s) Zero copy user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] cur /mnt/pm tail 0 27 [ 0 ] 8 6 TCP/IP [ 4 ] Pbufs 2 5 [ 8 ] 3 4 NIC

PASTE in Action ● Same for Pbuf 2 and 6 Plog 1. Run NIC I/O and TCP/IP 2. Read data on Pring /mnt/pm/plog App thread 3. Flush Pbuf(s) pbuf len off 4. Flush Plog entry(ies) 1 96 120 5. Swap out Pbuf(s) Zero copy 2 96 768 6 96 987 user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] cur /mnt/pm tail 0 27 [ 0 ] 8 10 TCP/IP [ 4 ] Pbufs 9 5 [ 8 ] 3 4 NIC

PASTE in Action ● Advance cur Plog 1. Run NIC I/O and TCP/IP 2. Read data on Pring /mnt/pm/plog ○ Return buffers in slot App thread 3. Flush Pbuf(s) pbuf len off 4. Flush Plog entry(ies) 0-6 to the kernel at 1 96 120 5. Swap out Pbuf(s) Zero copy 2 96 768 6. Update Pring next poll() 6 96 987 user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] /mnt/pm tail 0 27 [ 0 ] cur 8 10 TCP/IP [ 4 ] Pbufs 9 5 [ 8 ] 3 4 NIC

PASTE in Action Plog 1. Run NIC I/O and TCP/IP 2. Read data on Pring /mnt/pm/plog App thread 3. Flush Pbuf(s) pbuf len off Write-Ahead Logs 4. Flush Plog entry(ies) 1 96 120 5. Swap out Pbuf(s) Zero copy 2 96 768 6. Update Pring 6 96 987 user kernel Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] /mnt/pm tail 0 27 [ 0 ] cur 8 10 TCP/IP [ 4 ] Pbufs 9 5 [ 8 ] 3 4 NIC

PASTE in Action 3 5 Plog /mnt/pm/plog 0 5 7 1. Run NIC I/O and TCP/IP B+tree 2. Read data on Pring App thread 3. Flush Pbuf(s) ( 1 , 96, 120) 4. Flush Plog entry(ies) ( 2 , 96, 987) 5. Swap out Pbuf(s) Zero copy ( 6 , 96, 512) 6. Update Pring user kernel ● We can organize various Ppool (shared memory) Pring slot [0] File system /mnt/pm/pp [7] /mnt/pm data structures in Plog tail 0 27 [ 0 ] cur 8 10 TCP/IP [ 4 ] Pbufs 9 5 [ 8 ] 3 4 NIC

Evaluation 1. How does PASTE outperform existing systems? 2. Is PASTE applicable to existing applications? 3. Is PASTE useful for systems other than file/DB storage?

How does PASTE outperform existing systems? 64B What if we use more complex data structures? 1280B WAL B+tree (all writes)

How does PASTE outperform existing systems? 64B 1280B WAL B+tree (all writes)

Is PASTE applicable to existing applications? ● Redis YCSB (read mostly) YCSB (update heavy)

Is PASTE useful for systems other than DB/file storage? ● Packet logging prior to forwarding ○ Fault-tolerant middlebox [Sigcomm’15] ○ Traffic recording ● Extend mSwitch [SOSR’15] ○ Scalable NFV backend switch

Conclusion ● PASTE is a network programming interface that: ○ Enables durable zero copy to NVMM ○ Helps apps organize persistent data structures on NVMM ○ Lets apps use TCP/IP and be protected ○ Offers high-performance network stack even w/o NVMM https://github.com/luigirizzo/netmap/tree/paste micchie@sfc.wide.ad.jp or @michioh

Multicore Scalability ● WAL throughput

Further Opportunity with Co-designed Stacks ● What if we use higher access latency NVMM? ○ e.g., 3D-Xpoint ● Overlap flushes and processing with clflushopt and mfence before system call (triggers packet I/O) ○ See the paper for results Examine Examine request request clflushopt clflushopt Systemcall mfence Systemcall time Receive new Send requests responses Wait for flushes done

Experiment Setup ● Intel Xeon E5-2640v4 (2.4 Ghz) ● HPE 8GB NVDIMM (NVDIMM-N) ● Intel X540 10 GbE NIC ● Comparison ○ Linux and Stackmap [ATC’15] (current state-of-the art) ○ Fair to use the same kernel TCP/IP implementation

PASTE: A Network Programming Interface for Non-Volatile Main Memory - PowerPoint PPT Presentation

PASTE: A Network Programming Interface for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Giuseppe Lettieri (Universit di Pisa) Lars Eggert and Douglas Santry (NetApp) USENIX NSDI 2018 Review: Memory Hierarchy Slow,

I/O Bus and Interface Data Bus Addr Bus CPU Control Interface Interface Interface Interface

Encrypted Non-volatile Main Memory Systems Yu Hua Huazhong University of Science and Technology

Soft Updates Made Simple and Fast on Non-volatile Memory Mingkai Dong , Haibo Chen Institute of

Object-Oriented Recovery for Non-volatile Memory Nachshon Cohen, David Aksun, James Larus EPFL 10

Silicones for Personal Care Personal Care Product Range Volatile Fluids Dimethicones Phenyl

Rethinking Applications in the NVM Era Amitabha Roy ex- Intel Research NVM = Non Volatile

Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks Shengan Zheng ,

NOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System Jian Andiry Xu, Lu Zhang ,

Architectural Support for Atomic Durability in Non-Volatile Memory Arpit Joshi , Vijay Nagarajan,

A Persistent Friedman Lock-Free Queue Maurice Herlihy for Non-Volatile Memory Virendra

HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory Miao Cai Chance

ASSURE Authentication Scheme for SecURE Energy Efficient Non-Volatile Memories Joydeep Rakshit

A Write-friendly Hashing Scheme for Non-volatile Memory Systems Pengfei Zuo and Yu Hua Huazhong

Managing Non-Volatile Memory in Database Systems A review by Apaar Shanker DATA ANALYTICS

A Persistent Lock- Free Queue for Maurice Herlihy Non-Volatile Virendra Memory (PPoPP18)

Doubling FreeBSD request-response throughputs over TCP with PASTE Michio Honda, Giuseppe Lettieri

Overview of Bode Plots Transfer Function Review Transfer function review x ( t ) H ( s ) y ( t

Lecture 2: Wireless Physical Lecture 2: Wireless Physical Layer, Channel, and Capacity Layer,

Chapter 6: Modifying Sounds Using Loops How sound works: Acoustics, the physics of sound

Notes on Bauds, Levels, Decibels, Bandwidth and Capacity Dr. Vasos Vassiliou BAUD Baud is the

CNG for cleaner cities and road transport Alfredo Martn, January 2005 Fuel evolution in city /

READY FOR WORK Module 4. Slides to Safety www.worksafesask.ca WorkSafe Saskatchewan i

Information theory and coding Image, video and audio compression Markus Kuhn Computer

Tolerating Faults in Disaggregated Datacenters Amanda Carbonari , Ivan Beschastnikh University