SMB3 Extensions for Low Latency
Tom Talpey Microsoft May 12, 2016
SMB3 Extensions for Low Latency Tom Talpey Microsoft May 12, 2016 - - PowerPoint PPT Presentation
SMB3 Extensions for Low Latency Tom Talpey Microsoft May 12, 2016 Problem Statement Storage Class Memory A new, disruptive class of storage Nonvolatile medium with RAM-like performance Low latency, high throughput, high
Tom Talpey Microsoft May 12, 2016
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 2
5000x change over 15 years!
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 3
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
1 10 100 1,000 10,000 100,000 1,000,000 10,000,000
HDD SSD SCM DRAM
>500x Reduction in Latency, >500x more IOPs
(for replication)
50x Reduction in Latency, 1000x more IOPs
Never use async Always use async uSec 2 GHz Cycles 40 1000 200K 1M
4
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 5
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 6
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 7
and optimized)
scatter/gather list
server’s memory
preallocate to client
workloads
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
RDMA Read (with local invalidate) Send (with invalidate) Send DATA RDMA Write DATA Send (with invalidate) Send
READ WRITE
Client Server
Register (Register) Register
8
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 9
durability is guaranteed
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
RDMA Read Send Send DATA RDMA Write DATA Send Send
Remote Direct Access
Unregister Register RDMA Write DATA RDMA Commit (new)
Push Pull
10
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 11
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 12
make it to NVM before power dies. The integrated memory controller (iMC) is currently inside of the ADR Domain.
during an ADR event
when the target write buffer has WB attribute
by cache
Core Caches
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
CPU
ADR Domain DRAM/NVDIMM
PCI Func PCI Func
RNIC
Allocating Write Transactions PCI DMA Write Flow RNIC RDMA Write Flow Allocating Write Flow CPU Write Flow PCI DMA Read Flow RNIC RDMA Read Flow Allocating Read Flow Non-Allocating Write Flow Non-Allocating Read Flow CPU Read Flow
PCI Root Port
IIO iMC
L L C CORE CORE CORE CORE
Credit: Intel
13
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
CPU
IIO
PCI Root Port RNIC Internal BUFFERS
Non-Allocating Write Transactions
L L C CORE CORE CORE CORE
RNIC RDMA Read Flow RNIC RDMA Write Flow Write Data forced to persistence by ADR Flow
iMC
NVM ADR Domain
RDMA Write Data forced to ADR Domain by RDMA Read Flow
CPU
IIO
PCI Root Port RNIC Internal BUFFERS
Allocating Write Transactions
L L C CORE CORE CORE CORE
RNIC RDMA Send/Receive Flow RNIC RDMA Write Flow Send/Receive Callback CLFLUSHOPT/SFENCE Flow
iMC
NVM
RDMA Write Data forced to iMC by Send/Receive Flow Send/Receive Callback PCOMMIT/SFENCE Flow
Credit: Intel
14
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 15
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 16
completion). Requires creating a “Write Ack”.
queue)
existing RDMA Write semantic (minimizing RNIC implementation change)
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 17
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 18
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 19
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 20
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 21
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 22
adopt this to take advantage of PMEM-capable systems as they appear
ristiansen_SCM_in_Windows_NVM_Summit.pdf
Persistent-Memory-in-Linux.pdf
23 May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
SMB2_READ)
to PMEM file
signaling, work requests
24
RDMA Read (with local invalidate) Send (with invalidate) Send DATA RDMA Write DATA Send (with invalidate) Send
READ WRITE
Client Server
Register (Register) Register
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
25
RDMA Read Send Send DATA RDMA Write DATA Send Send
Remote Direct Access
Unregister Register RDMA Write DATA RDMA Commit (new)
Push Pull
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
26 May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
27 May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
28 May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
29 May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
30 May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
“buffered” mode
way (both Linux and Windows)
hint to do this
DAX is detected
mapping and direct RDMA reading/writing of PMEM- resident file
read/write, and RDMA Commit
31
Server RDMA NIC SMB3 RDMA Push “Buffer Cache”
RDMA R/W Load/Store
DAX Filesystem PMEM
I/O requests
Direct file mapping
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
Server “upcalls” may originate within and without filesystem
32 May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
33 May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
34 May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
Storage (iSCSI)
discussion
TBD
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin
https://datatracker.ietf.org/doc/draft-talpey-rdma-commit/
35
May 12, 2016 Tom Talpey - Microsoft - SambaXP 2016 Berlin 36