Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory - - PowerPoint PPT Presentation
Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory - - PowerPoint PPT Presentation
Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory Philipp Fent, Alexander van Renen, Andreas Kipf, Technische Universitt Mnchen, April 23, 2020 Viktor Leis , Thomas Neumann, Alfons Kemper
Communication Performance
DBMS Data serialization format Network & transport protocol Physical interconnect Client Application Data serialization format Network & transport protocol Physical interconnect
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 2 / 11
Communication Performance
DBMS Data serialization format Network & transport protocol Physical interconnect Client Application Data serialization format Network & transport protocol Physical interconnect ODBC
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 2 / 11
Communication Performance
DBMS Data serialization format Network & transport protocol Physical interconnect Client Application Data serialization format Network & transport protocol Physical interconnect ODBC
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 2 / 11
Communication Performance
Transactions / second
In-process Domain TCP
20 K 40 K 60 K
58
Linux Sockets
2.7 1.5 (a) 1 Thread In-process Domain TCP
100 K 200 K 300 K
344
Linux Sockets
16 8.9 (b) 8 Threads
Figure: TPC-C throughput using Silo
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 3 / 11
Communication Performance
Transactions / second
In-process Domain TCP
20 K 40 K 60 K
58
Linux Sockets
2.7 1.5 (a) 1 Thread In-process Domain TCP
100 K 200 K 300 K
344
Linux Sockets
16 8.9 (b) 8 Threads
Figure: TPC-C throughput using Silo
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 3 / 11
Understanding the Bottleneck
- Misconception: Network is slow
Twofold actual bottleneck:
TCP Through kernel communication
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 4 / 11
Understanding the Bottleneck
- Misconception: Network is slow
- Twofold actual bottleneck:
- TCP
Through kernel communication
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 4 / 11
Understanding the Bottleneck
- Misconception: Network is slow
- Twofold actual bottleneck:
- TCP
- Through kernel communication
DBMS
write()
Kernel
read()
Client
syscall > 10 k cycles syscall > 10 k cycles
Figure: Kernel based communication
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 4 / 11
Understanding the Bottleneck
- Misconception: Network is slow
- Twofold actual bottleneck:
- TCP
- Through kernel communication
DBMS
1× mmap() 1× mmap()
Client Message bufger Kernel
memcpy() ≈ 100 cycles memcpy() ≈ 100 cycles
Figure: Direct memory access
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 4 / 11
Low-Latency Communication Using Shared Memory
- Co-hosted on the same machine
- Latency similar to embedded DBs, e.g. SQLite
- Ideal interconnect for container / Docker environment
Bootstrapped via Domain Sockets
Pass message bufger via cmsg ancillary data
Ringbufger with polling to transfer serialized data
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 5 / 11
Low-Latency Communication Using Shared Memory
- Co-hosted on the same machine
- Latency similar to embedded DBs, e.g. SQLite
- Ideal interconnect for container / Docker environment
- Bootstrapped via Domain Sockets
- Pass message bufger via cmsg ancillary data
- Ringbufger with polling to transfer serialized data
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 5 / 11
Low-Latency Communication Using Shared Memory
Available bandwidth depends on transmission parameters
4 kB 32 kB 256 kB 2 MB 16 MB 128 MB 1 GB Size of transmitted chunks [log] 4 kB 32 kB 256 kB 2 MB 16 MB 128 MB 1 GB Size of transmission buffer [log]
2 . 1 4 . 6 2 . 5 4 . 9 4 . 8 2 . 8 4 . 9 5 . 1 5 . 0 3 . 1 4 . 9 5 . 0 5 . 1 5 . 1 3 . 2 4 . 9 5 . 1 5 . 2 5 . 1 5 . 1 3 . 2 5 . 1 5 . 2 5 . 2 5 . 1 5 . 1 5 . 2 1 . 8 5 . 1 5 . 1 5 . 2 5 . 2 5 . 2 5 . 3 5 . 1 2 . 3 5 . 0 5 . 2 5 . 3 5 . 2 5 . 25.3 5 . 1 4 . 6 2 . 5 5 . 0 5 . 2 5 . 2 5 . 2 5 . 2 5 . 3 5 . 1 5 . 0 5 . 0 2 . 8 5 . 0 5 . 1 5 . 2 5 . 1 5 . 0 5 . 1 5 . 3 5 . 0 5 . 0 5 . 0 2 . 8 4 . 9 5 . 0 5 . 1 5 . 1 5 . 1 5 . 2 5 . 2 5 . 0 4 . 9 4 . 9 5 . 0 2 . 9 4 . 4 4 . 7 4 . 8 4 . 9 4 . 9 4 . 9 4 . 9 4 . 7 4 . 6 4 . 6 4 . 6 4 . 7 2 . 4 4 . 4 4 . 7 4 . 8 4 . 8 4 . 8 4 . 9 4 . 9 4 . 6 4 . 5 4 . 6 4 . 6 4 . 5 4 . 4 1 . 6 4 . 4 4 . 6 4 . 8 4 . 8 4 . 8 4 . 8 4 . 9 4 . 6 4 . 6 4 . 5 4 . 5 4 . 6 4 . 5 2 . 9 1 . 5 4 . 4 4 . 6 4 . 8 4 . 7 4 . 8 4 . 8 4 . 8 4 . 6 4 . 6 4 . 6 4 . 5 4 . 6 4 . 5 3 . 0 3 . 0 1 . 5 4 . 4 4 . 6 4 . 7 4 . 7 4 . 7 4 . 8 4 . 8 4 . 6 4 . 5 4 . 6 4 . 5 4 . 6 4 . 5 3 . 0 2 . 9 2 . 9 1 . 5 4 . 4 4 . 6 4 . 7 4 . 7 4 . 8 4 . 8 4 . 8 4 . 6 4 . 5 4 . 6 4 . 5 4 . 6 4 . 5 3 . 0 3 . 0 3 . 0 3 . 0 1 . 6 4 . 3 4 . 6 4 . 7 4 . 7 4 . 8 4 . 8 4 . 8 4 . 6 4 . 5 4 . 6 4 . 5 4 . 6 4 . 5 3 . 0 3 . 0 3 . 0 3 . 0 2 . 9 1 . 6
L2 cache L3 cache 2 GB/s 3 GB/s 4 GB/s 5 GB/s
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 6 / 11
Low-Latency Communication Using RDMA
- Co-located in the same datacenter
- Bootstrapped via regular TCP/IP
- Similar ringbufger as with Shared Memory
RDMA intricacies:
- Sync. Throughput [msgs s]
1 4 16 64 256 200 K 400 K 600 K
Cache line
70
Size of message [Byte]
Write + Polling Two Writes Write + Immediate Send + Receive
1 4 16 64 256 200 K 400 K 600 K
Cache line
Size of message [Byte]
Two Writes Chained Writes Immediate Data
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 7 / 11
Low-Latency Communication Using RDMA
- Co-located in the same datacenter
- Bootstrapped via regular TCP/IP
- Similar ringbufger as with Shared Memory
- RDMA intricacies:
- Sync. Throughput [msgs/s]
1 4 16 64 256 200 K 400 K 600 K
Cache line
+70%
Size of message [Byte]
Write + Polling Two Writes Write + Immediate Send + Receive
1 4 16 64 256 200 K 400 K 600 K
Cache line
Size of message [Byte]
Two Writes Chained Writes Immediate Data
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 7 / 11
Low-Latency Communication Using RDMA
- Co-located in the same datacenter
- Bootstrapped via regular TCP/IP
- Similar ringbufger as with Shared Memory
- RDMA intricacies:
- Sync. Throughput [msgs/s]
1 4 16 64 256 200 K 400 K 600 K
Cache line
+70%
Size of message [Byte]
Write + Polling Two Writes Write + Immediate Send + Receive
1 4 16 64 256 200 K 400 K 600 K
Cache line
Size of message [Byte]
Two Writes Chained Writes Immediate Data
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 7 / 11
Low-Latency Communication Using RDMA
- Asymmetric connections
- Many message bufgers → random accesses for polling
Cache effjcient mailbox polling
Two writes to separate memory regions
Scales up to the limit of RDMA’s reliable connections Mailbox effjcient polling X X Message Bufger [28] SELECT a FROM r WHERE x = 28 [30] SELECT e FROM r WHERE x = 81 in-fmight written after message
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 8 / 11
Low-Latency Communication Using RDMA
- Asymmetric connections
- Many message bufgers → random accesses for polling
- Cache effjcient mailbox polling
- Two writes to separate memory regions
Scales up to the limit of RDMA’s reliable connections Mailbox effjcient polling X X Message Bufger [28] SELECT a FROM r WHERE x = 28 [30] SELECT e FROM r WHERE x = 81 in-fmight written after message
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 8 / 11
Low-Latency Communication Using RDMA
- Asymmetric connections
- Many message bufgers → random accesses for polling
- Cache effjcient mailbox polling
- Two writes to separate memory regions
- Scales up to the limit of RDMA’s reliable connections
Mailbox effjcient polling X X Message Bufger [28] SELECT a FROM r WHERE x = 28 [30] SELECT e FROM r WHERE x = 81 in-fmight written after message
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 8 / 11
Results
Local YCSB-C [sync. tx/s] TCP SHM NP DS RDMA Silo + L5 50.5 K 685 K — 72.1 K 364 K DBMS X 7.56 K 11.5 K 11.5 K — — MySQL 10.0 K 45.9 K 27.6 K — — SQLite — 378 K — — — Remote YCSB-C [sync. tx/s] 1 G Eth 56 G IB RDMA Silo + L5 15 K 27 K 302 K DBMS X 3 1 K 3 7 K — MySQL 7 1 K 8 0 K — PostgreSQL 6 3 K 7 5 K —
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 9 / 11
Results
Local YCSB-C [sync. tx/s] TCP SHM NP DS RDMA Silo + L5 50.5 K 685 K — 72.1 K 364 K DBMS X 7.56 K 11.5 K 11.5 K — — MySQL 10.0 K 45.9 K 27.6 K — — SQLite — 378 K — — — Remote YCSB-C [sync. tx/s] 1 G Eth 56 G IB RDMA Silo + L5 15 K 27 K 302 K DBMS X 3.1 K 3.7 K — MySQL 7.1 K 8.0 K — PostgreSQL 6.3 K 7.5 K —
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 9 / 11
Results – Scale Out
1 5 10 15 20 1 M 2 M Number of clients
- Sync. Throughput [tx/s]
RDMA 4 Servers RDMA 2 Servers RDMA 1 Server TCP
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 10 / 11
Conclusion
- L5 – Low-Level, Low-Latency Library
https://github.com/pfent/L5RDMA
- Shared Memory and RDMA bring OLTP performance to clients
- fent@in.tum.de
Transactions / second
In-process Domain TCP SHM RDMA
20 K 40 K 60 K
58
Linux Sockets
2.7 1.5
L5
32 14 (a) 1 Thread In-process Domain TCP SHM RDMA
100 K 200 K 300 K
344
Linux Sockets
16 8.9
L5
166 75 (b) 8 Threads
Philipp Fent et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 11 / 11