Circuit Switched VM Networks for Zero-Copy IO
Johannes Krude, Mirko Stofgers, Klaus Wehrle
https://comsys.rwth-aachen.de/ KBNets18, 2018-08-20
Circuit Switched VM Networks for Zero-Copy IO Johannes Krude, Mirko - - PowerPoint PPT Presentation
Circuit Switched VM Networks for Zero-Copy IO Johannes Krude, Mirko Stofgers, Klaus Wehrle https://comsys.rwth-aachen.de/ KBNets18, 2018-08-20 VM Networks VMs are used for Isolation Isolation complicates Communication Until now:
Johannes Krude, Mirko Stofgers, Klaus Wehrle
https://comsys.rwth-aachen.de/ KBNets18, 2018-08-20
VM Networks
◮ Multiple Tenants on the same Host ◮ Compartmentalization ◮ Fault Isolation
are mutually exclusive
Circuit Switched VM Networks enable Zero-Copy IO with Isolation
2
Krude et al.
VM Networks
◮ Multiple Tenants on the same Host ◮ Compartmentalization ◮ Fault Isolation
are mutually exclusive
Circuit Switched VM Networks enable Zero-Copy IO with Isolation
HTTP Proxy Application Server Database
2
Krude et al.
VM Networks
◮ Multiple Tenants on the same Host ◮ Compartmentalization ◮ Fault Isolation
are mutually exclusive
Circuit Switched VM Networks enable Zero-Copy IO with Isolation
VM1 VM2 HTTP Proxy Application Server Database
2
Krude et al.
VM Networks
◮ Multiple Tenants on the same Host ◮ Compartmentalization ◮ Fault Isolation
are mutually exclusive
Circuit Switched VM Networks enable Zero-Copy IO with Isolation
NIC VM1 VM2 HTTP Proxy Application Server Database NIC
2
Krude et al.
VM Networks
◮ Multiple Tenants on the same Host ◮ Compartmentalization ◮ Fault Isolation
are mutually exclusive
Circuit Switched VM Networks enable Zero-Copy IO with Isolation
NIC VM1 VM2 HTTP Proxy Application Server Database NIC
2
Krude et al.
VM Networks
◮ Multiple Tenants on the same Host ◮ Compartmentalization ◮ Fault Isolation
are mutually exclusive
Circuit Switched VM Networks enable Zero-Copy IO with Isolation
NIC VM1 VM2 HTTP Proxy Application Server Database NIC
2
Krude et al.
VM Packet Processing
Multiplexing Packetization Congestion Control Retransmissions Reordering (Copying)
Goals
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket virtual NIC virtual NIC
RX/TX Buf RX/TX Buf RX/TX Buf
TCP/UDP Stack TCP/UDP Stack Packet Forwarding
3
Krude et al.
VM Packet Processing
Multiplexing Packetization Congestion Control Retransmissions Reordering (Copying)
Goals
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket virtual NIC virtual NIC
RX/TX Buf RX/TX Buf RX/TX Buf
TCP/UDP Stack TCP/UDP Stack Packet Forwarding
3
Krude et al.
VM Packet Processing
◮ Multiplexing ◮ Packetization Congestion Control Retransmissions Reordering (Copying)
Goals
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket virtual NIC virtual NIC
RX/TX Buf RX/TX Buf RX/TX Buf
TCP/UDP Stack TCP/UDP Stack Packet Forwarding
3
Krude et al.
VM Packet Processing
◮ Multiplexing ◮ Packetization ◮ Congestion Control ◮ Retransmissions ◮ Reordering (Copying)
Goals
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket virtual NIC virtual NIC
RX/TX Buf RX/TX Buf RX/TX Buf
TCP/UDP Stack TCP/UDP Stack Packet Forwarding
3
Krude et al.
VM Packet Processing
◮ Multiplexing ◮ Packetization ◮ Congestion Control ◮ Retransmissions ◮ Reordering ◮ (Copying)
Goals
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket virtual NIC virtual NIC
RX/TX Buf RX/TX Buf RX/TX Buf
TCP/UDP Stack TCP/UDP Stack Packet Forwarding
3
Krude et al.
VM Packet Processing
◮ Multiplexing ◮ Packetization ◮ Congestion Control ◮ Retransmissions ◮ Reordering ◮ (Copying)
Goals
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket virtual NIC virtual NIC
RX/TX Buf RX/TX Buf RX/TX Buf
TCP/UDP Stack TCP/UDP Stack Packet Forwarding
3
Krude et al.
Removing Overhead
◮ Move to Host if Still Needed ◮ Remove if Possible
Provides Access to Streams & Datagrams Required to Support Legacy Applications Provides Isolation between Applications
As Optional Extension to Socket API
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket virtual NIC virtual NIC
RX/TX Buf RX/TX Buf RX/TX Buf
TCP/UDP Stack TCP/UDP Stack Packet Forwarding
4
Krude et al.
Removing Overhead
◮ Move to Host if Still Needed ◮ Remove if Possible
Provides Access to Streams & Datagrams Required to Support Legacy Applications Provides Isolation between Applications
As Optional Extension to Socket API
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket TCP/UDP Proxy Stack
4
Krude et al.
Removing Overhead
◮ Move to Host if Still Needed ◮ Remove if Possible
◮ Provides Access to Streams & Datagrams ◮ Required to Support Legacy Applications ◮ Provides Isolation between Applications
As Optional Extension to Socket API
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket TCP/UDP Proxy Stack
4
Krude et al.
Removing Overhead
◮ Move to Host if Still Needed ◮ Remove if Possible
◮ Provides Access to Streams & Datagrams ◮ Required to Support Legacy Applications ◮ Provides Isolation between Applications
◮ As Optional Extension to Socket API
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket TCP/UDP Proxy Stack
4
Krude et al.
Circuit Switched VM Networks
for each Connection
◮ from VM to Proxy Stack ◮ or Direct from VM to VM
Mediates Connection Establishment Enforces Connection Policies
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket TCP/UDP Proxy Stack Circuit Circuit
5
Krude et al.
Circuit Switched VM Networks
for each Connection
◮ from VM to Proxy Stack ◮ or Direct from VM to VM
◮ Mediates Connection Establishment ◮ Enforces Connection Policies
NIC VM1 VM2 HTTP Proxy Socket Application Server Socket Database Socket TCP/UDP Proxy Stack Circuit Circuit Switch Operator
5
Krude et al.
Circuits
Circuit
Ring Bufger A
Ring Bufger B
Control Area: Read & Write Pointers, Flags, … VM1 VM2 Application Server Socket Database Socket
◮ TCP Flow Control: Ring Bufgers ◮ UDP Datagrams: Prepend some kind of Header
Map Circuit Memory into Application Optional Compatible with Legacy Applications 6
Krude et al.
Circuits
Circuit
Ring Bufger A
Ring Bufger B
Control Area: Read & Write Pointers, Flags, … VM1 VM2 Application Server Socket Database Socket
◮ TCP Flow Control: Ring Bufgers ◮ UDP Datagrams: Prepend some kind of Header
◮ Map Circuit Memory into Application ◮ Optional ⇒ Compatible with Legacy Applications 6
Krude et al.
Network Isolation
◮ Keeps Socket Isolation ◮ Even when doing Zero-Copy IO
No Inspection of Individual Packets needed No Redundant State for Stateful Firewalls
Same Level of Access as Containers No Crafting of Malicious Packet Headers No Unfair Congestion Control Algorithms 7
Krude et al.
Network Isolation
◮ Keeps Socket Isolation ◮ Even when doing Zero-Copy IO
◮ No Inspection of Individual Packets needed ◮ No Redundant State for Stateful Firewalls
Same Level of Access as Containers No Crafting of Malicious Packet Headers No Unfair Congestion Control Algorithms 7
Krude et al.
Network Isolation
◮ Keeps Socket Isolation ◮ Even when doing Zero-Copy IO
◮ No Inspection of Individual Packets needed ◮ No Redundant State for Stateful Firewalls
◮ Same Level of Access as Containers ◮ No Crafting of Malicious Packet Headers ◮ No Unfair Congestion Control Algorithms 7
Krude et al.
Implementation & Evaluation
◮ Allows for Shared-Memory between any consenting VM
◮ No VM User-Space Modifjcations Required ◮ Use Regular Linux Sockets for Proxy Stack
NGINX, BIND, Tor, Firefox, Transmission, Quake 3, Mutt, openssh, git, aptitude, wget, …
Minimum Linux VM: 17 % Memory Reduction, 48 MiB to 40 MiB Especially Relevant for Unikernels in high density Deployments
Hardware: Xeon E5-4610 v4 (10 Cores), Intel X710-T4 (10 Gbit) 8
Krude et al.
Implementation & Evaluation
◮ Allows for Shared-Memory between any consenting VM
◮ No VM User-Space Modifjcations Required ◮ Use Regular Linux Sockets for Proxy Stack
◮ NGINX, BIND, Tor, Firefox, Transmission, Quake 3, Mutt, openssh, git, aptitude, wget, …
Minimum Linux VM: 17 % Memory Reduction, 48 MiB to 40 MiB Especially Relevant for Unikernels in high density Deployments
Hardware: Xeon E5-4610 v4 (10 Cores), Intel X710-T4 (10 Gbit) 8
Krude et al.
Implementation & Evaluation
◮ Allows for Shared-Memory between any consenting VM
◮ No VM User-Space Modifjcations Required ◮ Use Regular Linux Sockets for Proxy Stack
◮ NGINX, BIND, Tor, Firefox, Transmission, Quake 3, Mutt, openssh, git, aptitude, wget, …
◮ Minimum Linux VM: 17 % Memory Reduction, 48 MiB to 40 MiB ◮ Especially Relevant for Unikernels in high density Deployments
Hardware: Xeon E5-4610 v4 (10 Cores), Intel X710-T4 (10 Gbit) 8
Krude et al.
Implementation & Evaluation
◮ Allows for Shared-Memory between any consenting VM
◮ No VM User-Space Modifjcations Required ◮ Use Regular Linux Sockets for Proxy Stack
◮ NGINX, BIND, Tor, Firefox, Transmission, Quake 3, Mutt, openssh, git, aptitude, wget, …
◮ Minimum Linux VM: 17 % Memory Reduction, 48 MiB to 40 MiB ◮ Especially Relevant for Unikernels in high density Deployments
◮ Hardware: Xeon E5-4610 v4 (10 Cores), Intel X710-T4 (10 Gbit) 8
Krude et al.
Stream Goodput
2 4 6 8 10 #VMs 32 128 Goodput (Gbit/s) VMs to External Host packet switched circuit + legacy app circuit + zero-copy 95% Confidence
+ Circuit + Proxy Stack + NIC
9
Krude et al.
Stream Goodput
2 4 6 8 10 #VMs 32 128 Goodput (Gbit/s) VMs to External Host 10 20 30 40 50 #VMs 32 128 Goodput (Gbit/s) VMs to Host OS packet switched circuit + legacy app circuit + zero-copy 95% Confidence
+ Circuit + Proxy Stack − NIC
9
Krude et al.
Stream Goodput
2 4 6 8 10 #VMs 32 128 Goodput (Gbit/s) VMs to External Host 10 20 30 40 50 #VMs 32 128 Goodput (Gbit/s) VMs to Host OS 20 40 60 80 100 120 140 #VMs 32 128 Goodput (Gbit/s) VMs to VM packet switched circuit + legacy app circuit + zero-copy 95% Confidence
+ Circuit − Proxy Stack − NIC
9
Krude et al.
Response Times
10-5 10-4 10-3 10-2 Size 1 256 64Ki 16Mi Time (s) Stream Response connect packet switched circuit + legacy app circuit + zero-copy 95% Confidence
+50 µs
10
Krude et al.
Response Times
10-5 10-4 10-3 10-2 Size 1 256 64Ki 16Mi Time (s) Stream Response connect packet switched circuit + legacy app circuit + zero-copy 95% Confidence
+50 µs
10
Krude et al.
Response Times
10-5 10-4 10-3 10-2 Size 1 256 64Ki 16Mi Time (s) Stream Response connect 10-6 10-5 10-4 Size 1 16 256 4Ki Time (s) Datagram Response bind packet switched circuit + legacy app circuit + zero-copy 95% Confidence
+50 µs
+27 µs
10
Krude et al.
Conclusion
NIC VM1 VM2 HTTP Proxy Socket App Server Socket DB Socket TCP/UDP Proxy Stack Circuit Circuit Switch Operator
11
Krude et al.
Conclusion
NIC VM1 VM2 HTTP Proxy Socket App Server Socket DB Socket TCP/UDP Proxy Stack Circuit Circuit Switch Operator
11
Krude et al.
Socket API
socket(PF_INET,SOCK_STREAM) accept() connect(AF_UNSPEC) listen() shutdown(), connect(AF_UNSPEC) recv(), send() accept() connect() Success Failure socket(PF_INET,SOCK_DGRAM) bind(), connect(), send() connect(AF_UNSPEC) recv(), send(), connect()
12
Krude et al.