1 / 29
Vhost: Sharing is better
Eyal Moscovici Partly sponsored by: Bandan Das
Vhost: Sharing is better Eyal Moscovici Bandan Das Partly - - PowerPoint PPT Presentation
Vhost: Sharing is better Eyal Moscovici Bandan Das Partly sponsored by: 1 / 29 What's it about ? Paravirtualization: Shared Responsibilities Vhost: How much can we stretch ? Design Ideas: Parallelization Design Ideas:
1 / 29
Eyal Moscovici Partly sponsored by: Bandan Das
2 / 29
3 / 29
– - Standardized backend/frontend drivers
– - Host still has ultimate control
(compared to hardware device assignment)
– - Security, Fault tolerance, SDN, fjle-
based images, replication, snapshots, VM migration
– - Scalability Limitations
4 / 29
– - Let's move things into
– - Better
– - Avoids system calls,
– - And comes with all the
vCPU Vhost worker thread ioeventfd Network Stack irqfd
Read/Write
Virtio bufgers Guest KVM
5 / 29
– - But is it necessary ? – - Can a worker share responsibilities ?
– - Main objective: Scalable performance
6 / 29
CPU0 Guest Guest Guest Guest CPU1 CPU2 CPU3 Vhost-1 Vhost-2 Vhost-3 Vhost-4 Numa-aware scheduling Tx/Rx Tx/Rx Tx/Rx Tx/Rx
7 / 29
– - Scheduling more complicated when number of
8 / 29
Presented by Abel Gordon at KVM Forum 2013
two group: VM cores and I/O cores.
devices from difgerent guest
determines how many.
vhost thread
I/O Core Core N Core 2 Core 1 Core 1 I/O Core I/O VM1 Core N VMi I/O VM2
fine-grained I/O scheduling
Core 2 I/O VM2 I/O VMi
thread-based scheduling
VMj VMi VM1 VCPU1 I/O VM1 I/O VMj I/O VM2 … VM2 VCPU2 I/O VM2 I/O VMi
VM1 VCPU2 VM2 VCPU1
9 / 29
10 / 29
VCPU Thread (Core X)
guest hypervisor
(time)
I/O Thread (Core Y)
hypervisor
I/O notification Guest-to-Host I/O notification Host-to-Guest Process I/O Request Complete I/O Request
VCPU Thread (Core X)
(time)
I/O Thread (Core Y)
I/O notification Guest-to-Host I/O notification Host-to-Guest Process I/O Request Complete I/O Request
Polling Exitless virtual interrupt injection (via ELI) guest hypervisor hypervisor
11 / 29
– - INTEL VT-d Posted-interrupts (PI) which may be
12 / 29
– - Stabilize a next generation vhost design.
– - Introduce a shared vhost design and run benchmarks
–
13 / 29
– - Add a function to search all cgroups
in all hierarchies for the new process.
– - Even a single mismatch => create a
new vhost worker.
– - What happens when a VM process is
migrated to a difgerent cgroup ?
– - Can we optimize the cgroup search ? – - What happens if use polling? – - Rethink cgroups integration ? –
Guest1 Guest1 CG1 CG2 CG3 G1 G2 G3
WG3 WG3 WG3 WG1 WG1 WG1 WG2 WG2 WG2 WG3 WG1 WG3 Per Device Vhost Worker Shared Vhost Worker
14 / 29
– - Yes, but it will require the polling thread to take
– –
15 / 29
sharing!)
– - No cgroups support (at least yet, WIP)
– - Minimal control once work enters the workqueue – - Again, no cgroups support :( – –
16 / 29
– - A little old but signifjcant – - Includes testing for Exit Less Interrupts, Polling
– - Linux Kernel 3.1 – - IBM System x3550 M4, two 8-cores sockets of Intel Xeon E5-2660, 2.2
GHz, 56GB RAM
– and with an Intel x520 dual port 10Gbps – - QEMU 0.14
– - Throughput: Netperf TCP stream w. 64 byte messages – - Latency: Netperf UDP RR
17 / 29
1 2 3 4 5 6 7 2 4 6 8 10
netperf tcp stream
elvis-poll-pi elvis-poll elvis baseline baseline-affinity
# VMs Throughput (Gbps)
1 2 3 4 5 6 7 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00
netperf udp rr
baseline elvis elvis-poll elvis-poll-pi
# vms latency (msec)
18 / 29
1 2 3 4 5 6 7 0.75 0.80 0.85 0.90 0.95 1.00 1.05
netperf udp rr
elvis elvis-poll elvis-poll-pi
# vms relative latency
1 2 3 4 5 6 7 0.8 0.9 1.0 1.1 1.2 1.3 1.4
netperf tcp stream
elvis-poll-pi elvis-poll elvis
# VMs Relative throughput
19 / 29
– - T
– - Point to point network connection – - Netperf TCP throughput (STREAM & MAERTS) – - Netperf TCP Request Response
20 / 29
21 / 29
22 / 29
23 / 29
24 / 29
25 / 29
1 2 3 4 5 6 7 2 4 6 8 10
netperf tcp stream
elvis-poll-pi elvis-poll elvis baseline baseline-affinity
# VMs Throughput (Gbps)
1 2 3 4 5 6 7 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00
netperf udp rr
baseline elvis elvis-poll elvis-poll-pi
# vms latency (msec)
26 / 29
27 / 29
TCP RR)
28 / 29
(memcached)
29 / 29
(apachebench)