 
              Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, Andrew Warfield University of Cambridge Computer Laboratory, SOSP 2003 Presenter: Dhirendra Singh Kholia
Outline • What is Xen? • Xen: Goals, Challenges and Approach • Detailed Design • Benchmarks (skip?) • Xen Today • Conclusion • Discussion
What is Xen? • Xen is a virtual machine monitor (VMM) for x86, x86-64, Itanium and PowerPC architectures. Xen can securely execute multiple virtual machines, each running its own OS, on a single physical system with close-to-native performance. • It is a Type-1 ( native , bare-metal ) hypervisor. It runs directly on the host's hardware as a hardware control and guest operating system monitor.
Xen Goals • Performance isolation between guests (resource control for some guarantee of QoS) • Minimal performance overhead • Support for different Operating Systems. • Maintain Guest OS ABI (thus allowing existing applications to run unmodified) • Need to support full multi-application operating systems.
x86 CPU virtualization • x86 : most successful architecture ever! • Easy: Has built-in privilege levels/protection rings ( Ring 0, Ring 1, Ring 2 , Ring 3). Ring 1 and Ring 2 are unused • Hard: – VMM needs to run on highest privilege level (Ring 0) to provide isolation, resource scheduling and performance BUT Guest Kernels too are designed to run in Ring 0 - Running certain sensitive instructions (aka non- virtualizable instructions) without sufficient permissions causes silent failures instead of generating a “convenient” trap (GPF) to VMM. Thus, a VMM will never get an opportunity to simulate the effect of the instruction Source: Ring Diagrams: http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection
x86 CPU virtualization approaches 1 • Full Virtualization (VMware Workstation, presents Virtual resources) – Doesn’t require Guest OS modifications – Uses “binary translation”: A technique to dynamically rewrite Guest OS Kernel code in order to catch non-trapping privileged instructions. – Relatively lower performance (translation overhead, page table sync. and update overhead) – Time Synchronization can be problematic (lost ticks, backlog truncation) frequently requiring a Guest Tool to maintain synchronization.
x86 CPU virtualization approaches 2 • Paravirtualization (Xen, presents Virtual + Real resources) – Requires modifications to Guest OS’s Kernel. – Improved performance (due to exposure of real hardware, one time guest modification) – Exposing real time allows correct handling of time-critical stuff like TCP timeouts and RTT estimates. • Hardware Assisted Virtualization – Conceptually it can be understood as adding Ring -1 above Ring 0 in which hypervisor executes and can trap and emulate privileged instructions – Allows for a much cleaner implementation of full virtualization.
Full Virtualization vs. Paravirtualization Ring 3 Control User User Applications Plane Apps Ring 2 Guest OS Ring 1 Guest OS Dom0 VMM Binary Ring 0 Xen Translation Full Virtualization Paravirtualization http://www.cs.uiuc.edu/homes/kingst/spring2007/cs598stk/slides/20070201-kelm-thompson-xen.ppt
Cost of Porting/Paravirtualizing an OS • x86 dependant (Privileged instructions + Page table access) • Virtual Network driver, Virtual Block device driver • Xen Code (schedulers, hypercall implementation etc) • For Linux 2.4, < 1.5% (around 3000 lines) of x86 code base size modified/added. • How much modification of Guest OS is too much? Is several thousand lines of code per operating system actually minimal effort? - Considering Linux Kernel is around 11.5 million lines of code (Source: Linux Foundation, August 2009), I think few thousand lines of code is minimal.
Paravirtualization: Xen’s approach 1 • Xen runs in Ring 0, modified Guest Kernel runs in Ring 1 and Guest Applications run unmodified in Ring 3 (hence Guest OS remains protected) • Guest OS Kernel must be modified to use a special hypercall ABI instead of executing privileged and sensitive instructions directly. A hypercall (0x82) is a software trap from a domain to the hypervisor, just as a syscall (0x80) is a software trap from user space to the kernel. e.g. When the system is idle, Linux issues HLT instruction which requires Ring 0 privilege to execute. In XenoLinux this is replaced by a hypercall which transfer control to Xen Ring 0 from Ring 1.
Paravirtualization: Xen’s approach 2 • Xen is mapped to top 64MB (for x86) of every OS’s address space. This is done to save a TLB flush when going from Ring 1 to Ring 0 (VMM). Xen itself is protected by segmentation. • Trap/Exception (System call, page-fault) handlers are registered with Xen for validation. • Guest OS may install a “fast” exception handler for system calls, allowing direct calls from an application into its guest OS and avoiding indirecting through Xen on every call.
Paravirtualization: Xen’s approach Source: http://www.linuxjournal.com/article/8540
Control Transfer: Hypercalls and Events • Events for notification from Xen to guest OS – E.g. data arrival on network; virtual disk transfer complete • Events replace device interrupts! • Hypercalls: Synchronous calls from guest OS to Xen (similar to system calls). – E.g. set of page table updates
I/O Rings : Data Transfer Sort of message passing abstraction built on top of Xen SHM IPC Networking Example : A Domain (Request Producer) can supply buffers using “requests” and Xen (Response Producer) provides “responses” to signal arrival of packet into the buffers. In order this efficiently (avoid copy of packet data from Xen to Domain pages) Xen exchanges the its packet buffer with an unused page frame which has to be supplied by the Domain! 
MMU virtualization • VMware Solution (Shadow Page Tables, Slow) - Two sets of page tables are maintained - The guest virtual page tables aren’t visible to MMU. - The hypervisor traps virtual page table updates and is responsible for validating them and propagating changes to the MMU ‘shadow’ page table. • Xen Solution (Direct Page Tables access) - Guest OS is allowed read only access to the real page tables. - Page tables updates must still go through the hypervisor which validates them - Guest OSes allocate and manage their own PTs using hypercalls - The OS must not give itself unrestricted PT access, access to hypervisor space, or access to other VMs.
Networking • Xen provides a Virtual Firewall-router (VFR). • Each domain has one or more VIFs attached to VFR. • Two I/O buffer descriptor rings. (one each for Transmit and Receive). • Transmit : Domain updates the transmit descriptor ring. Xen copies the descriptor and the packet header. Header is inspected by VFR. Payload copying is avoided by using Gather DMA technique in NIC driver. • Receive: Avoid copying by used page flipping technique.
Disk • Only Domain0 has direct access to disks • Other domains need to use virtual block devices (VBD) – Use the I/O ring – Guest I/O scheduler reorders requests prior to enqueuing them on the ring – Xen can also reorder requests to improve performance • Zero-copy data transfer done using between DMA and pinned memory pages.
Xen Architecture Source: http://www.arunviswanathan.com/content/ppts/xen_virt.pdf
Domain 0: Control and Management • Separation of mechanism and policy • Domain0 hosts the application-level management software which uses control interfaces provided by Xen. • Create/Terminate other domains, control scheduling, CPU, Memory allocation, creation of VIFs and VBDs which have list of parameters to manage include access control (for i/o devices), amount of physical memory per domain, VFR rules etc.
I/O Handling • dom0 runs the backend of the device, which is exported to each domain via a frontend • netback, netfront for network devices (NICs) blockback • blockback, blockfront for block devices • PCI pass through exists for other kinds of devices (e.g. sound)
Driver Architecture Source: http://www.linuxjournal.com/article/8909
Benchmarks (all taken from Ian’s presentation in 2006) In short, Xen provides close to native performance!
MMU Micro-Benchmarks
TCP Benchmarks
Xen Today (Xen 3.x) • Xen 3.x supports running unmodified guest OS by using hardware assisted virtualization (Intel VT, AMD-V) • Supports NetBSD, OpenSolaris, Linux 2.4/2.6 as both guest and host. Runs FreeBSD, Windows (using HVM) as guest. • Live Migration of VMs between Xen hosts. • x86/x86-64/Itanium/PowerPC, SMP (64-way!) guests support, enhanced Power Management, XenCenter for management. • Awesome hardware support! (ESX HCL is very limited). • DomU (paravirtualization) patches merged in Linux 2.6.23 • Dom0 patches are still struggling to get merged upstream.  (KVM is gaining support!)
Xen 3.0 Architecture
Recommend
More recommend