Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir - - PowerPoint PPT Presentation

xen and the art of virtualization
SMART_READER_LITE
LIVE PREVIEW

Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir - - PowerPoint PPT Presentation

Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, Andrew Warfield University of Cambridge Computer Laboratory, SOSP 2003 Presenter: Dhirendra Singh Kholia


slide-1
SLIDE 1

Xen and the Art of Virtualization

Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, Andrew Warfield University of Cambridge Computer Laboratory, SOSP 2003

Presenter: Dhirendra Singh Kholia

slide-2
SLIDE 2

Outline

  • What is Xen?
  • Xen: Goals, Challenges and Approach
  • Detailed Design
  • Benchmarks (skip?)
  • Xen Today
  • Conclusion
  • Discussion
slide-3
SLIDE 3

What is Xen?

  • Xen is a virtual machine monitor (VMM) for x86,

x86-64, Itanium and PowerPC architectures. Xen can securely execute multiple virtual machines, each running its own OS, on a single physical system with close-to-native performance.

  • It is a Type-1 (native, bare-metal) hypervisor. It

runs directly on the host's hardware as a hardware control and guest operating system monitor.

slide-4
SLIDE 4

Xen Goals

  • Performance isolation between guests

(resource control for some guarantee of QoS)

  • Minimal performance overhead
  • Support for different Operating Systems.
  • Maintain Guest OS ABI (thus allowing existing

applications to run unmodified)

  • Need to support full multi-application
  • perating systems.
slide-5
SLIDE 5

x86 CPU virtualization

  • x86 : most successful architecture ever!
  • Easy: Has built-in privilege levels/protection rings ( Ring 0, Ring 1,

Ring 2 , Ring 3). Ring 1 and Ring 2 are unused

  • Hard:

– VMM needs to run on highest privilege level (Ring 0) to provide isolation, resource scheduling and performance BUT Guest Kernels too are designed to run in Ring 0

Source: Ring Diagrams: http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection

  • Running certain sensitive instructions (aka

non- virtualizable instructions) without sufficient permissions causes silent failures instead of generating a “convenient” trap (GPF) to VMM. Thus, a VMM will never get an opportunity to simulate the effect of the instruction

slide-6
SLIDE 6

x86 CPU virtualization approaches 1

  • Full Virtualization (VMware Workstation, presents

Virtual resources) – Doesn’t require Guest OS modifications – Uses “binary translation”: A technique to dynamically rewrite Guest OS Kernel code in order to catch non-trapping privileged instructions. – Relatively lower performance (translation overhead, page table sync. and update overhead) – Time Synchronization can be problematic (lost ticks, backlog truncation) frequently requiring a Guest Tool to maintain synchronization.

slide-7
SLIDE 7

x86 CPU virtualization approaches 2

  • Paravirtualization (Xen, presents Virtual + Real resources)

– Requires modifications to Guest OS’s Kernel. – Improved performance (due to exposure of real hardware,

  • ne time guest modification)

– Exposing real time allows correct handling of time-critical stuff like TCP timeouts and RTT estimates.

  • Hardware Assisted Virtualization

– Conceptually it can be understood as adding Ring -1 above Ring 0 in which hypervisor executes and can trap and emulate privileged instructions – Allows for a much cleaner implementation of full virtualization.

slide-8
SLIDE 8

Full Virtualization vs. Paravirtualization

Ring 0 Ring 2 Ring 1 Ring 3 User Applications

Binary Translation

VMM Full Virtualization

Guest OS

Xen

Guest OS

Paravirtualization

Control Plane User Apps Dom0 http://www.cs.uiuc.edu/homes/kingst/spring2007/cs598stk/slides/20070201-kelm-thompson-xen.ppt

slide-9
SLIDE 9

Cost of Porting/Paravirtualizing an OS

  • x86 dependant (Privileged instructions + Page

table access)

  • Virtual Network driver, Virtual Block device driver
  • Xen Code (schedulers, hypercall implementation

etc)

  • For Linux 2.4, < 1.5% (around 3000 lines) of x86

code base size modified/added.

  • How much modification of Guest OS is too much?

Is several thousand lines of code per operating system actually minimal effort? - Considering Linux Kernel is around 11.5 million lines of code (Source: Linux Foundation, August 2009), I think few thousand lines of code is minimal.

slide-10
SLIDE 10

Paravirtualization: Xen’s approach 1

  • Xen runs in Ring 0, modified Guest Kernel runs in Ring 1

and Guest Applications run unmodified in Ring 3 (hence Guest OS remains protected)

  • Guest OS Kernel must be modified to use a special

hypercall ABI instead of executing privileged and sensitive instructions directly. A hypercall (0x82) is a software trap from a domain to the hypervisor, just as a syscall (0x80) is a software trap from user space to the kernel. e.g. When the system is idle, Linux issues HLT instruction which requires Ring 0 privilege to execute. In XenoLinux this is replaced by a hypercall which transfer control to Xen Ring 0 from Ring 1.

slide-11
SLIDE 11

Paravirtualization: Xen’s approach 2

  • Xen is mapped to top 64MB (for x86) of every

OS’s address space. This is done to save a TLB flush when going from Ring 1 to Ring 0 (VMM). Xen itself is protected by segmentation.

  • Trap/Exception (System call, page-fault) handlers

are registered with Xen for validation.

  • Guest OS may install a “fast” exception handler

for system calls, allowing direct calls from an application into its guest OS and avoiding indirecting through Xen on every call.

slide-12
SLIDE 12

Paravirtualization: Xen’s approach

Source: http://www.linuxjournal.com/article/8540

slide-13
SLIDE 13

Control Transfer: Hypercalls and Events

  • Events for notification from Xen to guest OS

– E.g. data arrival on network; virtual disk transfer complete

  • Events replace device interrupts!
  • Hypercalls: Synchronous calls from guest OS

to Xen (similar to system calls).

– E.g. set of page table updates

slide-14
SLIDE 14

I/O Rings : Data Transfer

Networking Example: A Domain (Request Producer) can supply buffers using “requests” and Xen (Response Producer) provides “responses” to signal arrival of packet into the buffers. In order this efficiently (avoid copy of packet data from Xen to Domain pages) Xen exchanges the its packet buffer with an unused page frame which has to be supplied by the Domain!  Sort of message passing abstraction built on top of Xen SHM IPC

slide-15
SLIDE 15

MMU virtualization

  • VMware Solution (Shadow Page Tables, Slow)
  • Two sets of page tables are maintained
  • The guest virtual page tables aren’t visible to MMU.
  • The hypervisor traps virtual page table updates and is responsible for

validating them and propagating changes to the MMU ‘shadow’ page table.

  • Xen Solution (Direct Page Tables access)
  • Guest OS is allowed read only access to the real page tables.
  • Page tables updates must still go through the hypervisor which

validates them

  • Guest OSes allocate and manage their own PTs using hypercalls
  • The OS must not give itself unrestricted PT access, access to

hypervisor space, or access to other VMs.

slide-16
SLIDE 16

Networking

  • Xen provides a Virtual Firewall-router (VFR).
  • Each domain has one or more VIFs attached

to VFR.

  • Two I/O buffer descriptor rings. (one each for

Transmit and Receive).

  • Transmit: Domain updates the transmit descriptor
  • ring. Xen copies the descriptor and the packet
  • header. Header is inspected by VFR. Payload copying

is avoided by using Gather DMA technique in NIC driver.

  • Receive: Avoid copying by used page flipping

technique.

slide-17
SLIDE 17

Disk

  • Only Domain0 has direct access to disks
  • Other domains need to use virtual block

devices (VBD)

– Use the I/O ring – Guest I/O scheduler reorders requests prior to enqueuing them on the ring – Xen can also reorder requests to improve performance

  • Zero-copy data transfer done using between

DMA and pinned memory pages.

slide-18
SLIDE 18

Xen Architecture

Source: http://www.arunviswanathan.com/content/ppts/xen_virt.pdf

slide-19
SLIDE 19

Domain 0: Control and Management

  • Separation of mechanism and policy
  • Domain0 hosts the application-level

management software which uses control interfaces provided by Xen.

  • Create/Terminate other domains, control

scheduling, CPU, Memory allocation, creation

  • f VIFs and VBDs which have list of

parameters to manage include access control (for i/o devices), amount of physical memory per domain, VFR rules etc.

slide-20
SLIDE 20

I/O Handling

  • dom0 runs the backend of the device, which is

exported to each domain via a frontend

  • netback, netfront for network devices (NICs)

blockback

  • blockback, blockfront for block devices
  • PCI pass through exists for other kinds of

devices (e.g. sound)

slide-21
SLIDE 21

Driver Architecture

Source: http://www.linuxjournal.com/article/8909

slide-22
SLIDE 22

Benchmarks (all taken from Ian’s presentation in 2006)

In short, Xen provides close to native performance!

slide-23
SLIDE 23

MMU Micro-Benchmarks

slide-24
SLIDE 24

TCP Benchmarks

slide-25
SLIDE 25
slide-26
SLIDE 26

Xen Today (Xen 3.x)

  • Xen 3.x supports running unmodified guest OS by using

hardware assisted virtualization (Intel VT, AMD-V)

  • Supports NetBSD, OpenSolaris, Linux 2.4/2.6 as both guest

and host. Runs FreeBSD, Windows (using HVM) as guest.

  • Live Migration of VMs between Xen hosts.
  • x86/x86-64/Itanium/PowerPC, SMP (64-way!) guests support,

enhanced Power Management, XenCenter for management.

  • Awesome hardware support! (ESX HCL is very limited).
  • DomU (paravirtualization) patches merged in Linux 2.6.23
  • Dom0 patches are still struggling to get merged upstream. 

(KVM is gaining support!)

slide-27
SLIDE 27

Xen 3.0 Architecture

slide-28
SLIDE 28

Questions - Security

  • What is the chance of the Hypervisor and other Guest OS’s getting

affected by a compromised Guest OS, running on top Dom0 ? – Game Over , protection of Domain 0 is critical!

  • Can’t we get rid of Domain Zero Guest OS ? I think if can do that we

can reduce the vulnerable surface of Xen (In one of their Security presentation they admit they should minimize the TCB). What are the other implication that might have towards the system if we remove Dom 0 Guest OS ? – Where will the management code go?, Xen relies on Dom0 drivers.

slide-29
SLIDE 29

Questions – Security 2

  • Hypervisor takes up the upper 64MB address space. Will it incur

problems if we don't want to modify operating system any more by using Intel-VT. - With Intel-VT, Xen isn’t mapped into Guest OS address space.

  • If a hacker managed to place a VM co-resident with the target, as a

next step he can extract confidential information via a cross-VM

  • attack. There are a number of avenues for such an attack.

E.g: side-channels: cross-VM information leakage due to the sharing of physical resources (e.g., the CPU’s data caches). In the multi-process environment, such attacks have been shown to enable extraction of RSA and AES secret keys. How this problem can be avoided in XEN? - ???

slide-30
SLIDE 30

Questions – Security 3

  • The Dom0 domain accesses the hardware directly,

while all other domains see virtual abstractions of

  • devices. Does that mean that all drivers, regardless of

domain run in the same address space, i.e. that of

  • Dom0. If so, how does it prevent a driver from doing a

DMA write to the memory of an arbitrary domain? – Drivers can be pushed out from Domain 0(Ring 1) to “Driver Domains”(Ring 1). This makes the system more robust. However the fundamental problem of unsafe DMA access is solved by IOMMU hardware.

slide-31
SLIDE 31

Questions – Resource Management

  • In Xen each guest OS has its own memory reservation and disk
  • allocation. Is this a way to statically allocate hardware resources

which is often considered as a waste of the resources? - Yes, Resource Management is complicated  Xen can do memory over commitment and then use ballooning to do dynamic memory management. Parallax handles the space management problem (using COW?). Memory and disk are cheap these days though, I would focus more on isolation, QoS and security problems.

  • In the section about Physical memory, they talk about either using a

balloon driver or modifying the kernel memory management routine to adjust memory usage of a domain. Both these approaches seem to require the modification of the OS. With hardware supported virtualization now allowing OSes to run unmodified, how is this problem solved? – The “balloon” driver works with HVM guest.

slide-32
SLIDE 32

Questions – Resource Management

  • In Xen, what strategy is utilized by hypervisor to schedule the other

domains fairly (to balance the load for each domain)? How about some domains always have heavier average load than other domains? – The new CREDIT scheduler assigns a “weight” and a “cap” to each domain. A domain with 2X weight implies that it gets twice as much CPU as a domain with weight X. Cap decides how many processors the Domain can use. You can always assign (even at runtime) higher weight to a Domain which requires more CPU time.

  • I don’t see why the paper says delegating the task of building new

domain to Domain0 is better than building a domain entirely within

  • Xen. Isn’t Domain0 a part of Xen? How can the complexity be

reduced? – By Xen the authors mean the VMM part running in Ring 0. Domain 0 runs in Ring 1. Management code has to be present and Domain 0 is the logical place to put it!

slide-33
SLIDE 33

Questions - Isolation

  • How can this paper prove that it allows multiple commodity operating

systems to share hardware in a safe and resource managed fashion, when the Xen prototype can only support XenoLinux guest OS when this paper is written – Xen today handles many different Guest OSes. Even in 2003 they had a working XP prototype (it could run notepad and minesweeper).

  • It is impossible to run a guest OS on Xen that only supports

2 privilege levels in hardware? – Yes I think so, with 2 privilege levels Guest OS wouldn’t be able to protect itself from applications.

  • If Xen VMM is not used on a processor X86 with four privilege levels, will

the whole architecture impair? I mean, then, how to separate the guest OS kernel and guest application in a safe-proof fashion? – 3 Rings are good, 2 are NOT!

slide-34
SLIDE 34

Questions – Performance

  • If we can modify memory management subsystem, why cannot we modify

the I/O system to directly transfer from/to the disk? It seems I/O performance could be improved in this way. Is it hard? - Xen already does Zero-Copy transfer (by using DMA) for Disk I/O. Did I understand the question correctly?

  • DomU gets resources from Dom0 except the CPU resource and the memory

from Xen VMM, which will make a lot of overhead between

  • communication. How to reduce it in the next version of Xen? – Zero Copy

Transfers, Underlying IPC used (SHM) is fast, Batching Updates and Events, PCI Pass through.

  • 4MB address reserved by Xen for the avoidance of the TLB flush per address

space seems to be a great consumption if 100 OSes run on VMM. Does this paper mean that Xen need to use 64MB for each process run on each OS run on it? If it is the case, it seems to be a disaster. - NO!, Xen is mapped into top 64MB of every guest address space. It doesn’t physically consume 64MB of RAM for every Guest OS

slide-35
SLIDE 35

Questions – Utility

  • In what kind of scenarios in practice we need to have multiple

different kinds of operating systems running on the same machine, especially applications nowadays are becoming more and more portable on different platforms? – To test the the very same portable applications Virtual Machines are an excellent solution! You can run Windows, Linux, OSX on the same box and test your applications.

slide-36
SLIDE 36

Questions – Future Work

  • In the future work they talk about a shared

universal buffer cache. Is this similar to the shared memory mentioned in Disco? Was this ever implemented? – Yes, I think so. Yes, XenFS project seems to be active.

slide-37
SLIDE 37

Questions

  • Although the paper claims that minimal modification is required

to port an guest OS, the porting work of Windows XP was still incomplete in their experiments. So do you think it really easy to achieve that? - It ran into licensing problems (M$!). With HVM, such a port is not required. I leave the answering of last part to the audience 

  • The authors refer a number of times to a paravirtual port of

Windows XP. A quick Web search reveals that licensing issues prevent this port from ever being published; thus, today, Windows XP can only be run under Xen using hardware-assisted virtualization (added in Xen 3). Why do the authors bother describing the paravirtualization of Windows XP, when no researcher can replicate their results and no user can take advantage of this port (due to unavailability of the code)? – Simply to illustrate that different OSes could be potentially be ported to run on top of Xen with minimal changes, that would be my guess!

slide-38
SLIDE 38

More Questions

  • From this paper, it seems VMware lose a lot to Xen in

performance, so I'm wondering is there any scenario that we may prefer binary translation as VMware

  • ver paravirtualization as Xen? – BT is required in
  • rder to run unmodified Guest OS on top of plain
  • x86. BT is not required if processor supports

hardware virtualization. However BT is still used because it gives better performance than VT in some scenarios.

slide-39
SLIDE 39

Even More Questions!

  • Would it be a heavy performance loss on the guest OSes

that every privileged instruction has to be validated by Xen? How does VMware handle such a problem? - ???

  • The authors chose to not implement paging in the VMM,

but to allow each OS to perform paging itself. They state that this decision was made to help achieve performance isolation, by preventing one domain from performing thrashing-inciting memory access patterns and thus reducing the performance of other domains. Is there any paging policy that would allow the VMM to perform paging, with all the attendant benefits (better resource sharing in asymmetric-load situations, etc), while not suffering substantially from a breakdown in performance isolation? - ???

slide-40
SLIDE 40

Even More Questions!

  • A minor question: What is "QOS crosstalk" problem

referred in Section 1?

  • Xen can provide three types of time: real, virtual and

wall-clock time. The virtual time is used by the guest OS to make proper scheduling decisions but nowadays, Intel-VT enables us to use unmodified

  • guests. However, if the guest OS does not know the

virtual time, how can it make good scheduling decisions? By using Intel-VT, how could we provide the guest OS the virtual time, at the same time to give it the real time?

slide-41
SLIDE 41

References

  • Ring Diagrams: http://duartes.org/gustavo/blog/post/cpu-

rings-privilege-and-protection

  • J. S. Robin and C. E. Irvine. Analysis of the Intel Pentium's

ability to support a secure virtual machine monitor

  • Introduction to the Xen Virtual Machine:

http://www.linuxjournal.com/article/8540

slide-42
SLIDE 42

Conclusions

  • High performance, Strong isolation and

Effective scaling

  • Commercially Successful (Citrix) and Widely

used in Industry (It is the VMM driving Cloud Computing, at least Amazon S3 uses it!)

  • Xen is awesome 