KVM/ARM Linux Symposium 2010 Christoffer Dall and Jason Nieh - - PowerPoint PPT Presentation

kvm arm
SMART_READER_LITE
LIVE PREVIEW

KVM/ARM Linux Symposium 2010 Christoffer Dall and Jason Nieh - - PowerPoint PPT Presentation

KVM/ARM Linux Symposium 2010 Christoffer Dall and Jason Nieh {cdall,nieh}@cs.columbia.edu Slides: http://www.cs.columbia.edu/~cdall/ols2010-presentation.pdf Friday, July 16, 2010 We like KVM Its Fast, Free, Open, and Simple!


slide-1
SLIDE 1

KVM/ARM

Linux Symposium 2010

Christoffer Dall and Jason Nieh {cdall,nieh}@cs.columbia.edu

Slides: http://www.cs.columbia.edu/~cdall/ols2010-presentation.pdf

Friday, July 16, 2010

slide-2
SLIDE 2

We like KVM

  • It’s Fast, Free, Open, and Simple!
  • Integrates well with Linux
  • Always maintained
  • Supports x86, ia64, PowerPC, and s390

Friday, July 16, 2010

slide-3
SLIDE 3

ARM devices are everywhere

Friday, July 16, 2010

slide-4
SLIDE 4

Google Nexus One Specifications

Processor Qualcomm Snapdragon QSD8250 CPU Core Qualcomm Scorpion Architecture ARM v7 Clock speed 1000 MHz Technology 65 nm Memory 512 MB

...and they are getting really powerful

Friday, July 16, 2010

slide-5
SLIDE 5

KVM relies on hardware support

  • x86 and ia64 (Itanium)
  • PowerPC, and s390

Friday, July 16, 2010

slide-6
SLIDE 6

KVM relies on hardware support

  • x86 and ia64 (Itanium)
  • PowerPC, and s390

Virtualization Extensions

Friday, July 16, 2010

slide-7
SLIDE 7

KVM relies on hardware support

  • x86 and ia64 (Itanium)
  • PowerPC, and s390

Virtualizable Virtualization Extensions

Friday, July 16, 2010

slide-8
SLIDE 8

Hardware Support for Virtualization

  • Guest kernel runs in user mode
  • Sensitive instructions are instructions that depend
  • n CPU mode
  • Virtualizable if all sensitive instructions trap
  • Trap-and-emulate
  • Hardware virtualization features provide extra

mode where all sensitive instructions trap

Friday, July 16, 2010

slide-9
SLIDE 9

Problem

  • ARM is not virtualizable
  • ARM has no hardware virtualization

extensions

Friday, July 16, 2010

slide-10
SLIDE 10

31 Sensitive instructions

CPS LDRT STC RSBS MRS STRBT ADCS RSCS MSR STRT ADDS SBCS RFE CDP ANDS SUBS SRS LDC BICS LDM (2) MCR EORS LDM (3) MCRR MOVS STM (2) MRC MVNS LDRBT MRRC ORRS

Friday, July 16, 2010

slide-11
SLIDE 11

31 Sensitive instructions

CPS LDRT STC RSBS MRS STRBT ADCS RSCS MSR STRT ADDS SBCS RFE CDP ANDS SUBS SRS LDC BICS LDM (2) MCR EORS LDM (3) MCRR MOVS STM (2) MRC MVNS LDRBT MRRC ORRS

and 25 of them are non-privileged

Friday, July 16, 2010

slide-12
SLIDE 12

Solution

  • We use lightweight paravirtualization
  • Retains simplicity of KVM architecture
  • Minimally intrusive to KVM and the Kernel
  • Uses on QEMU for device emulation

Friday, July 16, 2010

slide-13
SLIDE 13
  • KVM
  • CPU virtualization on ARM
  • Memory virtualization on ARM
  • World Switch details
  • Implementation status

Friday, July 16, 2010

slide-14
SLIDE 14

KVM Architecture

Hardware Linux Kernel KVM Processes VM Guest kernel QEMU

Friday, July 16, 2010

slide-15
SLIDE 15

KVM execution flow

Friday, July 16, 2010

slide-16
SLIDE 16

Start QEMU

Friday, July 16, 2010

slide-17
SLIDE 17

Start QEMU Alloc memory

Friday, July 16, 2010

slide-18
SLIDE 18

Start QEMU Alloc memory

Friday, July 16, 2010

slide-19
SLIDE 19

Start QEMU Alloc memory Create VM

Friday, July 16, 2010

slide-20
SLIDE 20

Start QEMU Alloc memory Create VM

Friday, July 16, 2010

slide-21
SLIDE 21

Start QEMU Alloc memory Create VM Register memory

Friday, July 16, 2010

slide-22
SLIDE 22

Start QEMU Alloc memory Create VM Register memory

Friday, July 16, 2010

slide-23
SLIDE 23

Start QEMU Alloc memory Create VM Register memory Create VCPU

Friday, July 16, 2010

slide-24
SLIDE 24

Start QEMU Alloc memory Create VM Register memory Create VCPU

Friday, July 16, 2010

slide-25
SLIDE 25

Start QEMU Alloc memory Create VM Register memory Create VCPU KVM RUN

Friday, July 16, 2010

slide-26
SLIDE 26

Start QEMU Alloc memory Create VM Register memory Create VCPU KVM RUN

Friday, July 16, 2010

slide-27
SLIDE 27

Start QEMU Alloc memory Create VM Register memory Create VCPU KVM RUN User space World switch Kernel Guest

Friday, July 16, 2010

slide-28
SLIDE 28

Start QEMU Alloc memory Create VM Register memory Create VCPU KVM RUN User space Kernel World switch

Native guest execution

Guest

Friday, July 16, 2010

slide-29
SLIDE 29

Start QEMU Alloc memory Create VM Register memory Create VCPU KVM RUN User space Kernel World switch

Native guest execution

Guest Interrupt

Friday, July 16, 2010

slide-30
SLIDE 30

Start QEMU Alloc memory Create VM Register memory Create VCPU KVM RUN User space Kernel World switch

Native guest execution

Guest Interrupt World switch

Friday, July 16, 2010

slide-31
SLIDE 31

Start QEMU Alloc memory Create VM Register memory Create VCPU KVM RUN User space Kernel World switch

Native guest execution

Guest Interrupt World switch Handle exit

Friday, July 16, 2010

slide-32
SLIDE 32

Start QEMU Alloc memory Create VM Register memory Create VCPU KVM RUN User space Kernel World switch

Native guest execution

Guest Interrupt World switch Handle exit

Handle I/O?

Friday, July 16, 2010

slide-33
SLIDE 33

Start QEMU Alloc memory Create VM Register memory Create VCPU KVM RUN User space Kernel World switch

Native guest execution

Guest Interrupt World switch Handle exit

Handle I/O?

Emulation

Friday, July 16, 2010

slide-34
SLIDE 34

Start QEMU Alloc memory Create VM Register memory Create VCPU KVM RUN User space Kernel World switch

Native guest execution

Guest Interrupt World switch Handle exit

Handle I/O?

Emulation

Friday, July 16, 2010

slide-35
SLIDE 35

New KVM architecture

  • Logical separation of architecture

dependent and independent code

  • kvm_arch_XXX
  • kvm_XXX

Friday, July 16, 2010

slide-36
SLIDE 36
  • KVM
  • CPU virtualization on ARM
  • Memory virtualization on ARM
  • World Switch details
  • Implementation status

Friday, July 16, 2010

slide-37
SLIDE 37

ARM virtualization

  • ARM is not virtualizable - nor does it have

hardware virtualization support

  • Possible solutions:
  • binary translation
  • or paravirtualization

Friday, July 16, 2010

slide-38
SLIDE 38

Binary Translation

  • Traditionally done out-of-place with a

translation cache

  • Difficult to make it fast
  • Contradicts idea of KVM

Friday, July 16, 2010

slide-39
SLIDE 39

Paravirtualization

  • Changes the guest kernel to replace code

with sensitive instructions with hypercalls

  • Guest kernel is modified by hand
  • Hard to merge changes with upstream

Kernel versions

Friday, July 16, 2010

slide-40
SLIDE 40

Original code:

mrs r2, cpsr @ get current mode tst r2, #3 @ not user? bne not_angel

Lightweight-paravirtualization (LPV)

Friday, July 16, 2010

slide-41
SLIDE 41

Original code:

mrs r2, cpsr @ get current mode tst r2, #3 @ not user? bne not_angel

Lightweight-paravirtualization (LPV)

Friday, July 16, 2010

slide-42
SLIDE 42

Original code:

swi 0x022000 @ get current mode tst r2, #3 @ not user? bne not_angel

Lightweight-paravirtualization (LPV)

Friday, July 16, 2010

slide-43
SLIDE 43

Lightweight-paravirtualization (LPV)

  • Replace sensitive instructions with traps
  • Traps encode original instruction and operands
  • Emulate replaced instructions in KVM
  • Script-based solution applicable to any vanilla

kernel tree

Friday, July 16, 2010

slide-44
SLIDE 44

LPV encoding example

mrs r2, cpsr swi 0x022000

Status register access function

23 20 19 16 15 14 12 0 +--------------+-------------------+--+-------------+-----------------------------+ | 0 | Rd | R| 2 | OIF | +--------------+-------------------+--+-------------+-----------------------------+

MRS encoding

Friday, July 16, 2010

slide-45
SLIDE 45

LPV implementation

  • Uses regular expressions to search for

sensitive assembly instructions

  • ~150 lines (written in Python)
  • Supports inline assembler, preprocessor

macros and assembler files.

Friday, July 16, 2010

slide-46
SLIDE 46

LPV requirements

  • Assumes guest kernel does not make

system calls to itself

  • Module source code must also be handled
  • GCC does not generate sensitive

instructions from C-code

Friday, July 16, 2010

slide-47
SLIDE 47

LPV key points

  • Encodes each sensitive instructions to a

single trap

  • As efficient as trap-and-emulate
  • Fully automated
  • Doesn’t affect kernel code size

Friday, July 16, 2010

slide-48
SLIDE 48
  • KVM
  • CPU virtualization on ARM
  • Memory virtualization on ARM
  • World Switch details
  • Implementation status

Friday, July 16, 2010

slide-49
SLIDE 49

Virtual memory

4 GB

Devices Physical Addresses RAM Kernel

4 GB

User space application Virtual Addresses MMU Page Tables

Friday, July 16, 2010

slide-50
SLIDE 50

New address space

4 GB

Devices Host physical (Machine) Addresses RAM Guest Kernel

4 GB

Guest user space application Guest virtual Addresses MMU

4 GB

Devices Guest physical Addresses RAM

Friday, July 16, 2010

slide-51
SLIDE 51

New address space

4 GB

Devices Host physical (Machine) Addresses RAM Guest Kernel

4 GB

Guest user space application Guest virtual Addresses MMU Shadow page tables

4 GB

Devices Guest physical Addresses RAM

Friday, July 16, 2010

slide-52
SLIDE 52

Shadow page tables

  • Map
  • Guest Virtual Addresses to
  • Host Physical Addresses
  • One per guest page table (process)
  • Start out empty and add entries on page

faults (on demand)

Friday, July 16, 2010

slide-53
SLIDE 53

Address translation

KVM process Virtual Memory Guest physical Guest virtual Host kernel Guest memory Machine memory

Friday, July 16, 2010

slide-54
SLIDE 54

Address translation

KVM process Virtual Memory Guest physical Guest virtual Host kernel Guest memory Machine memory

Walk guest page tables in software: gva_to_gfn(...);

Friday, July 16, 2010

slide-55
SLIDE 55

Address translation

KVM process Virtual Memory Guest physical Guest virtual Host kernel Guest memory Machine memory

Built-in KVM functionality: gfn_to_hva(...); Walk guest page tables in software: gva_to_gfn(...);

Friday, July 16, 2010

slide-56
SLIDE 56

Address translation

KVM process Virtual Memory Guest physical Guest virtual Host kernel Guest memory Machine memory

Kernel functionality: page = virt_to_page(...); pfn = page_to_pfn(page); Built-in KVM functionality: gfn_to_hva(...); Walk guest page tables in software: gva_to_gfn(...);

Friday, July 16, 2010

slide-57
SLIDE 57

Shadow page table consistency

  • Caching shadow page tables is an
  • ptimization
  • Keep cached page tables in sync by

protecting guest page tables and tracking updates

Friday, July 16, 2010

slide-58
SLIDE 58
  • Goal
  • Protect host from guest
  • Honor intended guest protection
  • ARM provides flexible protection methods
  • Access is specified per CPU privilege level

Memory Protection

Friday, July 16, 2010

slide-59
SLIDE 59

Access Protection Bits

AP Privileged User 00 None None 01 R/W None 10 R/W R/O 11 R/W R/W

Friday, July 16, 2010

slide-60
SLIDE 60

Access mapping example

  • Guest page table specifies:
  • Privileged: R/W
  • User: No Access
  • Shadow page table bits in guest user mode:
  • User: No Access
  • Shadow page table bits in guest priv. mode:
  • User: R/W

Friday, July 16, 2010

slide-61
SLIDE 61

Access mapping example

  • Guest page table specifies:
  • Privileged: R/W
  • User: No Access
  • Shadow page table bits in guest user mode:
  • User: No Access
  • Shadow page table bits in guest priv. mode:
  • User: R/W

Friday, July 16, 2010

slide-62
SLIDE 62

Access mapping example

  • Guest page table specifies:
  • Privileged: R/W
  • User: No Access
  • Shadow page table bits in guest user mode:
  • User: No Access
  • Shadow page table bits in guest priv. mode:
  • User: R/W

Friday, July 16, 2010

slide-63
SLIDE 63
  • KVM
  • CPU virtualization on ARM
  • Memory virtualization on ARM
  • World Switch details
  • Implementation status

Friday, July 16, 2010

slide-64
SLIDE 64

KVM RUN User space Kernel World switch

Native guest execution

Guest Interrupt World switch Handle exit

Handle I/O?

Emulation

World Switches

Friday, July 16, 2010

slide-65
SLIDE 65

World switch

  • Disable interrupts
  • Store host state
  • Switch page tables
  • Load guest state
  • Enable interrupts
  • Jump to guest code
  • Store exit state
  • Switch page tables
  • Restore host state
  • (Host kernel IRQ handler)
  • Enable interrupts
  • Return to ioctl call

To guest From guest

Friday, July 16, 2010

slide-66
SLIDE 66

World switch

  • Disable interrupts
  • Store host state
  • Switch page tables
  • Load guest state
  • Enable interrupts
  • Jump to guest code
  • Store exit state
  • Switch page tables
  • Restore host state
  • (Host kernel IRQ handler)
  • Enable interrupts
  • Return to ioctl call

To guest From guest

Friday, July 16, 2010

slide-67
SLIDE 67

World switch

  • Disable interrupts
  • Store host state
  • Switch page tables
  • Load guest state
  • Enable interrupts
  • Jump to guest code
  • Store exit state
  • Switch page tables
  • Restore host state
  • (Host kernel IRQ handler)
  • Enable interrupts
  • Return to ioctl call

To guest From guest

Friday, July 16, 2010

slide-68
SLIDE 68

World switch

  • Disable interrupts
  • Store host state
  • Switch page tables
  • Load guest state
  • Enable interrupts
  • Jump to guest code
  • Store exit state
  • Switch page tables
  • Restore host state
  • (Host kernel IRQ handler)
  • Enable interrupts
  • Return to ioctl call

To guest From guest

Friday, July 16, 2010

slide-69
SLIDE 69

Switch page tables

PC

Friday, July 16, 2010

slide-70
SLIDE 70

Shared Page

Machine memory Guest Kernel

4 GB

User space application Guest Virtual Addresses Host Kernel

4 GB

QEMU virtual memory Host Virtual Addresses

0xFFFF1000 0xFFFF1000

Friday, July 16, 2010

slide-71
SLIDE 71

Shared Page

Machine memory Guest Kernel

4 GB

User space application Guest Virtual Addresses Host Kernel

4 GB

QEMU virtual memory Host Virtual Addresses

0xFFFF1000 0xFFFF1000

Friday, July 16, 2010

slide-72
SLIDE 72

Shared Page Internals

Temporary Data Code T emporary Stack 0xffff 1000 0xffff 1fff

Friday, July 16, 2010

slide-73
SLIDE 73
  • KVM
  • CPU virtualization on ARM
  • Memory virtualization on ARM
  • World Switch details
  • Implementation status

Friday, July 16, 2010

slide-74
SLIDE 74

Status

  • Successfully boots Linux VMs
  • Host built on Android Kernel 2.6.27
  • T

ested guest kernels from 2.6.17 to 2.6.33

Friday, July 16, 2010

slide-75
SLIDE 75
  • Improve performance
  • Cache shadow page tables
  • Avoid unnecessary world-switches
  • Binary patching
  • T

est device support

  • Upstream!

Future work

Friday, July 16, 2010

slide-76
SLIDE 76

ARMv6

  • Physically tagged caches
  • TLB “Application Space Identifiers” (ASID’s)
  • New instructions

Friday, July 16, 2010

slide-77
SLIDE 77

Related Work

  • Commercial solutions:
  • VMWare MVP

, OK Labs, VirtualLogix, ...

  • Open-source:
  • QEMU
  • XenARM

Friday, July 16, 2010

slide-78
SLIDE 78

Conclusions

  • ARM virtualization is important
  • With LPV we now have KVM/ARM
  • LPV is simple, fully automated, and efficient
  • Minimally intrusive
  • It works!

Friday, July 16, 2010

slide-79
SLIDE 79

Tasks

  • Caching of shadow page tables
  • Moving things to shared page
  • Coalesced MMIO
  • GDB support
  • T

esting devices (on BeagleBoards, IGEPv2 boards etc.)

  • ...

Friday, July 16, 2010

slide-80
SLIDE 80

Want to contribute?

  • Mailing list:

android-virt@lists.columbia.edu

  • WIKI:

http://wiki.ncl.cs.columbia.edu

  • Source code:

http://git.ncl.cs.columbia.edu/git

Friday, July 16, 2010

slide-81
SLIDE 81

Extra Material

Friday, July 16, 2010

slide-82
SLIDE 82
  • Same as on x86:
  • T

est and Development

  • OS freedom
  • Multiple Personas
  • Virtualization features

Use cases

Friday, July 16, 2010

slide-83
SLIDE 83

Exceptions

  • Traps & Interrupts
  • CPU changes mode and execution starts

from “vectors” at either:

  • 0x00000000 + offset
  • or 0xFFFF0000 + offset

Friday, July 16, 2010

slide-84
SLIDE 84

Exceptions and KVM/ARM

  • KVM/ARM uses custom handlers to handle

exceptions while executing guest

  • Exceptions are the only way to:

“exit from the guest”

  • IRQ’s are forwarded to the host kernel

handlers

  • Traps are handled by KVM/ARM

Friday, July 16, 2010

slide-85
SLIDE 85

Guest Kernel

4 GB

User space application Guest Virtual Addresses

0xFFFF0000

Hardware exception vector page

0x0

Guest exception vector page

Guest exceptions

Guest uses “low” vectors

Friday, July 16, 2010

slide-86
SLIDE 86

What happens at a conflict?

  • KVM/ARM’s vectors are mapped with no-

access for user mode code at 0xffff0000

  • The guest tries to access 0xffff0000 page
  • KVM/ARM handles the permission fault

Friday, July 16, 2010

slide-87
SLIDE 87

Guest Kernel

4 GB

User space application Guest Virtual Addresses

0xFFFF0000

Hardware exception vector page

0x0

Guest exception vector page

Exception page conflict

Friday, July 16, 2010

slide-88
SLIDE 88

Guest Kernel

4 GB

User space application Guest Virtual Addresses

0x0

Hardware exception vector page

0xffff0000

Guest exception vector page

Exception page conflict

Guest uses “high” vectors

Friday, July 16, 2010

slide-89
SLIDE 89

Guest Kernel

4 GB

User space application Guest Virtual Addresses

0xFFFF0000

Hardware exception vector page

Exception page conflict

Guest uses “high” vectors, but needs access to page 0

Friday, July 16, 2010