Microkernel virtualization under one roof - dare the impossible - - - PowerPoint PPT Presentation

microkernel virtualization under one roof dare the
SMART_READER_LITE
LIVE PREVIEW

Microkernel virtualization under one roof - dare the impossible - - - PowerPoint PPT Presentation

Microkernel virtualization under one roof - dare the impossible - Alexander Bttcher < alexander.boettcher@genode-labs.com > Outline 1. Introduction 2. Kernel interfaces 3. VM interface harmonization 4. VMMs harmonized 5. Conclusion


slide-1
SLIDE 1

Microkernel virtualization under one roof

  • dare the impossible -

Alexander Böttcher <alexander.boettcher@genode-labs.com>

slide-2
SLIDE 2

Outline

  • 1. Introduction
  • 2. Kernel interfaces
  • 3. VM interface harmonization
  • 4. VMMs harmonized
  • 5. Conclusion

Microkernel virtualization under one roof - dare the impossible - 2

slide-3
SLIDE 3

Outline

  • 1. Introduction
  • 2. Kernel interfaces
  • 3. VM interface harmonization
  • 4. VMMs harmonized
  • 5. Conclusion

Microkernel virtualization under one roof - dare the impossible - 3

slide-4
SLIDE 4

Motivation

Off-the-shell virtualization solution ridden with complexity. Application of virtualization call for trustworthy solutions. Complexity defeats trust. Alternative approach → Microkernels with hardware assisted virtualization extensions

Microkernel virtualization under one roof - dare the impossible - 4

slide-5
SLIDE 5

Component based virtualization architecture

non-root mode root mode kernel NOVA Microhypervisor 9,000 SLOC Resource management Apps Drivers VMM VMM VMM Guest OS Guest OS Guest OS

Microkernel virtualization under one roof - dare the impossible - 5

slide-6
SLIDE 6

Genode OS framework

Microkernel virtualization under one roof - dare the impossible - 6

slide-7
SLIDE 7

General supported kernels on Genode

Microkernel virtualization under one roof - dare the impossible - 7

slide-8
SLIDE 8

Kernels with hardware assisted virtualization

Microkernel virtualization under one roof - dare the impossible - 8

slide-9
SLIDE 9

VMM inventory of Genode

Hardware assisted virtualization/separation support Microkernel Host VMM Guest vCPU hw ARM, 32bit custom 1, 32bit hw/trustzone ARM, 32bit custom 1, 32bit hw with Muen Intel, 64bit VBox 4 1, 32bit Seoul N, 32bit NOVA Intel & AMD VBox 4 N, 32bit, 64 bit 32bit, 64bit VBox 5 N, 32bit, 64 bit

Microkernel virtualization under one roof - dare the impossible - 9

slide-10
SLIDE 10

Research challenge

Vision: VMMs runnable on all kernels w/o re-compilation

Microkernel virtualization under one roof - dare the impossible - 10

slide-11
SLIDE 11

Research challenge

Vision: VMMs runnable on all kernels w/o re-compilation Focus on x86 microkernels for now → NOVA, seL4, Fiasco.OC, and -hw-

Microkernel virtualization under one roof - dare the impossible - 10

slide-12
SLIDE 12

Research challenge

Vision: VMMs runnable on all kernels w/o re-compilation Focus on x86 microkernels for now → NOVA, seL4, Fiasco.OC, and -hw- Approach: Generalize VM interface as used by -hw-

Microkernel virtualization under one roof - dare the impossible - 10

slide-13
SLIDE 13

Outline

  • 1. Introduction
  • 2. Kernel interfaces
  • 3. VM interface harmonization
  • 4. VMMs harmonized
  • 5. Conclusion

Microkernel virtualization under one roof - dare the impossible - 11

slide-14
SLIDE 14

Flow of a virtualization event

User-level VMM Guest OS NOVA

UTCB UTCB

  • VMCS

world switch

copy Microkernel virtualization under one roof - dare the impossible - 12

slide-15
SLIDE 15

vCPU state on NOVA

VMM

UTCB

kernel space user space

NOVA microhypervisor UTCB VMCS/VMCB

Transfer: UTCB, VMCS/VMCB agnostic, partial state support

Microkernel virtualization under one roof - dare the impossible - 13

slide-16
SLIDE 16

vCPU state on Fiasco.OC

VMM

UTCB vCPU state

kernel space user space

Fiasco.OC microkernel UTCB vCPU state VMCS/VMCB

Transfer: vCPU state, not VMCS/VMCB agnostic, full state

Microkernel virtualization under one roof - dare the impossible - 14

slide-17
SLIDE 17

vCPU state on seL4

VMM

IPCBuffer

kernel space user space

seL4 microkernel IPCBuffer vCPU state VMCS

Transfer: hybrid - IPCBuffer & syscall per VMCS register IPCBuffer: VM exit - 17 registers, VM enter - 3 registers

Microkernel virtualization under one roof - dare the impossible - 15

slide-18
SLIDE 18

Control flow on NOVA

VMM

UTCB

kernel space user space

NOVA microhypervisor UTCB VMCS/VMCB

thread vCPU IPC call IPC reply Microkernel virtualization under one roof - dare the impossible - 16

slide-19
SLIDE 19

Control flow on Fiasco.OC

VMM

UTCB vCPU state

kernel space user space

Fiasco.OC microkernel UTCB vCPU state VMCS/VMCB

thread vCPU syscall done vmresume (blocking) Microkernel virtualization under one roof - dare the impossible - 17

slide-20
SLIDE 20

Control flow on seL4

VMM

IPCBuffer

kernel space user space

seL4 microkernel IPCBuffer vCPU state VMCS

thread vCPU syscall done vmenter (blocking) Microkernel virtualization under one roof - dare the impossible - 18

slide-21
SLIDE 21

Control flow on Genode’s -hw-

VMM

UTCB vCPU state

kernel space user space

Genode’s -hw- microkernel (ARM) UTCB vCPU state

thread vCPU signal run Microkernel virtualization under one roof - dare the impossible - 19

slide-22
SLIDE 22

Outline

  • 1. Introduction
  • 2. Kernel interfaces
  • 3. VM interface harmonization
  • 4. VMMs harmonized
  • 5. Conclusion

Microkernel virtualization under one roof - dare the impossible - 20

slide-23
SLIDE 23

Design goals

VMM → just a component Genode components designed event driven Non-blocking thread (entrypoint) register for event sources Events cause transition in state machine State transition by Genode signal or RPC

Microkernel virtualization under one roof - dare the impossible - 21

slide-24
SLIDE 24

Design goals

VMM → just a component Genode components designed event driven Non-blocking thread (entrypoint) register for event sources Events cause transition in state machine State transition by Genode signal or RPC VM event → just another event source I/O event → just another event source Kernel agnostic ABI Unified vCPU state per platform

Microkernel virtualization under one roof - dare the impossible - 21

slide-25
SLIDE 25

Envisioned vCPU handling

VMM timer network

kernel

kernel space user space entrypoint vCPU0 vCPUn VM event signal signal

Microkernel virtualization under one roof - dare the impossible - 22

slide-26
SLIDE 26

Envisioned vCPU handling - multi core

VMM

kernel space user space

kernel

entrypoint A vCPU A0 ... vCPU An entrypoint B vCPU B0 ... vCPU Bn

Microkernel virtualization under one roof - dare the impossible - 23

slide-27
SLIDE 27

VM interface - kernel agnostic

VMM

kernel space user space

kernel

ld.lib.so VM interface Entrypoint vCPU0 ... vCPUn

Genode -base- library with unified ABI in ld.lib.so

Microkernel virtualization under one roof - dare the impossible - 24

slide-28
SLIDE 28

VM interface - kernel agnostic

VM connection/session → VM address space established create_vcpu() - setup new vCPUs cpu_state() - access to guest state attach/detach() - memory management of VM VM_handler class - registration for VM event handling run/pause() - control execution of vCPUs - non-blocking

Microkernel virtualization under one roof - dare the impossible - 25

slide-29
SLIDE 29

VM interface - kernel agnostic

VMM

ld.lib.so VM interface (client)

init core

VM interface (server) connection kernel kernel space user space entrypoint

Microkernel virtualization under one roof - dare the impossible - 26

slide-30
SLIDE 30

VM interface - kernel agnostic

VMM

ld.lib.so VM interface (client)

init core

VM interface (server) VM session kernel kernel space user space entrypoint vCPU0 ... vCPUn

Microkernel virtualization under one roof - dare the impossible - 27

slide-31
SLIDE 31

VM interface - kernel agnostic

VMM

ld.lib.so VM interface (client)

init core

VM interface (server) VM session kernel kernel space user space entrypoint vCPU0 ... vCPUn

Server: 200-400 LOC Client: NOVA, seL4: ~500 - Fiasco.OC: ~1000 - hw: ~30 LOC

Microkernel virtualization under one roof - dare the impossible - 28

slide-32
SLIDE 32

Control flow on Genode’s -hw- and NOVA

VMM

UTCB

kernel space user space

NOVA microhypervisor UTCB VMCS/VMCB

thread vCPU IPC call IPC reply

VMM

UTCB vCPU state

kernel space user space

Genode’s -hw- microkernel (ARM) UTCB vCPU state

thread vCPU signal run

Microkernel virtualization under one roof - dare the impossible - 29

slide-33
SLIDE 33

Control flow on Genode’s -hw- and NOVA

Event source

(timer)

VMM kernel VM

vCPU vCPU0 vCPU1 Entrypoint hw/NOVA VM exit signal/IPC call run/IPC reply non-blocking VM resume event (timeout) pause/recall non-blocking VM exit signal/IPC call run/IPC reply inject vIRQ

Microkernel virtualization under one roof - dare the impossible - 30

slide-34
SLIDE 34

Control flow on seL4 and Fiasco.OC

VMM

IPCBuffer

kernel space user space

seL4 microkernel IPCBuffer vCPU state VMCS

thread vCPU syscall done vmenter (blocking)

VMM

UTCB vCPU state

kernel space user space

Fiasco.OC microkernel UTCB vCPU state VMCS/VMCB

thread vCPU syscall done vmresume (blocking)

Microkernel virtualization under one roof - dare the impossible - 31

slide-35
SLIDE 35

Control flow on seL4 and Fiasco.OC

Event source

(timer)

VMM kernel VM

vCPU0 vCPU1 Entrypoint seL4/Fiasco.OC vmenter/vmresume blocking syscall VM resume

Blocking syscall unfortunate → complicates life Kernels provide mechanism to cancel Avoid special case handling in Genode for first take → Workaround: spawn per vCPU extra thread

Microkernel virtualization under one roof - dare the impossible - 32

slide-36
SLIDE 36

Control flow on seL4 and Fiasco.OC

Event source

(timer)

VMM kernel VM

vCPU0 vCPU1 Entrypoint Handler0 Handler1 run run non-blocking vmenter/vmresume VM resume run run non-blocking vmenter/vmresume VM resume

Microkernel virtualization under one roof - dare the impossible - 33

slide-37
SLIDE 37

Control flow on seL4 and Fiasco.OC

Event source

(timer)

VMM kernel VM

vCPU0 vCPU1 Entrypoint Handler0 seL4/Fiasco.OC run non-blocking vmenter/vmresume blocking syscall VM resume event (timeout) pause cancel vmenter/vmresume vmenter/vmresume syscall returns signal run inject vIRQ vmenter/vmresume inject vIRQ VM resume

Microkernel virtualization under one roof - dare the impossible - 34

slide-38
SLIDE 38

Outline

  • 1. Introduction
  • 2. Kernel interfaces
  • 3. VM interface harmonization
  • 4. VMMs harmonized
  • 5. Conclusion

Microkernel virtualization under one roof - dare the impossible - 35

slide-39
SLIDE 39

VMM unit test

Control flow and exit handling on few instructions Multiple vCPUs, multiple EPs, multiple physical CPUs

Microkernel virtualization under one roof - dare the impossible - 36

slide-40
SLIDE 40

VMM unit test

Control flow and exit handling on few instructions Multiple vCPUs, multiple EPs, multiple physical CPUs sel4 v9.0: Kernel fault on VMEnter by non vCPU thread → patch

Microkernel virtualization under one roof - dare the impossible - 36

slide-41
SLIDE 41

VMM unit test

Control flow and exit handling on few instructions Multiple vCPUs, multiple EPs, multiple physical CPUs sel4 v9.0: Kernel fault on VMEnter by non vCPU thread → patch No unrestricted guest support → patch

Microkernel virtualization under one roof - dare the impossible - 36

slide-42
SLIDE 42

VMM unit test

Control flow and exit handling on few instructions Multiple vCPUs, multiple EPs, multiple physical CPUs sel4 v9.0: Kernel fault on VMEnter by non vCPU thread → patch No unrestricted guest support → patch Scheduling bug if vCPU spins → starvation → patch

Microkernel virtualization under one roof - dare the impossible - 36

slide-43
SLIDE 43

VMM unit test

Control flow and exit handling on few instructions Multiple vCPUs, multiple EPs, multiple physical CPUs sel4 v9.0: Kernel fault on VMEnter by non vCPU thread → patch No unrestricted guest support → patch Scheduling bug if vCPU spins → starvation → patch Kernel denies to boot on non VT-x platforms → patch

Microkernel virtualization under one roof - dare the impossible - 36

slide-44
SLIDE 44

VMM unit test

Control flow and exit handling on few instructions Multiple vCPUs, multiple EPs, multiple physical CPUs sel4 v9.0: Kernel fault on VMEnter by non vCPU thread → patch No unrestricted guest support → patch Scheduling bug if vCPU spins → starvation → patch Kernel denies to boot on non VT-x platforms → patch → Working toy VMM on all 3 kernels → no AMD support by seL4

Microkernel virtualization under one roof - dare the impossible - 36

slide-45
SLIDE 45

Seoul VMM

Replaced all NOVA specific parts Simple Genode based guests for testing Running again after few days on Genode/NOVA

Microkernel virtualization under one roof - dare the impossible - 37

slide-46
SLIDE 46

Seoul VMM

Replaced all NOVA specific parts Simple Genode based guests for testing Running again after few days on Genode/NOVA Various debugging sessions on Fiasco.OC and seL4 → war stories (backup slides) → 1 kernel patch for seL4 and 1 for Fiasco.OC

Microkernel virtualization under one roof - dare the impossible - 37

slide-47
SLIDE 47

Seoul VMM

Replaced all NOVA specific parts Simple Genode based guests for testing Running again after few days on Genode/NOVA Various debugging sessions on Fiasco.OC and seL4 → war stories (backup slides) → 1 kernel patch for seL4 and 1 for Fiasco.OC State: kernel agnostic Seoul VMM on all 3 kernels Guests: Genode VMs, Linux VM+network+SMP seL4: kernel fault on Linux SMP VM → not investigated

Microkernel virtualization under one roof - dare the impossible - 37

slide-48
SLIDE 48

VBox 5 VMM - current state

Work in progress - current state: Kernel agnostic VBox5 binary ready and runnable NOVA: simple Genode VMs running again seL4/Fiasco.OC: VM gets up, fails/hangs early Known remaining challenges: Guest FPU state access required

◮ Missing in VM interface ◮ Support by seL4 and Fiasco.OC unclear

seL4: no support for 64bit guests

Microkernel virtualization under one roof - dare the impossible - 38

slide-49
SLIDE 49

Outline

  • 1. Introduction
  • 2. Kernel interfaces
  • 3. VM interface harmonization
  • 4. VMMs harmonized
  • 5. Conclusion

Microkernel virtualization under one roof - dare the impossible - 39

slide-50
SLIDE 50

Conclusion

Dare the impossible → possible* Restrictions depending on the kernel Roadmap: Finish VBox5 adaptation Extend -hw- kernel with VT-x extensions Optional: support other platforms, e. g. ARM Benefits: Portable VMMs across kernels Genode users have the ultimate kernel choice

Microkernel virtualization under one roof - dare the impossible - 40

slide-51
SLIDE 51

Thank you

Genode OS Framework https://genode.org Source code at GitHub https://github.com/genodelabs/genode Stories around Genode https://www.genodians.org Genode Labs GmbH https://www.genode-labs.com

Microkernel virtualization under one roof - dare the impossible - 41

slide-52
SLIDE 52

Backup

Microkernel virtualization under one roof - dare the impossible - 42

slide-53
SLIDE 53

Seoul VMM - war stories I

Fiasco.OC: In-guest faults during protected→page mode transition reason: EFER status of host taken instead of guest Fiasco.OC: can be runtime configured → good seL4: seL4: EFER register not saved on VMexit → kernel patch

Microkernel virtualization under one roof - dare the impossible - 43

slide-54
SLIDE 54

Seoul VMM - war stories II

CR* shadow/mask handling required on seL4 & Fiasco.OC Took some time, caused friction Open issue:

◮ Kernels overwrites some bits in CR* to adhere to hardware requirements ◮ Overriden bits not known/announced to VMM ◮ Read back CR* modifications contains changes of hypervisor and VM mixed

◮ Leads to various invalid guest states

◮ Heuristics required - unexpected but manageable:

◮ Job of Fiasco.OC/seL4 vs VMM ?

Microkernel virtualization under one roof - dare the impossible - 44

slide-55
SLIDE 55

Seoul VMM - war stories III

Another test VM: seL4 and NOVA: worked fine Fiasco.OC: invalid guest state Long long sessions of VM state diffs between kernels Happens on switch from protected → real mode Source reason: vIRQ injection can not be reset by VMM on Fiasco.OC Patching kernel helps

Microkernel virtualization under one roof - dare the impossible - 45