Virtualization BOF Isaku Yamahata <yamahata@valinux.co.jp> - - PowerPoint PPT Presentation

virtualization bof
SMART_READER_LITE
LIVE PREVIEW

Virtualization BOF Isaku Yamahata <yamahata@valinux.co.jp> - - PowerPoint PPT Presentation

Virtualization BOF Isaku Yamahata <yamahata@valinux.co.jp> <yamahata@private.email.ne.jp> Japan Linux Symposium October 23, 2009 Agenda Introduction New chipset emulator in qemu Other desired features QEMU (Any


slide-1
SLIDE 1

Virtualization BOF

Japan Linux Symposium October 23, 2009 Isaku Yamahata <yamahata@valinux.co.jp> <yamahata@private.email.ne.jp>

slide-2
SLIDE 2

Agenda

  • Introduction
  • New chipset emulator in qemu
  • Other desired features
  • QEMU
  • (Any other virtuliazation topics?)
slide-3
SLIDE 3

Introduction

  • The active development in virtualization

has shifted from cpu itself to surrouding area.

– Eg, IO, guest firmware, FT/HT

  • Virtualization target has diverged from

server to Desktop/Client/Embedded

– So do Required features.

slide-4
SLIDE 4

Introduction(cont.)

  • QEMU is the key component of IO emulation

– QEMU is heavily utilized by virtulalization

projects as device emulator.

– Topics are mainly on IO device emulation/guest

firmware.

  • Other virtualization related topics are

also welcome.

slide-5
SLIDE 5

New chipset emulator for new hardware feature

slide-6
SLIDE 6

Background

  • Current Qemu emulates

– For Pentium Pro/II/III – North bridge: I440FX – South bridge: PIIX3 (and PIIX4 for acpi power

management and pci hot plug)

– Hardware release date: May 1996

  • Too old compared to new real hardware

features

slide-7
SLIDE 7

Motivation

  • PCI

– Qemu only support part of PCI specs. e.g.64bit

BAR

– More buses/slots

  • Qemu only support host buses (for PC emulation)
  • 3+ pci bus(96+ slots)/96+ pcie slots
  • Brige emulation: filtering
  • PCI express isn't supported

– PCI express has more advanced features

slide-8
SLIDE 8

Motivation(cont.)

  • Native direct attachment of PCI express

device to guest OS

– Currently it can be attached as PCI device

  • Xen calls it pci passthrough
  • Kvm calls it device assignment

– PCI express has more features

  • Need to fill the gap between newer real

hardware feature and qemu emulation mainly in PCI related area.

slide-9
SLIDE 9

Challenges

  • Chiken and egg

– qemu emulated devices doesn't use those new

features

– The new features haven't been provided, qemu

emulated devices won't use them

  • Testing

– Testing those features without real user – Variety of target. PCI is used by many targets – PCI express direct attach is a way for test,

but their qemu aren't based on very unstable version

slide-10
SLIDE 10

Why PCI express?

Photo:Cited from wikipedia

slide-11
SLIDE 11

Features from software point of view

  • MMCONFIG (>0xff configuration space)
  • PCI express native hot plug (not ACPI

based)

  • AER(Advanced Error Reporting)
  • ARI(Alternative Routing ID)
  • PCI express native power management
slide-12
SLIDE 12

PCI express extended configuration space

PCI configuration space PCI configuration space 0x00 0xff PCI express extended configuration space PCI express extended configuration space 0x00 0xff 0xfff PCI configuration space PCI express enhanced access mechanism (ECAM)

slide-13
SLIDE 13

PCIe MMCONFIG

MMCFG (max 256MB) 0x0 MCFG base address 0xFFFF FFFF PCI express extended configuration space 0x0 0xff 0xffff

slide-14
SLIDE 14

PCI express hot plug

PCI express downstream port PCI express slot PCI express upstream port PCI express downstream port PCI express slot PCI express downstream port PCI express slot PCI express switch Power indicator Attention indicator Attention button

slide-15
SLIDE 15

New chipset emulator

  • Q35 chipset based

– For Core2 Duo – North bridge: mch – South bridge: ich9 – Release date: Sep 2007 – In fact I have chosen Q35 because I have it

available at hand.

  • Newer chipsets(gmch/ioh, ich10) have mostly same

feature from the point of emulation except graphics.

– Now it lacks iommu/graphics emulation so it

should be called P45?

slide-16
SLIDE 16

New chipset emulator(cont.)

  • Now the followings are working

– 64bit BAR – PCI express MMCONFIG – BIOS updates(MCFG, e820) – Linux boots happily using MMCONFIG

  • Haven't tested other OSes.
slide-17
SLIDE 17

BIOS

  • ACPI MCFG to specify MMCONFIG area
  • E820 update

– Make e820 code 64bit aware.

  • So far it filled higher bits with zero.

– Linux requires MCFG area is covered by e820

reserved area

– Otherwise Linux thinks that it's bios bug and

avoids to use MMCONFIG.

slide-18
SLIDE 18

BIOS(cont.)

  • PCI initialization

– Teach bios new chipset – PCI IO/memory area assignment for multi pci

bus.

slide-19
SLIDE 19

ACPI

  • ACPI tables update

– FADT – MCFG – DSDT

  • PCI express(PNP0A08)
  • PCI routing table
slide-20
SLIDE 20

Future work:PCI express

  • PCI express hot plug will be provided as

pcie switch emulator (not integrated into chipset)

– Many (96+) port wanted

  • ARI(alternative routing ID)
slide-21
SLIDE 21

Future work:PCI express(cont.)

  • PCI express native direct attach.

– PCI express specific configuration registers

should be virtualized

  • Device serial number cap, VSEC...

– AER(Advanced Error Report): passing errors to

guest OS

– Power management

  • Multi PCI domain?

– More slots

slide-22
SLIDE 22

Future work:BIOS

  • pcbios(bochs bios) vs seabios

– Pcbios is from bochs. – Seabios is more clean and featured.

  • Qemu switches from pcbios to seabios

– Now qemu uses pcbios so that patches for

pcbios has been created.

– Qemu 0.12.0 release will use seabios instead

  • f pcbios.

– So patches for seabios is necessary for

merging.

slide-23
SLIDE 23

Future work:ACPI

  • Code change is small, however acpi table

change would be large.

– Have two tables (more in future?), and switch

it dynamically?

– pass tables outside qemu, say, by command line

  • ption.
  • Requires interface between qemu and bios
  • fw_cfg

– Dynamically generating acpi code?

  • COREBOOT does.
slide-24
SLIDE 24

Future work: Direct attach in qemu?

  • Does qemu want the feature?

– Hopefully consolidate xen and kvm code.

slide-25
SLIDE 25

PCI express PCIe MMCONFIG Q35 chipset

  • working. Waiting for pcie MMCONFIG

PCIe portemulator WIP pcie native hotplug WIP pcie passthrough WIP 3+ pci bus working pcbios mcfg working e820 working working working WIP sea bios: Not started yet Working Under heavy review for merge. host bridge initiazatlin pci io/memory space initialization switching acpi table

  • r passing acpi table
  • utside qemu

Summary: current status

slide-26
SLIDE 26

Other desired features?

slide-27
SLIDE 27

Other hot plug

  • SATA/eSATA hotplug?

– AHCI

slide-28
SLIDE 28

Other feature: IOMMU

  • IOMMU: Intel VT-d, AMD IOMMU
  • Usage model?

– Guest OS wants IOMMU?

  • IOMMU emulator in qemu

– Implementation will be interesting.

  • Shadow paging of IOMMU for guest OS

– At the moment DMA fault and restart isn't

possible due to PCI specification.

slide-29
SLIDE 29

Other feature: graphics

  • Integrated Graphics of gmch

– Anyway GPU support is highly wanted.

  • GPU passthrough
slide-30
SLIDE 30

Other feature: APICs

  • IOAPIC

– IOAPIC is performance critical, so IOAPIC

emulation is done in kernel/hypervisor.

– Does it make sense to address IOAPIC in qemu? – More than 24 pins – Multi IOAPIC

slide-31
SLIDE 31

Other feature: firmware

  • gPXE

– More device support

  • igb, igbvf
  • Guest UEFI

– Tristan Gingold created guest UEFI using

edk2.tianocore.org

slide-32
SLIDE 32

HT/FT

  • Kemari
slide-33
SLIDE 33

PCI DMA fault/restart?

  • DMA fault/restart?
slide-34
SLIDE 34

QEMU

slide-35
SLIDE 35

Long term(?) qemu feature

  • Nested VT-x

– AMD SVM is there

  • Threading QEMU

– For guest SMP – For scalability

  • Machine config file

– Allow more flexible machine

  • For more complex pci bus topology
  • Device tree
slide-36
SLIDE 36

Thank you

slide-37
SLIDE 37

backup

slide-38
SLIDE 38

Qemu next release

  • Qemu 0.11.0 released
  • Now planning for 0.12.0

– Anthony Liguori thinks aiming for early to mid

December

– three month cycle

slide-39
SLIDE 39

Planned features

  • Qdev
  • Vmstate
  • seabios switch
  • gPXE switch
  • KVM

– In-kernel APIC – guset SMP

  • Multiport virtio-console
slide-40
SLIDE 40

Planned features

  • Machine monitor protocol

– Robust UI for human and machine – QObject

  • NEC PC-9821