Vhost and VIOMMU Jason Wang <jasowang@redhat.com> (Wei Xu - - PowerPoint PPT Presentation

vhost and viommu
SMART_READER_LITE
LIVE PREVIEW

Vhost and VIOMMU Jason Wang <jasowang@redhat.com> (Wei Xu - - PowerPoint PPT Presentation

Vhost and VIOMMU Jason Wang <jasowang@redhat.com> (Wei Xu <wexu@redhat.com>) Peter Xu <peterx@redhat.com> Agenda IOMMU & Qemu vIOMMU background Motivation of secure virtio DMAR (DMA Remapping) Design


slide-1
SLIDE 1

Vhost and VIOMMU

Jason Wang <jasowang@redhat.com> (Wei Xu <wexu@redhat.com>) Peter Xu <peterx@redhat.com>

slide-2
SLIDE 2

08/18/16 VHOST AND VIOMMU 2

Agenda

  • IOMMU & Qemu vIOMMU background
  • Motivation of secure virtio
  • DMAR (DMA Remapping)

– Design Overview – Implementation illustration – Performance optimization – vhost device iotlb

  • IR (Interrupt Remapping)
  • Performance results & status
slide-3
SLIDE 3

08/18/16 VHOST AND VIOMMU 3

  • What is IOMMU?

A hardware component provides two main functions: IO Translation and Device Isolation.

  • How IO Translation and Device Isolation are supported by IOMMU

DMA Remapping(DMAR), IO space address presented by devices are translated to physical address coupled with access permission on the fly, so the ability of devices are limited to access specific regions of memory.

Interrupt Remapping (IR), Some architectures also support interrupt remapping, in a manner similar to memory remapping.

  • What's qemu vIOMMU?

An emulated IOMMU which behaves as a real one.

The functionality is always a subset of a physical unit depending on implementation.

Only Intel, ppc, sun4m iommus are support in qemu currently.

IOMMU & Qemu vIOMMU Revisit

slide-4
SLIDE 4

08/18/16 VHOST AND VIOMMU 4

IOMMU and vIOMMU

Memory vIOMMU Emulated Devices vCPU vMMU

VM

Memory vIOMMU Emulated Devices vCPU vMMU

VM HOST

Host Memory IOMMU MMU Hardware Devices CPU

slide-5
SLIDE 5

08/18/16 VHOST AND VIOMMU 5

Motivation

  • Security, Securtiy and security.
  • DPDK: The Userspace Polling-Mode drivers (DPDK)

for virtio net devices are vastly used in NFV.

  • Vhost is the popular backend for most of user cases.
  • Vhost is still out of IOMMU scope.
slide-6
SLIDE 6

DMA Remapping (DMAR)

slide-7
SLIDE 7

08/18/16 VHOST AND VIOMMU 7

gpa Virtio-Net Backends Vring

Vhost-net Vhost-user Other virtio-net backends

tx/rx

Memory API Virtio-Net

gpa Qemu

Virtio-Net Device Address Space Overview

Guest

Virtio-Net Backend Service

gpa-to-hva

Guest pages

slide-8
SLIDE 8

08/18/16 VHOST AND VIOMMU 8 iova

Virtio-Net Backends Vring

Vhost-net Vhost-user Other virtio-net backends

tx/rx

IOMMU Driver vIOMMU IOTLB API

dma api iotlb entry lookup

Memory API Virtio-Net

iova

Qemu

Design of Secure Virtio-Net Device Driver

Guest

Virtio-Net Backend Service

iova-to-hva Guest Pages

slide-9
SLIDE 9

08/18/16 VHOST AND VIOMMU 9

Implementation: Guest

  • Guest

Boot guest with a vIOMMU assigned.

VIRTIO_F_IOMMU_PLATFORM, if this feature bit is provided in the device, then the guest virtio driver is forced to use dma api to manage all corresponding dma memory access, otherwise the device will be disabled by system compulsorily.

slide-10
SLIDE 10

08/18/16 VHOST AND VIOMMU 10

Implementation: Qemu and Backends

  • Qemu

DMA address translation for vIOMMU has been fully supported, unfortunately, virtio-pci devices is still using memory address space and never use iova at all, switch to use dma address(iova).

  • Backends

All address access to vring must be translated from guest iova to hva, this is done via iotlb lookup with interfering of vIOMMU.

slide-11
SLIDE 11

08/18/16 VHOST AND VIOMMU 11

More optimization: Vhost Device IOTLB Cache

  • Why it comes to vhost?

Vhost-net is the most powerful and reliable in-kernel network backend, and is widely used as a preferred backend.

  • What problem does vhost encounter?

IOTLB api of vIOMMU is implemented in qemu, while vhost works in kernel, high frequency of iotlb translations which traverse between kernel and userspace will impact performance dramatically.

  • How does vhost survive?

Kernel-Side device iotlb cache(ATS).

slide-12
SLIDE 12

08/18/16 VHOST AND VIOMMU 12

Root Complex Translation Agent (TA) PCIe Device A PCIe Device B ats request ats completion device iotlb cache

Memory

Address Translation Services(ATS) Overview

slide-13
SLIDE 13

08/18/16 VHOST AND VIOMMU 13

Why Address Translation Services(ATS)?

  • Alternative

An individual VT-d in vhost, drawbacks:

  • Code duplication.
  • Vendor and architecture specific.
  • New api for error reporting.
  • Benefits of ATS

PCIe spec

Platform independent.

Easily achieved based on current iommu infrastructure.

slide-14
SLIDE 14

08/18/16 VHOST AND VIOMMU 14

a

translate iova 'd' iotlb-miss 'd' iotlb-update 'd' iotlb invalidate 'c'

Vhost

(d, size, wo)

IOTLB API

lookup new

error report

illegal address range update 'd' guest unmap 'c'

Vring

Qemu

Vhost Device IOTLB Cache Workflow

Tx/Rx

device iotble cache entries interval tree

(a, size, ro)

legal address range

Vhost IOTLB API

(b, size, wo) (c, size, rw) (d, size, wo)

slide-15
SLIDE 15

08/18/16 VHOST AND VIOMMU 15

Vhost Device IOTLB Implementation Summary

  • Implementation
  • Save device iotlb cache entries in kernel.
  • Lookup entry from the cache when accessing virtio buffers.
  • Request qemu to translate for any tlb miss on demand.
  • Process update/invalidate message from qemu and manage

the kernel cache correctly.

  • Data Structure and Userspace/Kernel Interface
  • An interval tree is chosen to save the dynamica device iotlb caches.
  • A message mechanism via vhost 'fd' read/write is used to pass vATS

request and reply.

slide-16
SLIDE 16

Interrupt Remapping (IR)

slide-17
SLIDE 17

08/18/16 VHOST AND VIOMMU 17

X86 system interrupts

System Bus Bridge Signal-based Interrupts (MSI/MSIX) IOAPIC Line-based Interrupts PCI Bus Processor Local APIC Processor Local APIC Processor Local APIC

...

Kinds of interrupts:

Line-based (edge/level)

Signal-based (MSI/MSI-X)

IRQ chips

IOAPIC

Local APICs (LAPICs)

slide-18
SLIDE 18

08/18/16 VHOST AND VIOMMU 18

IR challenges for vhost

  • Interrupt remapping (IR) still not supported for x86 vIOMMU

– MSI and IOAPIC interrupts

  • Kernel irqchip support:

– How to define interface between user and kernel space? – How to enable vhost fast irq path (irqfd)?

  • Performance impact?
  • Interrupt caching
slide-19
SLIDE 19

08/18/16 VHOST AND VIOMMU 19

IOAPIC interrupt delivery

  • Workflow before IR:

Fill in IOAPIC entry with interrupt information (trigger mode, destination ID, destination mode, etc.).

When line triggered, interrupt sent to CPU with information stored in IOAPIC entry.

  • Workflow after IR (IRTE: Interrupt Remapping Table Entry):

Fill in IRTE with interrupt information (in system memory).

Fill in IOAPIC entry with IRTE index.

When line triggered, fetch IRTE index from IOAPIC entry, send the interrupt with information stored in specific IRTE.

slide-20
SLIDE 20

08/18/16 VHOST AND VIOMMU 20

MSI/MSI-X delivery

Interrrupt Request (MSI) Interrrupt Request (MSI with IR) IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE Interrupt Remapping T able Interrrupt Request (MSI) Interrrupt Remapping T able Entry (IRTE) Lookup Indexing Parse Delivered Delivered MSI Delivery without IR MSI Delivery with IR

slide-21
SLIDE 21

08/18/16 VHOST AND VIOMMU 21

IR with kernel-irqchip

  • We want interrupts “as fast as before”.
  • Current implementation:

– Leverage existing GSI routing table in KVM – Instead of translate “on the fly”, translate during setup – Easy to implement (no KVM change required) – Little performance impact (slow setup, fast delivery) – Only support “split|off” kernel irqchip, not “on”

slide-22
SLIDE 22

08/18/16 VHOST AND VIOMMU 22

Remap irqfd interrupts

  • Fast IRQ path for vhost devices: without remapping

vhost KVM Event Guest Notifjer IRQ injection

GSI Routing T able

Guest

MSI Message 1 MSI Message 2 MSI Message 3 MSI Message 4

QEMU

Setup Setup

slide-23
SLIDE 23

08/18/16 VHOST AND VIOMMU 23

Remap irqfd interrupts (cont.)

  • Fast IRQ path for vhost devices: with remapping

vhost KVM Event Guest Notifjer IRQ injection

GSI Routing T able

Guest

T ranslated MSI Message 4 T ranslated MSI Message 3 T ranslated MSI Message 2 T ranslated MSI Message 1

QEMU

Setup Setup

slide-24
SLIDE 24

08/18/16 VHOST AND VIOMMU 24

All in all...

  • To boot guest with DMAR and IR enabled:

(Possibly one extra flag to enable DMAR for guest virtio driver)

qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \

  • device intel-iommu,intremap=on \
  • netdev tap,id=tap1,script=no,downscript=no,vhost=on \
  • device virtio-net-pci,netdev=tap1,disable-modern=off,ats=on

qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \

  • device intel-iommu,intremap=on \
  • netdev tap,id=tap1,script=no,downscript=no,vhost=on \
  • device virtio-net-pci,netdev=tap1,disable-modern=off,ats=on
slide-25
SLIDE 25

08/18/16 VHOST AND VIOMMU 25

Vhost + VIOMMU Performance

  • For dynamic DMA mapping (e.g., using generic Linux kernel drivers):

– Performance dropped drastically – TCP_STREAM: 24500 Mbps

600 Mbps →

– TCP_RR: 25000 trans/s

11600 trans/s →

  • For static DMA mapping (e.g., DPDK based application like l2fwd)

– Around 5% performance drop for throughput (pktgen) – Still more work TBD...

slide-26
SLIDE 26

08/18/16 VHOST AND VIOMMU 26

Current status & TBDs

  • DMAR/IR upstream status:

QEMU: IR merged (Peter Xu), DMAR still RFC (Jason Wang will post formal patch soon)

Vhost & Virtio driver: merged (Michael S. Tsirkin/Jason Wang)

DPDK: vhost-user IOTLB is being developed (Victor Kaplansky)

  • TBDs

Performance tuning for DMAR

Quite a few enhancements for IR: explicit cache invalidations, better error handling, etc.

slide-27
SLIDE 27

08/18/16 VHOST AND VIOMMU 27

Thanks!

slide-28
SLIDE 28

08/18/16 VHOST AND VIOMMU 28

Appendix

slide-29
SLIDE 29

08/18/16 VHOST AND VIOMMU 29

Kernel-irqchip: a review

  • Command line interface:
  • Supported modes

Mode IOAPIC APIC “ON” In kernel “SPLIT” In userspace In kernel In kernel In userspace “OFF” In userspace

qemu-system-x86_64 -M q35,kernel-irqchip={on|off|split} qemu-system-x86_64 -M q35,kernel-irqchip={on|off|split}