Virtually Persistent Data Andrew Warfield XenSource Quick Overview - - PowerPoint PPT Presentation

virtually persistent data
SMART_READER_LITE
LIVE PREVIEW

Virtually Persistent Data Andrew Warfield XenSource Quick Overview - - PowerPoint PPT Presentation

Virtually Persistent Data Andrew Warfield XenSource Quick Overview Blktap driver overview/update Performance vs. Safety Architecture Tapdisks Block consistency and live migration Problem, current solution, future


slide-1
SLIDE 1

Virtually Persistent Data

Andrew Warfield XenSource

slide-2
SLIDE 2

Quick Overview

  • Blktap driver overview/update

– Performance vs. Safety – Architecture – Tapdisks

  • Block consistency and live migration

– Problem, current solution, future solution

slide-3
SLIDE 3

Why BlkTap?

  • It turns out that performance isn’t the only

requirement for VM storage.

  • Other requirements:

– Correctness – Managability (e.g. file-based disks) – Crazier things (CoW/encryption/network)

  • Doing these things at the Linux block layer

is tricky.

slide-4
SLIDE 4

Example: blkback + loopback

  • Go through safety concerns, mention fix in

loopback2, also mention qos/block shed issues.

Dom0 DomU

Example 1: Normal block device operation. blkfront blkback disk

slide-5
SLIDE 5

Example: blkback + loopback

  • Blkback preserves the semantics of the physical disk.
  • DomU is notified of completion once data is really

written.

Dom0 DomU

Example 1: Normal block device operation. blkfront blkback disk

slide-6
SLIDE 6

Example: blkback + loopback

  • New loopback driver should fix this.
  • Associating requests with a process has qos/sheduling

benefits.

Dom0 DomU

Example 2: Loopback and NFS blkfront blkback nfs loop

OOM Killer

slide-7
SLIDE 7

Architecture

Dom0 DomU DomU

Blktap Blkfront Blkfront Blktapctrl tapdisk tapdisk

  • Blktap handles IDC
  • Tapdisk implements a virtual block device

– Synchronous, AIO, QCoW, VMDK, Shared mem

  • Zero-copy throughout
  • Tapdisks are isolated processes

Each tapdisk process handles requests for a VBD.

slide-8
SLIDE 8

Current tapdisks

  • Individual tapdisk drivers may

be implemented as plugins.

  • Generally a few hundred lines
  • f code.
  • Very similar to qemu’s block

plugins, but with an asynchronous interface.

  • Currently have asynchronous

raw (device or image file), QCoW, VMDK, and shared- memory disks. struct tap_disk tapdisk_aio = { "tapdisk_aio", sizeof(struct tdaio_state), tdaio_open, tdaio_queue_read, tdaio_queue_write, tdaio_submit, tdaio_get_fd, tdaio_close, tdaio_do_callbacks, };

slide-9
SLIDE 9

Blktap Performance

slide-10
SLIDE 10

CoW/Image formats

  • Many image formats now exist for block devices

– QCoW, VMDK, VHD

  • Work a lot like page tables: mapping tree for

block address resolution

– Typically use a bitmap at the bottom level

  • Ensuring consistency adds overhead

– Metadata is dependent on data – Must be written before acknowledging request to DomU

  • We are currently working with a modified version
  • f the QCoW format

– 4K blocks, all-at-once extent allocation, bitmaps.

slide-11
SLIDE 11

Migration Consistency

  • Drivers use request shadow rings to

handle migration

  • Unacked requests are reissued on arrival
  • Slight risk of write-after-write hazard.
  • Now fixed, more optimal plan for later.

Host B Host A Dom0 Dom0 DomU

blkfront W W W W W W

slide-12
SLIDE 12

What’s next?

  • Copy on Write
  • Live snapshots
  • Maybe some network-based/distributed

storage.

slide-13
SLIDE 13

end

slide-14
SLIDE 14

Performance(2)