virtually persistent data
play

Virtually Persistent Data Andrew Warfield XenSource Quick Overview - PowerPoint PPT Presentation

Virtually Persistent Data Andrew Warfield XenSource Quick Overview Blktap driver overview/update Performance vs. Safety Architecture Tapdisks Block consistency and live migration Problem, current solution, future


  1. Virtually Persistent Data Andrew Warfield XenSource

  2. Quick Overview • Blktap driver overview/update – Performance vs. Safety – Architecture – Tapdisks • Block consistency and live migration – Problem, current solution, future solution

  3. Why BlkTap? • It turns out that performance isn’t the only requirement for VM storage. • Other requirements: – Correctness – Managability (e.g. file-based disks) – Crazier things (CoW/encryption/network) • Doing these things at the Linux block layer is tricky.

  4. Example: blkback + loopback Dom0 DomU blkback blkfront disk Example 1: Normal block device operation. • Go through safety concerns, mention fix in loopback2, also mention qos/block shed issues.

  5. Example: blkback + loopback Dom0 DomU blkback blkfront disk Example 1: Normal block device operation. • Blkback preserves the semantics of the physical disk. • DomU is notified of completion once data is really written.

  6. Example: blkback + loopback Dom0 DomU OOM Killer nfs loop blkback blkfront Example 2: Loopback and NFS • New loopback driver should fix this. • Associating requests with a process has qos/sheduling benefits.

  7. Architecture Blktapctrl tapdisk tapdisk Each tapdisk process handles requests for a VBD. Blktap Blkfront Blkfront Dom0 DomU DomU • Blktap handles IDC • Tapdisk implements a virtual block device – Synchronous, AIO, QCoW, VMDK, Shared mem • Zero-copy throughout • Tapdisks are isolated processes

  8. Current tapdisks • Individual tapdisk drivers may struct tap_disk be implemented as plugins. tapdisk_aio = { • Generally a few hundred lines "tapdisk_aio", of code. sizeof(struct • Very similar to qemu’s block tdaio_state), plugins, but with an tdaio_open, asynchronous interface. tdaio_queue_read, • Currently have asynchronous tdaio_queue_write, raw (device or image file), tdaio_submit, QCoW, VMDK, and shared- memory disks. tdaio_get_fd, tdaio_close, tdaio_do_callbacks, };

  9. Blktap Performance

  10. CoW/Image formats • Many image formats now exist for block devices – QCoW, VMDK, VHD • Work a lot like page tables: mapping tree for block address resolution – Typically use a bitmap at the bottom level • Ensuring consistency adds overhead – Metadata is dependent on data – Must be written before acknowledging request to DomU • We are currently working with a modified version of the QCoW format – 4K blocks, all-at-once extent allocation, bitmaps.

  11. Migration Consistency Dom0 DomU Dom0 W W W W blkfront W W Host A Host B • Drivers use request shadow rings to handle migration • Unacked requests are reissued on arrival • Slight risk of write-after-write hazard. • Now fixed, more optimal plan for later.

  12. What’s next? • Copy on Write • Live snapshots • Maybe some network-based/distributed storage.

  13. end

  14. Performance(2)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend