qcow2 why (not)? Max Reitz <mreitz@redhat.com> Kevin Wolf - PowerPoint PPT Presentation

qcow2 – why (not)? Max Reitz <mreitz@redhat.com> Kevin Wolf <kwolf@redhat.com> KVM Forum 2015

Choosing between raw and qcow2 Traditional answer: Performance? raw! Features? qcow2! But what if you need both?

A car analogy Throwing out the seats gives you better acceleration Is it worth it?

Our goal Keep the seats in! Never try to get away without qcow2’s features

Part I What are those features?

qcow2 features Backing files Internal snapshots Zero clusters and partial allocation (on all filesystems) Compression

qcow2 metadata Image is split into clusters (default: 64 kB) L2 tables map guest offsets to host offsets Refcount blocks store allocation information

qcow2 metadata For non-allocating I/O: Only L2 tables needed

Part II Preallocated images

What is tested? Linux guest with fio (120 s runtime per test/pattern; O DIRECT AIO) 6 GB images on SSD and HDD Random/sequential 4k/1M blocks qcow2: preallocation=metadata

SSD write performance 1 . 5 raw qcow2 Fraction of raw IOPS 1 0 . 5 0 4k 1M 4k 1M random random seq seq

SSD read performance raw 1 . 2 qcow2 Fraction of raw IOPS 1 0 . 8 0 . 6 0 . 4 0 . 2 0 4k 1M 4k 1M random random seq seq

HDD write performance 1 . 5 raw qcow2 Fraction of raw IOPS 1 0 . 5 0 4k 1M 4k 1M random random seq seq

HDD read performance raw 1 . 2 qcow2 Fraction of raw IOPS 1 0 . 8 0 . 6 0 . 4 0 . 2 0 4k 1M 4k 1M random random seq seq

So? Looks good, right?

So? Let’s increase the image size!

SSD 16 GB image write performance raw qcow2 Fraction of raw IOPS 1 . 5 1 0 . 5 0 4k 1M 4k 1M random random seq seq

SSD 16 GB image read performance raw 1 . 2 qcow2 Fraction of raw IOPS 1 0 . 8 0 . 6 0 . 4 0 . 2 0 4k 1M 4k 1M random random seq seq

HDD 32 GB image write performance raw qcow2 Fraction of raw IOPS 1 0 . 5 0 4k 1M 4k 1M random random seq seq

HDD 32 GB image read performance raw 1 . 2 qcow2 Fraction of raw IOPS 1 0 . 8 0 . 6 0 . 4 0 . 2 0 4k 1M 4k 1M random random seq seq

What happened? Cache thrashing happened! qcow2 caches L2 tables; default cache size: 1 MB This covers 8 GB of an image!

How to fix it? 1 DON’T PANIC – Don’t fix it. Random accesses contained in an 8 GB area are fine, no matter the image size 2 Increase the cache size l2-cache-size runtime option e.g. -drive format=qcow2,l2-cache-size=4M,... cluster size ÷ 8 = area size area size 8192 B

SSD 16 GB image, 2 MB L2 cache, writing raw 1 . 2 qcow2 Fraction of raw IOPS 1 0 . 8 0 . 6 0 . 4 0 . 2 0 4k 1M 4k 1M random random seq seq

SSD 16 GB image, 2 MB L2 cache, reading raw 1 . 2 qcow2 Fraction of raw IOPS 1 0 . 8 0 . 6 0 . 4 0 . 2 0 4k 1M 4k 1M random random seq seq

HDD 32 GB image, 4 MB L2 cache, writing raw qcow2 Fraction of raw IOPS 1 0 . 5 0 4k 1M 4k 1M random random seq seq

HDD 32 GB image, 4 MB L2 cache, reading raw 1 . 2 qcow2 Fraction of raw IOPS 1 0 . 8 0 . 6 0 . 4 0 . 2 0 4k 1M 4k 1M random random seq seq

Results No significant difference between raw and qcow2 for preallocated images . . . As long as the L2 cache is large enough! Without COW, everything is good! But it is named qcow2 for a reason. . .

Part III Cluster allocations

Cluster allocation When is a new cluster allocated? When writing to unallocated clusters Previous content in backing file Without backing file: all zero For COW if existing cluster was shared Internal snapshots Compressed image

Copy on Write 0 64k 128k 192k Clusters Write request Data written by guest Copy on Write area Cluster content must be completely valid (64k) Guest may write with sector granularity (512b) Partial write to newly allocated cluster → Rest must be filled with old data

Copy on Write 0 64k 128k 192k Clusters Write request Data written by guest Copy on Write area COW cost is most expensive part of allocations 1 More I/O requests 2 More bytes transferred 3 More disk flushes (in some cases)

Copy on Write is slow (Problem 1) 0 64k 128k 192k Clusters Write request Data written by guest Copy on Write area Naive implementation: 2 reads and 3 writes About 30% performance hit vs. rewrite

Copy on Write is slow (Problem 1) 0 64k 128k 192k Clusters Write request Data written by guest Copy on Write area Can combine writes into a single request Fixes allocation performance without backing file Doesn’t fix other cases: read is expensive

Copy on Write is slow (Problem 2) 0 64k 128k 192k Clusters Write request 1 Write request 2 Write request 3 Write request 4 Data written by guest Copy on Write area Unnecessary COW overhead Most COW is unnecessary for sequential writes If the COW area is overwritten anyway: Avoid the copy in the first place

qcow2 data cache Metadata already uses a cache for batching. We can do the same for data! Mark COW area invalid at first Only read from backing file when accessed Overwriting makes it valid → read avoided

Data cache performance Seq. allocating writes (qcow2 with backing file) MB/s 200 master data cache 150 raw 100 50 0 8k rewrite 256k rewrite

Copy on Write is slow (Problem 3) Internal COW (internal snapshots, compression): 1 Allocate new cluster: Must increase refcount before mapping update 2 Drop reference for old cluster: Must update mapping before refcount decrease → Need two (slow) disk flushes per allocation

Copy on Write is slow (Problem 3) Possible solutions: lazy refcounts=on allows inconsistent refcounts Implement journalling allows updating both at the same time → No flushes needed → Performance fixed

Another solution: Avoid COW 0 64k 128k 192k Clusters Write request Data written by guest Stays unmodified (COW with large clusters) Don’t optimize COW, avoid it → Use a small cluster size (= sector size)

Another solution: Avoid COW 0 64k 128k 192k Clusters Write request Data written by guest Stays unmodified (COW with large clusters) But small cluster size isn’t practicable: Large metadata (but no larger caches) Potentially more fragmentation → No COW any more, but everything is slow

Subclusters 0 64k 128k 192k Clusters Subclusters Write request (Sub)cluster gets allocated Stays unallocated Split cluster size into two different sizes: Granularity for the mapping (clusters, large) Granularity of COW (subclusters, small) Add subcluster bitmap to L2 table for COW status

Subclusters 0 64k 128k 192k Clusters Subclusters Write request (Sub)cluster gets allocated Stays unallocated Requires incompatible image format change Can solve problems 1 and 2, but not 3

Status Data cache: Prototype patches exist (ready for 2.5 or 2.6?) Subclusters: Only theory, no code Still useful with cache merged Journalling: Not anytime soon Use lazy refcounts for internal COW

Questions?

qcow2 why (not)? Max Reitz <mreitz@redhat.com> Kevin Wolf - PowerPoint PPT Presentation

qcow2 why (not)? Max Reitz <mreitz@redhat.com> Kevin Wolf <kwolf@redhat.com> KVM Forum 2015 Choosing between raw and qcow2 Traditional answer: Performance? raw! Features? qcow2! But what if you need both? A car analogy

More Block Device Configuration Max Reitz <mreitz@redhat.com> Kevin Wolf

Managing the New Block Layer Kevin Wolf <kwolf@redhat.com> Max Reitz

Backups (and snapshots) with QEMU Max Reitz <mreitz@redhat.com> KVM Forum 2016 Part I

Improving the performance of the qcow2 format KVM Forum 2017 Alberto Garcia

Welcome to Reitz REITZ 10-2010/pgm Einfhrung Engineering Basisprogramm Groventilator

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf)

enterprise-2-web enterprise-2-web Randy Reitz and Tim Rupp Randy Reitz and Tim Rupp InterLab

F.J. REITZ HS 8 TH GRADE NIGHT CLASS OF 2024 Mrs. Campbell Principal F. J. Reitz High School

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Vhost and VIOMMU Jason Wang <jasowang@redhat.com> (Wei Xu <wexu@redhat.com>) Peter

The wolf in the Free State of Saxony Statement on the handling of the wolf in Saxony 1 | XX.

How to Handle Globally Distributed QCOW2 Chains? Eyal Moscovici & Amit Abir Oracle-Ravello

Backing Chain Management in libvirt and qemu Eric Blake <eblake@redhat.com> KVM Forum,

Max India Limited Max India Limited I Investor Presentation t P t ti June, 2014

Positive definite max-QP Recall max-cut max vE(1-v) s.t. 0 v 1 max

REDHAT KICKSTART REDHAT KICKSTART Automating Linux Installation Automating Linux Installation

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language

First we will introduce some necessary background. 2 For example, VGG16 can correctly classify

Objec(ves Web Server Socket programming in Java Project 1 Open up a terminal Sept 11,

Distributed Control Lab - A component-based application Overview Architecture Experiments

A cool way to publish manga in Japan OGATA Katsuhiro (Denno MAVO LLC.) 2018/09/19 W3C Workshop

Texture Mapping Texture (images) lecture 16 Texture mapping Aliasing (and anti-aliasing)

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Classification of Raster Maps for Automatic Feature Extraction Yao-Yi Chiang and Craig A.

qcow2 why (not)? Max Reitz <mreitz@redhat.com> Kevin Wolf - PowerPoint PPT Presentation

qcow2 why (not)? Max Reitz <mreitz@redhat.com> Kevin Wolf <kwolf@redhat.com> KVM Forum 2015 Choosing between raw and qcow2 Traditional answer: Performance? raw! Features? qcow2! But what if you need both? A car analogy

More Block Device Configuration Max Reitz &lt;mreitz@redhat.com&gt; Kevin Wolf

Managing the New Block Layer Kevin Wolf &lt;kwolf@redhat.com&gt; Max Reitz

Backups (and snapshots) with QEMU Max Reitz &lt;mreitz@redhat.com&gt; KVM Forum 2016 Part I

Improving the performance of the qcow2 format KVM Forum 2017 Alberto Garcia

Welcome to Reitz REITZ 10-2010/pgm Einfhrung Engineering Basisprogramm Groventilator

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf)

enterprise-2-web enterprise-2-web Randy Reitz and Tim Rupp Randy Reitz and Tim Rupp InterLab

F.J. REITZ HS 8 TH GRADE NIGHT CLASS OF 2024 Mrs. Campbell Principal F. J. Reitz High School

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Vhost and VIOMMU Jason Wang &lt;jasowang@redhat.com&gt; (Wei Xu &lt;wexu@redhat.com&gt;) Peter

The wolf in the Free State of Saxony Statement on the handling of the wolf in Saxony 1 | XX.

How to Handle Globally Distributed QCOW2 Chains? Eyal Moscovici &amp; Amit Abir Oracle-Ravello

Backing Chain Management in libvirt and qemu Eric Blake &lt;eblake@redhat.com&gt; KVM Forum,

Max India Limited Max India Limited I Investor Presentation t P t ti June, 2014

Positive definite max-QP Recall max-cut max vE(1-v) s.t. 0 v 1 max

REDHAT KICKSTART REDHAT KICKSTART Automating Linux Installation Automating Linux Installation

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language

First we will introduce some necessary background. 2 For example, VGG16 can correctly classify

Objec(ves Web Server Socket programming in Java Project 1 Open up a terminal Sept 11,

Distributed Control Lab - A component-based application Overview Architecture Experiments

A cool way to publish manga in Japan OGATA Katsuhiro (Denno MAVO LLC.) 2018/09/19 W3C Workshop

Texture Mapping Texture (images) lecture 16 Texture mapping Aliasing (and anti-aliasing)

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Classification of Raster Maps for Automatic Feature Extraction Yao-Yi Chiang and Craig A.

More Block Device Configuration Max Reitz <mreitz@redhat.com> Kevin Wolf

Managing the New Block Layer Kevin Wolf <kwolf@redhat.com> Max Reitz

Backups (and snapshots) with QEMU Max Reitz <mreitz@redhat.com> KVM Forum 2016 Part I

Vhost and VIOMMU Jason Wang <jasowang@redhat.com> (Wei Xu <wexu@redhat.com>) Peter

How to Handle Globally Distributed QCOW2 Chains? Eyal Moscovici & Amit Abir Oracle-Ravello

Backing Chain Management in libvirt and qemu Eric Blake <eblake@redhat.com> KVM Forum,