SLIDE 1
Decentralized Deduplication in SAN Cluster File Systems
Austin T. Clements∗ Irfan Ahmad Murali Vilayannur Jinyuan Li VMware, Inc.
∗MIT CSAIL
Abstract
File systems hosting virtual machines typically con- tain many duplicated blocks of data resulting in wasted storage space and increased storage array cache footprint. Deduplication addresses these problems by storing a sin- gle instance of each unique data block and sharing it be- tween all original sources of that data. While deduplica- tion is well understood for file systems with a centralized component, we investigate it in a decentralized cluster file system, specifically in the context of VM storage. We propose DEDE, a block-level deduplication sys- tem for live cluster file systems that does not require any central coordination, tolerates host failures, and takes ad- vantage of the block layout policies of an existing cluster file system. In DEDE, hosts keep summaries of their
- wn writes to the cluster file system in shared on-disk
- logs. Each host periodically and independently processes
the summaries of its locked files, merges them with a shared index of blocks, and reclaims any duplicate blocks. DEDE manipulates metadata using general file system in- terfaces without knowledge of the file system implemen-
- tation. We present the design, implementation, and eval-
uation of our techniques in the context of VMware ESX
- Server. Our results show an 80% reduction in space with
minor performance overhead for realistic workloads.
1 Introduction
Deployments of consolidated storage using Storage Area Networks (SANs) are increasing, motivated by universal access to data from anywhere, ease of backup, flexibil- ity in provisioning, and centralized administration. SAN arrays already form the backbone of modern data cen- ters by providing consolidated data access for multiple hosts simultaneously. This trend is further fueled by the proliferation of virtualization technologies, which rely on shared storage to support features such as live migration
- f virtual machines (VMs) across hosts.
SANs provide multiple hosts with direct SCSI access to shared storage volumes. Regular file systems assume exclusive access to the disk and would quickly corrupt a shared disk. To tackle this, numerous shared disk clus- ter file systems have been developed, including VMware VMFS [21], RedHat GFS [15], and IBM GPFS [18], which use distributed locking to coordinate concurrent access between multiple hosts. Cluster file systems play an important role in virtual- ized data centers, where multiple physical hosts each run potentially hundreds of virtual machines whose virtual disks are stored as regular files in the shared file sys-
- tem. SANs provide hosts access to shared storage for
VM disks with near native SCSI performance while also enabling advanced features like live migration, load bal- ancing, and failover of VMs across hosts. These shared file systems represent an excellent oppor- tunity for detecting and coalescing duplicate data. Since they store data from multiple hosts, not only do they con- tain more data, but data redundancy is also more likely. Shared storage for VMs is a ripe application for dedupli- cation because common system and application files are repeated across VM disk images and hosts can automat- ically and transparently share data between and within
- VMs. This is especially true of virtual desktop infras-
tructures (VDI) [24], where desktop machines are virtual- ized, consolidated into data centers, and accessed via thin
- clients. Our experiments show that a real enterprise VDI
deployment can expend as much as 80% of its overall storage footprint on duplicate data from VM disk images. Given the desire to lower costs, such waste provides mo- tivation to reduce the storage needs of virtual machines both in general and for VDI in particular. Existing deduplication techniques [1,3–5,8,14,16,17, 26] rely on centralized file systems, require cross-host communication for critical file system operations, per- form deduplication in-band, or use content-addressable
- storage. All of these approaches have limitations in our
- domain. Centralized techniques would be difficult to ex-