Decentralized Deduplication in SAN Cluster File Systems Austin T. - - PowerPoint PPT Presentation

decentralized deduplication in san cluster file systems
SMART_READER_LITE
LIVE PREVIEW

Decentralized Deduplication in SAN Cluster File Systems Austin T. - - PowerPoint PPT Presentation

Decentralized Deduplication in SAN Cluster File Systems Austin T. Clements Irfan Ahmad Murali Vilayannur Jinyuan Li VMware, Inc. MIT CSAIL Decentralized Deduplication in SAN Cluster File Systems Storage Area Networks Decentralized


slide-1
SLIDE 1

Decentralized Deduplication in SAN Cluster File Systems

Austin T. Clements∗ Irfan Ahmad Murali Vilayannur Jinyuan Li

VMware, Inc.

∗MIT CSAIL Decentralized Deduplication in SAN Cluster File Systems

slide-2
SLIDE 2

Storage Area Networks

Decentralized Deduplication in SAN Cluster File Systems

slide-3
SLIDE 3

Storage Area Networks

Decentralized Deduplication in SAN Cluster File Systems

slide-4
SLIDE 4

Storage Area Networks

Decentralized Deduplication in SAN Cluster File Systems

slide-5
SLIDE 5

Storage Area Networks

Decentralized Deduplication in SAN Cluster File Systems

slide-6
SLIDE 6

Storage Area Networks

1.3 TB

Decentralized Deduplication in SAN Cluster File Systems

slide-7
SLIDE 7

Storage Area Networks

1.3 TB 237 GB

Decentralized Deduplication in SAN Cluster File Systems

slide-8
SLIDE 8

Storage Area Networks

1.3 TB 237 GB 5X reduction

Decentralized Deduplication in SAN Cluster File Systems

slide-9
SLIDE 9

Deduplication

Decentralized Deduplication in SAN Cluster File Systems

slide-10
SLIDE 10

Deduplication

Decentralized Deduplication in SAN Cluster File Systems

slide-11
SLIDE 11

Deduplication

Decentralized Deduplication in SAN Cluster File Systems

slide-12
SLIDE 12

Deduplication

Decentralized Deduplication in SAN Cluster File Systems

slide-13
SLIDE 13

Deduplication

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

slide-14
SLIDE 14

Deduplication

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

slide-15
SLIDE 15

Deduplication

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

slide-16
SLIDE 16

Deduplication

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

slide-17
SLIDE 17

The Problem

Slow — Lots of IO in the write path. Can’t cache the index.

Decentralized Deduplication in SAN Cluster File Systems

slide-18
SLIDE 18

The Problem

Slow — Lots of IO in the write path. Can’t cache the index. Very Slow — Writes require allocation and thus coordination. No hope of disk locality.

Decentralized Deduplication in SAN Cluster File Systems

slide-19
SLIDE 19

The Problem

Slow — Lots of IO in the write path. Can’t cache the index. Very Slow — Writes require allocation and thus coordination. No hope of disk locality. Hopelessly Slow — Multi-host lock contention on shared index.

Decentralized Deduplication in SAN Cluster File Systems

slide-20
SLIDE 20

Three-Stage Deduplication

Decentralized Deduplication

Decentralized Deduplication in SAN Cluster File Systems

slide-21
SLIDE 21

Three-Stage Deduplication

DeDe

Decentralized Deduplication in SAN Cluster File Systems

slide-22
SLIDE 22

Three-Stage Deduplication

DeDe

Decentralized Deduplication in SAN Cluster File Systems

slide-23
SLIDE 23

Three-Stage Deduplication

DeDe

Out-of-band deduplication of live, primary storage Process duplicates efficiently, in large batches Minimize contention on the index Resilient to stale index information Unique blocks remain mutable and sequential = ⇒ No overhead for blocks that don’t benefit from deduplication

Decentralized Deduplication in SAN Cluster File Systems

slide-24
SLIDE 24

Three-Stage Deduplication

DeDe

Decentralized Deduplication in SAN Cluster File Systems

slide-25
SLIDE 25

Three-Stage Deduplication

DeDe

Decentralized Deduplication in SAN Cluster File Systems

slide-26
SLIDE 26

Three-Stage Deduplication

DeDe

Decentralized Deduplication in SAN Cluster File Systems

slide-27
SLIDE 27

Three-Stage Deduplication

DeDe

Decentralized Deduplication in SAN Cluster File Systems

slide-28
SLIDE 28

Three-Stage Deduplication

DeDe

f4bea9.. f2a4d2.. 0e7a26.. 15ba2b.. d5e341.. bc6887.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

Decentralized Deduplication in SAN Cluster File Systems

slide-29
SLIDE 29

Three-Stage Deduplication

DeDe

f4bea9.. f2a4d2.. 0e7a26.. 15ba2b.. d5e341.. bc6887.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

Decentralized Deduplication in SAN Cluster File Systems

slide-30
SLIDE 30

Three-Stage Deduplication

DeDe

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. f4bea9.. f2a4d2.. 0e7a26.. 15ba2b.. d5e341.. bc6887.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

Decentralized Deduplication in SAN Cluster File Systems

slide-31
SLIDE 31

Three-Stage Deduplication

DeDe

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. f4bea9.. f2a4d2.. 0e7a26.. 15ba2b.. d5e341.. bc6887.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

Decentralized Deduplication in SAN Cluster File Systems

slide-32
SLIDE 32

Three-Stage Deduplication

DeDe

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

Decentralized Deduplication in SAN Cluster File Systems

slide-33
SLIDE 33

Three-Stage Deduplication

DeDe

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

Decentralized Deduplication in SAN Cluster File Systems

slide-34
SLIDE 34

Three-Stage Deduplication

DeDe

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

Decentralized Deduplication in SAN Cluster File Systems

slide-35
SLIDE 35

Three-Stage Deduplication

DeDe

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

slide-36
SLIDE 36

Three-Stage Deduplication

DeDe

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

slide-37
SLIDE 37

Three-Stage Deduplication

DeDe

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

? ? ? ?

Decentralized Deduplication in SAN Cluster File Systems

slide-38
SLIDE 38

Three-Stage Deduplication

DeDe

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

? ? ? ?

Decentralized Deduplication in SAN Cluster File Systems

slide-39
SLIDE 39

Three-Stage Deduplication

DeDe

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

? ? ? ?

Decentralized Deduplication in SAN Cluster File Systems

slide-40
SLIDE 40

Implementation

Decentralized Deduplication in SAN Cluster File Systems

slide-41
SLIDE 41

Implementation

Decentralized Deduplication in SAN Cluster File Systems

slide-42
SLIDE 42

Implementation

Decentralized Deduplication in SAN Cluster File Systems

slide-43
SLIDE 43

Implementation

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

slide-44
SLIDE 44

Implementation

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

slide-45
SLIDE 45

VMFS

Compare-and-share

Decentralized Deduplication in SAN Cluster File Systems

slide-46
SLIDE 46

VMFS

Compare-and-share

?

Decentralized Deduplication in SAN Cluster File Systems

slide-47
SLIDE 47

VMFS

Compare-and-share

Decentralized Deduplication in SAN Cluster File Systems

slide-48
SLIDE 48

VMFS

Compare-and-share DeDe finds duplicates. VMFS eliminates them.

Decentralized Deduplication in SAN Cluster File Systems

slide-49
SLIDE 49

Write Monitor

Decentralized Deduplication in SAN Cluster File Systems

slide-50
SLIDE 50

Write Monitor

A lightweight kernel module monitors writes, computes hashes

Decentralized Deduplication in SAN Cluster File Systems

slide-51
SLIDE 51

Write Monitor

A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk

Decentralized Deduplication in SAN Cluster File Systems

slide-52
SLIDE 52

Write Monitor

A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk

Decentralized Deduplication in SAN Cluster File Systems

slide-53
SLIDE 53

Write Monitor

A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk Safe to buffer the log because index is resilient

Decentralized Deduplication in SAN Cluster File Systems

slide-54
SLIDE 54

Write Monitor

A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk Safe to buffer the log because index is resilient 150 MB of regular writes → 1 MB sequential log write

Decentralized Deduplication in SAN Cluster File Systems

slide-55
SLIDE 55

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Map from hashes to block locators, list sorted by hash

Decentralized Deduplication in SAN Cluster File Systems

slide-56
SLIDE 56

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash

Decentralized Deduplication in SAN Cluster File Systems

slide-57
SLIDE 57

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable

Decentralized Deduplication in SAN Cluster File Systems

slide-58
SLIDE 58

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable

Decentralized Deduplication in SAN Cluster File Systems

slide-59
SLIDE 59

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable

Decentralized Deduplication in SAN Cluster File Systems

slide-60
SLIDE 60

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks

Decentralized Deduplication in SAN Cluster File Systems

slide-61
SLIDE 61

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks

Decentralized Deduplication in SAN Cluster File Systems

slide-62
SLIDE 62

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks

Decentralized Deduplication in SAN Cluster File Systems

slide-63
SLIDE 63

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Decentralized Deduplication in SAN Cluster File Systems

slide-64
SLIDE 64

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Decentralized Deduplication in SAN Cluster File Systems

slide-65
SLIDE 65

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Decentralized Deduplication in SAN Cluster File Systems

slide-66
SLIDE 66

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 12067c.. 6cd412.. c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-67
SLIDE 67

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 12067c.. 6cd412.. c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-68
SLIDE 68

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 12067c.. 6cd412.. c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-69
SLIDE 69

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 12067c.. 6cd412.. c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-70
SLIDE 70

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 6cd412.. c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-71
SLIDE 71

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 6cd412.. c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-72
SLIDE 72

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 6cd412.. c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-73
SLIDE 73

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-74
SLIDE 74

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-75
SLIDE 75

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-76
SLIDE 76

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

?

c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-77
SLIDE 77

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-78
SLIDE 78

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared c277d6..

Decentralized Deduplication in SAN Cluster File Systems

slide-79
SLIDE 79

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Decentralized Deduplication in SAN Cluster File Systems

slide-80
SLIDE 80

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Decentralized Deduplication in SAN Cluster File Systems

slide-81
SLIDE 81

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Decentralized Deduplication in SAN Cluster File Systems

slide-82
SLIDE 82

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Decentralized Deduplication in SAN Cluster File Systems

slide-83
SLIDE 83

Phew.

Decentralized Deduplication in SAN Cluster File Systems

slide-84
SLIDE 84

Evaluation

Decentralized Deduplication in SAN Cluster File Systems

slide-85
SLIDE 85

Evaluation

How much space does DeDe save?

Decentralized Deduplication in SAN Cluster File Systems

slide-86
SLIDE 86

Evaluation

How much space does DeDe save? How much overhead does DeDe introduce?

Decentralized Deduplication in SAN Cluster File Systems

slide-87
SLIDE 87

Evaluation

How much space does DeDe save? How much overhead does DeDe introduce? How fast can DeDe deduplicate?

Decentralized Deduplication in SAN Cluster File Systems

slide-88
SLIDE 88

Space Savings: VDI Cluster

Corporate Virtual Desktop Infrastructure cluster Desktop XP VM’s 6–12 months of active use Originally cloned from small number of base images ...

Decentralized Deduplication in SAN Cluster File Systems

slide-89
SLIDE 89

Space Savings: VDI Cluster

Corporate Virtual Desktop Infrastructure cluster Desktop XP VM’s 6–12 months of active use Originally cloned from small number of base images ...

                          

113 VM’s

  

1.9 TB

Decentralized Deduplication in SAN Cluster File Systems

slide-90
SLIDE 90

Space Savings: VDI Cluster

Corporate Virtual Desktop Infrastructure cluster Desktop XP VM’s 6–12 months of active use Originally cloned from small number of base images ...

                          

113 VM’s

  

1.9 TB 1.3 TB

Decentralized Deduplication in SAN Cluster File Systems

slide-91
SLIDE 91

Space Savings: VDI Cluster

1.3 TB

Decentralized Deduplication in SAN Cluster File Systems

slide-92
SLIDE 92

Space Savings: VDI Cluster

1.3 TB 237 GB

Decentralized Deduplication in SAN Cluster File Systems

slide-93
SLIDE 93

Space Savings: VDI Cluster

1.3 TB 237 GB 173 GB unique 61 GB shared

Decentralized Deduplication in SAN Cluster File Systems

slide-94
SLIDE 94

Space Savings: VDI Cluster

1.3 TB 237 GB 173 GB unique 61 GB shared 2.7 GB

Decentralized Deduplication in SAN Cluster File Systems

slide-95
SLIDE 95

Space Savings: VDI Cluster

1.3 TB 237 GB 173 GB unique 61 GB shared 2.7 GB 1.3 GB index file 194 MB

  • v. arena

1.1 GB FS metadata

Decentralized Deduplication in SAN Cluster File Systems

slide-96
SLIDE 96

Runtime Effects

Write monitoring Disk array caching

Decentralized Deduplication in SAN Cluster File Systems

slide-97
SLIDE 97

Runtime Effects

Write monitoring Disk array caching

Decentralized Deduplication in SAN Cluster File Systems

slide-98
SLIDE 98

Runtime Effects

Write monitoring Disk array caching EMC CLARiiON CX3-40

Decentralized Deduplication in SAN Cluster File Systems

slide-99
SLIDE 99

Runtime Overhead: Write Monitoring

Worst-case benchmark: 100% sequential write IO, No computation

Decentralized Deduplication in SAN Cluster File Systems

slide-100
SLIDE 100

Runtime Overhead: Write Monitoring

Worst-case benchmark: 100% sequential write IO, No computation Baseline Write Monitor CPU 33% 220%

Decentralized Deduplication in SAN Cluster File Systems

slide-101
SLIDE 101

Runtime Overhead: Write Monitoring

Worst-case benchmark: 100% sequential write IO, No computation Baseline Write Monitor CPU 33% 220% Bandwidth (MB/s) 233 233 Latency (ms) 8.6 8.6

Decentralized Deduplication in SAN Cluster File Systems

slide-102
SLIDE 102

Runtime Overhead: Write Monitoring

Worst-case benchmark: 100% sequential write IO, No computation Baseline Write Monitor CPU 33% 220% Bandwidth (MB/s) 233 233 Latency (ms) 8.6 8.6 (See paper for database application benchmark)

Decentralized Deduplication in SAN Cluster File Systems

slide-103
SLIDE 103

Runtime Gains: Disk Array Caching

Reduced storage footprint Better caching Less IO

Decentralized Deduplication in SAN Cluster File Systems

slide-104
SLIDE 104

Runtime Gains: Disk Array Caching

Reduced storage footprint Better caching Less IO

10 20 30 40 50 60 70 80 90 2 4 6 8 10 12 14 16 18 20

Average boot time (secs) # VMs booting concurrently VDI "boot storm"

Decentralized Deduplication in SAN Cluster File Systems

slide-105
SLIDE 105

Runtime Gains: Disk Array Caching

Reduced storage footprint Better caching Less IO

10 20 30 40 50 60 70 80 90 2 4 6 8 10 12 14 16 18 20

Average boot time (secs) # VMs booting concurrently VDI "boot storm" Full copies

Decentralized Deduplication in SAN Cluster File Systems

slide-106
SLIDE 106

Runtime Gains: Disk Array Caching

Reduced storage footprint Better caching Less IO

10 20 30 40 50 60 70 80 90 2 4 6 8 10 12 14 16 18 20

Average boot time (secs) # VMs booting concurrently VDI "boot storm" Full copies Deduplicated

Decentralized Deduplication in SAN Cluster File Systems

slide-107
SLIDE 107

Out-of-band Deduplication Rate

Index scan COW sharing

Decentralized Deduplication in SAN Cluster File Systems

slide-108
SLIDE 108

Out-of-band Deduplication Rate

Index scan 6.6 GB/sec COW sharing Virtually no cost for unique blocks

Decentralized Deduplication in SAN Cluster File Systems

slide-109
SLIDE 109

Out-of-band Deduplication Rate

Index scan 6.6 GB/sec COW sharing 2.6 MB/sec Virtually no cost for unique blocks

Decentralized Deduplication in SAN Cluster File Systems

slide-110
SLIDE 110

Out-of-band Deduplication Rate

Index scan 6.6 GB/sec COW sharing 2.6 MB/sec (It’s a prototype!) Virtually no cost for unique blocks

Decentralized Deduplication in SAN Cluster File Systems

slide-111
SLIDE 111

Out-of-band Deduplication Rate

Index scan 6.6 GB/sec COW sharing 2.6 MB/sec (It’s a prototype!) Virtually no cost for unique blocks 9 GB of new shared blocks per hour (And provisioning can be special-cased)

Decentralized Deduplication in SAN Cluster File Systems

slide-112
SLIDE 112

Related Work

Centralized archival

Venti Data Domain Foundation

Centralized primary storage

NetApp ASIS Microsoft Single Instance Store

Distributed

Farsite

SAN with Coordinator

DDE

Decentralized Deduplication in SAN Cluster File Systems

slide-113
SLIDE 113

Conclusion

Decentralized, out-of-band, live file system deduplication.

Decentralized Deduplication in SAN Cluster File Systems

slide-114
SLIDE 114

Conclusion

Decentralized, out-of-band, live file system deduplication. Deduplication is effective.

Decentralized Deduplication in SAN Cluster File Systems

slide-115
SLIDE 115

Conclusion

Decentralized, out-of-band, live file system deduplication. Deduplication is effective. Deduplication is hard.

Decentralized Deduplication in SAN Cluster File Systems

slide-116
SLIDE 116

Conclusion

Decentralized, out-of-band, live file system deduplication. Deduplication is effective. Deduplication is hard. Three-stage deduplication has only modest performance overhead.

Decentralized Deduplication in SAN Cluster File Systems

slide-117
SLIDE 117

Conclusion

Decentralized, out-of-band, live file system deduplication. Deduplication is effective. Deduplication is hard. Three-stage deduplication has only modest performance overhead. Thank you.

Decentralized Deduplication in SAN Cluster File Systems