DupHunter : Flexible High-Performance Deduplication for Docker - - PowerPoint PPT Presentation

duphunter
SMART_READER_LITE
LIVE PREVIEW

DupHunter : Flexible High-Performance Deduplication for Docker - - PowerPoint PPT Presentation

DupHunter : Flexible High-Performance Deduplication for Docker Registries Nannan Zhao , Hadeel Albahar, Subil Abraham, Keren Chen, Vasily Tarasov, Dimitrios Skourtis, Lukas Rupprecht, Ali Anwar, and Ali R. Butt Containers are ubiquitous OS


slide-1
SLIDE 1

DupHunter: Flexible High-Performance Deduplication for Docker Registries

Nannan Zhao, Hadeel Albahar, Subil Abraham, Keren Chen, Vasily Tarasov, Dimitrios Skourtis, Lukas Rupprecht, Ali Anwar, and Ali R. Butt

slide-2
SLIDE 2

Containers are ubiquitous

OS Database Web server Cache Serverless

Deep learning

Big data Languages

2 Nannan Zhao znannan1@vt.edu

slide-3
SLIDE 3

Application containerization is becoming a significant market player

slide-4
SLIDE 4

4 Nannan Zhao znannan1@vt.edu

slide-5
SLIDE 5

3,657,773

4 Nannan Zhao znannan1@vt.edu

slide-6
SLIDE 6

3,657,773

4 Nannan Zhao znannan1@vt.edu

Docker image dataset is growing fast!

slide-7
SLIDE 7

How to efficiently manage the ever-growing image dataset for Docker registries?

3,657,773

4 Nannan Zhao znannan1@vt.edu

Docker image dataset is growing fast!

slide-8
SLIDE 8

Our contribution: DupHunter—a framework to deduplicate images in Docker registries

❑ We make two key observations:

1. Container images exhibit a lot of redundancy.

  • 2. User access pattern is predictable.

❑ We design DupHunter to work with compressed images and provide layer deduplication and reduce layer restore overhead. ❑ We evaluate DupHunter with representative real world workloads. Compared to the state of the art, DupHunter:

▪ reduces storage space by up to 6.9x. ▪ reduces the GET layer latency up to 2.8x.

5 Nannan Zhao znannan1@vt.edu

slide-9
SLIDE 9

Overview of Docker

❑ Docker container is a self-contained executable package, that is:

▪ Lightweight ▪ Portable ▪ Provides Isolation

❑ Docker registry:

▪ Stores Docker images ▪ Supports fast distribution ▪ Facilitates easy deployment

Docker host R/W layer PHP MySQL Base image: Ubuntu Image layers (Read only) Container layer Docker daemon Host OS Container Container Container Hardware Docker client

docker build docker run docker push

docker push image

docker pull

docker pull image

Docker hub

6 Nannan Zhao znannan1@vt.edu

slide-10
SLIDE 10

Key observation I: Image dataset has large amount of redundant files

❑ Container images have a lot of redundancy.

▪ 97% of files across layers are duplicates!

❑ Existing technologies such as Jdupes, VDO, Btrfs, ZFS, and Ceph are unable to harness this redundancy.

7 Nannan Zhao znannan1@vt.edu

Compressed layer dataset Does not help!

slide-11
SLIDE 11

Key observation I: Image dataset has large amount of redundant files

❑ Container images have a lot of redundancy.

▪ 97% of files across layers are duplicates!

❑ Existing technologies such as Jdupes, VDO, Btrfs, ZFS, and Ceph are unable to harness this redundancy.

Decompress

7 Nannan Zhao znannan1@vt.edu

Compressed layer dataset Does not help!

slide-12
SLIDE 12

Key observation I: Image dataset has large amount of redundant files

❑ Container images have a lot of redundancy.

▪ 97% of files across layers are duplicates!

❑ Existing technologies such as Jdupes, VDO, Btrfs, ZFS, and Ceph are unable to harness this redundancy.

Decompress

7 Nannan Zhao znannan1@vt.edu

Compressed layer dataset Uncompressed layer dataset Does not help!

slide-13
SLIDE 13

Key observation I: Image dataset has large amount of redundant files

❑ Container images have a lot of redundancy.

▪ 97% of files across layers are duplicates!

❑ Existing technologies such as Jdupes, VDO, Btrfs, ZFS, and Ceph are unable to harness this redundancy.

Decompress

7 Nannan Zhao znannan1@vt.edu

Compressed layer dataset Uncompressed layer dataset Does not help! Unpack

slide-14
SLIDE 14

Key observation I: Image dataset has large amount of redundant files

❑ Container images have a lot of redundancy.

▪ 97% of files across layers are duplicates!

❑ Existing technologies such as Jdupes, VDO, Btrfs, ZFS, and Ceph are unable to harness this redundancy.

Decompress

7 Nannan Zhao znannan1@vt.edu

Compressed layer dataset Uncompressed layer dataset Does not help! Unpack Deduplicate

slide-15
SLIDE 15

Key observation I: Image dataset has large amount of redundant files

❑ Container images have a lot of redundancy.

▪ 97% of files across layers are duplicates!

❑ Existing technologies such as Jdupes, VDO, Btrfs, ZFS, and Ceph are unable to harness this redundancy.

Decompress

7 Nannan Zhao znannan1@vt.edu

Compressed layer dataset Uncompressed layer dataset Does not help! Unpack Deduplicate

slide-16
SLIDE 16

Key observation I: Image dataset has large amount of redundant files

❑ Container images have a lot of redundancy.

▪ 97% of files across layers are duplicates!

❑ Existing technologies such as Jdupes, VDO, Btrfs, ZFS, and Ceph are unable to harness this redundancy.

Decompress

7 Nannan Zhao znannan1@vt.edu

Compressed layer dataset Uncompressed layer dataset Does not help! Unpack Deduplicate Reduces space by up to 4X

slide-17
SLIDE 17

Key observation I: Image dataset has large amount of redundant files

❑ Container images have a lot of redundancy.

▪ 97% of files across layers are duplicates!

❑ Existing technologies such as Jdupes, VDO, Btrfs, ZFS, and Ceph are unable to harness this redundancy.

Decompress

7 Nannan Zhao znannan1@vt.edu

Compressed layer dataset Uncompressed layer dataset Does not help! Unpack Deduplicate Reduces space by up to 4X

Layer restore incurs considerable overhead for layer pulling latency up to 98x!

slide-18
SLIDE 18

Key observation II: Predictable user access pattern

❑ We observe a consistent user pulling pattern: Pull manifest first, then layers, but not all of the layers will be pulled. ❑ We performed a quantitive study using a 75-day IBM Cloud Registry workload with 7 availability zones.

1 10 100 1,000 10,00050,000

GET Layer count

0.6 0.7 0.8 0.9 1

Layers ratio

Dal Dev Fra Lon Pre Sta Syd

8 Nannan Zhao znannan1@vt.edu

slide-19
SLIDE 19

Key observation II: Predictable user access pattern

❑ We observe a consistent user pulling pattern: Pull manifest first, then layers, but not all of the layers will be pulled. ❑ We performed a quantitive study using a 75-day IBM Cloud Registry workload with 7 availability zones.

1 10 100 1,000 10,00050,000

GET Layer count

0.6 0.7 0.8 0.9 1

Layers ratio

Dal Dev Fra Lon Pre Sta Syd

8 Nannan Zhao znannan1@vt.edu

slide-20
SLIDE 20

Key observation II: Predictable user access pattern

❑ We observe a consistent user pulling pattern: Pull manifest first, then layers, but not all of the layers will be pulled. ❑ We performed a quantitive study using a 75-day IBM Cloud Registry workload with 7 availability zones.

1 10 100 1,000 10,00050,000

GET Layer count

0.6 0.7 0.8 0.9 1

Layers ratio

Dal Dev Fra Lon Pre Sta Syd

Majority of layers are only fetched once by the same client.

8 Nannan Zhao znannan1@vt.edu

slide-21
SLIDE 21

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Repulling probability

0.2 0.4 0.6 0.8 1

Clients ratio

Dal Dev Fra Pre Sta Syd Lon

Key observation II-b: User repulling pattern can also be predicted

9 Nannan Zhao znannan1@vt.edu

slide-22
SLIDE 22

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Repulling probability

0.2 0.4 0.6 0.8 1

Clients ratio

Dal Dev Fra Pre Sta Syd Lon

Half of the clients have a repull probability less than 0.2 → many clients pull a layer only once.

Key observation II-b: User repulling pattern can also be predicted

9 Nannan Zhao znannan1@vt.edu

slide-23
SLIDE 23

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Repulling probability

0.2 0.4 0.6 0.8 1

Clients ratio

Dal Dev Fra Pre Sta Syd Lon

10 Nannan Zhao znannan1@vt.edu

Key observation II-b: User repulling pattern can also be predicted

slide-24
SLIDE 24

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Repulling probability

0.2 0.4 0.6 0.8 1

Clients ratio

Dal Dev Fra Pre Sta Syd Lon

10 Nannan Zhao znannan1@vt.edu

Key observation II-b: User repulling pattern can also be predicted

slide-25
SLIDE 25

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Repulling probability

0.2 0.4 0.6 0.8 1

Clients ratio

Dal Dev Fra Pre Sta Syd Lon

10 Nannan Zhao znannan1@vt.edu

Key observation II-b: User repulling pattern can also be predicted

User repulling pattern is either pull-once or always-pull → we can predict which layers to pull.

slide-26
SLIDE 26

Key observation II-c: Layer preconstruction is possible

11 Nannan Zhao znannan1@vt.edu

slide-27
SLIDE 27

Layer preconstruction can significantly reduce layer restore overhead.

Key observation II-c: Layer preconstruction is possible

11 Nannan Zhao znannan1@vt.edu

slide-28
SLIDE 28

DupHunter architecture

Clients

Server C Server A Server D

storage cluster

Server B

Distributed metadata database

Local storage system

Registry REST API Registry REST API

12 Nannan Zhao znannan1@vt.edu

slide-29
SLIDE 29

Reducing overhead in DupHunter

1. Support multiple replica deduplication modes. 2. Facilitate parallel layer reconstruction. 3. Enable proactive layer prefetching/preconstruction.

13 Nannan Zhao znannan1@vt.edu

slide-30
SLIDE 30

DupHunter supports multiple replica deduplication modes

❑ B-mode n: Basic deduplication mode n

▪ Keep n layer replicas intact. ▪ Deduplicate the remaining R-n layer replicas (R = layer replication level).

❑ S-mode: Selective deduplication mode

▪ The number of intact layer replicas proportional to the layer’s popularity. ▪ Hot layers have more intact replicas.

14 Nannan Zhao znannan1@vt.edu

slide-31
SLIDE 31

DupHunter supports multiple replica deduplication modes

❑ B-mode n: Basic deduplication mode n

▪ Keep n layer replicas intact. ▪ Deduplicate the remaining R-n layer replicas (R = layer replication level).

❑ S-mode: Selective deduplication mode

▪ The number of intact layer replicas proportional to the layer’s popularity. ▪ Hot layers have more intact replicas.

File store Layer store Layer stage area

Tier 1 Primary cluster Tier 2 Deduplication cluster

Storage cluster

Local storage system Registry REST API Registry REST API

Clients

D-server C P-server A D-server D P-server B

Distributed metadata database

14 Nannan Zhao znannan1@vt.edu

slide-32
SLIDE 32

DupHunter facilitates parallel layer reconstruction

❑ Slice: Set of all the files on a server belonging to a layer.

▪ Distributed evenly across the cluster. ▪ Speed up layer reconstruction via parallel processing of slices.

Distributed metadata database

Clients

D-server C P-server A D-server D

storage cluster

P-server B

Tier 1 Primary cluster Tier 2 Deduplication cluster

Local storage system File store Layer stage area Registry REST API

Layer recipe

Id: L …

Slice recipe

Id: L1::A::P … …

15 Nannan Zhao znannan1@vt.edu

slide-33
SLIDE 33

DupHunter enables prefetching/preconstruction of layers

❑ Prefetch cache to prefetch layers and hide disk I/Os. ❑ Preconstruct cache to store preconstruct layers and hide layer restore overhead.

ILmap ULmap Id: U … …

Clients

D-server C P-server A D-server D

Storage cluster

P-server B

Tier 1 Primary cluster Tier 2 Deduplication cluster

Distributed metadata database

Local storage system File store Layer store Layer stage area Registry REST API Registry REST API

16 Nannan Zhao znannan1@vt.edu

ILmap ULmap Id: U … …

Prefetch cache Preconstruct cache

slide-34
SLIDE 34

Deduplicating layers

Content fingerprint Header

f1 f2 f3 f4 f5 f6 Layer tar archive L1 File entries h1 h2 h3 h4 h5 h6

17 Nannan Zhao znannan1@vt.edu

slide-35
SLIDE 35

Deduplicating layers

Content fingerprint Header

f1 f2 f3 f4 f5 f6 Layer tar archive L1 File entries h1 h2 h3 h4 h5 h6 File index Id f1 f2 r2 r1 A:/../.. B:/../.. B:/../.. C:/../..

17 Nannan Zhao znannan1@vt.edu

slide-36
SLIDE 36

Deduplicating layers

Duplicate / Shared files Unique files

Content fingerprint Header

f1 f2 f3 f4 f5 f6 Layer tar archive L1 File entries h1 h2 h3 h4 h5 h6 File index Id f1 f2 r2 r1 A:/../.. B:/../.. B:/../.. C:/../..

17 Nannan Zhao znannan1@vt.edu

slide-37
SLIDE 37

Deduplicating layers

Duplicate / Shared files Unique files Stored file replicas D-server B D-server A f3’ f2’ f1’ f1 f2 f3 D-server C

Content fingerprint Header

f1 f2 f3 f4 f5 f6 Layer tar archive L1 File entries h1 h2 h3 h4 h5 h6 File index Id f1 f2 r2 r1 A:/../.. B:/../.. B:/../.. C:/../..

17 Nannan Zhao znannan1@vt.edu

slide-38
SLIDE 38

Deduplicating layers

f4’ f5 f6 f6’ f5’ Newly added file replicas f4 Duplicate / Shared files Unique files Stored file replicas D-server B D-server A f3’ f2’ f1’ f1 f2 f3 D-server C

Content fingerprint Header

f1 f2 f3 f4 f5 f6 Layer tar archive L1 File entries h1 h2 h3 h4 h5 h6 File index Id f1 f2 r2 r1 A:/../.. B:/../.. B:/../.. C:/../..

17 Nannan Zhao znannan1@vt.edu

slide-39
SLIDE 39

Deduplicating layers

f4’ f5 f6 f6’ f5’ Newly added file replicas f4 Duplicate / Shared files Unique files Stored file replicas D-server B D-server A f3’ f2’ f1’ f1 f2 f3 D-server C

Content fingerprint Header

f1 f2 f3 f4 f5 f6 Layer tar archive L1 File entries h1 h2 h3 h4 h5 h6 File index Id f1 f2 r2 r1 A:/../.. B:/../.. B:/../.. C:/../..

17 Nannan Zhao znannan1@vt.edu

slide-40
SLIDE 40

Deduplicating layers

f4’ f5 f6 f6’ f5’ Newly added file replicas f4 Duplicate / Shared files Unique files Slice recipe Id: L1::A::P

Header

h2 h5 f2 f5

Content pointer

Stored file replicas D-server B D-server A f3’ f2’ f1’ f1 f2 f3 D-server C

Content fingerprint Header

f1 f2 f3 f4 f5 f6 Layer tar archive L1 File entries h1 h2 h3 h4 h5 h6 File index Id f1 f2 r2 r1 A:/../.. B:/../.. B:/../.. C:/../..

17 Nannan Zhao znannan1@vt.edu

slide-41
SLIDE 41

Deduplicating layers

Layer recipe Id: L1 Master: A Workers: [A, B, C] f4’ f5 f6 f6’ f5’ Newly added file replicas f4 Duplicate / Shared files Unique files Slice recipe Id: L1::A::P

Header

h2 h5 f2 f5

Content pointer

Stored file replicas D-server B D-server A f3’ f2’ f1’ f1 f2 f3 D-server C

Content fingerprint Header

f1 f2 f3 f4 f5 f6 Layer tar archive L1 File entries h1 h2 h3 h4 h5 h6 File index Id f1 f2 r2 r1 A:/../.. B:/../.. B:/../.. C:/../..

17 Nannan Zhao znannan1@vt.edu

slide-42
SLIDE 42

Restoring layers

concatenate

Slice stream Layer A Layer constructor File I/O stream Tar stream archive

compress

A Slice constructor archive

compress

B archive

compress

C

18 Nannan Zhao znannan1@vt.edu

slide-43
SLIDE 43

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

19 Nannan Zhao znannan1@vt.edu

slide-44
SLIDE 44

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

image

19 Nannan Zhao znannan1@vt.edu

slide-45
SLIDE 45

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

image Will pull

19 Nannan Zhao znannan1@vt.edu

slide-46
SLIDE 46

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

image Will pull user

19 Nannan Zhao znannan1@vt.edu

slide-47
SLIDE 47

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

image Will pull user

19 Nannan Zhao znannan1@vt.edu

slide-48
SLIDE 48

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

image Will pull user

19 Nannan Zhao znannan1@vt.edu

slide-49
SLIDE 49

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

image image Will pull user

19 Nannan Zhao znannan1@vt.edu

slide-50
SLIDE 50

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

image image Will pull user user

19 Nannan Zhao znannan1@vt.edu

slide-51
SLIDE 51

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

image image Will pull user user May pull

19 Nannan Zhao znannan1@vt.edu

slide-52
SLIDE 52

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

image image Will pull user user May pull

19 Nannan Zhao znannan1@vt.edu

slide-53
SLIDE 53

Caching and preconstructing layers

❑ ILmap: Maps image to its containing layer set. ❑ ULmap: Maps user to the layers that the user has accessed and the corresponding pull count.

image image Will pull user user May pull

19 Nannan Zhao znannan1@vt.edu

slide-54
SLIDE 54

Tier 1 Primary cluster P-server A P-server B

Cache handling in tiered storage

20 Nannan Zhao znannan1@vt.edu

slide-55
SLIDE 55

D-server C D-server D Tier 2 Deduplication cluster Tier 1 Primary cluster P-server A P-server B

Cache handling in tiered storage

20 Nannan Zhao znannan1@vt.edu

slide-56
SLIDE 56

D-server C D-server D Tier 2 Deduplication cluster Tier 1 Primary cluster P-server A P-server B

Cache handling in tiered storage

L1 Prefetch cache Cache Cache

20 Nannan Zhao znannan1@vt.edu

slide-57
SLIDE 57

D-server C D-server D Tier 2 Deduplication cluster Tier 1 Primary cluster P-server A P-server B

Cache handling in tiered storage

L1 Prefetch cache Cache Cache Layer store Layer store L2 Layer store

20 Nannan Zhao znannan1@vt.edu

slide-58
SLIDE 58

D-server C D-server D Tier 2 Deduplication cluster Tier 1 Primary cluster P-server A P-server B

Cache handling in tiered storage

L1 Prefetch cache Cache Cache L3 Layer stage area Stage area Stage area Layer store Layer store L2 Layer store

20 Nannan Zhao znannan1@vt.edu

slide-59
SLIDE 59

D-server C D-server D Tier 2 Deduplication cluster Tier 1 Primary cluster P-server A P-server B

Cache handling in tiered storage

L1 Prefetch cache Cache Cache L3 Layer stage area Stage area Stage area Layer store Layer store L2 Layer store

20 Nannan Zhao znannan1@vt.edu

slide-60
SLIDE 60

D-server C D-server D Tier 2 Deduplication cluster Tier 1 Primary cluster P-server A P-server B

Cache handling in tiered storage

L1 Prefetch cache Cache Cache L3 Layer stage area Stage area Stage area L4 Preconstruct cache Cache Cache Layer store Layer store L2 Layer store

20 Nannan Zhao znannan1@vt.edu

slide-61
SLIDE 61

D-server C D-server D Tier 2 Deduplication cluster Tier 1 Primary cluster P-server A P-server B

Cache handling in tiered storage

L1 Prefetch cache Cache Cache L3 Layer stage area Stage area Stage area L4 Preconstruct cache Cache Cache Layer store Layer store L2 Layer store File store File store L5 File store

20 Nannan Zhao znannan1@vt.edu

slide-62
SLIDE 62

Evaluation

❑ Workloads used:

▪ Traces from IBM registries: Dal, Fra, Lon, and Syd availability zones ▪ Dataset from Docker Hub

❑ Schemes studied:

▪ Baseline: No deduplication ▪ B-mode n: n (1-3) replicas are preserved; 3 – n deduplicated ▪ S-mode: intact layer replicas proportional to the layer’s popularity ▪ B-mode 0: deduplicate all layer replicas, under a given replication policy

  • GF-R: global file-level deduplication
  • GF+LB-R: global file-level deduplication and local block-level deduplication
  • GB-EC: global block-level deduplication under erasure coding

21 Nannan Zhao znannan1@vt.edu

slide-63
SLIDE 63

Deduplication ratio vs. performance

22 Nannan Zhao znannan1@vt.edu

slide-64
SLIDE 64

Deduplication ratio vs. performance

22 Nannan Zhao znannan1@vt.edu

slide-65
SLIDE 65

Deduplication ratio vs. performance

22 Nannan Zhao znannan1@vt.edu

slide-66
SLIDE 66

Deduplication ratio vs. performance

22 Nannan Zhao znannan1@vt.edu

slide-67
SLIDE 67

Deduplication ratio vs. performance

22 Nannan Zhao znannan1@vt.edu

slide-68
SLIDE 68

Deduplication ratio vs. performance

22 Nannan Zhao znannan1@vt.edu

slide-69
SLIDE 69

Deduplication ratio vs. performance

22 Nannan Zhao znannan1@vt.edu

slide-70
SLIDE 70

Deduplication ratio vs. performance

22 Nannan Zhao znannan1@vt.edu

slide-71
SLIDE 71

Prefetch cache hit ratio

0.2 0.4 0.6 0.8 1

Dal Fra Lon Syd Dal Fra Lon Syd Dal Fra Lon Syd 5% 10% 15% Hit ratio LRU ARC ARC+P-PUT ARC+P-UB

DupHunter

23 Nannan Zhao znannan1@vt.edu

Cache size State of the art

slide-72
SLIDE 72

Prefetch cache hit ratio

0.2 0.4 0.6 0.8 1

Dal Fra Lon Syd Dal Fra Lon Syd Dal Fra Lon Syd 5% 10% 15% Hit ratio LRU ARC ARC+P-PUT ARC+P-UB

DupHunter

23 Nannan Zhao znannan1@vt.edu

Cache size State of the art 4.2x

Duphunter can provide high hit ratio while reducing tail latency.

slide-73
SLIDE 73

0.2 0.4 0.6 0.8 1 GF-R GF+LB-R GB-EC GF-R GF+LB-R GB-EC GF-R GF+LB-R GB-EC GF-R GF+LB-R GB-EC Dal Fra Lon Syd % of GET layer requests Hit Wait Miss

Preconstruct cache hit ratio

24 Nannan Zhao znannan1@vt.edu

slide-74
SLIDE 74

0.2 0.4 0.6 0.8 1 GF-R GF+LB-R GB-EC GF-R GF+LB-R GB-EC GF-R GF+LB-R GB-EC GF-R GF+LB-R GB-EC Dal Fra Lon Syd % of GET layer requests Hit Wait Miss

Preconstruct cache hit ratio

24 Nannan Zhao znannan1@vt.edu

Global file level dedeuplication also has the lowest wait and miss ratios.

slide-75
SLIDE 75

Summary

❑ DupHunter exploits the redundancy in container images along with predictable user access patterns to achieve high space savings with low layer restore

  • verhead.

▪ It supports multiple replica deduplication modes. ▪ It facilitates parallel layer reconstruction. ▪ It offers proactive layer prefetching/preconstruction.

❑ DupHunter reduces storage space needs by up to 6.9x and can reduce the GET layer latency up to 2.8x compared to the state of the art. ❑ DupHunter is available at https://github.com/nnzhaocs/DupHunter.

25 Nannan Zhao znannan1@vt.edu

slide-76
SLIDE 76

THANK YOU

Questions: Nannan Zhao znannan1@vt.edu DSSL@VT: http://dssl.cs.vt.edu