[PPT] - HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for PowerPoint Presentation

SLIDE 1

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

Huijun Wu1,4, Chen Wang2, Yinjin Fu3, Sherif Sakr1, Liming Zhu1,2 and Kai Lu4 The University of New South Wales1 Data61, CSIRO2 PLA University of Science and Technology3 National University of Defence Technology4

SLIDE 2

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

2 18/05/2017

SLIDE 3

Background

 Primary Storage Deduplication

Save the storage capacity
Improve the I/O efficiency

 The state-of-the-art

Post-processing deduplication

– Perform during off-peak time

Inline deduplication

– Perform on the write path

3 18/05/2017

Data blocks Fingerprint Lookup Only write unique blocks

SLIDE 4

Post-processing Deduplication

 The commodity product uses post-processing deduplication [TOS’16]

Windows Server 2012 [ATC’12]

 Challenges remain for real-world systems

Off-peak periods may not be enough
More storage capacity is required
Duplicate writes shorten the lifespan of storage devices (e.g., SSD)
Does not help improving the I/O performance, but wastes I/O bandwidth

 Inline deduplication can help

4 18/05/2017

SLIDE 5

Inline Deduplication

 Fingerprint look-up is the bottleneck

On-disk fingerprint table introduces high latency
Fingerprint table is large and hard to fit in memory
Cache efficiency is critical

 The state-of-the-art solutions and challenges

Exploit the temporal locality of workloads [FAST’12][IPDPS’14]

– But temporal locality may not exist [TPDS’17]

For cloud scenario,

– locality for workloads of different VMs may be quite different

Workloads may interfere with each other and reduce the cache efficiency

5 18/05/2017

SLIDE 6

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

6 18/05/2017

SLIDE 7

Motivation

 Workloads with different temporal locality interfere with each other

A toy example.

7 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

1 1 1

# of Deduplicated Blocks: 0

Fingerprint Cache

SLIDE 8

Motivation

 Workloads with different temporal locality interfere with each other

A toy example.

8 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

1 5 2

# of Deduplicated Blocks: 1

Fingerprint Cache

SLIDE 9

Motivation

 Workloads with different temporal locality interfere with each other

A toy example.

9 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

4 11 5

# of Deduplicated Blocks: 2

Fingerprint Cache

SLIDE 10

Motivation

 Workloads with different temporal locality interfere with each other

A toy example.

10 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

5 12 3

# of Deduplicated Blocks: 4

Fingerprint Cache

SLIDE 11

Motivation

 Workloads with different temporal locality interfere with each other

A toy example.

11 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

6 3 6

# of Deduplicated Blocks: 5

Fingerprint Cache

SLIDE 12

Motivation

 Workloads with different temporal locality interfere with each other

A toy example.

– 18 duplicate blocks in total, only 6 are identified.

12 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

8 16 10

# of Deduplicated Blocks: 6

Fingerprint Cache

SLIDE 13

Motivation

 Temporal locality may be weak for workloads

Histogram for the distribution of distance between duplicate blocks

FIU-mail Cloud-FTP

13 18/05/2017

SLIDE 14

Motivation

 Workloads with different temporal locality interfere with each other

Using real-world I/O trace. (LRU)

14 18/05/2017

# of duplicate blocks: FIU-mail > 4Cloud-FTP Occupied cache size: FIU-mail < 0.8Cloud-FTP Cache resource allocation is unreasonable!

SLIDE 15

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

15 18/05/2017

SLIDE 16

Hybrid Prioritized Deduplication

 Hybrid inline & post-processing deduplication

Either post-processing or inline deduplication works well
Solution: Combine inline and post-processing deduplication together
Identifying more duplicates by inline caching
Using post-processing to achieve exact deduplication

 Challenges: Interference compromises the temporal locality of workload,

thus reducing the efficiency of fingerprint caching

 We differentiate workloads (data streams) to improve it

16 18/05/2017

SLIDE 17

Hybrid Prioritized Deduplication

 Prioritize the cache allocation for inline deduplication

Data stream that contributes more deduplication ratio should get more

cache resources

For inline phase, deduplication ratio comes from better temporal locality

 How to evaluate temporal locality ?

Changes dynamically with time
Accurate estimation is critical to achieve good cache allocation
Use # of duplicate blocks in N consecutive data blocks (estimation interval)

as an indicator for temporal locality

17 18/05/2017

SLIDE 18

System architecture

18 18/05/2017

Estimate the temporal locality for streams and allocate cache according to this. On-disk fingerprint table for post-processing deduplication.

SLIDE 19

Evaluate the temporal locality

 Simple idea: Count distinct data block fingerprints for streams

Introduce high memory overhead
May be comparable to the cache capacity

Estimate rather than count

Get the number of distinct fingerprints by small portion of samples
Essentially same as a classical problem ‘How many distinct elements exist in

a set ?’ Origin – Estimate # of species of animal population from samples [Fisher, JSTOR’1940]

Sublinear estimator – Unseen Estimation Algorithm [NIPS’13]

19 18/05/2017

SLIDE 20

Estimate the temporal locality

 Using unseen algorithm to estimate LDSS.

20 18/05/2017 Estimation Interval I Time

f1 f2 f3 f4 ... ... ... ... … f15 f16 f17 f18

Fingerprint Sample Buffer Reservoir Sampling Unseen Estimation Algorithm

LDSS for Interval I

SLIDE 21

Key points to deploy the estimation

 Unseen algorithm requires uniform sampling

Each fingerprint should be sampled with the same probability
We use Reservoir Sampling [TOMS’04]

 Choose a proper estimation interval

More unique data blocks -> Larger interval
A good approximation

– Historical inline deduplication ratio

Adaptive method

21 18/05/2017

SLIDE 22

Differentiate the spatial locality

 Existing deduplication solutions exploit the spatial locality to reduce disk

fragmentation

perform deduplication on block sequences longer than a fixed threshold.

 Workloads have different sensitivity to the increase of threshold

Differentiating the workloads achieves better deduplication ratio with less

fragmentation

22 18/05/2017

Insensitive to the threshold change Sensitive to the threshold change

SLIDE 23

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

23 18/05/2017

SLIDE 24

Evaluation Setup

 Evaluated Systems

compare with inline (iDedup), post-processing and hybrid (DIODE)

deduplication schemas

 Workloads

FIU trace (FIU-home, FIU-web and FIU-mail)
Cloud-FTP (trace we collect from a FTP server by using NBD)

 Mixing workloads as multiple VMs

Different ratios between good locality (FIU, L) and bad locality (Cloud-FTP,

NL) workloads

Workload A (L:NL = 3:1), workload B (L:NL = 1:1), workload C (L:NL = 1:3)

24 18/05/2017

SLIDE 25

Evaluation

 Inline deduplication ratio

Cache size (20MB – 320MB)
HPDedup improves the inline deduplication ratio (8.04% - 37.75% )

25 18/05/2017

SLIDE 26

Evaluation

 Data written to disks (Comparing with post-processing deduplication)

HPDedup reduce the data written to disks by 12.78% - 45.08%

26 18/05/2017

SLIDE 27

Evaluation

 Average # of hits for each cached fingerprint

DIODE [MASCOTS’16] skips files in inline deduplication based on file

extensions

HPDedup classifies data at finer-grained (stream temporal locality level) so

that the efficiency of inline deduplication can be further improved

27 18/05/2017

SLIDE 28

Evaluation – LDSS Estimation Accuracy

28 18/05/2017

Real Normalized LDSS Cache Occupation (without Locality Estimation) Cache Occupation (with Locality Estimation)

Locality estimation allocates cache resources according to the temporal locality of streams and improves the inline deduplication ratio by 12.53%.

SLIDE 29

Evaluation

29 18/05/2017

 Deduplication threshold

DIODE [MASCOTS’16] does not differentiate workloads.
HPDedup introduces less fragmentation while achieving higher dedup ratio

SLIDE 30

Overhead Analysis

 Computational Overhead

Mainly includes (1) generating the fingerprint histogram and (2) linear

programming of estimation algorithm

(1) 7ms for 1M fingerprints
(2) 27ms regardless of the estimation interval size
More intuitive feeling – ms level overhead for GBs of data writing
Can be computed asynchronously

 Memory Overhead

Up to 2.81% of cache capacity for the three workloads

30 18/05/2017

SLIDE 31

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

31 18/05/2017

SLIDE 32

Conclusion

 New hybrid, prioritized data deduplication

Fusing inline and post-processing deduplication
Differentiate the temporal and spatial locality of data streams coming from

VMs and applications

 More efficient compared with the state of the art

Improve the inline deduplication ratio by up to 37.75%
Reduce the disk capacity requirement by up to 45.08%
Low computational and memory overhead

32 18/05/2017

SLIDE 33

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

Huijun Wu The University of New South Wales, Australia Email: huijunw@cse.unsw.edu.au

33 18/05/2017

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

Huijun Wu1,4, Chen Wang2, Yinjin Fu3, Sherif Sakr1, Liming Zhu1,2 and Kai Lu4 The University of New South Wales1 Data61, CSIRO2 PLA University of Science and Technology3 National University of Defence Technology4

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

Background

– Perform during off-peak time

– Perform on the write path

Post-processing Deduplication

Inline Deduplication

– But temporal locality may not exist [TPDS’17]

– locality for workloads of different VMs may be quite different

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

– 18 duplicate blocks in total, only 6 are identified.

Motivation

FIU-mail Cloud-FTP

Motivation

# of duplicate blocks: FIU-mail > 4*Cloud-FTP Occupied cache size: FIU-mail < 0.8*Cloud-FTP Cache resource allocation is unreasonable!

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

Hybrid Prioritized Deduplication

thus reducing the efficiency of fingerprint caching

Hybrid Prioritized Deduplication

cache resources

as an indicator for temporal locality

System architecture

Estimate the temporal locality for streams and allocate cache according to this. On-disk fingerprint table for post-processing deduplication.

Evaluate the temporal locality

Estimate rather than count

a set ?’ Origin – Estimate # of species of animal population from samples [Fisher, JSTOR’1940]

Estimate the temporal locality

Key points to deploy the estimation

– Historical inline deduplication ratio

Differentiate the spatial locality

fragmentation

fragmentation

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

Evaluation Setup

deduplication schemas

NL) workloads

Evaluation

Evaluation

Evaluation

extensions

that the efficiency of inline deduplication can be further improved

Evaluation – LDSS Estimation Accuracy

Locality estimation allocates cache resources according to the temporal locality of streams and improves the inline deduplication ratio by 12.53%.

Evaluation

Overhead Analysis

programming of estimation algorithm

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

Conclusion

VMs and applications

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

Huijun Wu The University of New South Wales, Australia Email: huijunw@cse.unsw.edu.au

# of duplicate blocks: FIU-mail > 4Cloud-FTP Occupied cache size: FIU-mail < 0.8Cloud-FTP Cache resource allocation is unreasonable!