HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for - - PowerPoint PPT Presentation

hpdedup a hybrid prioritized data deduplication mechanism
SMART_READER_LITE
LIVE PREVIEW

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for - - PowerPoint PPT Presentation

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud Huijun Wu 1,4 , Chen Wang 2 , Yinjin Fu 3 , Sherif Sakr 1 , Liming Zhu 1,2 and Kai Lu 4 The University of New South Wales 1 Data61, CSIRO 2 PLA University


slide-1
SLIDE 1

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

Huijun Wu1,4, Chen Wang2, Yinjin Fu3, Sherif Sakr1, Liming Zhu1,2 and Kai Lu4 The University of New South Wales1 Data61, CSIRO2 PLA University of Science and Technology3 National University of Defence Technology4

slide-2
SLIDE 2

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

2 18/05/2017

slide-3
SLIDE 3

Background

 Primary Storage Deduplication

  • Save the storage capacity
  • Improve the I/O efficiency

 The state-of-the-art

  • Post-processing deduplication

– Perform during off-peak time

  • Inline deduplication

– Perform on the write path

3 18/05/2017

Data blocks Fingerprint Lookup Only write unique blocks

slide-4
SLIDE 4

Post-processing Deduplication

 The commodity product uses post-processing deduplication [TOS’16]

  • Windows Server 2012 [ATC’12]

 Challenges remain for real-world systems

  • Off-peak periods may not be enough
  • More storage capacity is required
  • Duplicate writes shorten the lifespan of storage devices (e.g., SSD)
  • Does not help improving the I/O performance, but wastes I/O bandwidth

 Inline deduplication can help

4 18/05/2017

slide-5
SLIDE 5

Inline Deduplication

 Fingerprint look-up is the bottleneck

  • On-disk fingerprint table introduces high latency
  • Fingerprint table is large and hard to fit in memory
  • Cache efficiency is critical

 The state-of-the-art solutions and challenges

  • Exploit the temporal locality of workloads [FAST’12][IPDPS’14]

– But temporal locality may not exist [TPDS’17]

  • For cloud scenario,

– locality for workloads of different VMs may be quite different

  • Workloads may interfere with each other and reduce the cache efficiency

5 18/05/2017

slide-6
SLIDE 6

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

6 18/05/2017

slide-7
SLIDE 7

Motivation

 Workloads with different temporal locality interfere with each other

  • A toy example.

7 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

1 1 1

# of Deduplicated Blocks: 0

Fingerprint Cache

slide-8
SLIDE 8

Motivation

 Workloads with different temporal locality interfere with each other

  • A toy example.

8 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

1 5 2

# of Deduplicated Blocks: 1

Fingerprint Cache

slide-9
SLIDE 9

Motivation

 Workloads with different temporal locality interfere with each other

  • A toy example.

9 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

4 11 5

# of Deduplicated Blocks: 2

Fingerprint Cache

slide-10
SLIDE 10

Motivation

 Workloads with different temporal locality interfere with each other

  • A toy example.

10 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

5 12 3

# of Deduplicated Blocks: 4

Fingerprint Cache

slide-11
SLIDE 11

Motivation

 Workloads with different temporal locality interfere with each other

  • A toy example.

11 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

6 3 6

# of Deduplicated Blocks: 5

Fingerprint Cache

slide-12
SLIDE 12

Motivation

 Workloads with different temporal locality interfere with each other

  • A toy example.

– 18 duplicate blocks in total, only 6 are identified.

12 18/05/2017 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A B C 1 2 3 4 5 6 7 8 9 10 11 1 12 13 3 4 14 15 16 17 1 2 3 1 2 3 1 1 4 5 6 6 6 7 8 9 10 7 1 1 1 2 3 3 4 4 5 5 6 6 7 8 8 9

8 16 10

# of Deduplicated Blocks: 6

Fingerprint Cache

slide-13
SLIDE 13

Motivation

 Temporal locality may be weak for workloads

  • Histogram for the distribution of distance between duplicate blocks

FIU-mail Cloud-FTP

13 18/05/2017

slide-14
SLIDE 14

Motivation

 Workloads with different temporal locality interfere with each other

  • Using real-world I/O trace. (LRU)

14 18/05/2017

# of duplicate blocks: FIU-mail > 4*Cloud-FTP Occupied cache size: FIU-mail < 0.8*Cloud-FTP Cache resource allocation is unreasonable!

slide-15
SLIDE 15

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

15 18/05/2017

slide-16
SLIDE 16

Hybrid Prioritized Deduplication

 Hybrid inline & post-processing deduplication

  • Either post-processing or inline deduplication works well
  • Solution: Combine inline and post-processing deduplication together
  • Identifying more duplicates by inline caching
  • Using post-processing to achieve exact deduplication

 Challenges: Interference compromises the temporal locality of workload,

thus reducing the efficiency of fingerprint caching

 We differentiate workloads (data streams) to improve it

16 18/05/2017

slide-17
SLIDE 17

Hybrid Prioritized Deduplication

 Prioritize the cache allocation for inline deduplication

  • Data stream that contributes more deduplication ratio should get more

cache resources

  • For inline phase, deduplication ratio comes from better temporal locality

 How to evaluate temporal locality ?

  • Changes dynamically with time
  • Accurate estimation is critical to achieve good cache allocation
  • Use # of duplicate blocks in N consecutive data blocks (estimation interval)

as an indicator for temporal locality

17 18/05/2017

slide-18
SLIDE 18

System architecture

18 18/05/2017

Estimate the temporal locality for streams and allocate cache according to this. On-disk fingerprint table for post-processing deduplication.

slide-19
SLIDE 19

Evaluate the temporal locality

 Simple idea: Count distinct data block fingerprints for streams

  • Introduce high memory overhead
  • May be comparable to the cache capacity

Estimate rather than count

  • Get the number of distinct fingerprints by small portion of samples
  • Essentially same as a classical problem ‘How many distinct elements exist in

a set ?’ Origin – Estimate # of species of animal population from samples [Fisher, JSTOR’1940]

  • Sublinear estimator – Unseen Estimation Algorithm [NIPS’13]

19 18/05/2017

slide-20
SLIDE 20

Estimate the temporal locality

 Using unseen algorithm to estimate LDSS.

20 18/05/2017 Estimation Interval I Time

f1 f2 f3 f4 ... ... ... ... … f15 f16 f17 f18

Fingerprint Sample Buffer Reservoir Sampling Unseen Estimation Algorithm

LDSS for Interval I

slide-21
SLIDE 21

Key points to deploy the estimation

 Unseen algorithm requires uniform sampling

  • Each fingerprint should be sampled with the same probability
  • We use Reservoir Sampling [TOMS’04]

 Choose a proper estimation interval

  • More unique data blocks -> Larger interval
  • A good approximation

– Historical inline deduplication ratio

  • Adaptive method

21 18/05/2017

slide-22
SLIDE 22

Differentiate the spatial locality

 Existing deduplication solutions exploit the spatial locality to reduce disk

fragmentation

  • perform deduplication on block sequences longer than a fixed threshold.

 Workloads have different sensitivity to the increase of threshold

  • Differentiating the workloads achieves better deduplication ratio with less

fragmentation

22 18/05/2017

Insensitive to the threshold change Sensitive to the threshold change

slide-23
SLIDE 23

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

23 18/05/2017

slide-24
SLIDE 24

Evaluation Setup

 Evaluated Systems

  • compare with inline (iDedup), post-processing and hybrid (DIODE)

deduplication schemas

 Workloads

  • FIU trace (FIU-home, FIU-web and FIU-mail)
  • Cloud-FTP (trace we collect from a FTP server by using NBD)

 Mixing workloads as multiple VMs

  • Different ratios between good locality (FIU, L) and bad locality (Cloud-FTP,

NL) workloads

  • Workload A (L:NL = 3:1), workload B (L:NL = 1:1), workload C (L:NL = 1:3)

24 18/05/2017

slide-25
SLIDE 25

Evaluation

 Inline deduplication ratio

  • Cache size (20MB – 320MB)
  • HPDedup improves the inline deduplication ratio (8.04% - 37.75% )

25 18/05/2017

slide-26
SLIDE 26

Evaluation

 Data written to disks (Comparing with post-processing deduplication)

  • HPDedup reduce the data written to disks by 12.78% - 45.08%

26 18/05/2017

slide-27
SLIDE 27

Evaluation

 Average # of hits for each cached fingerprint

  • DIODE [MASCOTS’16] skips files in inline deduplication based on file

extensions

  • HPDedup classifies data at finer-grained (stream temporal locality level) so

that the efficiency of inline deduplication can be further improved

27 18/05/2017

slide-28
SLIDE 28

Evaluation – LDSS Estimation Accuracy

28 18/05/2017

Real Normalized LDSS Cache Occupation (without Locality Estimation) Cache Occupation (with Locality Estimation)

Locality estimation allocates cache resources according to the temporal locality of streams and improves the inline deduplication ratio by 12.53%.

slide-29
SLIDE 29

Evaluation

29 18/05/2017

 Deduplication threshold

  • DIODE [MASCOTS’16] does not differentiate workloads.
  • HPDedup introduces less fragmentation while achieving higher dedup ratio
slide-30
SLIDE 30

Overhead Analysis

 Computational Overhead

  • Mainly includes (1) generating the fingerprint histogram and (2) linear

programming of estimation algorithm

  • (1) 7ms for 1M fingerprints
  • (2) 27ms regardless of the estimation interval size
  • More intuitive feeling – ms level overhead for GBs of data writing
  • Can be computed asynchronously

 Memory Overhead

  • Up to 2.81% of cache capacity for the three workloads

30 18/05/2017

slide-31
SLIDE 31

Outline

 Background  Motivations  Hybrid Prioritized Deduplication  Experiment Results  Conclusion

31 18/05/2017

slide-32
SLIDE 32

Conclusion

 New hybrid, prioritized data deduplication

  • Fusing inline and post-processing deduplication
  • Differentiate the temporal and spatial locality of data streams coming from

VMs and applications

 More efficient compared with the state of the art

  • Improve the inline deduplication ratio by up to 37.75%
  • Reduce the disk capacity requirement by up to 45.08%
  • Low computational and memory overhead

32 18/05/2017

slide-33
SLIDE 33

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

Huijun Wu The University of New South Wales, Australia Email: huijunw@cse.unsw.edu.au

33 18/05/2017