sadedupe skew area inline deduplication for distributed
play

SADedupe: Skew Area Inline Deduplication for Distributed Storage - PowerPoint PPT Presentation

SADedupe: Skew Area Inline Deduplication for Distributed Storage Binqi Zhang , Bing Bing Zhou, Chen Wang * , Dong Yuan, Albert Y. Zomaya The University of Sydney, Sydney, Australia * CSIRO, Sydney, Australia 1 Introduction Deduplication


  1. SADedupe: Skew Area Inline Deduplication for Distributed Storage Binqi Zhang , Bing Bing Zhou, Chen Wang * , Dong Yuan, Albert Y. Zomaya The University of Sydney, Sydney, Australia * CSIRO, Sydney, Australia 1

  2. Introduction – Deduplication Routing • Files -> Chunks • Chunks -> Blocks & Hash calculation • Extract the feature ID • Use the feature ID to route the chunk to node Deduplication • Check all hash values of blocks • If exist, then add reference • If not, store the block 2

  3. System architecture 3

  4. Problem File Chunk Ref Data Count Node Replication Queue Longer processing 4 queues

  5. Algorithm & results • We check the feature ID used for routing for its reference count • Currently we use “capping” approach • Standard deviation of post dedupe storage usage (PDSU)is examined. RT = reference count threshold 5

  6. Future work • To find a better and bigger data set to illustrate the severity of the skew issue and impact to read performance • To find a few more routing algorithms that optimize the load balancing • Consider the replication 6

  7. Thank you 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend