austere flash caching with deduplication and compression
play

Austere Flash Caching with Deduplication and Compression Qiuping - PowerPoint PPT Presentation

Austere Flash Caching with Deduplication and Compression Qiuping Wang * , Jinhong Li * , Wen Xia # Erik Kruus ^ , Biplob Debnath ^ , Patrick P. C. Lee * * The Chinese University of Hong Kong (CUHK) # Harbin Institute of Technology, Shenzhen ^ NEC


  1. Austere Flash Caching with Deduplication and Compression Qiuping Wang * , Jinhong Li * , Wen Xia # Erik Kruus ^ , Biplob Debnath ^ , Patrick P. C. Lee * * The Chinese University of Hong Kong (CUHK) # Harbin Institute of Technology, Shenzhen ^ NEC Labs 1

  2. Flash Caching Ø Flash-based solid-state drives (SSDs) • ü Faster than hard disk drives (HDD) • ü Better reliability • û Limited capacity and endurance Ø Flash caching • Accelerate HDD storage by caching frequently accessed blocks in flash 2

  3. Deduplication and Compression Ø Reduce storage and I/O overheads Ø Deduplication (coarse-grained) • In units of chunks (fixed- or variable-size) • Compute fingerprint (e.g., SHA-1) from chunk content • Reference identical (same FP) logical chunks to a physical copy Ø Compression (fine-grained) • In units of bytes • Transform chunks into fewer bytes 3

  4. Deduplicated and Compressed Flash Cache Ø LBA : chunk address in HDD; FP : chunk fingerprint Ø CA : chunk address in flash cache (after dedup’ed + compressed) Read/write LBA-index Chunking LBA, CA LBA à FP LBA, CA Deduplication … FP-index and compression FP à CA, length Dirty list RAM I/O SSD Variable-size compressed chunks (after deduplication) Fixed-size HDD chunks 4

  5. Memory Amplification for Indexing Ø Example: 512-GiB flash cache with 4-TiB HDD working set Ø Conventional flash cache • Memory overhead: 256 MiB LBA (8B) à CA (8B) Ø Deduplicated and compressed flash cache • LBA-index: 3.5 GiB LBA (8B) à FP (20B) • FP-index: 512 MiB FP (20B) à CA (8B) + Length (4B) • Memory amplification: 16x • Can be higher 5

  6. Related Work Ø Nitro [Li et al., ATC’14] • First work to study deduplication and compression in flash caching • Manage compressed data in Write-Evict Units (WEUs) Ø CacheDedup [Li et al, FAST’16] • Propose dedup-aware algorithms for flash caching to improve hit ratios They both suffer from memory amplification! 6

  7. Our Contribution Ø AustereCache : a deduplicated and compressed flash cache with austere memory-efficient management • Bucketization • No overhead for address mappings • Hash chunks to storage locations • Fixed-size compressed data management • No tracking for compressed lengths of chunks in memory • Bucket-based cache replacement • Cache replacement per bucket • Count-Min Sketch [Cormode 2005] for low-memory reference counting Ø Extensive trace-driven evaluation and prototype experiments 7

  8. Bucketization Ø Main idea • Use hashing to partition index and cache space Bucket • (RAM) LBA-index and FP-index • (SSD) metadata region and data region • Store partial keys (prefixes) in memory … … … • Memory savings slot Ø Layout • Hash entries into equal-sized buckets mapping / data • Each bucket has fixed-number of slots 8

  9. (RAM) LBA-index and FP-index Ø (RAM) LBA-index and FP-index FP-index LBA-index Bucket Bucket … … … … … … slot slot LBA-hash prefix FP hash Flag FP-hash prefix Flag • Locate buckets with hash suffixes • Match slots with hash prefixes • Each slot in FP-index corresponds to a storage location in flash 9

  10. (SSD) Metadata and Data Regions Ø (SSD) Metadata region and data region Bucket Bucket … … … … … … slot slot Metadata Data region region FP List of LBAs Chunk • Each slot has full FP and list of full LBAs in metadata region • For validation against prefix collisions • Cached chunks in data region 10

  11. Fixed-size Compressed Data Management Ø Main idea • Slice and pad a compressed chunk into fixed-size subchunks 20KiB 32KiB 8KiB each Compress Slice and Pad Ø Advantages • Compatible with bucketization • Store each subchunk in one slot • Allow per-chunk management for cache replacement 11

  12. Fixed-size Compressed Data Management Ø Layout • One chunk occupies multiple consecutive slots • No additional memory for compressed length FP-index Bucket FP-hash prefix Flag … … RAM SSD Chunk … … … … FP List of LBAs Length Subchunk Metadata Region Data Region 12

  13. Bucket-based Cache Replacement Ø Main idea • Cache replacement in each bucket independently • Eliminate priority-based structures for cache decisions LBA-index FP-index Reference • Combine recency and deduplication Counter Slot Slot • LBA-index: least-recently-used policy 2 Recent • FP-index: least-referenced policy … 3 … • Weighted reference counting based on … … recency in LBAs Old … … … 13

  14. Sketch-based Reference Counting Ø High memory overhead for complete reference counting • One counter for every FP-hash Ø Count-Min Sketch [Cormode 2005] • Fixed memory usage with provable error bounds +1 count = minimum counter FP-hash +1 h indexed by (i, H i (FP-hash)) +1 w 14

  15. Evaluation Ø Implement AustereCache as a user-space block device • ~4.5K lines of C++ code in Linux Ø Traces • FIU traces: WebVM, Homes, Mail • Synthetic traces: varying I/O dedup ratio and write-read ratio • I/O dedup ratio: fraction of duplicate written chunks in all written chunks Ø Schemes • AustereCache: AC-D, AC-DC • CacheDedup: CD-LRU-D, CD-ARC-D, CD-ARC-DC 15

  16. Memory Overhead AC-D AC-DC CD-LRU-D CD-ARC-D CD-ARC-DC 1000 1000 1000 Memory (MiB) Memory (MiB) Memory (MiB) 100 100 100 10 10 10 1 1 1 12.5 25 37.5 50 62.5 75 87.5 100 12.5 25 37.5 50 62.5 75 87.5 100 12.5 25 37.5 50 62.5 75 87.5 100 Cache Capacity (%) Cache Capacity (%) Cache Capacity (%) (a) WebVM (b) Homes (c) Mail Ø AC-D incurs 69.9-94.9% and 70.4-94.7% less memory across all traces than CD-LRU-D and CD-ARC-D, respectively. Ø AC-DC incurs 87.0-97.0% less memory than CD-ARC-DC. 16

  17. Read Hit Ratios AC-D AC-DC CD-LRU-D CD-ARC-D CD-ARC-DC 100 50 100 40 Read Hit (%) Read Hit (%) Read Hit (%) 75 75 30 50 50 20 25 25 10 0 0 0 12.5 25 37.5 50 62.5 75 87.5 100 12.5 25 37.5 50 62.5 75 87.5 100 12.5 25 37.5 50 62.5 75 87.5 100 Cache Capacity (%) Cache Capacity (%) Cache Capacity (%) (a) WebVM (b) Homes (c) Mail Ø AC-D has up to 39.2% higher read hit ratio than CD-LRU-D, and similar read hit ratio as CD-ARC-D Ø AC-DC has up to 30.7% higher read hit ratio than CD-ARC-DC 17

  18. Write Reduction Ratios AC-D AC-DC CD-LRU-D CD-ARC-D CD-ARC-DC 80 80 80 Write Rd. (%) Write Rd. (%) Write Rd. (%) 60 60 60 40 40 40 20 20 20 0 0 0 12.5 25 37.5 50 62.5 75 87.5 100 12.5 25 37.5 50 62.5 75 87.5 100 12.5 25 37.5 50 62.5 75 87.5 100 Cache Capacity (%) Cache Capacity (%) Cache Capacity (%) (a) WebVM (b) Homes (c) Mail Ø AC-D is comparable as CD-LRU-D and CD-ARC-D Ø AC-DC is slightly lower (by 7.7-14.5%) than CD-ARC-DC • Due to padding in compressed data management 18

  19. Throughput AC-D AC-DC CD-LRU-D CD-ARC-D CD-ARC-DC 100 100 Thpt (MiB/s) Thpt (MiB/s) 75 75 50 50 25 25 0 0 20 40 60 80 9:1 7:3 5:5 3:7 1:9 I/O Dedup Ratio (%) Write-to-Read Ratio (b) Throughput vs. write-to-read (a) Throughput vs. I/O dedup ratio (I/O dedup ratio 50%) ratio (write-to-read ratio 7:3) Ø AC-DC has highest throughput • Due to high write reduction ratio and high read hit ratio Ø AC-D has slightly lower throughput than CD-ARC-D • AC-D needs to access the metadata region during indexing 19

  20. CPU Overhead and Multi-threading 6025 Ø Latency (32 KiB chunk write) Fingerprint 6000 Compression Latency (us) 5975 Lookup • HDD (5,997 µs) and SSD (85 µs) Update 100 SSD • AustereCache (31.2 µs) (fingerprinting 15.5 µs) 75 HDD 50 • Latency hidden via multi-threading 25 0 Ø Multi-threading (write-read ratio 7:3) 250 50% dedup • 50% I/O dedup ratio: 2.08X 80% dedup Thpt (MiB/s) 200 150 • 80% I/O dedup ratio: 2.51X 100 • Higher I/O dedup ratio implies less I/O to flash 50 à more computation savings via multi-threading 0 1 2 4 6 8 Number of threads 20

  21. Conclusion Ø AustereCache: memory efficiency in deduplicated and compressed flash caching via • Bucketization • Fixed-size compressed data management • Bucket-based cache replacement Ø Source code: http://adslab.cse.cuhk.edu.hk/software/austerecache 21

  22. Thank You! Q & A 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend