with tunable encrypted deduplication
play

with Tunable Encrypted Deduplication Jingwei Li * , Zuoru Yang # , - PowerPoint PPT Presentation

Balancing Storage Efficiency and Data Confidentiality with Tunable Encrypted Deduplication Jingwei Li * , Zuoru Yang # , Yanjing Ren * , Patrick P. C. Lee # , Xiaosong Zhang * * University of Electronic Science and Technology of China (UESTC) # The


  1. Balancing Storage Efficiency and Data Confidentiality with Tunable Encrypted Deduplication Jingwei Li * , Zuoru Yang # , Yanjing Ren * , Patrick P. C. Lee # , Xiaosong Zhang * * University of Electronic Science and Technology of China (UESTC) # The Chinese University of Hong Kong (CUHK) EuroSys 2020 1

  2. Deduplication  Deduplication  coarse-grained compression • Units: chunks (fixed- or variable-size)  Stores only one copy of duplicate chunks Storage space saved by 5/12 = 42%! 2

  3. Encrypted Deduplication  Augments deduplication with encryption for data confidentiality  Application: outsourced storage Storage Provider 🔓 Deduplication Encryption Which crypto primitive should be used? 🔓 Encryption 3

  4. Encryption Primitives  Symmetric-key encryption (SKE) • Derives a random key for chunk encryption/decryption • Ensures confidentiality, but prohibits deduplication of duplicate chunks  Message-locked encryption (MLE) [Bellare et al., Eurocrypt’13] • Derives a deterministic key from chunk content • Supports deduplication, but leaks frequency distribution of plaintext chunks [Li et al., DSN’17] Pose a dilemma of choosing the right cryptographic primitive 4

  5. Our Contributions  TED : a tunable encrypted deduplication primitive for balancing trade-off between storage efficiency and data confidentiality • Includes three new designs • Minimizes frequency leakage via a configurable storage blowup factor  TEDStore : encrypted deduplication prototype based on TED • TED incurs only limited performance overhead  Extensive trace-driven analysis and prototype experiments 5

  6. Main Idea  Key derivation with three inputs: chunk M , current frequency f , and balance parameter t f t 🔒 K Hash M K = H (M || ⌊ f/t ⌋ ) Function • f: cumulative and increases with number of duplicates of M • t: controls maximum allowed number of duplicate copies for a ciphertext chunk  Special cases: • t = 1  SKE • t → ∞  MLE 6

  7. Design Overview Key Manager Clients Provider Chunk … Chunk Deduplication  TED builds on server-aided MLE architecture in DupLESS [Bellare et al., Security’13] • Key generation by key manager to prevent offline brute-force attacks 7

  8. Questions  Q1: How does the key manager learn chunk frequencies? • Low overhead required even for many chunks  Q2: How does the key manager generate keys for chunks? • Distinct sequences of ciphertext chunks required for identical files  Q3: How should the balance parameter t be configured in practice? • Adaptive for different workloads 8

  9. Sketch-based Frequency Counting Count-Min Sketch +1 H 1 (M) +1 f = minimum counter M r rows indexed by (i, H i (M)) +1 +1 H r (M) w counters per row  Key manager estimates f via Count-Min Sketch [Cormode 2005] • Fixed memory usage with provable error bounds  Client sends short hashes { H i (M)} to key manager • Key manager cannot readily infer M from short hashes 9

  10. Probabilistic Key Generation  Selects K uniformly from candidate keys derived from 0, 1,…, ⌊ f/t ⌋ • Enables probabilistic encryption on identical files • Maintains deduplication effectiveness • Reason : f is cumulative; keys derived from 0, 1,…, ⌊ f/t ⌋ -1 have been used to encrypt some old copies of M Already encrypted chunks 🔒 K ← {K 0 , K 1 , K 2 , K 3 } 🔒 K 3 🔒 K 2 🔒 K 2 🔒 K 1 🔒 K 1 🔒 K 0 🔒 K 0 ……… … … … … M … … … 🔓 🔓 🔓 🔓 🔓 🔓 🔓 M M M M M M M Processing sequence 10

  11. Automated Parameter Configuration  Configure t by solving optimization problem , given: • Frequency distribution for a batch of plaintext chunks • Affordable storage blowup b over exact deduplication  Goal: minimize frequency leakage • Quantify frequency leakage by Kullback-Leibler distance (KLD) • KLD: relative entropy to uniform distribution • A lower KLD implies higher robustness against frequency analysis • Configure t from the returned optimal frequency distribution of ciphertext chunks 11

  12. Evaluation  TEDStore realizes TED in encrypted deduplication storage • ~4.5K line of C++ code in Linux  Trace analysis • FSL: file system snapshots (42 backups; 3.08TB raw data) • MS: windows file system snapshots (30 backups; 3.91TB raw data)  Prototype experiments • Local 10 GbE cluster 12

  13. Trade-off Analysis (FSL Dataset)  Schemes • MLE • SKE • MinHash [Li et al., DSN’17] • Basic TED (varying t) • Full TED (varying b)  Basic TED and Full TED effectively balance trade-off  Full TED readily configures actual storage blowup 13

  14. Prototype Experiments Fast Secure Steps (MD5, AES-128) (SHA-256, AES-256) Chunking 0.8ms Computational time Fingerprinting 1.7ms 2.6ms per 1MB of uploads Hashing 0.4ms Key Seeding 0.01ms 0.04ms TED operations Key Derivation 0.07ms 0.1ms Encryption 3.7ms 4.9ms  TED incurs limited overhead (7.2% for Fast; 6.1% for Secure)  More results in paper: • TED achieves ~ 30X key generation speedup over existing approaches • Multi-client upload/download performance 14

  15. Conclusion  TED: encrypted deduplication primitive that enables controllable trade-off between storage efficiency and data confidentiality • Sketch-based frequency counting • Probabilistic key generation • Automated parameter configuration  Source code: http://adslab.cse.cuhk.edu.hk/software/ted 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend