learning to reconstruct statistical learning theory and
play

Learning to Reconstruct: Statistical Learning Theory and Encrypted - PowerPoint PPT Presentation

Learning to Reconstruct: Statistical Learning Theory and Encrypted Database Attacks Paul Grubbs , Marie-Sarah Lacharit, Brice Minaud, Kenny Paterson pag225@cornell.edu, @pag_crypto Outsourced Databases Database Encrypting Outsourced


  1. Learning to Reconstruct: Statistical Learning Theory and Encrypted Database Attacks Paul Grubbs , Marie-Sarah Lacharité, Brice Minaud, Kenny Paterson pag225@cornell.edu, @pag_crypto

  2. Outsourced Databases Database

  3. Encrypting Outsourced Databases? Encryption prevents querying! Database

  4. Encrypted Databases Efficient ones leak access patterns: set of matching records for query Encrypted Database What can an attacker learn from access pattern leakage?

  5. Database Reconstruction (DR) With enough queries, can learn data from access patterns! [KKNO], [LMP], [KPT] Encrypted Database Prior work : huge numbers of queries, [KKNO]: 10 26 for salaries strong assumptions, [LMP]: dense database specific query types. [KPT]: kNN queries only

  6. Our Contributions • Enabling insight: access pattern leakage is a binary classification Use statistical learning theory (SLT) to build and analyze attacks • New DR attacks on range queries Generalize and improve [KKNO], [LMP] with SLT + PQ trees On real data: with only 50 queries, predict salaries to 2% error • Generic reduction from DR with known queries to PAC learning • Give “minimal” attack for all query types via ε-nets Instantiate with first DR attack for prefix queries • First general lower bound on #queries needed for DR Full version: https://eprint.iacr.org/2019/011

  7. Our Contributions • Enabling insight: access pattern leakage is a binary classification Use statistical learning theory (SLT) to build and analyze attacks • New DR attacks on range queries Generalize and improve [KKNO], [LMP] with SLT + PQ trees On real data: with only 50 queries, predict salaries to 2% error • Generic reduction from DR with known queries to PAC learning • Give “minimal” attack for all query types via ε-nets Instantiate with first DR attack for prefix queries • First general lower bound on #queries needed for DR Full version: https://eprint.iacr.org/2019/011

  8. Notation and Terminology N: number of possible values, wlog [1, … , N] E.g., N=125 for age data Range query: is a pair [a, b] where 1 ≤ a ≤ b ≤ N. Database: is composed of records , each with values in [1, … , N] [7, 10] 1 2 3 4 2 4 12 8 14 9 8 9 Access Pattern: which records match Full database reconstruction (DR): recovering exact record values Approximate DR: recovering all record values within εN. ε = 0.05 is recovery within 5%. ε = 1/N is full DR. Scale-free : query complexity independent of #records or N.

  9. DR For Range Queries: Our Work [KKNO16] : full DR in O( N 4 log N ) queries Generalizes Three attacks : Lower Bound Full DR ‣ GeneralizedKKNO: O(ε -4 log ε -1 ) for approx. DR Ω(ε -4 ) O( N 4 log N ) ‣ ApproxValue: O(ε -2 log ε -1 ) approx. DR * Ω(ε -2 ) O( N 2 log N ) ‣ ApproxOrder: O(ε -1 log ε -1 ) for approx. order rec * Ω(ε -1 log ε -1 ) O( N log N ) Implies With DB distribution info, get approx. DR [LMP18]: Full DR for dense DB in O( N log N ). Bypass [LMP] lower bound via * Requires a mild hypothesis about data relaxing to “sacrificial” recon.

  10. GeneralizedKKNO f 1 N Less probable More probable Assume uniform distribution on range queries + static database. Induces a distribution f on the probability that a value is accessed.

  11. GeneralizedKKNO f Two values! Estimate 1 N Idea : for each record … How many queries to 1. Count #accesses to estimate f (value) get estimate sufficient 2. Find value by “inverting” f estimate for ε approx. DR? More work needed to break symmetry. See paper for details

  12. Estimating a Probability Set X with probability distribution D . Let C ⊆ X be a set. Sample complexity : to measure Pr( C) within ε, you need O(1/ε 2 ) samples. C X

  13. Estimating a Set of Probabilities Now: set of sets 𝓓 . Goal: estimate all sets’ probabilities simultaneously . The set of samples drawn from X is an ε -sample i ff for all C in 𝓓 : X

  14. The ε-sample Theorem How many points do we need to draw to get an ε-sample w.h.p.? V & C 1971: If 𝓓 has VC dimension d , then the number of points to get an ε-sample whp is Does not depend on | 𝓓 |! X

  15. GeneralizedKKNO f Estimate 1 N Idea : for each record … 1. Count #accesses to estimate f (value) 2. Find value by “inverting” f estimate Can we get rid of squaring? This is an ε-sample! X = range queries 𝓓 ={{range queries ∋ x}: x ∈ [1, N ]} VC dim. = 2 We need O(ε -4 log ε -1 ) queries (inverting f adds a square)

  16. GeneralizedKKNO f Estimate 1 N Idea : for each record … 1. Count #accesses to estimate f (value) Assume there exists 2. Find value by “inverting” f estimate at least one record in This is an ε-sample! [N/8, 3N/8]. X = range queries 𝓓 ={{range queries ∋ x}: x ∈ [1, N ]} VC dim. = 2 We need O(ε -4 log ε -1 ) queries (inverting f adds a square)

  17. ApproxValue f Estimate 1 N Idea : for each record … 1. Count #accesses to estimate f (value) Assume there exists 2. Find value by “inverting” f estimate at least one record in This is an ε-sample! [N/8, 3N/8]. X = range queries 𝓓 ={{range queries ∋ x}: x ∈ [1, N ]} VC dim. = 2 We need O(ε -2 log ε -1 ) queries! More complex attack – see paper

  18. DR For Range Queries: Our Work Three attacks : Lower Bound Full DR ‣ GeneralizedKKNO: O(ε -4 log ε -1 ) for approx. DR Ω(ε -4 ) O( N 4 log N ) ‣ ApproxValue: O(ε -2 log ε -1 ) approx. DR * Ω(ε -2 ) O( N 2 log N ) ‣ ApproxOrder: O(ε -1 log ε -1 ) for approx. order rec * Ω(ε -1 log ε -1 ) O( N log N ) With DB distribution info, get approx. DR Require iid uniform queries, adversary knows query distribution. What can we do without making these assumptions?

  19. DR For Range Queries: Our Work Three attacks : Lower Bound Full DR ‣ GeneralizedKKNO: O(ε -4 log ε -1 ) for approx. DR Ω(ε -4 ) O( N 4 log N ) ‣ ApproxValue: O(ε -2 log ε -1 ) approx. DR * Ω(ε -2 ) O( N 2 log N ) ‣ ApproxOrder: O(ε -1 log ε -1 ) for approx. order rec * Ω(ε -1 log ε -1 ) O( N log N ) With DB distribution info, get approx. DR Reveal order without no assumptions on query distribution. See paper for details

  20. Conclusion • Enabling insight: access pattern leakage is a binary classification Use statistical learning theory (SLT) to build and analyze attacks • New DR attacks on range queries Generalize and improve [KKNO], [LMP] with SLT + PQ trees On real data: with only 50 queries, predict salaries to 2% error • Generic reduction from DR with known queries to PAC learning • Give “minimal” attack for all query types via ε-nets Instantiate with first DR attack for prefix queries • First general lower bound on #queries needed for DR Full version: https://eprint.iacr.org/2019/011 Thanks for listening! Any questions?

  21. Attack Simulation ApproxValue experimental results R = 1000, compared to theoretical ✏ -sample bound 0 . 200 ✏ − 2 log ✏ − 1 ✏ − 2 log ✏ − 1 0 . 175 0 . 150 Symmetric value/error (as a fraction of N ) 0 . 125 0 . 100 0 . 075 0 . 050 0 . 025 0 . 000 0 100 200 300 400 500 Number of queries Max. sacrificed symmetric value Max. symmetric error N = 100 N = 10000 N = 100 N = 10000 N = 1000 N = 100000 N = 1000 N = 100000 Effective constants are ~ 1!

  22. DR As Learning a Binary Classifier This formulation is not specific to range queries! Record values are binary classifiers X = range queries 𝓓 ={{range queries ∋ x}: x ∈ [1, N ]} Approximately learning classifier => approximate DR

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend