no free lunch in data privacy
play

No Free Lunch in Data Privacy CompSci 590.03 Instructor: Ashwin - PowerPoint PPT Presentation

No Free Lunch in Data Privacy CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 15: 590.03 Fall 12 1 Outline Background: Domain-independent privacy definitions No Free Lunch in Data Privacy [Kifer- M SIGMOD 11]


  1. No Free Lunch in Data Privacy CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 15: 590.03 Fall 12 1

  2. Outline • Background: Domain-independent privacy definitions • No Free Lunch in Data Privacy [Kifer- M SIGMOD ‘11] • Correlations: A case for domain specific privacy definitions [Kifer- M SIGMOD ‘11] • Pufferfish Privacy Framework [Kifer- M PODS’12] • Defining Privacy for Correlated Data [Kifer- M PODS’12 & Ding - M ‘13] – Next class Lecture 15: 590.03 Fall 12 2

  3. Data Privacy Problem Utility: Privacy: No breach about any individual Server D B Individual 1 Individual 2 Individual 3 Individual N r 1 r 2 r 3 r N Lecture 15: 590.03 Fall 12 3

  4. Data Privacy in the real world Application Data Collector Third Party Private Function (utility) (adversary) Information Medical Hospital Epidemiologist Disease Correlation between disease and geography Genome Hospital Statistician/ Genome Correlation between analysis Researcher genome and disease Advertising Google/FB/Y! Advertiser Clicks/Brows Number of clicks on an ad ing by age/region/gender … Social Facebook Another user Friend links Recommend other users Recommen- / profile or ads to users based on dations social network iDASH Privacy Workshop 9/29/2012 4

  5. Semantic Privacy ... nothing about an individual should be learnable from the database that cannot be learned without access to the database. T. Dalenius, 1977 Lecture 15: 590.03 Fall 12 5

  6. Can we achieve semantic privacy? • … or is there one (“precious…”) privacy definition to rule them all? Lecture 15: 590.03 Fall 12 6

  7. Defining Privacy • In order to allow utility, a non-negligible amount of information about an individual must be disclosed to the adversary. • Measuring information disclosed to an adversary involves carefully modeling the background knowledge already available to the adversary. • … but we do not know what information is available to the adversary. Lecture 15: 590.03 Fall 12 7

  8. Many definitions & several attacks • Linkage attack K-Anonymity • Background knowledge attack Sweeney et al. L-diversity IJUFKS ‘02 • Minimality /Reconstruction Machanavajjhala et. al attack TKDD ‘07 T-closeness • de Finetti attack E-Privacy • Composition attack Li et. al ICDE ‘07 Machanavajjhala et. al Diff ifferenti tial VLDB ‘09 Privacy Dw Dwork et. al ICALP ‘06 Lecture 15: 590.03 Fall 12 8

  9. Composability [Dwork et al, TCC 06] Theorem (Composability) : If algorithms A 1 , A 2 , …, A k use independent randomness and each A i satisfies ε i -differential privacy, resp. Then, outputting all the answers together satisfies differential privacy with ε = ε 1 + ε 2 + … + ε k Lecture 15: 590.03 Fall 12 10

  10. Differential Privacy • Domain independent privacy definition that is independent of the attacker. • Tolerates many attacks that other definitions are susceptible to. – Avoids composition attacks – Claimed to be tolerant against adversaries with arbitrary background knowledge. • Allows simple, efficient and useful privacy mechanisms – Used in a live US Census Product [ M et al ICDE ‘08] Lecture 15: 590.03 Fall 12 11

  11. Outline • Background: Domain independent privacy definitions. • No Free Lunch in Data Privacy [Kifer- M SIGMOD ‘11] • Correlations: A case for domain specific privacy definitions [Kifer- M SIGMOD ‘11] • Pufferfish Privacy Framework [Kifer- M PODS’12] • Defining Privacy for Correlated Data [Kifer- M PODS’12 & Ding - M ‘13] – Current research Lecture 15: 590.03 Fall 12 12

  12. No Free Lunch Theorem It is not possible to guarantee any utility in addition to privacy, without making assumptions about [Kifer- Machanavajjhala SIGMOD ‘11] • the data generating distribution • the background knowledge available [Dwork-Naor JPC ‘10] to an adversary Lecture 15: 590.03 Fall 12 13

  13. Discriminant: Sliver of Utility • Does an algorithm A provide any utility? w(k, A) > c if there are k inputs { D 1 , …, D k } such that A(D i ) give different outputs with probability > c . • Example: If A can distinguish between tables of size <100 and size >1000000000, then w(2,A) = 1 . 14

  14. Discriminant: Sliver of Utility Theorem: The discriminant of Laplace mechanism is 1. Proof: • Let Di = a database with n records and n∙i /k cancer patients • Let Si = the range [ n∙i /k – n/3k, n∙i /k + n/3k]. All Si are disjoint • Let M be the laplace mechanism on the query “how many cancer patients are there”. • Pr(M(Di) ε Si) = Pr(Noise < n/3k) > 1 – e -n/3k ε = 1 – δ • Hence, discriminant w(k,M) > 1- δ • As n tends to infinity, discriminant tends to 1. 15

  15. Discriminant: Sliver of Utility • Does an algorithm A provide any utility? w(k, A) > c if there are k inputs { D 1 , …, D k } such that A(D i ) give different outputs with probability > c . • If w(k, A) is close to 1 - we may get some utility after using A . • If w(k, A) is close to 0 - we cannot distinguish any k inputs – no utility. 16

  16. Non-privacy • D is randomly drawn from P data . • q is a sensitive query with k answers, s.t., knows P data but cannot guess value of q • A is not private if: can guess q correctly based on P data and A 17

  17. No Free Lunch Theorem • Let A be a privacy mechanism with w(k,A) > 1- ε • Let q be a sensitive query with k possible outcomes. • There exists a data generating distribution P data , s.t. – q(D) is uniformly distributed, but – wins with probability greater than 1- ε 18

  18. Outline • Background: Domain independent privacy definitions • No Free Lunch in Data Privacy [Kifer- M SIGMOD ‘11] • Correlations: A case for domain specific privacy definitions [Kifer- M SIGMOD ‘11] • Pufferfish Privacy Framework [Kifer- M PODS’12] • Defining Privacy for Correlated Data [Kifer- M PODS’12 & Ding - M ‘13] – Current research Lecture 15: 590.03 Fall 12 19

  19. Correlations & Differential Privacy • When an adversary knows that individuals in a table are correlated, then (s)he can learn sensitive information about individuals even from the output of a differentially private mechanism. • Example 1: Contingency tables with pre-released exact counts • Example 2: Social Networks Lecture 15: 590.03 Fall 12 20

  20. Contingency tables Each tuple takes k=4 different values 2 2 2 8 D Count( , ) Lecture 15: 590.03 Fall 12 21

  21. Contingency tables Want to release counts privately ? ? ? ? D Count( , ) Lecture 15: 590.03 Fall 12 22

  22. Laplace Mechanism 2 + Lap(1/ ε ) 2 + Lap(1/ ε ) 2 + Lap(1/ ε ) 8 + Lap(1/ ε ) Mean : 8 D Variance : 2/ ε 2 Guarantees differential privacy. Lecture 15: 590.03 Fall 12 23

  23. Marginal counts 2 + Lap(1/ ε ) 2 + Lap(1/ ε ) 4 4 2 + Lap(1/ ε ) 8 + Lap(1/ ε ) 10 10 4 4 10 10 Auxiliary marginals published for following reasons: 1. Legal : 2002 Supreme Court case Utah v. Evans 2. Contractual : Advertisers must know exact D demographics at coarse granularities Does Laplace mechanism still guarantee privacy? Lecture 15: 590.03 Fall 12 24

  24. Marginal counts 2 + Lap(1/ ε ) 2 + Lap(1/ ε ) 4 2 + Lap(1/ ε ) 2 + Lap(1/ ε ) 2 + Lap(1/ ε ) 8 + Lap(1/ ε ) 10 2 + Lap(1/ ε ) 4 10 Count ( , ) = 8 + Lap(1/ ε ) Count ( , ) = 8 - Lap(1/ ε ) D Count ( , ) = 8 - Lap(1/ ε ) Count ( , ) = 8 + Lap(1/ ε ) Lecture 15: 590.03 Fall 12 25

  25. Marginal counts 2 + Lap(1/ ε ) 2 + Lap(1/ ε ) 4 2 + Lap(1/ ε ) 2 + Lap(1/ ε ) 2 + Lap(1/ ε ) 8 + Lap(1/ ε ) 10 2 + Lap(1/ ε ) 4 10 Mean : 8 D Variance : 2/ke 2 can reconstruct the table with high precision for large k Lecture 15: 590.03 Fall 12 26

  26. Reason for Privacy Breach • Pairs of tables that differ in one tuple • cannot distinguish them Tables that do not satisfy background knowledge Space of all possible tables Lecture 15: 590.03 Fall 12 27

  27. Reason for Privacy Breach can distinguish between every pair of these tables based on the output Space of all possible tables Lecture 15: 590.03 Fall 12 28

  28. Correlations & Differential Privacy • When an adversary knows that individuals in a table are correlated, then (s)he can learn sensitive information about individuals even from the output of a differentially private mechanism. • Example 1: Contingency tables with pre-released exact counts • Example 2: Social Networks Lecture 15: 590.03 Fall 12 29

  29. A count query in a social network Bob Alice • Want to release the number of edges between blue and green communities. • Should not disclose the presence/absence of Bob-Alice edge. 30

  30. Adversary knows how social networks evolve • Depending on the social network evolution model, (d 2 -d 1 ) is linear or even super-linear in the size of the network. 31

  31. Differential privacy fails to avoid breach Output (d 1 + δ ) δ ~ Laplace(1/ ε ) Output (d 2 + δ ) Adversary can distinguish between the two worlds if d 2 – d 1 is large. 32

  32. Outline • Background: Domain independent privacy definitions • No Free Lunch in Data Privacy [Kifer- M SIGMOD ‘11] • Correlations: A case for domain-specific privacy definitions [Kifer- M SIGMOD ‘11] • Pufferfish Privacy Framework [Kifer- M PODS’12] • Defining Privacy for Correlated Data [Kifer- M PODS’12 & Ding - M ‘13] – Current research Lecture 15: 590.03 Fall 12 33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend