simulatability
play

Simulatability The enemy knows the system, Claude Shannon CompSci - PowerPoint PPT Presentation

Simulatability The enemy knows the system, Claude Shannon CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 6 : 590.03 Fall 12 1 Announcements Please meet with me at least 2 times before you finalize your project (deadline


  1. Simulatability “The enemy knows the system”, Claude Shannon CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 6 : 590.03 Fall 12 1

  2. Announcements • Please meet with me at least 2 times before you finalize your project (deadline Sep 28). Lecture 6 : 590.03 Fall 12 2

  3. Recap – L-Diversity • The link between identity and attribute value is the sensitive information. “ Does Bob have Cancer? Heart disease? Flu?” “Does Umeko have Cancer? Heart disease? Flu?” • Adversary knows ≤ L -2 negation statements. “ Umeko does not have Heart Disease.” – Data Publisher may not know exact adversarial knowledge • Privacy is breached when identity can be linked to attribute value with high probability Pr[ “ Bob has Cancer” | published table, adv. knowledge ] > t Lecture 6 : 590.03 Fall 12 3

  4. Recap – 3-Diverse Table Zip Age Nat. Disease 1306* <=40 * Heart 1306* <=40 * Flu 1306* <=40 * Cancer L-Diversity Principle : 1306* <=40 * Cancer Every group of tuples with the 1485* >40 * Cancer same Q-ID values has ≥ L 1485* >40 * Heart distinct sensitive values of 1485* >40 * Flu 1485* >40 * Flu roughly equal proportions. 1305* <=40 * Heart 1305* <=40 * Flu 1305* <=40 * Cancer 1305* <=40 * Cancer Lecture 6 : 590.03 Fall 12 4

  5. Outline • Simulatable Auditing • Minimality Attack in anonymization • Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 12 5

  6. Query Auditing Query Yes Database Safe to publish? No Researcher Database has numeric values (say salaries of employees). Database either truthfully answers a question or denies answering. MIN, MAX, SUM queries over subsets of the database. Question: When to allow/deny queries? Lecture 6 : 590.03 Fall 12 6

  7. Why should we deny queries? • Q1: Ben’s sensitive value? Name 1 st year Gender Sensitiv – DENY PhD e value Ben Y M 1 • Q2: Max sensitive value of Bha N M 1 males? Ios Y M 1 – ANSWER: 2 Jan N M 2 Jian Y M 2 Jie N M 1 • Q3: Max sensitive value of 1 st Joe N M 2 year PhD students? Moh N M 1 – ANSWER: 3 Son N F 1 Xi Y F 3 • But Q3 + Q2 => Xi = 3 Yao N M 2 Lecture 6 : 590.03 Fall 12 7

  8. Value-Based Auditing • Let a 1 , a 2 , …, a k be the answers to previous queries Q 1 , Q 2 , …, Q k . • Let a k+1 be the answer to Q k+1 . a i = f(c i1 x 1 , c i2 x 2 , …, c in x n ), i = 1 … k+1 c im = 1 if Q i depends on x m Check if any x j has a unique solution. Lecture 6 : 590.03 Fall 12 8

  9. Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. • Allow query if value of xi can’t be inferred. x 1 x 2 x 3 x 4 x 5 Lecture 6 : 590.03 Fall 12 9

  10. Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. - ∞ ≤ x 1 … x 5 ≤ 10 • Allow query if value of xi can’t be inferred. max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 x 4 x 5 Lecture 6 : 590.03 Fall 12 10

  11. Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. • Allow query if value of xi can’t be inferred. - ∞ ≤ x 1 … x 4 ≤ 8 max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 => x 5 = 10 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 11

  12. Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. Denial means some • Allow query if value of xi can’t be inferred. value can be compromised! max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 12

  13. Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. What could • Allow query if value of xi can’t be inferred. max(x1, x2, x3, x4) be? max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 13

  14. Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. From first answer, • Allow query if value of xi can’t be inferred. max(x1,x2,x3,x4) ≤ 10 max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 14

  15. Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. If, max(x1,x2,x3,x4) = 10 • Allow query if value of xi can’t be inferred. Then, no privacy breach max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 15

  16. Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. Hence, • Allow query if value of xi can’t be inferred. max(x1,x2,x3,x4) < 10 => x5 = 10! max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 16

  17. Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. Hence, • Allow query if value of xi can’t be inferred. max(x1,x2,x3,x4) < 10 => x5 = 10! Denials leak information. max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 Attack occurred since privacy analysis did x 2 Ans: 10 10 x 3 not assume that attacker knows the algorithm. max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 17

  18. Simulatable Auditing [Kenthapadi et al PODS ‘05] • An auditor is simulatable if the decision to deny a query Q k is made based on information already available to the attacker. – Can use querie s Q 1 , Q 2 , …, Q k and answers a 1 , a 2 , …, a k-1 – Cannot use a k or the actual data to make the decision. • Denials provably do not leak informaiton – Because the attacker could equivalently determine whether the query would be denied. – Attacker can mimic or simulate the auditor. Lecture 6 : 590.03 Fall 12 18

  19. Simulatable Auditing Algorithm • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. Ans > 10 => not possible • Allow query if value of xi can’t be inferred. Ans = 10 => - ∞ ≤ x 1 … x 4 ≤ 10 SAFE UNSAFE Ans < 10 => x 5 = 10 max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 Before x 5 computing DENY answer Lecture 6 : 590.03 Fall 12 19

  20. Summary of Simulatable Auditing • Decision to deny answers must be based on past queries answered in some ( many! ) cases. • Denials can leak information if the adversary does not know all the information that is used to decide whether to deny the query. Lecture 6 : 590.03 Fall 12 20

  21. Outline • Simulatable Auditing • Minimality Attack in anonymization • Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 12 21

  22. Minimality attack on Generalization algorithms • Algorithms for K-anonymity, L-diversity, T-closeness, etc. try to maximize utility. – Find a minimally generalized table in the lattice that satisfies privacy, and maximizes utility. • But … attacker also knows this algorithm! Lecture 6 : 590.03 Fall 12 22

  23. Example Minimality attack [Wong et al VLDB07] • Dataset with one quasi-identifier and 2 values q1, q2. • q1, q2 generalize to Q. • Sensitive attribute: Cancer – yes/no • We want to ensure P[Cancer = yes] < ½. – OK to know if an individual does not have Cancer. QID Cancer Q Yes Q Yes • Published Table: Q No Q No q2 No q2 No Lecture 6 : 590.03 Fall 12 23

  24. Which input datasets could have led to the published table? Output dataset {q1,q2}  Q Possible Input dataset (“2 - diverse”) 3 occurrences of q1 QID Cancer QID Cancer QID Cancer q1 Yes q1 Yes Q Yes q1 Yes q1 No Q Yes q1 No q1 No Q No q2 No q2 Yes Q No q2 No q2 No q2 No q2 No q2 No q2 No Lecture 6 : 590.03 Fall 12 24

  25. Which input datasets could have led to the published table? Output dataset {q1,q2}  Q Possible Input dataset (“2 - diverse”) 3 occurrences of q1 QID Cancer QID Cancer q1 Yes Q Yes Q No Q Yes Q No Q No q2 Yes Q No q2 No q2 No q2 No q2 No This is a better generalization! Lecture 6 : 590.03 Fall 12 25

  26. Which input datasets could have led to the published table? Output dataset {q1,q2}  Q Possible Input dataset (“2 - diverse”) 1 occurrence of q1 QID Cancer QID Cancer QID Cancer q2 Yes q2 Yes Q Yes q1 Yes q2 Yes Q Yes q2 No q1 No Q No q2 No q2 No Q No q2 No q2 No q2 No q2 No q2 No q2 No Lecture 6 : 590.03 Fall 12 26

  27. Which input datasets could have led to the published table? Output dataset {q1,q2}  Q Possible Input dataset (“2 - diverse”) 3 occurrences of q1 QID Cancer QID Cancer q2 Yes Q Yes Q No Q Yes Q No Q No q2 Yes Q No q2 No q2 No q2 No q2 No This is a better generalization! Lecture 6 : 590.03 Fall 12 27

  28. Which input datasets could have led to the published table? Output dataset {q1,q2}  Q Possible Input dataset (“2 - diverse”) 3 occurrences of q1 QID Cancer QID Cancer q2 Yes Q Yes There must be exactly two tuples with q1 Q No Q Yes Q No Q No q2 Yes Q No q2 No q2 No q2 No q2 No Lecture 6 : 590.03 Fall 12 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend