differential privacy and applications
play

Differential Privacy and Applications Marco Gaboardi Boston - PowerPoint PPT Presentation

Differential Privacy and Applications Marco Gaboardi Boston University Data Private Queries? Private Queries? medical correlation? y r e u q r e w s n a Private Queries? Does Joe have cancer? query a n s w e r


  1. Differential Privacy and Applications Marco Gaboardi Boston University 


  2. Data

  3. Private Queries?

  4. Private Queries? medical correlation? y r e u q r e w s n a

  5. Private Queries? Does Joe have cancer? query a n s w e r

  6. Private Queries? Does Joe have cancer?

  7. Anonymization?

  8. Anonymization? medical correlation y r e u q r e w s n a

  9. Anonymization? query ?!? a n s w e r

  10. Attacks on Anonymization 
 (Narayanan, Shmatikov: Robust De-anonymization of Large Sparse Datasets. IEEE Symposium on Security and Privacy 2008) correlations Additional Data Anonymous Data

  11. A Possible Solution: randomization

  12. Adding noise Noise

  13. Adding noise medical correlation? Noise y r e u q e s i o n + r e w s n a

  14. Adding noise Noise ?!? query a n s w e r + n o i s e

  15. Adding noise Noise ?!?

  16. Data analyst

  17. Privacy vs Utility Privacy Utility

  18. Differential privacy: understanding the mathematical and computational meaning of this trade-off. [Dwork, McSherry, Nissim, Smith, TCC06]

  19. Some Official Users • US Census Bureau - onTheMap, new releases in 2020 • Google - RAPPOR tool for Chrome • Apple - typing statistics reports in devices • Facebook - social science data release • Uber / Amazon / Mozilla / Snapchat • Many startups

  20. The rest of the class • Today: Fundamental of reconstruction attacks and definition of differential privacy. • Tuesday: Basic mechanisms and basic properties of differential privacy and how to support them in programming languages. • Thursday: More advanced mechanisms and their verification. • Friday: Other models and applications.

  21. Today: Fundamental of reconstruction attacks and definition of differential privacy

  22. Data Statistics over Data

  23. Is this data private? D1 D2 D3 D4 D5 D6 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 I1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 I2 1 0 1 1 1 0 1 0 1 0 1 0 0 1 0 0 I3 0 1 0 1 1 1 0 1 0 0 0 1 0 0 1 0 I4 1 0 1 0 0 1 1 0 1 1 0 0 0 0 1 1 I5 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 0 I6 0 0 1 1 0 1 1 0 1 1 0 0 1 0 1 0 I7 1 1 0 0 1 0 1 1 1 0 1 0 1 0 0 1 I8 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 I9 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 0 I10 1 0 1 0 0 1 1 0 0 0 0 0 0 1 0 1 I11 0 1 0 1 1 0 0 1 0 1 0 1 0 1 1 0 I12 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 I13 1 1 1 0 1 1 1 1 0 0 1 0 1 0 1 0 I14 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 I15 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1

  24. How about if we also have this data? ID Disease ID Name 1 D1 AMAN 1 I1 Alice 2 D2 Behcet 2 I2 Bob 3 D3 Celiac 3 I3 Cynthia a 4 D4 Dermatitis 4 I4 Dan 5 D5 Evans synd. 5 I5 Eve 6 D6 Fibrosis 6 I6 Frank 7 I7 Guy 7 D7 Graves’ dis. 8 I8 Hannah 8 D8 Henoch-Schonlein 9 I9 Ivan 9 D9 IGA Neph. 10 I10 Jon 10 D10 Juv. Diabetes 11 I11 Ken 11 D11 Kawasaki dis. 12 I12 Lou 12 D12 Lichen planus 13 I13 Mike 13 D13 Myositis 14 I14 Noa 14 D14 Narcolepsy 15 I15 Omer 15 D15 Optic Neuritis

  25. How about this? D1 D2 D3 D4 D5 D6 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 6 7 6 7 8 7 9 7 5 5 1 6 6 5 7 5 ID Disease ID Disease 1 D1 AMAN 9 D9 IGA Neph. 2 D2 Behcet 10 D10 Juv. Diabetes 3 D3 Celiac 11 D11 Kawasaki dis. 4 D4 Dermatitis 12 D12 Lichen planus 5 D5 Evans synd. 13 D13 Myositis 6 D6 Fibrosis 14 D14 Narcolepsy 7 D7 Graves’ dis. 15 D15 Optic Neuritis 8 D8 Henoch-Schonlein

  26. The answers to this kind of questions depend on the additional information we have available

  27. Database • We can think about a database as a list of records from some universe set: D ∈ X n = DB • Sometimes we will think to them as functions D[i] ∈ X • and sometimes we will write elements explicitly (x 1 ,…,x n ) ∈ DB

  28. 
 (Normalized) Counting Queries • A counting query q : X n → [0,1] is a function counting the proportion of element in a dataset satisfying the predicate: 
 q : X → {0,1} • In symbols: 
 q ( D ) = 1 n ∑ q ( D [ i ]) i

  29. Example 1 Let’s consider an arbitrary universe domain X and let’s consider the following predicate for y ∈ X I 1 if y = x q y ( x ) = 0 otherwise we call a point function the associated counting query q y : X n → [0,1] Question: Suppose that we answer all the point function queries for y ∈ X . What well know statistics do we obtain?

  30. Example 1 D1 D2 D3 I1 0 0 0 I2 1 0 1 I3 0 1 0 I4 1 0 1 D ∈ X 10 = X={0,1} 3 I5 0 0 0 I6 0 0 1 I7 1 1 0 I8 0 0 0 I9 0 1 0 I10 1 0 1 0.3 q 000 (D) = .3 q 100 (D) = 0 0.225 q 001 (D) = .1 q 101 (D) = .3 0.15 q 010 (D) = .2 q 110 (D) = .1 0.075 q 011 (D) = 0 q 111 (D) = 0 0 000 001 010 011 100 101 110 111

  31. Example 1 Question: Suppose that we answer all the point function queries for y ∈ X . What well know statistics do we obtain? Answer: Histogram of the database.

  32. Example II Let’s consider an arbitrary ordered universe domain X and let’s consider the following predicate for y ∈ X I 1 if x ≤ y q y ( x ) = 0 otherwise we call a threshold function the associated counting query q y : X n → [0,1] Question: Suppose that we answer all the threshold function queries for y ∈ X . What well know statistics do we obtain?

  33. Example II D1 D2 D3 I1 0 0 0 X={0,1} 3 I2 1 0 1 I3 0 1 0 with order 
 I4 1 0 1 D ∈ X 10 = given by the 
 I5 0 0 0 I6 0 0 1 corresponding 
 I7 1 1 0 binary encoding. I8 0 0 0 I9 0 1 0 I10 1 0 1 1 q 000 (D) = .3 q 100 (D) = .6 0.75 q 001 (D) = .4 q 101 (D) = .9 0.5 q 010 (D) = .6 q 110 (D) = 1 0.25 q 011 (D) = .6 q 111 (D) = 1 0 000 001 010 011 100 101 110 111

  34. Example II Question: Suppose that we answer all the threshold function queries for y ∈ X . What well know statistics do we obtain? Answer: CDF of the database.

  35. Example III Let’s consider the universe domain X={0,1} d and let’s consider the following predicate for an index 1 ≤ j ≤ d q j (x) = x[j] we call an attribute counting function the associated counting query q j : {0,1} n*d → [0,1] Question: Which statistics does correspond to releasing all the attribute counting functions?

  36. Example III D1 D1 D2 D2 D3 D3 I1 I1 0 0 0 0 0 0 I2 I2 1 1 0 0 1 1 I3 I3 0 0 1 1 0 0 I4 1 0 1 I4 1 0 1 D ∈ X 10 = X={0,1} 3 I5 I5 0 0 0 0 0 0 I6 0 0 1 I6 0 0 1 I7 1 1 0 I7 1 1 0 I8 I8 0 0 0 0 0 0 I9 0 1 0 I9 0 1 0 I10 1 0 1 I10 1 0 1 margin 4 3 4 q 1 (D) = .4 q 2 (D) = .3 q 3 (D) = .4

  37. Example III Question: Which statistics does correspond to releasing all the attribute counting functions? Answer: (1-way) Marginals of the database

  38. ⃗ ⃗ Example IV Let’s consider the universe domain X={0,1} d and let’s consider 2, …, d , ¯ v ∈ List [ k ]{1,¯ 1,2,¯ and d } q v ( x ) = q v 1 ( x ) ∧ q v 2 ( x ) ∧ ⋯ ∧ q v k ( x ) j ( x ) = ¬ x j f and d q ¯ where as q j ( x ) = x j an ounting query q : conjunction the c We call a conjunction or k-way marginal the associated counting query q j : {0,1} n*d → [0,1] Question: Which statistics does correspond to releasing conjunctions?

  39. Example IV D1 D2 D3 I1 0 0 0 I2 1 0 1 I3 0 1 0 I4 1 0 1 D ∈ X 10 = X={0,1} 3 I5 0 0 0 I6 0 0 1 I7 1 1 0 I8 0 0 0 I9 0 1 0 I10 1 0 1 k=2 q 12 (D) = .1 q /12 (D) = .2 D1 /D1 q 1/2 (D) = .3 q /13 (D) = .1 D2 0.1 0.2 q 13 (D) = .3 q /1/2 (D) = .4 /D2 0.3 0.4 q 1/3 (D) = .1 q /1/3 (D) = .5

  40. Example IV Question: Which statistics does correspond to releasing conjunctions? Answer: contingency tables

  41. Linear Queries • A linear query q : X n → [0,1] is a function averagint the value of a function 
 q : X → [0,1] over the elements of the dataset. • In symbols: 
 q ( D ) = ∑ 1 nq ( D [ i ]) i

  42. 
 Sum queries • Let’s denote by I ⊆ ℕ a subset of the rows of the dataset. • A sum query q I : List(X) → ℕ is defined as 
 q I ( D ) = ∑ D [ i ] i ∈ I

  43. Example D1 D2 D3 I1 0 0 0 I2 1 0 1 I3 0 1 0 I4 1 0 1 X=List[3]{0,1} D = I5 0 0 0 I6 0 0 1 I7 1 1 0 I8 0 0 0 I9 0 1 0 I10 1 0 1 q {1,2,3} (D) = (1,1,1) q {1,2,4} (D) = (2,0,2) q {5,8} (D) = (0,0,0) q {2,4,7,10} (D) = (4,1,3)

  44. Question: Is releasing the result of sum (counting or linear) queries private?

  45. Question: How can we make statistical queries private?

  46. Private Statistical database statistical Noise query answer+noise Question: What kind of noise?

  47. Additive Noise Perturbation • We say that M is a privacy mechanism obtained by adding noise if for every query q, M creates a new randomized query: q*(D) = q(D) + Y • We say that a mechanism M add noise within 
 perturbation E iff for every q and every D: |q*(D)-q(D)| ≤ E

  48. Question: Does this approach protect privacy?

  49. Reconstruction attack q1 Attacker Noise q2 D … qk D’

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend