foundations of computing ii
play

Foundations of Computing II Lecture 26: Applications Differential - PowerPoint PPT Presentation

CSE 312 Foundations of Computing II Lecture 26: Applications Differential Privacy Stefano Tessaro tessaro@cs.washington.edu 1 Setting Data mining Statistical queries Medical data Query logs Social network data 2 Setting


  1. CSE 312 Foundations of Computing II Lecture 26: Applications – Differential Privacy Stefano Tessaro tessaro@cs.washington.edu 1

  2. Setting Data mining Statistical queries Medical data Query logs Social network data … 2

  3. Setting – Data Release Main concern: Do not violate user privacy! Internet Publish: Aggregated data, e.g., outcome of medical study, research paper, … 3

  4. [Sweeney ‘00] Example – Linkage Attack • The Commonwealth of Massachusetts Group Insurance Commission (GIC) releases 135,000 records of patient encounters, each with 100 attributes – Relevant attributes removed, but ZIP, birth date, gender available – Considered “safe” practice • Public voter registration record “Linkage” – Contain, among others, name, address, ZIP, birth date, gender • Allowed identification of medical records of William Weld, governor of MA at that time – He was the only man in his zip code with his birth date … +More attacks! (cf. Netflix grand prize challenge!) 4

  5. One way out? Differential Privacy • A formal definition of privacy – Satisfied in systems deployed by Google, Uber, Apple, … • Will be used by 2020 census • Idea: Any information-related risk to a person should not change significantly as a result of that person’s information being included, or not, in the analysis. 5

  6. Ideal Privacy Fact. This notion of privacy is unattainable! ! Analysis Output Ideally: Should be DB w/ Stefano’s data identical! !′ Analysis Output’ DB w/o Stefano’s data 6

  7. More Realistic Privacy Goal ! Analysis Output Should be DB w/ Stefano’s data “similar” !′ Analysis Output’ DB w/o Stefano’s data 7

  8. # = mechanism Setting – Formal # # ! ∈ ℝ ! We say that ! , !′ Here, # is randomized, i.e., it w/ Stefano’s data differ at exactly one makes random choices entry # # !′ ∈ ℝ !′ w/o Stefano’s data 8

  9. Setting – Mechanism Definition. A mechanism # is & - differentially private if for all subsets* ' ⊆ ℝ , and for all databases !, !′ which differ at exactly one entry, ℙ # ! ∈ ' ≤ , - ℙ # !′ ∈ ' Dwork, McSherry, Nissim, Smith, ‘06 / / Think: & = /00 or & = /0 * Can be generalized beyond output in ℝ 9

  10. Example – Counting Queries • DB is a vector ! = 1 / , … , 1 3 where 1 / , … , 1 3 ∈ 0,1 – E.g., 1 6 = 1 if individual 7 has diseases 3 • Query: 8 ! = ∑ 6:/ 1 6 Here: DB proximity means vectors differ at one single coordinate. – 1 6 = 0 means patient does not have disease or patient data wasn’t recorded. 10

  11. A solution – Laplacian Noise “Laplacian Mechanism # taking input ! = 1 / , … , 1 3 : mechanism with 3 • Return # ! = ∑ 6:/ 1 6 + < parameter & “ Here, < follows a Laplace distribution with parameter & 1 > 1 = & 2 , @-|B| = 0.8 0.6 C < = 0 Var < = 2 0.4 & G 0.2 0 -4 -3 -2 -1 0 1 2 3 4 11

  12. Better Solution – Laplacian Noise “Laplacian Mechanism # taking input ! = 1 / , … , 1 3 : mechanism with 3 • Return # ! = ∑ 6:/ 1 6 + < parameter & “ Here, < follows a Laplace distribution with parameter & > 1 = & 2 , @-|B| = 1 0.8 Key property: For all 1, Δ 0.6 = > 1 0.4 > (1 + Δ) ≤ , -K = 0.2 0 -4 -3 -2 -1 0 1 2 3 4 12

  13. Laplacian Mechanism – Privacy Theorem. The Laplacian Mechanism with parameter & satisfies & - differential privacy 3 3 ! , !′ differ at one entry Δ = L 1′ 6 − L 1 6 Δ ≤ 1 = R 6:/ 6:/ = R′ U ℙ # ! ∈ [O, P] = ℙ R + < ∈ [O, P] = S = > 1 − R d1 T U U U > 1 − R W d1 ≤ , - S > 1 − R W d1 > 1 − R W + Δ d1 ≤ , -K S = = = S = T T T ≤ , - ℙ(# !′ ∈ O, P ) 13

  14. How Accurate is Laplacian Mechanism? 3 Let’s look at ∑ 6:/ 1 6 + < 3 3 3 • C ∑ 6:/ 1 6 + < = ∑ 6:/ 1 6 + C < = ∑ 6:/ 1 6 G 3 • Var ∑ 6:/ 1 6 + < = Var < = - X This is accurate enough for large enough Y ! 14

  15. Differential Privacy – What else can we compute? • Statistics: counts, mean, median, histograms, boxplots, etc. • Machine learning: classification, regression, clustering, distribution learning, etc. • … 15

  16. Differential Privacy – Nice Properties • Group privacy: If # is & -differentially private, then for all ' ⊆ ℝ , and for all databases !, !′ which differ at (at most) Z entries, ℙ # ! ∈ ' ≤ , [- ℙ # !′ ∈ ' • Composition: If we apply two & -DP mechanisms to data, combined output is 2& -DP. – How much can we allow & to grow? (So-called “privacy budget.”) • Post-processing: Postprocessing does not decrease privacy. 16

  17. Local Differential Privacy What if we don’t trust aggregator? Laplacian Mechanism 1 / + < / 1 / < 1 G + < G 1 G ∑ ∑ + … … 1 3 + < 3 1 3 Solution: Add noise locally! 17

  18. For a given parameter ] Example – Randomize Response Mechanism # taking input ! = 1 / , … , 1 3 : • For all 7 = 1, … , Y : / / – \ 6 = 1 6 w/ probability G + ] , and \ 6 = 1 − 1 6 w/ probability G − ] . _ ` @ a X bc – ^ 1 6 = Gc 3 • Return # ! = ∑ 6:/ 1 6 ^ S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965 18

  19. For a given parameter ] Example – Randomize Response Mechanism # taking input ! = 1 / , … , 1 3 : For all 7 = 1, … , Y : • / / – \ 6 = 1 6 w/ probability G + ] , and \ 6 = 1 − 1 6 w/ probability G − ] . _ ` @ a X bc 1 6 = ^ – Gc 3 Return # ! = ∑ 6:/ 1 6 ^ • Theorem. Randomized Response with parameter ] satisfies & -differential privacy, if ] = d e @/ d e b/ . 3 3 = ∑ 6:/ Fact 1. C # ! 1 6 Fact 2. Var # ! ≈ - X 19

  20. Differential Privacy – Challenges • Accuracy vs. privacy: How do we choose & ? – Practical applications tend to err in favor of accuracy. – See e.g. https://arxiv.org/abs/1709.02753 • Fairness: Differential privacy hides contribution of small groups, by design – How do we avoid excluding minorities? – Very hard problem! 20

  21. Literature • Cynthia Dwork and Aaron Roth. “The Algorithmic Foundations of Differential Privacy” . – https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf • https://privacytools.seas.harvard.edu/ 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend