differential privacy li xiong outline differential
play

Differential Privacy Li Xiong Outline Differential Privacy - PowerPoint PPT Presentation

CS573 Data Privacy and Security Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques Composition theorems Statistical Data Privacy Non-interactive vs interactive Privacy goal: individual is


  1. CS573 Data Privacy and Security Differential Privacy Li Xiong

  2. Outline • Differential Privacy Definition • Basic techniques • Composition theorems

  3. Statistical Data Privacy • Non-interactive vs interactive • Privacy goal: individual is protected • Utility goal: statistical information useful for analysis Queries Privacy Original Mechanism Data Statistics/ Data Synthetic analyst data Data curator

  4. Recap • Anonymization or de-identification (input perturbation) – Linkage attacks, homogeneity attacks • Query auditing/restriction – Query denial is itself disclosive, computationally infeasible • Summary statistics – Differencing attacks

  5. Differential Privacy • Promise: an individual will not be affected, adversely or otherwise, by allowing his/her data to be used in any study or analysis, no matter what other studies, datasets, or information sources, are available ” • Paradox: learning nothing about an individual while learning useful statistical information about a population

  6. Differential Privacy • Statistical outcome is indistinguishable regardless whether a particular user (record) is included in the data

  7. Differential Privacy • Statistical outcome is indistinguishable regardless whether a particular user (record) is included in the data

  8. Differential privacy: an example Perturbed histogram Original records Original histogram with differential privacy

  9. Differential Privacy: Some Qualitative Properties • Protection against presence/participation of a single record • Quantification of privacy loss • Composition • Post-processing

  10. Differential Privacy: Additional Remarks • Correlations between records • Granularity of a single record (difference for neighboring database) – Group privacy – Graph database (eg social networks): node vs edge – Movie rating database: user vs event (movie)

  11. Outline • Differential Privacy Definition • Basic techniques – Laplace mechanism – Exponential mechanism – Random Response • Composition theorems

  12. Can deterministic algorithms satisfy differential privacy? Module 2 Tutorial: Differential Privacy in the Wild 19

  13. Non trivial deterministic algorithms do not satisfy differential privacy Space of all inputs Space of all outputs (at least 2 distinct outputs) Module 2 Tutorial: Differential Privacy in the Wild 20

  14. Non-trivial deterministic algorithms do not satisfy differential privacy Each input mapped to a distinct output. Module 2 Tutorial: Differential Privacy in the Wild 21

  15. There exist two inputs that differ in one entry mapped to different outputs. Pr > 0 Pr = 0 Module 2 Tutorial: Differential Privacy in the Wild 22

  16. Output Randomization Query Database Add noise to true answer Researcher • Add noise to answers such that: – Each answer does not leak too much information about the database. – Noisy answers are close to the original answers. Module 2 Tutorial: Differential Privacy in the Wild 23

  17. [DMNS 06] Laplace Mechanism Query q Database True answer q(D) + η q(D) Researcher η Laplace Distribution – Lap(S/ ε ) 0.6 0.4 0.2 0 Module 2 Tutorial: Differential Privacy in the Wild 24 -10 -8 -6 -4 -2 0 2 4 6 8 10

  18. Laplace Distribution • PDF: • Denoted as Lap(b) when u=0 • Mean u • Variance 2b 2

  19. How much noise for privacy? [Dwork et al., TCC 2006] Sensitivity : Consider a query q: I  R. S(q) is the smallest number s.t. for any neighboring tables D, D’, | q(D) – q (D’ ) | ≤ S(q) Theorem : If sensitivity of the query is S , then the algorithm A(D) = q(D) + Lap(S(q)/ ε ) guarantees ε - differential privacy Module 2 Tutorial: Differential Privacy in the Wild 26

  20. Example: COUNT query D Disease • Number of people having disease (Y/N) Y • Sensitivity = 1 Y N • Solution: 3 + η , Y where η is drawn from Lap(1/ ε ) N – Mean = 0 – Variance = 2/ ε 2 N Module 2 Tutorial: Differential Privacy in the Wild 27

  21. Example: SUM query • Suppose all values x are in [a,b] • Sensitivity = b Module 2 Tutorial: Differential Privacy in the Wild 28

  22. Privacy of Laplace Mechanism • Consider neighboring databases D and D’ • Consider some output O Module 2 Tutorial: Differential Privacy in the Wild 29

  23. Utility of Laplace Mechanism • Laplace mechanism works for any function that returns a real number • Error: E(true answer – noisy answer) 2 = Var( Lap(S(q)/ ε ) ) = 2*S(q) 2 / ε 2 • Error bound: very unlikely the result has an error greater than a factor (Roth book Theorem 3.8) Module 2 Tutorial: Differential Privacy in the Wild 30

  24. Outline • Differential Privacy Definition • Basic techniques – Laplace mechanism – Exponential mechanism – Random Response • Composition theorems

  25. Exponential Mechanism • For functions that do not return a real number … – “what is the most common nationality in this room”: Chinese/Indian/American… • When perturbation leads to invalid outputs … – To ensure integrality/non-negativity of output Module 2 Tutorial: Differential Privacy in the Wild 32

  26. [MT 07] Exponential Mechanism Consider some function f (can be deterministic or probabilistic): Inputs Outputs How to construct a differentially private version of f? Module 2 Tutorial: Differential Privacy in the Wild 33

  27. Exponential Mechanism Theorem For a database D, output space R and a utility score function u : D× R → R , the algorithm A Pr[ A ( D ) = r ] ∝ exp ( ε × u ( D, r )/ 2Δ u ) satisfies ε -differential privacy, where Δ u is the sensitivity of the utility score function Δ u = max r & D,D’ | u ( D, r ) - u ( D’, r )|

  28. Example: Exponential Mechanism • Scoring/utility function w: Inputs x Outputs  R • D: nationalities of a set of people • f(D) : most frequent nationality in D • u (D, O) = #(D, O) the number of people with nationality O Module 2 Tutorial: Differential Privacy in the Wild 35

  29. Privacy of Exponential Mechanism The exponential mechanism outputs an element r with probability Pr[ A ( D ) = r ] ∝ exp ( ε × u ( D, r )/ 2Δ u ) Δ u = max r & D,D’ | u ( D, r ) - u ( D’, r )| Approximately Pr[ A ( D ) = r ] /Pr[ A ( D’ ) = r ] <= ε (Exact proof with normalization factor: Roth Book page 39)

  30. Privacy of Exponential Mechanism

  31. Utility of Exponential Mechanism • Can give strong utility guarantees, as it discounts outcomes exponentially based on utility score • Highly unlikely that returned element r has a utility score inferior to max r u(D,r) by an additive factor of (Theorem 3.11 Roth book)

  32. Outline • Differential Privacy Definition • Basic techniques – Laplace mechanism – Exponential mechanism – Random Response • Composition theorems

  33. [W 65] Randomized Response (a.k.a. local randomization) D O Disease Disease (Y/N) (Y/N) Y Y With probability p, Report true value Y N With probability 1-p, Report flipped value N N Y N N Y N N Module 2 Tutorial: Differential Privacy in the Wild 40

  34. Differential Privacy Analysis • Consider 2 databases D, D’ (of size M) that differ in the j th value – D[j] ≠ D’[j]. But, D[ i ] = D’[ i], for all i ≠ j • Consider some output O Module 2 Tutorial: Differential Privacy in the Wild 41

  35. Utility Analysis • Suppose n1 out of n people replied “yes”, and rest said “no” • What is the best estimate for π = fraction of people with disease = Y? π hat = {n1/n – (1-p)}/(2p-1) • E( π hat ) = π • Var( π hat ) = Sampling Variance due to coin flips Module 2 Tutorial: Differential Privacy in the Wild 42

  36. Laplace Mechanism vs Randomized Response Privacy • Provide the same ε -differential privacy guarantee • Laplace mechanism assumes data collector is trusted • Randomized Response does not require data collector to be trusted – Also called a Local Algorithm, since each record is perturbed Module 2 Tutorial: Differential Privacy in the Wild 43

  37. Laplace Mechanism vs Randomized Response Utility • Suppose a database with N records where μN records have disease = Y. • Query: # rows with Disease=Y • Std dev of Laplace mechanism answer: O(1/ ε ) • Std dev of Randomized Response answer: O(√N) Module 2 Tutorial: Differential Privacy in the Wild 44

  38. Outline • Differential Privacy • Basic Algorithms – Laplace – Exponential Mechanism – Randomized Response • Composition Theorems Module 2 Tutorial: Differential Privacy in the Wild 45

  39. Why Composition? • Reasoning about privacy of a complex algorithm is hard. • Helps software design – If building blocks are proven to be private, it would be easy to reason about privacy of a complex algorithm built entirely using these building blocks. Module 2 Tutorial: Differential Privacy in the Wild 46

  40. A bound on the number of queries • In order to ensure utility, a statistical database must leak some information about each individual • We can only hope to bound the amount of disclosure • Hence, there is a limit on number of queries that can be answered Module 2 Tutorial: Differential Privacy in the Wild 47

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend