Differential Privacy Li Xiong Outline Differential Privacy - - PowerPoint PPT Presentation

differential privacy li xiong outline differential
SMART_READER_LITE
LIVE PREVIEW

Differential Privacy Li Xiong Outline Differential Privacy - - PowerPoint PPT Presentation

CS573 Data Privacy and Security Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques Composition theorems Statistical Data Privacy Non-interactive vs interactive Privacy goal: individual is


slide-1
SLIDE 1

CS573 Data Privacy and Security Differential Privacy

Li Xiong

slide-2
SLIDE 2

Outline

  • Differential Privacy Definition
  • Basic techniques
  • Composition theorems
slide-3
SLIDE 3
slide-4
SLIDE 4

Statistical Data Privacy

  • Non-interactive vs interactive
  • Privacy goal: individual is protected
  • Utility goal: statistical information useful for analysis

Original Data Statistics/ Synthetic data Privacy Mechanism

Queries Data curator Data analyst

slide-5
SLIDE 5

Recap

  • Anonymization or de-identification (input

perturbation)

– Linkage attacks, homogeneity attacks

  • Query auditing/restriction

– Query denial is itself disclosive, computationally infeasible

  • Summary statistics

– Differencing attacks

slide-6
SLIDE 6

Differential Privacy

  • Promise: an individual will not be affected,

adversely or otherwise, by allowing his/her data to be used in any study or analysis, no matter what other studies, datasets, or information sources, are available”

  • Paradox: learning nothing about an individual

while learning useful statistical information about a population

slide-7
SLIDE 7

Differential Privacy

  • Statistical outcome is indistinguishable regardless whether a

particular user (record) is included in the data

slide-8
SLIDE 8

Differential Privacy

  • Statistical outcome is indistinguishable regardless whether a

particular user (record) is included in the data

slide-9
SLIDE 9

Original records Original histogram Perturbed histogram with differential privacy

Differential privacy: an example

slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

Differential Privacy: Some Qualitative Properties

  • Protection against presence/participation of a

single record

  • Quantification of privacy loss
  • Composition
  • Post-processing
slide-16
SLIDE 16

Differential Privacy: Additional Remarks

  • Correlations between records
  • Granularity of a single record (difference for

neighboring database)

– Group privacy – Graph database (eg social networks): node vs edge – Movie rating database: user vs event (movie)

slide-17
SLIDE 17

Outline

  • Differential Privacy Definition
  • Basic techniques

– Laplace mechanism – Exponential mechanism – Random Response

  • Composition theorems
slide-18
SLIDE 18
slide-19
SLIDE 19

Can deterministic algorithms satisfy differential privacy?

19 Tutorial: Differential Privacy in the Wild Module 2

slide-20
SLIDE 20

Non trivial deterministic algorithms do not satisfy differential privacy

Space of all inputs Space of all outputs (at least 2 distinct outputs)

20 Tutorial: Differential Privacy in the Wild Module 2

slide-21
SLIDE 21

Each input mapped to a distinct

  • utput.

Non-trivial deterministic algorithms do not satisfy differential privacy

21 Tutorial: Differential Privacy in the Wild Module 2

slide-22
SLIDE 22

Pr > 0 Pr = 0

There exist two inputs that differ in one entry mapped to different outputs.

22 Tutorial: Differential Privacy in the Wild Module 2

slide-23
SLIDE 23

Output Randomization

  • Add noise to answers such that:

– Each answer does not leak too much information about the database. – Noisy answers are close to the original answers.

Database

Researcher

Query Add noise to true answer

23 Tutorial: Differential Privacy in the Wild Module 2

slide-24
SLIDE 24

Laplace Mechanism

0.2 0.4 0.6

  • 10 -8
  • 6
  • 4
  • 2

2 4 6 8 10

Laplace Distribution – Lap(S/ε)

Database

Researcher

Query q

True answer

q(D) q(D) + η η

24 Tutorial: Differential Privacy in the Wild Module 2

[DMNS 06]

slide-25
SLIDE 25

Laplace Distribution

  • PDF:
  • Denoted as Lap(b) when u=0
  • Mean u
  • Variance 2b2
slide-26
SLIDE 26

How much noise for privacy?

Sensitivity: Consider a query q: I  R. S(q) is the smallest number s.t. for any neighboring tables D, D’, | q(D) – q(D’) | ≤ S(q) Theorem: If sensitivity of the query is S, then the algorithm A(D) = q(D) + Lap(S(q)/ε) guarantees ε- differential privacy

26 Tutorial: Differential Privacy in the Wild

[Dwork et al., TCC 2006]

Module 2

slide-27
SLIDE 27

Example: COUNT query

  • Number of people having disease
  • Sensitivity = 1
  • Solution: 3 + η,

where η is drawn from Lap(1/ε)

– Mean = 0 – Variance = 2/ε2

Tutorial: Differential Privacy in the Wild 27

Disease (Y/N) Y Y N Y N N D

Module 2

slide-28
SLIDE 28

Example: SUM query

  • Suppose all values x are in [a,b]
  • Sensitivity = b

Tutorial: Differential Privacy in the Wild 28 Module 2

slide-29
SLIDE 29

Privacy of Laplace Mechanism

  • Consider neighboring databases D and D’
  • Consider some output O

Tutorial: Differential Privacy in the Wild 29 Module 2

slide-30
SLIDE 30

Utility of Laplace Mechanism

  • Laplace mechanism works for any function

that returns a real number

  • Error: E(true answer – noisy answer)2

= Var( Lap(S(q)/ε) ) = 2*S(q)2 / ε2

  • Error bound: very unlikely the result has an

error greater than a factor (Roth book Theorem 3.8)

Tutorial: Differential Privacy in the Wild 30 Module 2

slide-31
SLIDE 31

Outline

  • Differential Privacy Definition
  • Basic techniques

– Laplace mechanism – Exponential mechanism – Random Response

  • Composition theorems
slide-32
SLIDE 32

Exponential Mechanism

  • For functions that do not return a real number

– “what is the most common nationality in this room”: Chinese/Indian/American…

  • When perturbation leads to invalid outputs …

– To ensure integrality/non-negativity of output

Module 2 Tutorial: Differential Privacy in the Wild 32

slide-33
SLIDE 33

Exponential Mechanism

Consider some function f (can be deterministic or probabilistic): How to construct a differentially private version of f?

Tutorial: Differential Privacy in the Wild 33

Inputs Outputs

Module 2

[MT 07]

slide-34
SLIDE 34

Exponential Mechanism

Theorem For a database D, output space R and a utility score function u : D×R → R, the algorithm A Pr[A(D) = r] ∝ exp (ε × u(D, r)/ 2Δu) satisfies ε-differential privacy, where Δu is the sensitivity of the utility score function Δu = max r & D,D’ |u(D, r) - u(D’, r)|

slide-35
SLIDE 35

Example: Exponential Mechanism

  • Scoring/utility function w: Inputs x Outputs 

R

  • D: nationalities of a set of people
  • f(D) : most frequent nationality in D
  • u (D, O) = #(D, O) the number of people with

nationality O

Tutorial: Differential Privacy in the Wild 35 Module 2

slide-36
SLIDE 36

Privacy of Exponential Mechanism

The exponential mechanism outputs an element r with probability Pr[A(D) = r] ∝ exp (ε × u(D, r)/ 2Δu) Δu = max r & D,D’ |u(D, r) - u(D’, r)| Approximately Pr[A(D) = r] /Pr[A(D’) = r] <= ε

(Exact proof with normalization factor: Roth Book page 39)

slide-37
SLIDE 37

Privacy of Exponential Mechanism

slide-38
SLIDE 38

Utility of Exponential Mechanism

  • Can give strong utility guarantees, as it

discounts outcomes exponentially based on utility score

  • Highly unlikely that returned element r has a

utility score inferior to maxr u(D,r) by an additive factor of (Theorem 3.11 Roth book)

slide-39
SLIDE 39

Outline

  • Differential Privacy Definition
  • Basic techniques

– Laplace mechanism – Exponential mechanism – Random Response

  • Composition theorems
slide-40
SLIDE 40

Randomized Response (a.k.a. local randomization)

Disease (Y/N) Y Y N Y N N

Tutorial: Differential Privacy in the Wild 40

With probability p, Report true value With probability 1-p, Report flipped value Disease (Y/N) Y N N N Y N D O

Module 2

[W 65]

slide-41
SLIDE 41

Differential Privacy Analysis

  • Consider 2 databases D, D’ (of size M) that

differ in the jth value

– D[j] ≠ D’[j]. But, D[i] = D’[i], for all i ≠ j

  • Consider some output O

Tutorial: Differential Privacy in the Wild 41 Module 2

slide-42
SLIDE 42

Utility Analysis

  • Suppose n1 out of n people replied “yes”, and rest said “no”
  • What is the best estimate for π = fraction of people with

disease = Y?

πhat = {n1/n – (1-p)}/(2p-1)

  • E(πhat) = π
  • Var(π hat) =

Tutorial: Differential Privacy in the Wild 42

Sampling Variance due to coin flips

Module 2

slide-43
SLIDE 43

Laplace Mechanism vs Randomized Response

Privacy

  • Provide the same ε-differential privacy guarantee
  • Laplace mechanism assumes data collector is trusted
  • Randomized Response does not require data

collector to be trusted

– Also called a Local Algorithm, since each record is perturbed

Module 2 Tutorial: Differential Privacy in the Wild 43

slide-44
SLIDE 44

Laplace Mechanism vs Randomized Response

Utility

  • Suppose a database with N records where μN

records have disease = Y.

  • Query: # rows with Disease=Y
  • Std dev of Laplace mechanism answer: O(1/ε)
  • Std dev of Randomized Response answer: O(√N)

Module 2 Tutorial: Differential Privacy in the Wild 44

slide-45
SLIDE 45

Outline

  • Differential Privacy
  • Basic Algorithms

– Laplace – Exponential Mechanism – Randomized Response

  • Composition Theorems

Tutorial: Differential Privacy in the Wild 45 Module 2

slide-46
SLIDE 46

Why Composition?

  • Reasoning about privacy of

a complex algorithm is hard.

  • Helps software design

– If building blocks are proven to be private, it would be easy to reason about privacy of a complex algorithm built entirely using these building blocks.

Module 2 Tutorial: Differential Privacy in the Wild 46

slide-47
SLIDE 47

A bound on the number of queries

  • In order to ensure utility, a statistical database

must leak some information about each individual

  • We can only hope to bound the

amount of disclosure

  • Hence, there is a limit on number of

queries that can be answered

Module 2 Tutorial: Differential Privacy in the Wild 47

slide-48
SLIDE 48

Composition theorems

Sequential composition ∑iεi –differential privacy Parallel composition max(εi)–differential privacy

slide-49
SLIDE 49

Sequential Composition

  • If M1, M2, ..., Mk are algorithms that access a private

database D such that each Mi satisfies εi -differential privacy, then the combination of their outputs satisfies ε-differential privacy with ε=ε1+...+εk

Module 2 Tutorial: Differential Privacy in the Wild 49

slide-50
SLIDE 50

Parallel Composition

  • If M1, M2, ..., Mk are algorithms that access

disjoint databases D1, D2, …, Dk such that each Mi satisfies εi -differential privacy, then the combination of their outputs satisfies ε-differential privacy with ε= max{ε1,...,εk}

Module 2 Tutorial: Differential Privacy in the Wild 50

slide-51
SLIDE 51

Postprocessing

  • If M1 is an ε differentially private algorithm

that accesses a private database D, then outputting M2(M1(D)) also satisfies ε- differential privacy.

Module 2 Tutorial: Differential Privacy in the Wild 51

slide-52
SLIDE 52

Summary

  • Differential privacy ensure an attacker can’t

infer the presence or absence of a single record in the input based on any output.

  • Building blocks

– Laplace, exponential mechanism, (local) randomized response

  • Composition rules help build complex

algorithms using building blocks

Module 2 Tutorial: Differential Privacy in the Wild 52

slide-53
SLIDE 53

Case Study: K-means Clustering

Tutorial: Differential Privacy in the Wild 53 Module 2

slide-54
SLIDE 54

Kmeans

  • Partition a set of points x1, x2, …, xn into k

clusters S1, S2, …, Sk such that the following is minimized:

Tutorial: Differential Privacy in the Wild 54

Mean of the cluster Si

Module 2

slide-55
SLIDE 55

Kmeans

Algorithm:

  • Initialize a set of k centers
  • Repeat

Assign each point to its nearest center Recompute the set of centers Until convergence …

  • Output final set of k centers

Tutorial: Differential Privacy in the Wild 55 Module 2

slide-56
SLIDE 56

Differentially Private Kmeans

  • Suppose we fix the number of iterations to T
  • In each iteration (given a set of centers):
  • 1. Assign the points to the new center to form clusters
  • 2. Noisily compute the size of each cluster
  • 3. Compute noisy sums of points in each cluster

Tutorial: Differential Privacy in the Wild 56 Module 2

[BDMN 05]

slide-57
SLIDE 57

Differentially Private Kmeans

  • Suppose we fix the number of iterations to T
  • In each iteration (given a set of centers):
  • 1. Assign the points to the new center to form clusters
  • 2. Noisily compute the size of each cluster
  • 3. Compute noisy sums of points in each cluster

Tutorial: Differential Privacy in the Wild 57 Module 2

Each iteration uses ε/T privacy budget, total privacy loss is ε

slide-58
SLIDE 58

Differentially Private Kmeans

  • Suppose we fix the number of iterations to T
  • In each iteration (given a set of centers):
  • 1. Assign the points to the new center to form clusters
  • 2. Noisily compute the size of each cluster
  • 3. Compute noisy sums of points in each cluster

Tutorial: Differential Privacy in the Wild 58 Module 2

Exercise: Which of these steps expends privacy budget?

slide-59
SLIDE 59

Differentially Private Kmeans

  • Suppose we fix the number of iterations to T
  • In each iteration (given a set of centers):
  • 1. Assign the points to the new center to form clusters
  • 2. Noisily compute the size of each cluster
  • 3. Compute noisy sums of points in each cluster

Tutorial: Differential Privacy in the Wild 59 Module 2

Exercise: Which of these steps expends privacy budget? NO YES YES

slide-60
SLIDE 60

Differentially Private Kmeans

  • Suppose we fix the number of iterations to T
  • In each iteration (given a set of centers):
  • 1. Assign the points to the new center to form clusters
  • 2. Noisily compute the size of each cluster
  • 3. Compute noisy sums of points in each cluster

Tutorial: Differential Privacy in the Wild 60 Module 2

1 Domain size What is the sensitivity?

slide-61
SLIDE 61

Differentially Private Kmeans

  • Suppose we fix the number of iterations to T
  • In each iteration (given a set of centers):
  • 1. Assign the points to the new center to form clusters
  • 2. Noisily compute the size of each cluster
  • 3. Compute noisy sums of points in each cluster

Tutorial: Differential Privacy in the Wild 61 Module 2

Each iteration uses ε/T privacy budget, total privacy loss is ε

Laplace(2T/ε) Laplace(2T |dom|/ε)

slide-62
SLIDE 62

Results (T = 10 iterations, random initialization)

Tutorial: Differential Privacy in the Wild 62

Original Kmeans algorithm Laplace Kmeans algorithm

  • Even though we noisily compute centers, Laplace kmeans can distinguish

clusters that are far apart.

  • Since we add noise to the sums with sensitivity proportional to |dom|,

Laplace k-means can’t distinguish small clusters that are close by.

Module 2

slide-63
SLIDE 63

Privacy as Constrained Optimization

  • Three axes

– Privacy – Error – Queries that can be answered

  • E.g.: Given a fixed set of queries and privacy

budget ε, what is the minimum error that can be achieved?

  • E.g.: Given a task and privacy budget ε, how to

design a set of queries (functions) and allocate the budget such that the error is minimized?

Module 2 Tutorial: Differential Privacy in the Wild 63

slide-64
SLIDE 64

References

[W65] Warner, “Randomized Response” JASA 1965 [DN03] Dinur, Nissim, “Revealing information while preserving privacy”, PODS 2003 [BDMN05] Blum, Dwork, McSherry, Nissim, “Practical privacy: the SuLQ framework”, PODS 2005 [D06] Dwork, “Differential Privacy”, ICALP 2006 [DMNS06] Dwork, McSherry, Nissim, Smith, “Calibrating noise to sensitivity in private data analysis”, TCC 2006 [MT07] McSherry, Talwar, “Mechanism Design via Differential Privacy”, FOCS 2007

Module 2 Tutorial: Differential Privacy in the Wild 64