Foundations of Computing II Lecture 26: Applications Differential - - PowerPoint PPT Presentation

foundations of computing ii
SMART_READER_LITE
LIVE PREVIEW

Foundations of Computing II Lecture 26: Applications Differential - - PowerPoint PPT Presentation

CSE 312 Foundations of Computing II Lecture 26: Applications Differential Privacy Stefano Tessaro tessaro@cs.washington.edu 1 Setting Data mining Statistical queries Medical data Query logs Social network data 2 Setting


slide-1
SLIDE 1

CSE 312

Foundations of Computing II

Lecture 26: Applications – Differential Privacy

Stefano Tessaro

tessaro@cs.washington.edu

1

slide-2
SLIDE 2

Setting

2

Data mining Statistical queries

Medical data Query logs Social network data …

slide-3
SLIDE 3

Setting – Data Release

3

Main concern: Do not violate user privacy!

Internet

Publish: Aggregated data, e.g., outcome of medical study, research paper, …

slide-4
SLIDE 4

Example – Linkage Attack

  • The Commonwealth of Massachusetts Group Insurance

Commission (GIC) releases 135,000 records of patient encounters, each with 100 attributes

– Relevant attributes removed, but ZIP, birth date, gender available – Considered “safe” practice

  • Public voter registration record

– Contain, among others, name, address, ZIP, birth date, gender

  • Allowed identification of medical records of William Weld,

governor of MA at that time

– He was the only man in his zip code with his birth date …

4

[Sweeney ‘00] “Linkage” +More attacks! (cf. Netflix grand prize challenge!)

slide-5
SLIDE 5

One way out? Differential Privacy

  • A formal definition of privacy

– Satisfied in systems deployed by Google, Uber, Apple, …

  • Will be used by 2020 census
  • Idea: Any information-related risk to a person should not

change significantly as a result of that person’s information being included, or not, in the analysis.

5

slide-6
SLIDE 6

Ideal Privacy

6

!

Analysis

Output

!′

Analysis

DB w/o Stefano’s data Output’ DB w/ Stefano’s data Ideally: Should be identical!

  • Fact. This notion of

privacy is unattainable!

slide-7
SLIDE 7

More Realistic Privacy Goal

7

!

Analysis

Output

!′

Analysis

DB w/o Stefano’s data Output’ DB w/ Stefano’s data Should be “similar”

slide-8
SLIDE 8

Setting – Formal

8

! # !′ #

w/o Stefano’s data w/ Stefano’s data

# ! ∈ ℝ # !′ ∈ ℝ

We say that !, !′ differ at exactly one entry

# = mechanism

Here, # is randomized, i.e., it makes random choices

slide-9
SLIDE 9

Setting – Mechanism

9

  • Definition. A mechanism # is &-differentially private if for all subsets*

' ⊆ ℝ, and for all databases !, !′ which differ at exactly one entry, ℙ # ! ∈ ' ≤ ,- ℙ # !′ ∈ '

Dwork, McSherry, Nissim, Smith, ‘06

Think: & =

/ /00 or & = / /0

* Can be generalized beyond output in ℝ

slide-10
SLIDE 10

Example – Counting Queries

  • DB is a vector ! = 1/, … , 13 where 1/, … , 13 ∈ 0,1

– E.g., 16 = 1 if individual 7 has diseases

  • Query: 8 ! = ∑6:/

3

16

Here: DB proximity means vectors differ at one single coordinate.

– 16 = 0 means patient does not have disease or patient data wasn’t recorded.

10

slide-11
SLIDE 11

A solution – Laplacian Noise

11

Mechanism # taking input ! = 1/, … , 13 :

  • Return # ! = ∑6:/

3

16 + < Here, < follows a Laplace distribution with parameter &

0.2 0.4 0.6 0.8 1

  • 4
  • 3
  • 2
  • 1

1 2 3 4

=

> 1 = &

2 ,@-|B|

“Laplacian mechanism with parameter &“

C < = 0 Var < = 2 &G

slide-12
SLIDE 12

Better Solution – Laplacian Noise

12

Mechanism # taking input ! = 1/, … , 13 :

  • Return # ! = ∑6:/

3

16 + < Here, < follows a Laplace distribution with parameter &

0.2 0.4 0.6 0.8 1

  • 4
  • 3
  • 2
  • 1

1 2 3 4

=

> 1 = &

2 ,@-|B|

“Laplacian mechanism with parameter &“ Key property: For all 1, Δ =

> 1

=

>(1 + Δ) ≤ ,-K

slide-13
SLIDE 13

Laplacian Mechanism – Privacy

13

  • Theorem. The Laplacian Mechanism with parameter & satisfies &-

differential privacy !, !′ differ at one entry

Δ = L

6:/ 3

1′6 − L

6:/ 3

16

Δ ≤ 1

ℙ # ! ∈ [O, P] = ℙ R + < ∈ [O, P] = S

T U

=

> 1 − R d1

= S

T U

=

> 1 − RW + Δ d1 ≤ ,-K S T U

=

> 1 − RW d1 ≤ ,- S T U

=

> 1 − RW d1

≤ ,-ℙ(# !′ ∈ O, P ) = R = R′

slide-14
SLIDE 14

How Accurate is Laplacian Mechanism? Let’s look at ∑6:/

3

16 + <

  • C ∑6:/

3

16 + < = ∑6:/

3

16 + C < = ∑6:/

3

16

  • Var ∑6:/

3

16 + < = Var < =

G

  • X

This is accurate enough for large enough Y!

14

slide-15
SLIDE 15

Differential Privacy – What else can we compute?

  • Statistics: counts, mean, median, histograms, boxplots, etc.
  • Machine learning: classification, regression, clustering, distribution

learning, etc.

15

slide-16
SLIDE 16

Differential Privacy – Nice Properties

  • Group privacy: If # is &-differentially private, then for all ' ⊆ ℝ, and

for all databases !, !′ which differ at (at most) Z entries, ℙ # ! ∈ ' ≤ ,[- ℙ # !′ ∈ '

  • Composition: If we apply two &-DP mechanisms to data, combined
  • utput is 2&-DP.

– How much can we allow & to grow? (So-called “privacy budget.”)

  • Post-processing: Postprocessing does not decrease privacy.

16

slide-17
SLIDE 17

Local Differential Privacy

17

Laplacian Mechanism

What if we don’t trust aggregator?

1/ 1G 13

… + <

1/ + <

/

1G + <

G

13 + <

3

Solution: Add noise locally!

slide-18
SLIDE 18

Example – Randomize Response Mechanism # taking input ! = 1/, … , 13 :

  • For all 7 = 1, … , Y:

– \6 = 16 w/ probability

/ G + ], and \6 = 1 − 16 w/ probability / G − ].

– ^ 16 =

_`@a

Xbc

Gc

  • Return # ! = ∑6:/

3

^ 16

18

For a given parameter ]

  • S. L. Warner. Randomized response: A survey technique for

eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965

slide-19
SLIDE 19

Example – Randomize Response

Mechanism # taking input ! = 1/, … , 13 :

  • For all 7 = 1, … , Y:

– \6 = 16 w/ probability

/ G + ], and \6 = 1 − 16 w/ probability / G − ].

– ^ 16 =

_`@a

Xbc

Gc

  • Return # ! = ∑6:/

3

^ 16

19

For a given parameter ]

  • Theorem. Randomized Response with parameter ] satisfies &-differential

privacy, if ] = de@/

deb/ .

Fact 1. C # ! = ∑6:/

3

16 Fact 2. Var # ! ≈

3

  • X
slide-20
SLIDE 20

Differential Privacy – Challenges

  • Accuracy vs. privacy: How do we choose &?

– Practical applications tend to err in favor of accuracy. – See e.g. https://arxiv.org/abs/1709.02753

  • Fairness: Differential privacy hides contribution of small

groups, by design

– How do we avoid excluding minorities? – Very hard problem!

20

slide-21
SLIDE 21

Literature

  • Cynthia Dwork and Aaron Roth. “The Algorithmic

Foundations of Differential Privacy”.

– https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

  • https://privacytools.seas.harvard.edu/

21