New Directions in Privacy- preserving Machine Learning Kamalika - - PowerPoint PPT Presentation

new directions in privacy preserving machine learning
SMART_READER_LITE
LIVE PREVIEW

New Directions in Privacy- preserving Machine Learning Kamalika - - PowerPoint PPT Presentation

New Directions in Privacy- preserving Machine Learning Kamalika Chaudhuri University of California, San Diego Sensitive Data Medical Records Genetic Data Search Logs AOL Violates Privacy AOL Violates Privacy Netflix Violates Privacy [NS08]


slide-1
SLIDE 1

New Directions in Privacy- preserving Machine Learning

Kamalika Chaudhuri University of California, San Diego

slide-2
SLIDE 2

Sensitive Data

Medical Records Genetic Data Search Logs

slide-3
SLIDE 3

AOL Violates Privacy

slide-4
SLIDE 4

AOL Violates Privacy

slide-5
SLIDE 5

Netflix Violates Privacy [NS08]

User%1% User%2% User%3% Movies%

2-8 movie-ratings and dates for Alice reveals: Whether Alice is in the dataset or not Alice’s other movie ratings

slide-6
SLIDE 6

High-dimensional Data is Unique

Example: UCSD Employee Salary Table

One employee (Kamalika) fits description!

Faculty Position Gender Department Ethnicity

  • Salary

Female CSE SE Asian

slide-7
SLIDE 7

Simply anonymizing data is unsafe!

slide-8
SLIDE 8

Disease Association Studies [WLWTZ09]

Cancer Healthy Correlations Correlations Correlation (R2 values), Alice’s DNA reveals: If Alice is in the Cancer set or Healthy set

slide-9
SLIDE 9

Simply anonymizing data is unsafe! Statistics on small data sets is unsafe!

Privacy Accuracy Data Size

slide-10
SLIDE 10

Correlated Data

User information in social networks Physical Activity Monitoring

slide-11
SLIDE 11

Why is Privacy Hard for Correlated Data?

Neighbor’s information leaks information on user

slide-12
SLIDE 12

How do we learn from sensitive data while still preserving privacy?

Talk Agenda:

New Directions:

  • 1. Privacy-preserving Bayesian Learning
  • 2. Privacy-preserving statistics on correlated data
slide-13
SLIDE 13

Talk Agenda:

  • 1. Privacy for Uncorrelated Data
  • How to define privacy
slide-14
SLIDE 14

Differential Privacy [DMNS06]

“similar”

Randomized Algorithm Randomized Algorithm

Data + Data +

Participation of a single person does not change output

slide-15
SLIDE 15

Differential Privacy: Attacker’s View

Prior Knowledge + Algorithm Output on Data & = Conclusion

  • n

Prior Knowledge + Algorithm Output on Data & = Conclusion

  • n
slide-16
SLIDE 16

Differential Privacy [DMNS06]

For all D1, D2 that differ in one person’s value, any set S,

S

D1 D2

Pr[A(D1) in S] Pr[A(D2) in S]

If A = -private randomized algorithm, then:

Pr(A(D1) ∈ S) ≤ e✏ Pr(A(D2) ∈ S)

slide-17
SLIDE 17

Differential Privacy

  • 1. Provably strong notion of privacy
  • 2. Good approximations for many functions

e.g, means, histograms, etc.

slide-18
SLIDE 18

Interpretation: Attacker’s Hypothesis Test [WZ10, OV13]

Failure Events: False Alarm (FA), Missed Detection (MD) H0: Input to the algorithm = Data + H1: Input to the algorithm = Data +

slide-19
SLIDE 19

Interpretation: Attacker’s Hypothesis Test [WZ10, OV13]

(1, 0) (0, 1)

✓ 1 1 + e✏ , 1 1 + e✏ ◆

If algorithm is ✏-DP

Pr(FA) + e✏ Pr(MD) ≥ 1 e✏ Pr(FA) + Pr(MD) ≥ 1 FA = False Alarm MD = Missed Detection

slide-20
SLIDE 20

Talk Agenda:

  • 1. Privacy for Uncorrelated Data
  • How to define privacy
  • Privacy-preserving Learning
slide-21
SLIDE 21

Example 1: Flu Test

Predicts flu or not, based on patient symptoms Trained on sensitive patient data

slide-22
SLIDE 22

Example 2: Clustering Abortion Data

Given data on abortion locations, cluster by location while preserving privacy of individuals

slide-23
SLIDE 23

Bayesian Learning

slide-24
SLIDE 24

Bayesian Learning

Data X = { x1, x2, … } Model Class Θ likelihood p(x|θ) Related through

}

slide-25
SLIDE 25

Bayesian Learning

Data X = { x1, x2, … } Model Class Θ + Prior π(θ) likelihood p(x|θ) Related through

}

slide-26
SLIDE 26

Bayesian Learning

Data X = { x1, x2, … } Model Class Θ + Prior π(θ) Data X likelihood p(x|θ) Related through

}

slide-27
SLIDE 27

Bayesian Learning

Data X = { x1, x2, … } Model Class Θ + Prior π(θ) Data X = Posterior p(θ|X) likelihood p(x|θ) Related through

}

slide-28
SLIDE 28

Bayesian Learning

Data X = { x1, x2, … } Model Class Θ + Prior π(θ) Data X = Posterior p(θ|X) likelihood p(x|θ) Related through

}

Goal: Output posterior (approx. or samples)

slide-29
SLIDE 29

Example: Coin tosses

X = { H, T, H, H… } likelihood:

Θ = [0, 1]

p(x|θ) = θx(1 − θ)1−x

slide-30
SLIDE 30

Example: Coin tosses

X = { H, T, H, H… } likelihood:

Θ = [0, 1]

p(x|θ) = θx(1 − θ)1−x

+ Prior

π(θ) = 1

slide-31
SLIDE 31

Example: Coin tosses

X = { H, T, H, H… } likelihood:

Θ = [0, 1]

p(x|θ) = θx(1 − θ)1−x

+ Prior Data X

π(θ) = 1

(h H, t T)

slide-32
SLIDE 32

Example: Coin tosses

X = { H, T, H, H… } likelihood:

Θ = [0, 1]

p(x|θ) = θx(1 − θ)1−x

+ Prior Data X = Posterior

π(θ) = 1

(h H, t T)

p(θ|x) ∝ θh(1 − θ)t

slide-33
SLIDE 33

Example: Coin tosses

X = { H, T, H, H… } likelihood:

Θ = [0, 1]

p(x|θ) = θx(1 − θ)1−x

+ Prior Data X = Posterior

π(θ) = 1

(h H, t T)

p(θ|x) ∝ θh(1 − θ)t

In general, is more complex (classifiers, etc)

θ

slide-34
SLIDE 34

Private Bayesian Learning

Data X = { x1, x2, … } Model Class Θ + Prior π(θ) Data X = Posterior p(θ|X) likelihood p(x|θ) Related through

}

slide-35
SLIDE 35

Private Bayesian Learning

Data X = { x1, x2, … } Model Class Θ + Prior π(θ) Data X = Posterior p(θ|X) likelihood p(x|θ) Related through

}

Goal: Output private approx. to posterior

slide-36
SLIDE 36

How to make posterior private?

Option 1: Direct posterior sampling [Detal14] Not private unless under restrictive conditions p(θ|D)

p(θ|D0)

slide-37
SLIDE 37

How to make posterior private?

Option 2: Sample from truncated posterior at high temperature [WFS15] Disadvantage: Intractable - technically privacy only on convergence Needs more data/subjects

slide-38
SLIDE 38

Our Work: Exponential Families

Exponential family distributions: p(x|θ) = h(x)eθ>T (x)−A(θ) where T is a sufficient statistic Includes many common distributions like Gaussians, Binomials, Dirichlets, Betas, etc

slide-39
SLIDE 39

Properties of Exponential Families

Exponential families have conjugate priors

+

Prior π(θ) Data X

=

Posterior p(θ|X) eg, Gaussians-Gaussians, Beta-Binomial, etc is in the same distribution class as p(θ|x) π(θ)

slide-40
SLIDE 40

Sampling from Exponential Families

(Non-private) posterior comes from exp. family: given data x1, x2, … Private Sampling:

  • 1. If T is bounded, add noise to to get private

version T’

X

i

T(xi)

  • 2. Sample from the perturbed posterior:

p(θ|x) ∝ eη(θ)>(P

i T (xi))−B(θ)

p(θ|x) ∝ eη(θ)>T 0−B(θ)

slide-41
SLIDE 41

Performance

  • Theoretical Guarantees
  • Experiments
slide-42
SLIDE 42

Theoretical Guarantees

Performance Measure: Asymptotic Relative Efficiency (Lower = more sample efficient for large n) Non-private: Our Method: [WFS15]: 2 2

max(2, 1 + 1/✏)

slide-43
SLIDE 43

Experiments - Task

Task: Time series clustering of events in Wikileaks war logs while preserving event-level privacy Data: War-log entries - Afghanistan (75K), Iraq (390K) Goal: Cluster entries in each region based on features (casualty counts, enemy/friendly fire, explosive hazards, etc…)

slide-44
SLIDE 44

Experiments - Model

Hidden Markov Model for each region Discrete states (ht) and observations (xt) Hidden state Observed features … … ht xt Transition parameters T: Tij = P(ht+1 = i | ht = j) Emission parameters O: where Oij = P(xt = i | ht = j) Goal: Sample from posterior P(O| data) (in the exponential family)

slide-45
SLIDE 45

Experiments - Results

10

−1

10 10

1

−5 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 x 10

5

Epsilon (total) Test−set log−likelihood Non−private HMM Non−private naive Bayes Laplace mechanism HMM OPS HMM (truncation multiplier = 100)

Iraq

10

−1

10 10

1

−8.5 −8 −7.5 −7 −6.5 −6 −5.5 −5 −4.5 −4 −3.5 x 10

4

Epsilon (total) Test−set log−likelihood Non−private HMM Non−private naive Bayes Laplace mechanism HMM OPS HMM (truncation multiplier = 100)

Afghanistan

slide-46
SLIDE 46

Experiments - States

0.05 0.1 0.15 0.2 0.25 0.3 criminal event enemy action explosive hazard friendly action friendly fire non−combat event

  • ther

suspicious incident threat report 0.005 0.01 0.015 0.02 cache found/cleared ied found/cleared ied explosion direct fire detain escalation of force indirect fire small arms threat raid murder 0.05 0.1 0.15 0.2 0.25 friendly and host casualties civilian casualties enemy casualties 0.05 0.1 0.15 0.2 0.25 criminal event enemy action explosive hazard friendly action friendly fire combat event

  • ther

suspicious incident threat report 0.005 0.01 0.015 0.02 0.025 ied explosion direct fire ied found/cleared murder indirect fire detain search and attack cache found/cleared raid counter mortar patrol 0.05 0.1 0.15 0.2 friendly and host casualties civilian casualties enemy casualties

Iraq State 1 Iraq State 2

slide-47
SLIDE 47

Region code Month Peak troops Jan 2004 Jan 2005 Jan 2006 Surge announced Jan 2008 MND−BAGHDAD MND−C MND−N MND−SE MNF−W State 1 State 2

Experiments - Clustering

slide-48
SLIDE 48

Conclusion

New method for private posterior sampling from exponential families Open Problems:

  • 1. Private sampling from more complex posteriors
  • 2. Private versions of other Bayesian posterior

approximation schemes (variational Bayes, etc)

  • 3. Combining Bayesian inference with more relaxed

forms of DP (eg, concentrated DP , distributional DP , etc)

slide-49
SLIDE 49

Talk Agenda:

  • 1. Privacy for Uncorrelated Data
  • 2. Privacy for Correlated Data
  • How to define privacy
  • Privacy-preserving Bayesian Learning
slide-50
SLIDE 50

Example 1: Activity Monitoring

Share aggregate data on physical activity with doctor or provider, while hiding activity at each specific time

slide-51
SLIDE 51

Example 2: Spread of Flu in Network

Publish aggregate statistics, preserve individual privacy Interaction Network

slide-52
SLIDE 52

Why is Differential Privacy not Enough for Correlated data?

slide-53
SLIDE 53

Example: Activity Monitoring

Correlation Network Goal: (1) Publish activity histogram (2) Prevent adversary from knowing activity at t D = (x1, .., xT), xt = activity at time t

slide-54
SLIDE 54

1-DP: Output histogram of activities + noise with stdev 1 Correlation Network

Example: Activity Monitoring

D = (x1, .., xT), xt = activity at time t

slide-55
SLIDE 55

1-DP: Output histogram of activities + noise with stdev 1 Not enough - activities across time are highly correlated! Correlation Network

Example: Activity Monitoring

D = (x1, .., xT), xt = activity at time t

slide-56
SLIDE 56

1-Group DP: Output histogram of activities + noise with stdev T Too much noise - no utility! Correlation Network D = (x1, .., xT), xt = activity at time t

Example: Activity Monitoring

slide-57
SLIDE 57

Talk Agenda:

  • 1. Privacy for Uncorrelated Data
  • 2. Privacy for Correlated Data
  • How to define privacy
  • Privacy-preserving Classification
  • How to define privacy
slide-58
SLIDE 58

Pufferfish Privacy [KM12]

Secret Set S S: Information to be protected e.g: Alice’s age is 25, Bob has a disease

slide-59
SLIDE 59

Pufferfish Privacy [KM12]

Secret Set S Secret Pairs Set Q Q: Pairs of secrets we want to be indistinguishable e.g: (Alice’s age is 25, Alice’s age is 40) (Bob is in dataset, Bob is not in dataset)

slide-60
SLIDE 60

Pufferfish Privacy [KM12]

Secret Set S Secret Pairs Set Q Distribution Class Θ e.g: (connection graph G, disease transmits w.p [0.1, 0.5]) (Markov Chain with transition matrix in set P) : A set of distributions that plausibly generate the data Θ May be used to model correlation in data

slide-61
SLIDE 61

Pufferfish Privacy [KM12]

Secret Set S Secret Pairs Set Q Distribution Class Θ whenever P(si|θ), P(sj|θ) > 0

p(A(X)|sj, θ)

p(A(X)|si, θ)

t

p✓,A(A(X) = t|si, θ) ≤ e✏ · p✓,A(A(X) = t|sj, θ)

An algorithm A is -Pufferfish private with parameters (S, Q, Θ) if for all (si, sj) in Q, for all , all t, θ ∈ Θ X ∼ θ, ✏

slide-62
SLIDE 62

Pufferfish Generalizes DP [KM12]

Theorem: Pufferfish = Differential Privacy when: S = { si,a := Person i has value a, for all i, all a in domain X } Q = { (si,a si,b), for all i and (a, b) pairs in X x X } = { Distributions where each person i is independent } Θ

slide-63
SLIDE 63

Pufferfish Generalizes DP [KM12]

Theorem: Pufferfish = Differential Privacy when: S = { si,a := Person i has value a, for all i, all a in domain X } Q = { (si,a si,b), for all i and (a, b) pairs in X x X } = { Distributions where each person i is independent } Θ Theorem: No utility possible when: = { All possible distributions } Θ

slide-64
SLIDE 64

Talk Agenda:

  • 1. Privacy for Uncorrelated Data
  • 2. Privacy for Correlated Data
  • How to define privacy
  • Privacy-preserving Classification
  • How to define privacy
  • Privacy-preserving Statistics
slide-65
SLIDE 65

How to get Pufferfish privacy?

Special case mechanisms [KM12, HMD12] Is there a more general Pufferfish mechanism for a large class of correlated data? Our work: Yes, the Markov Quilt Mechanism

slide-66
SLIDE 66

Correlation Measure: Bayesian Networks

Node: variable Directed Acyclic Graph

Pr(X1, X2, . . . , Xn) = Y

i

Pr(Xi|parents(Xi))

Joint distribution of variables:

slide-67
SLIDE 67

A Simple Example

X1 X2 X3 Xn Xi in {0, 1} Model: State Transition Probabilities: 1 1 - p 1 - p p p

slide-68
SLIDE 68

A Simple Example

X1 X2 X3 Xn Xi in {0, 1} Model: State Transition Probabilities: 1 1 - p 1 - p p p Pr(X2 = 0| X1 = 0) = p …. Pr(X2 = 0| X1 = 1) = 1 - p

slide-69
SLIDE 69

A Simple Example

X1 X2 X3 Xn Xi in {0, 1} Model: State Transition Probabilities: 1 1 - p 1 - p p p Pr(X2 = 0| X1 = 0) = p …. Influence of X1 diminishes with distance Pr(Xi = 0| X1 = 0) =

1 2 + 1 2(2p − 1)i−1

Pr(X2 = 0| X1 = 1) = 1 - p

1 2 − 1 2(2p − 1)i−1

Pr(Xi = 0| X1 = 1) =

slide-70
SLIDE 70

Algorithm: Main Idea

Goal: Protect X1

X1 X2 X3 Xn

slide-71
SLIDE 71

Algorithm: Main Idea

Goal: Protect X1

X1 X2 X3 Xn

Local nodes Rest (high correlation) (almost independent)

slide-72
SLIDE 72

Algorithm: Main Idea

Goal: Protect X1

X1 X2 X3 Xn

Add noise to hide local nodes Small correction for rest

+

Local nodes Rest (high correlation) (almost independent)

slide-73
SLIDE 73

Measuring “Independence”

Max-influence of Xi on a set of nodes XR: To protect Xi, correction term needed for XR is exp(e(XR|Xi))

e(XR|Xi) = max

a,b sup θ∈Θ

max

xR log Pr(XR = xR|Xi = a, θ)

Pr(XR = xR|Xi = b, θ)

Low e(XR|Xi) means XR is almost independent of Xi

slide-74
SLIDE 74

How to find large “almost independent” sets

Brute force search is expensive Use structural properties of the Bayesian network

slide-75
SLIDE 75

Markov Blanket

Markov Blanket(Xi) = Set of nodes XS s.t Xi is independent of X\(Xi U XS) given XS (usually, parents, children,

  • ther parents of children)

Xi XS

Markov Blanket (Xi)

slide-76
SLIDE 76

Define: Markov Quilt

XQ is a Markov Quilt of Xi if:

  • 2. Xi lies in XN
  • 1. Deleting XQ breaks graph

into XN and XR

  • 3. XR is independent of Xi

given XQ Xi XQ XR XN (For Markov Blanket XN = Xi)

slide-77
SLIDE 77

Recall: Algorithm

Goal: Protect X1

X1 X2 X3 Xn

Add noise to hide local nodes Small correction for rest

+

Local nodes Rest (high correlation) (almost independent)

slide-78
SLIDE 78

Why do we need Markov Quilts?

Given a Markov Quilt, Xi XQ XR XN XN = local nodes for Xi XQ U XR = rest

slide-79
SLIDE 79

Why do we need Markov Quilts?

Given a Markov Quilt, Xi XQ XR XN XN = local nodes for Xi XQ U XR = rest Need to search over Markov Quilts XQ to find the one which needs optimal amount

  • f noise
slide-80
SLIDE 80

From Markov Quilts to Amount of Noise

Xi XQ XR XN Stdev of noise to protect Xi: Score(XQ) =

Correction for XQ U XR Noise due to XN

Let XQ = Markov Quilt for Xi

card(XN) ✏ − e(XQ|Xi)

slide-81
SLIDE 81

The Markov Quilt Mechanism

For each Xi Find the Markov Quilt XQ for Xi with minimum score si Output F(D) + (maxi si) Z where Z ∼ Lap(1)

slide-82
SLIDE 82

The Markov Quilt Mechanism

For each Xi Find the Markov Quilt XQ for Xi with minimum score si Output F(D) + (maxi si) Z where Z ∼ Lap(1) Advantage: Poly-time in special cases. Theorem: This preserves -Pufferfish privacy

slide-83
SLIDE 83

Example: Activity Monitoring

D = (x1, .., xT), xt = activity at time t

slide-84
SLIDE 84

XQ

Example: Activity Monitoring

D = (x1, .., xT), xt = activity at time t (Minimal) Markov Quilts for Xi have form {Xi-a,Xi+b} Xi Xi+b Xi-a Efficiently searchable XN XQ XR

slide-85
SLIDE 85

Example: Activity Monitoring

set of states

X :

Pθ : transition matrix describing each θ ∈ Θ

slide-86
SLIDE 86

Example: Activity Monitoring

Under some assumptions, relevant parameters are:

πΘ = min

x∈X,θ∈Θ πθ(x)

(min prob of x under stationary distr.)

set of states

X :

Pθ : transition matrix describing each θ ∈ Θ

gΘ = min

θ∈Θ min{1 − |λ| : Pθx = λx, λ < 1} (min eigengap of any )

slide-87
SLIDE 87

Example: Activity Monitoring

Under some assumptions, relevant parameters are:

πΘ = min

x∈X,θ∈Θ πθ(x)

(min prob of x under stationary distr.)

set of states

X :

Pθ : transition matrix describing each θ ∈ Θ

gΘ = min

θ∈Θ min{1 − |λ| : Pθx = λx, λ < 1} (min eigengap of any )

e(XQ|Xi) ≤ log ✓πΘ + exp(−gΘb) πΘ − exp(−gΘb) ◆ + 2 log ✓πΘ + exp(−gΘa) πΘ − exp(−gΘa) ◆

Max-influence of XQ = {Xi-a,Xi+b} for Xi Score(XQ) =

a + b − 1 ✏ − e(XQ|Xi)

slide-88
SLIDE 88

Markov Quilt Mechanism for Activity Monitoring

For each Xi Find Markov Quilt XQ = {Xi-a,Xi+b} with minimum score si Output F(D) + (maxi si) Z where Z ∼ Lap(1) Running Time: O(T3) (can be made O(T2) ) Advantage: Consistency

slide-89
SLIDE 89

Conclusion

New mechanism for computing statistics on correlated data Open Problems:

  • 1. Composing multiple releases on correlated data
  • 2. Other correlation models (beyond Bayesian nets)
  • 4. Applications - activity recognition, location privacy
  • 3. More mechanisms (for optimization)
slide-90
SLIDE 90

Conclusion

Learning with Privacy: Learning from iid data based on convex opt Bayesian Inference Relatively well-understood New Directions: Learning from Correlated Data

slide-91
SLIDE 91

Acknowledgements

Shuang Song Mani Srivastava Yizhen Wang Joseph Geumlek James Foulds Max Welling

slide-92
SLIDE 92

Questions?