New Directions in Privacy- preserving Machine Learning Kamalika - - PowerPoint PPT Presentation
New Directions in Privacy- preserving Machine Learning Kamalika - - PowerPoint PPT Presentation
New Directions in Privacy- preserving Machine Learning Kamalika Chaudhuri University of California, San Diego Sensitive Data Medical Records Genetic Data Search Logs AOL Violates Privacy AOL Violates Privacy Netflix Violates Privacy [NS08]
Sensitive Data
Medical Records Genetic Data Search Logs
AOL Violates Privacy
AOL Violates Privacy
Netflix Violates Privacy [NS08]
User%1% User%2% User%3% Movies%
2-8 movie-ratings and dates for Alice reveals: Whether Alice is in the dataset or not Alice’s other movie ratings
High-dimensional Data is Unique
Example: UCSD Employee Salary Table
One employee (Kamalika) fits description!
Faculty Position Gender Department Ethnicity
- Salary
Female CSE SE Asian
Simply anonymizing data is unsafe!
Disease Association Studies [WLWTZ09]
Cancer Healthy Correlations Correlations Correlation (R2 values), Alice’s DNA reveals: If Alice is in the Cancer set or Healthy set
Simply anonymizing data is unsafe! Statistics on small data sets is unsafe!
Privacy Accuracy Data Size
Correlated Data
User information in social networks Physical Activity Monitoring
Why is Privacy Hard for Correlated Data?
Neighbor’s information leaks information on user
How do we learn from sensitive data while still preserving privacy?
Talk Agenda:
New Directions:
- 1. Privacy-preserving Bayesian Learning
- 2. Privacy-preserving statistics on correlated data
Talk Agenda:
- 1. Privacy for Uncorrelated Data
- How to define privacy
Differential Privacy [DMNS06]
“similar”
Randomized Algorithm Randomized Algorithm
Data + Data +
Participation of a single person does not change output
Differential Privacy: Attacker’s View
Prior Knowledge + Algorithm Output on Data & = Conclusion
- n
Prior Knowledge + Algorithm Output on Data & = Conclusion
- n
Differential Privacy [DMNS06]
For all D1, D2 that differ in one person’s value, any set S,
S
D1 D2
Pr[A(D1) in S] Pr[A(D2) in S]
If A = -private randomized algorithm, then:
✏
Pr(A(D1) ∈ S) ≤ e✏ Pr(A(D2) ∈ S)
Differential Privacy
- 1. Provably strong notion of privacy
- 2. Good approximations for many functions
e.g, means, histograms, etc.
Interpretation: Attacker’s Hypothesis Test [WZ10, OV13]
Failure Events: False Alarm (FA), Missed Detection (MD) H0: Input to the algorithm = Data + H1: Input to the algorithm = Data +
Interpretation: Attacker’s Hypothesis Test [WZ10, OV13]
(1, 0) (0, 1)
✓ 1 1 + e✏ , 1 1 + e✏ ◆
If algorithm is ✏-DP
Pr(FA) + e✏ Pr(MD) ≥ 1 e✏ Pr(FA) + Pr(MD) ≥ 1 FA = False Alarm MD = Missed Detection
Talk Agenda:
- 1. Privacy for Uncorrelated Data
- How to define privacy
- Privacy-preserving Learning
Example 1: Flu Test
Predicts flu or not, based on patient symptoms Trained on sensitive patient data
Example 2: Clustering Abortion Data
Given data on abortion locations, cluster by location while preserving privacy of individuals
Bayesian Learning
Bayesian Learning
Data X = { x1, x2, … } Model Class Θ likelihood p(x|θ) Related through
}
Bayesian Learning
Data X = { x1, x2, … } Model Class Θ + Prior π(θ) likelihood p(x|θ) Related through
}
Bayesian Learning
Data X = { x1, x2, … } Model Class Θ + Prior π(θ) Data X likelihood p(x|θ) Related through
}
Bayesian Learning
Data X = { x1, x2, … } Model Class Θ + Prior π(θ) Data X = Posterior p(θ|X) likelihood p(x|θ) Related through
}
Bayesian Learning
Data X = { x1, x2, … } Model Class Θ + Prior π(θ) Data X = Posterior p(θ|X) likelihood p(x|θ) Related through
}
Goal: Output posterior (approx. or samples)
Example: Coin tosses
X = { H, T, H, H… } likelihood:
Θ = [0, 1]
p(x|θ) = θx(1 − θ)1−x
Example: Coin tosses
X = { H, T, H, H… } likelihood:
Θ = [0, 1]
p(x|θ) = θx(1 − θ)1−x
+ Prior
π(θ) = 1
Example: Coin tosses
X = { H, T, H, H… } likelihood:
Θ = [0, 1]
p(x|θ) = θx(1 − θ)1−x
+ Prior Data X
π(θ) = 1
(h H, t T)
Example: Coin tosses
X = { H, T, H, H… } likelihood:
Θ = [0, 1]
p(x|θ) = θx(1 − θ)1−x
+ Prior Data X = Posterior
π(θ) = 1
(h H, t T)
p(θ|x) ∝ θh(1 − θ)t
Example: Coin tosses
X = { H, T, H, H… } likelihood:
Θ = [0, 1]
p(x|θ) = θx(1 − θ)1−x
+ Prior Data X = Posterior
π(θ) = 1
(h H, t T)
p(θ|x) ∝ θh(1 − θ)t
In general, is more complex (classifiers, etc)
θ
Private Bayesian Learning
Data X = { x1, x2, … } Model Class Θ + Prior π(θ) Data X = Posterior p(θ|X) likelihood p(x|θ) Related through
}
Private Bayesian Learning
Data X = { x1, x2, … } Model Class Θ + Prior π(θ) Data X = Posterior p(θ|X) likelihood p(x|θ) Related through
}
Goal: Output private approx. to posterior
How to make posterior private?
Option 1: Direct posterior sampling [Detal14] Not private unless under restrictive conditions p(θ|D)
p(θ|D0)
How to make posterior private?
Option 2: Sample from truncated posterior at high temperature [WFS15] Disadvantage: Intractable - technically privacy only on convergence Needs more data/subjects
Our Work: Exponential Families
Exponential family distributions: p(x|θ) = h(x)eθ>T (x)−A(θ) where T is a sufficient statistic Includes many common distributions like Gaussians, Binomials, Dirichlets, Betas, etc
Properties of Exponential Families
Exponential families have conjugate priors
+
Prior π(θ) Data X
=
Posterior p(θ|X) eg, Gaussians-Gaussians, Beta-Binomial, etc is in the same distribution class as p(θ|x) π(θ)
Sampling from Exponential Families
(Non-private) posterior comes from exp. family: given data x1, x2, … Private Sampling:
- 1. If T is bounded, add noise to to get private
version T’
X
i
T(xi)
- 2. Sample from the perturbed posterior:
p(θ|x) ∝ eη(θ)>(P
i T (xi))−B(θ)
p(θ|x) ∝ eη(θ)>T 0−B(θ)
Performance
- Theoretical Guarantees
- Experiments
Theoretical Guarantees
Performance Measure: Asymptotic Relative Efficiency (Lower = more sample efficient for large n) Non-private: Our Method: [WFS15]: 2 2
max(2, 1 + 1/✏)
Experiments - Task
Task: Time series clustering of events in Wikileaks war logs while preserving event-level privacy Data: War-log entries - Afghanistan (75K), Iraq (390K) Goal: Cluster entries in each region based on features (casualty counts, enemy/friendly fire, explosive hazards, etc…)
Experiments - Model
Hidden Markov Model for each region Discrete states (ht) and observations (xt) Hidden state Observed features … … ht xt Transition parameters T: Tij = P(ht+1 = i | ht = j) Emission parameters O: where Oij = P(xt = i | ht = j) Goal: Sample from posterior P(O| data) (in the exponential family)
Experiments - Results
10
−1
10 10
1
−5 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 x 10
5
Epsilon (total) Test−set log−likelihood Non−private HMM Non−private naive Bayes Laplace mechanism HMM OPS HMM (truncation multiplier = 100)
Iraq
10
−1
10 10
1
−8.5 −8 −7.5 −7 −6.5 −6 −5.5 −5 −4.5 −4 −3.5 x 10
4
Epsilon (total) Test−set log−likelihood Non−private HMM Non−private naive Bayes Laplace mechanism HMM OPS HMM (truncation multiplier = 100)
Afghanistan
Experiments - States
0.05 0.1 0.15 0.2 0.25 0.3 criminal event enemy action explosive hazard friendly action friendly fire non−combat event
- ther
suspicious incident threat report 0.005 0.01 0.015 0.02 cache found/cleared ied found/cleared ied explosion direct fire detain escalation of force indirect fire small arms threat raid murder 0.05 0.1 0.15 0.2 0.25 friendly and host casualties civilian casualties enemy casualties 0.05 0.1 0.15 0.2 0.25 criminal event enemy action explosive hazard friendly action friendly fire combat event
- ther
suspicious incident threat report 0.005 0.01 0.015 0.02 0.025 ied explosion direct fire ied found/cleared murder indirect fire detain search and attack cache found/cleared raid counter mortar patrol 0.05 0.1 0.15 0.2 friendly and host casualties civilian casualties enemy casualties
Iraq State 1 Iraq State 2
Region code Month Peak troops Jan 2004 Jan 2005 Jan 2006 Surge announced Jan 2008 MND−BAGHDAD MND−C MND−N MND−SE MNF−W State 1 State 2
Experiments - Clustering
Conclusion
New method for private posterior sampling from exponential families Open Problems:
- 1. Private sampling from more complex posteriors
- 2. Private versions of other Bayesian posterior
approximation schemes (variational Bayes, etc)
- 3. Combining Bayesian inference with more relaxed
forms of DP (eg, concentrated DP , distributional DP , etc)
Talk Agenda:
- 1. Privacy for Uncorrelated Data
- 2. Privacy for Correlated Data
- How to define privacy
- Privacy-preserving Bayesian Learning
Example 1: Activity Monitoring
Share aggregate data on physical activity with doctor or provider, while hiding activity at each specific time
Example 2: Spread of Flu in Network
Publish aggregate statistics, preserve individual privacy Interaction Network
Why is Differential Privacy not Enough for Correlated data?
Example: Activity Monitoring
Correlation Network Goal: (1) Publish activity histogram (2) Prevent adversary from knowing activity at t D = (x1, .., xT), xt = activity at time t
1-DP: Output histogram of activities + noise with stdev 1 Correlation Network
Example: Activity Monitoring
D = (x1, .., xT), xt = activity at time t
1-DP: Output histogram of activities + noise with stdev 1 Not enough - activities across time are highly correlated! Correlation Network
Example: Activity Monitoring
D = (x1, .., xT), xt = activity at time t
1-Group DP: Output histogram of activities + noise with stdev T Too much noise - no utility! Correlation Network D = (x1, .., xT), xt = activity at time t
Example: Activity Monitoring
Talk Agenda:
- 1. Privacy for Uncorrelated Data
- 2. Privacy for Correlated Data
- How to define privacy
- Privacy-preserving Classification
- How to define privacy
Pufferfish Privacy [KM12]
Secret Set S S: Information to be protected e.g: Alice’s age is 25, Bob has a disease
Pufferfish Privacy [KM12]
Secret Set S Secret Pairs Set Q Q: Pairs of secrets we want to be indistinguishable e.g: (Alice’s age is 25, Alice’s age is 40) (Bob is in dataset, Bob is not in dataset)
Pufferfish Privacy [KM12]
Secret Set S Secret Pairs Set Q Distribution Class Θ e.g: (connection graph G, disease transmits w.p [0.1, 0.5]) (Markov Chain with transition matrix in set P) : A set of distributions that plausibly generate the data Θ May be used to model correlation in data
Pufferfish Privacy [KM12]
Secret Set S Secret Pairs Set Q Distribution Class Θ whenever P(si|θ), P(sj|θ) > 0
p(A(X)|sj, θ)
p(A(X)|si, θ)
t
p✓,A(A(X) = t|si, θ) ≤ e✏ · p✓,A(A(X) = t|sj, θ)
An algorithm A is -Pufferfish private with parameters (S, Q, Θ) if for all (si, sj) in Q, for all , all t, θ ∈ Θ X ∼ θ, ✏
Pufferfish Generalizes DP [KM12]
Theorem: Pufferfish = Differential Privacy when: S = { si,a := Person i has value a, for all i, all a in domain X } Q = { (si,a si,b), for all i and (a, b) pairs in X x X } = { Distributions where each person i is independent } Θ
Pufferfish Generalizes DP [KM12]
Theorem: Pufferfish = Differential Privacy when: S = { si,a := Person i has value a, for all i, all a in domain X } Q = { (si,a si,b), for all i and (a, b) pairs in X x X } = { Distributions where each person i is independent } Θ Theorem: No utility possible when: = { All possible distributions } Θ
Talk Agenda:
- 1. Privacy for Uncorrelated Data
- 2. Privacy for Correlated Data
- How to define privacy
- Privacy-preserving Classification
- How to define privacy
- Privacy-preserving Statistics
How to get Pufferfish privacy?
Special case mechanisms [KM12, HMD12] Is there a more general Pufferfish mechanism for a large class of correlated data? Our work: Yes, the Markov Quilt Mechanism
Correlation Measure: Bayesian Networks
Node: variable Directed Acyclic Graph
Pr(X1, X2, . . . , Xn) = Y
i
Pr(Xi|parents(Xi))
Joint distribution of variables:
A Simple Example
X1 X2 X3 Xn Xi in {0, 1} Model: State Transition Probabilities: 1 1 - p 1 - p p p
A Simple Example
X1 X2 X3 Xn Xi in {0, 1} Model: State Transition Probabilities: 1 1 - p 1 - p p p Pr(X2 = 0| X1 = 0) = p …. Pr(X2 = 0| X1 = 1) = 1 - p
A Simple Example
X1 X2 X3 Xn Xi in {0, 1} Model: State Transition Probabilities: 1 1 - p 1 - p p p Pr(X2 = 0| X1 = 0) = p …. Influence of X1 diminishes with distance Pr(Xi = 0| X1 = 0) =
1 2 + 1 2(2p − 1)i−1
Pr(X2 = 0| X1 = 1) = 1 - p
1 2 − 1 2(2p − 1)i−1
Pr(Xi = 0| X1 = 1) =
Algorithm: Main Idea
Goal: Protect X1
X1 X2 X3 Xn
Algorithm: Main Idea
Goal: Protect X1
X1 X2 X3 Xn
Local nodes Rest (high correlation) (almost independent)
Algorithm: Main Idea
Goal: Protect X1
X1 X2 X3 Xn
Add noise to hide local nodes Small correction for rest
+
Local nodes Rest (high correlation) (almost independent)
Measuring “Independence”
Max-influence of Xi on a set of nodes XR: To protect Xi, correction term needed for XR is exp(e(XR|Xi))
e(XR|Xi) = max
a,b sup θ∈Θ
max
xR log Pr(XR = xR|Xi = a, θ)
Pr(XR = xR|Xi = b, θ)
Low e(XR|Xi) means XR is almost independent of Xi
How to find large “almost independent” sets
Brute force search is expensive Use structural properties of the Bayesian network
Markov Blanket
Markov Blanket(Xi) = Set of nodes XS s.t Xi is independent of X\(Xi U XS) given XS (usually, parents, children,
- ther parents of children)
Xi XS
Markov Blanket (Xi)
Define: Markov Quilt
XQ is a Markov Quilt of Xi if:
- 2. Xi lies in XN
- 1. Deleting XQ breaks graph
into XN and XR
- 3. XR is independent of Xi
given XQ Xi XQ XR XN (For Markov Blanket XN = Xi)
Recall: Algorithm
Goal: Protect X1
X1 X2 X3 Xn
Add noise to hide local nodes Small correction for rest
+
Local nodes Rest (high correlation) (almost independent)
Why do we need Markov Quilts?
Given a Markov Quilt, Xi XQ XR XN XN = local nodes for Xi XQ U XR = rest
Why do we need Markov Quilts?
Given a Markov Quilt, Xi XQ XR XN XN = local nodes for Xi XQ U XR = rest Need to search over Markov Quilts XQ to find the one which needs optimal amount
- f noise
From Markov Quilts to Amount of Noise
Xi XQ XR XN Stdev of noise to protect Xi: Score(XQ) =
Correction for XQ U XR Noise due to XN
Let XQ = Markov Quilt for Xi
card(XN) ✏ − e(XQ|Xi)
The Markov Quilt Mechanism
For each Xi Find the Markov Quilt XQ for Xi with minimum score si Output F(D) + (maxi si) Z where Z ∼ Lap(1)
The Markov Quilt Mechanism
For each Xi Find the Markov Quilt XQ for Xi with minimum score si Output F(D) + (maxi si) Z where Z ∼ Lap(1) Advantage: Poly-time in special cases. Theorem: This preserves -Pufferfish privacy
✏
Example: Activity Monitoring
D = (x1, .., xT), xt = activity at time t
XQ
Example: Activity Monitoring
D = (x1, .., xT), xt = activity at time t (Minimal) Markov Quilts for Xi have form {Xi-a,Xi+b} Xi Xi+b Xi-a Efficiently searchable XN XQ XR
Example: Activity Monitoring
set of states
X :
Pθ : transition matrix describing each θ ∈ Θ
Example: Activity Monitoring
Under some assumptions, relevant parameters are:
πΘ = min
x∈X,θ∈Θ πθ(x)
(min prob of x under stationary distr.)
set of states
X :
Pθ : transition matrix describing each θ ∈ Θ
gΘ = min
θ∈Θ min{1 − |λ| : Pθx = λx, λ < 1} (min eigengap of any )
Pθ
Example: Activity Monitoring
Under some assumptions, relevant parameters are:
πΘ = min
x∈X,θ∈Θ πθ(x)
(min prob of x under stationary distr.)
set of states
X :
Pθ : transition matrix describing each θ ∈ Θ
gΘ = min
θ∈Θ min{1 − |λ| : Pθx = λx, λ < 1} (min eigengap of any )
Pθ
e(XQ|Xi) ≤ log ✓πΘ + exp(−gΘb) πΘ − exp(−gΘb) ◆ + 2 log ✓πΘ + exp(−gΘa) πΘ − exp(−gΘa) ◆
Max-influence of XQ = {Xi-a,Xi+b} for Xi Score(XQ) =
a + b − 1 ✏ − e(XQ|Xi)
Markov Quilt Mechanism for Activity Monitoring
For each Xi Find Markov Quilt XQ = {Xi-a,Xi+b} with minimum score si Output F(D) + (maxi si) Z where Z ∼ Lap(1) Running Time: O(T3) (can be made O(T2) ) Advantage: Consistency
Conclusion
New mechanism for computing statistics on correlated data Open Problems:
- 1. Composing multiple releases on correlated data
- 2. Other correlation models (beyond Bayesian nets)
- 4. Applications - activity recognition, location privacy
- 3. More mechanisms (for optimization)
Conclusion
Learning with Privacy: Learning from iid data based on convex opt Bayesian Inference Relatively well-understood New Directions: Learning from Correlated Data
Acknowledgements
Shuang Song Mani Srivastava Yizhen Wang Joseph Geumlek James Foulds Max Welling