DP-Finder: Finding Differential Privacy Violations by Sampling and Optimization
Benjamin Bichsel Timon Gehr Dana Drachsler-Cohen PetarTsankov Martin Vechev
Violations by Sampling and Optimization Dana Benjamin Bichsel - - PowerPoint PPT Presentation
DP-Finder: Finding Differential Privacy Violations by Sampling and Optimization Dana Benjamin Bichsel Timon Gehr PetarTsankov Martin Vechev Drachsler-Cohen Differential Privacy Basic Setting # disease 7 2 Differential Privacy Basic
DP-Finder: Finding Differential Privacy Violations by Sampling and Optimization
Benjamin Bichsel Timon Gehr Dana Drachsler-Cohen PetarTsankov Martin Vechev
Differential Privacy – Basic Setting
7
2
# disease
Differential Privacy – Basic Setting
7.3
3
# disease + noise
What about my privacy?
Differential Privacy - Intuition
Change my data
7.3 7.6
4
# disease + noise
?
# disease + noise
Differential Privacy – More Abstractly
Neighboring
𝐺(𝑦) 𝐺(𝑦′)
5
𝐺 𝐺
Attacker check 𝐺(𝑦) ∈ Φ?
𝑦 𝑦′
Attacker check 𝐺(𝑦′) ∈ Φ?
Attacker check 𝐺(𝑦) ∈ Φ? Attacker check 𝐺(𝑦′) ∈ Φ?
𝐺(𝑦)
Differential Privacy - Definition
Neighbouring
𝐺(𝑦′)
6
𝐺 𝐺
𝑦 𝑦′
𝜁-DP: Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] ≤ exp 𝜁 ≈ 1 + 𝜁 Challenges induced by DP:
(buggy algorithms)
𝑦′ 𝑦 Φ
Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > exp 𝜁 ⟺ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > 𝜁
that violate 𝜁-DP:
𝜁-DP Counterexamples
7
𝑦′ 𝑦 Φ
Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > exp 𝜁 ⟺ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > 𝜁
that violate 𝜁-DP:
𝜁-DP Counterexamples
8
Maximize ε(𝑦, 𝑦′, Φ)
Bounds on "true" 𝜁
9
Evaluation: We get precise and large ε, close to known upper bounds
Proven: 10%-DP (𝜁 = 10% = 0.1) Counterexample: 9.9%-DP Counterexample: 15%-DP Counterexample: 5%-DP
𝜁-DP Counterexamples
10
Goal: Maximize ε(𝑦, 𝑦′, Φ)
Challenge 1: Expensive to compute ε precisely Challenge 2: Search space is sparse: Few 𝑦, 𝑦′, Φ lead to large ε(𝑦, 𝑦′, Φ)
Estimate 𝜁 by sampling Make Ƹ 𝜁 differentiable
𝜁 Ƹ 𝜁 Ƹ 𝜁 𝑒
Step 1: Estimate 𝜁
11
Estimate 𝜁 by sampling
𝜁 Ƹ 𝜁
Estimating 𝜁
12
𝜁 x, x′, Φ ≔ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ]
𝜁 x, x′, Φ ≔ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ]
Estimating 𝜁
Pr 𝐺(𝑦) ∈ Φ = 1
𝑜
𝑗=1 𝑜
check𝐺,Φ
𝑗
(𝑦)
67% 33%
13
𝐺
7.3
𝐺
7.6
𝐺
6.8
𝐺(𝑦) 𝑦 check𝐺,Φ
𝑗
(𝑦)
yes no yes
How precise is our estimate?
Precision of Pr[𝐺 𝑦 ∈ Φ] and Pr[𝐺 𝑦′ ∈ Φ] Sampling effort 𝑜 Precision of 𝜁
14
Exponential search
Counterexample: 9.9% ± 10%-DP Counterexample: 9.9% ± 2 ∙ 10−3-DP
vs
Estimating precisely is expensive
Estimating 𝜁 up to an error of 2 ∙ 10−3 with confidence of 90%
15
104
Probabillstic guarantees Heuristic Efficient Heuristic
𝐺
7.3
𝐺
7.6
𝐺
6.8
1 𝑜
𝑗=1 𝑜
check𝐺,Φ
𝑗
𝑦 1 𝑜
𝑗=1 𝑜
check𝐺,Φ
𝑗
𝑦′
Follows 2D Gaussian distribution
Applying the M-CLT (Correlation)
16
yes no yes
𝐺 𝐺 𝐺
7.3 7.6 8.2
yes no no
Obtaining a Confidence Interval for 𝜁
17 Distribution of Gauss Gauss (correlated):
Biometrika 56, 3 (1969), 635–639. http://www.jstor.org/stable/2334671
Joint likelihood of Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺 𝑦′ ∈ Φ] Likelihood of ε(𝑦, x′, Φ) Confidence Interval for ε(𝑦, x′, Φ)
How precise is our estimate?
18
Counterexample: 9.9% ± 10%-DP Counterexample: 9.9% ± 2 ∙ 10−3-DP
vs
Step 2: Finding Counterexamples
19
Ƹ 𝜁 Ƹ 𝜁 𝑒
Make Ƹ 𝜁 differentiable
How can we optimize our estimate?
maximize
¬𝐶 ↝ 1 − 𝐶 𝐶1 ∧ 𝐶2 ↝ 𝐶1 ∙ 𝐶2 if 𝐶 ∶ 𝑦 = 𝐹1 else ∶ 𝑦 = 𝐹2 ↝ 𝑦 = 𝐶 ∙ 𝐹1 + (1 − 𝐶) ∙ 𝐹2
20
Ƹ 𝜗 𝑦, 𝑦′, Φ = log
1 𝑜 σ𝑗=1 𝑜
check𝐺,Φ
𝑗
(𝑦)
1 𝑜 σ𝑗=1 𝑜
check𝐺,Φ
𝑗
(𝑦′) Not differentiable Goals
How can we optimize our estimate?
maximize
21
Ƹ 𝜗 𝑦, 𝑦′, Φ = log
1 𝑜 σ𝑗=1 𝑜
check𝐺,Φ
𝑗
(𝑦)
1 𝑜 σ𝑗=1 𝑜
check𝐺,Φ
𝑗
(𝑦′) Not differentiable
neighborhood)
Main differences to Ding et al.
22
Dimension Ding et al. This work Problem statement ε 𝑦, 𝑦′, Φ > ε0? Maximize ε(𝑦, 𝑦′, Φ) Approach Statistical tests Estimate + confidence interval Search By patterns Gradient descent (incremental)
Evaluation
23
random search?
Exact solver (PSI) for ground truth
Precision of Differentiable Estimate
24
𝜁 Algorithms Ƹ 𝜁 𝑒 𝜁
Random vs Optimized
25
Random start Optimized
Differential Privacy
Conclusion
26
𝜁-DP Counterexamples
Estimate 𝜁 Finding Counterexamples
𝜁 Ƹ 𝜁 Ƹ 𝜁 𝑒 Ƹ 𝜁