Violations by Sampling and Optimization Dana Benjamin Bichsel - - PowerPoint PPT Presentation

▶

Mar 21, 2023 687 likes •957 views

DP-Finder: Finding Differential Privacy Violations by Sampling and Optimization Dana Benjamin Bichsel Timon Gehr PetarTsankov Martin Vechev Drachsler-Cohen Differential Privacy Basic Setting # disease 7 2 Differential Privacy Basic

SLIDE 1

DP-Finder: Finding Differential Privacy Violations by Sampling and Optimization

Benjamin Bichsel Timon Gehr Dana Drachsler-Cohen PetarTsankov Martin Vechev

SLIDE 2

Differential Privacy – Basic Setting

# disease

SLIDE 3

Differential Privacy – Basic Setting

7.3

# disease + noise

What about my privacy?

SLIDE 4

Differential Privacy - Intuition

Change my data

7.3 7.6

# disease + noise

SLIDE 5

Differential Privacy – More Abstractly

Neighboring

𝐺(𝑦) 𝐺(𝑦′)

𝐺 𝐺

Attacker check 𝐺(𝑦) ∈ Φ?

𝑦 𝑦′

Attacker check 𝐺(𝑦′) ∈ Φ?

SLIDE 6

Attacker check 𝐺(𝑦) ∈ Φ? Attacker check 𝐺(𝑦′) ∈ Φ?

𝐺(𝑦)

Differential Privacy - Definition

Neighbouring

𝐺(𝑦′)

𝐺 𝐺

𝑦 𝑦′

𝜁-DP: Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] ≤ exp 𝜁 ≈ 1 + 𝜁 Challenges induced by DP:

Proving/checking 𝜁-DP is hard

(buggy algorithms)

Proof strategies not complete
Proofs only provide upper bounds

SLIDE 7

( , , )

𝑦′ 𝑦 Φ

Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > exp 𝜁 ⟺ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > 𝜁

that violate 𝜁-DP:

𝜁-DP Counterexamples

SLIDE 8

( , , )

𝑦′ 𝑦 Φ

Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > exp 𝜁 ⟺ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > 𝜁

that violate 𝜁-DP:

𝜁-DP Counterexamples

Maximize ε(𝑦, 𝑦′, Φ)

SLIDE 9

Bounds on "true" 𝜁

Evaluation: We get precise and large ε, close to known upper bounds

Proven: 10%-DP (𝜁 = 10% = 0.1) Counterexample: 9.9%-DP Counterexample: 15%-DP Counterexample: 5%-DP

SLIDE 10

𝜁-DP Counterexamples

Goal: Maximize ε(𝑦, 𝑦′, Φ)

Challenge 1: Expensive to compute ε precisely Challenge 2: Search space is sparse: Few 𝑦, 𝑦′, Φ lead to large ε(𝑦, 𝑦′, Φ)

Estimate 𝜁 by sampling Make Ƹ 𝜁 differentiable

𝜁 Ƹ 𝜁 Ƹ 𝜁 𝑒

SLIDE 11

Step 1: Estimate 𝜁

Estimate 𝜁 by sampling

𝜁 Ƹ 𝜁

SLIDE 12

Estimating 𝜁

𝜁 x, x′, Φ ≔ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ]

SLIDE 13

𝜁 x, x′, Φ ≔ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ]

Estimating 𝜁

෢ Pr 𝐺(𝑦) ∈ Φ = 1

𝑜 ෍

𝑗=1 𝑜

check𝐺,Φ

𝑗

(𝑦)

67% 33%

𝐺

7.3

𝐺

7.6

𝐺

6.8

𝐺(𝑦) 𝑦 check𝐺,Φ

𝑗

(𝑦)

yes no yes

SLIDE 14

How precise is our estimate?

Precision of Pr[𝐺 𝑦 ∈ Φ] and Pr[𝐺 𝑦′ ∈ Φ] Sampling effort 𝑜 Precision of 𝜁

Exponential search

Counterexample: 9.9% ± 10%-DP Counterexample: 9.9% ± 2 ∙ 10−3-DP

SLIDE 15

Estimating precisely is expensive

Estimating 𝜁 up to an error of 2 ∙ 10−3 with confidence of 90%

104

Probabillstic guarantees Heuristic Efficient Heuristic

SLIDE 16

𝐺

7.3

𝐺

7.6

𝐺

6.8

1 𝑜 ෍

𝑗=1 𝑜

check𝐺,Φ

𝑗

𝑦 1 𝑜 ෍

𝑗=1 𝑜

check𝐺,Φ

𝑗

𝑦′

Follows 2D Gaussian distribution

Applying the M-CLT (Correlation)

yes no yes

𝐺 𝐺 𝐺

7.3 7.6 8.2

yes no no

SLIDE 17

Obtaining a Confidence Interval for 𝜁

17 Distribution of Gauss Gauss (correlated):

D. V. Hinkley. 1969. On the Ratio of Two Correlated Normal Random Variables.

Biometrika 56, 3 (1969), 635–639. http://www.jstor.org/stable/2334671

Joint likelihood of Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺 𝑦′ ∈ Φ] Likelihood of ε(𝑦, x′, Φ) Confidence Interval for ε(𝑦, x′, Φ)

SLIDE 18

How precise is our estimate?

Counterexample: 9.9% ± 10%-DP Counterexample: 9.9% ± 2 ∙ 10−3-DP

SLIDE 19

Step 2: Finding Counterexamples

Ƹ 𝜁 Ƹ 𝜁 𝑒

Make Ƹ 𝜁 differentiable

SLIDE 20

How can we optimize our estimate?

maximize

¬𝐶 ↝ 1 − 𝐶 𝐶1 ∧ 𝐶2 ↝ 𝐶1 ∙ 𝐶2 if 𝐶 ∶ 𝑦 = 𝐹1 else ∶ 𝑦 = 𝐹2 ↝ 𝑦 = 𝐶 ∙ 𝐹1 + (1 − 𝐶) ∙ 𝐹2

Ƹ 𝜗 𝑦, 𝑦′, Φ = log

1 𝑜 σ𝑗=1 𝑜

check𝐺,Φ

𝑗

(𝑦)

1 𝑜 σ𝑗=1 𝑜

check𝐺,Φ

𝑗

(𝑦′) Not differentiable Goals

Make differentiable
Preserve semantics

SLIDE 21

How can we optimize our estimate?

maximize

Ƹ 𝜗 𝑦, 𝑦′, Φ = log

1 𝑜 σ𝑗=1 𝑜

check𝐺,Φ

𝑗

(𝑦)

1 𝑜 σ𝑗=1 𝑜

check𝐺,Φ

𝑗

(𝑦′) Not differentiable

Maximize using SLSQP (supports hard constraints for

neighborhood)

Random starting point (+ restart)
What about division by zero?
What about very small denominators?

SLIDE 22

Main differences to Ding et al.

Dimension Ding et al. This work Problem statement ε 𝑦, 𝑦′, Φ > ε0? Maximize ε(𝑦, 𝑦′, Φ) Approach Statistical tests Estimate + confidence interval Search By patterns Gradient descent (incremental)

SLIDE 23

Evaluation

How precise is the differentiable estimate?
How efficient is DP-Finder in finding violations compared to

random search?

Exact solver (PSI) for ground truth

SLIDE 24

Precision of Differentiable Estimate

𝜁 Algorithms Ƹ 𝜁 𝑒 𝜁

SLIDE 25

Random vs Optimized

Random start Optimized

SLIDE 26

Differential Privacy

Conclusion

𝜁-DP Counterexamples

( , , )

Estimate 𝜁 Finding Counterexamples

𝜁 Ƹ 𝜁 Ƹ 𝜁 𝑒 Ƹ 𝜁