Violations by Sampling and Optimization Dana Benjamin Bichsel - - PowerPoint PPT Presentation

violations by sampling and optimization
SMART_READER_LITE
LIVE PREVIEW

Violations by Sampling and Optimization Dana Benjamin Bichsel - - PowerPoint PPT Presentation

DP-Finder: Finding Differential Privacy Violations by Sampling and Optimization Dana Benjamin Bichsel Timon Gehr PetarTsankov Martin Vechev Drachsler-Cohen Differential Privacy Basic Setting # disease 7 2 Differential Privacy Basic


slide-1
SLIDE 1

DP-Finder: Finding Differential Privacy Violations by Sampling and Optimization

Benjamin Bichsel Timon Gehr Dana Drachsler-Cohen PetarTsankov Martin Vechev

slide-2
SLIDE 2

Differential Privacy – Basic Setting

7

2

# disease

slide-3
SLIDE 3

Differential Privacy – Basic Setting

7.3

3

# disease + noise

What about my privacy?

slide-4
SLIDE 4

Differential Privacy - Intuition

Change my data

7.3 7.6

4

# disease + noise

?

  • r

# disease + noise

slide-5
SLIDE 5

Differential Privacy – More Abstractly

Neighboring

𝐺(𝑦) 𝐺(𝑦′)

5

𝐺 𝐺

Attacker check 𝐺(𝑦) ∈ Φ?

𝑦 𝑦′

Attacker check 𝐺(𝑦′) ∈ Φ?

slide-6
SLIDE 6

Attacker check 𝐺(𝑦) ∈ Φ? Attacker check 𝐺(𝑦′) ∈ Φ?

𝐺(𝑦)

Differential Privacy - Definition

Neighbouring

𝐺(𝑦′)

6

𝐺 𝐺

𝑦 𝑦′

𝜁-DP: Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] ≤ exp 𝜁 ≈ 1 + 𝜁 Challenges induced by DP:

  • Proving/checking 𝜁-DP is hard

(buggy algorithms)

  • Proof strategies not complete
  • Proofs only provide upper bounds
slide-7
SLIDE 7

( , , )

𝑦′ 𝑦 Φ

Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > exp 𝜁 ⟺ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > 𝜁

that violate 𝜁-DP:

𝜁-DP Counterexamples

7

slide-8
SLIDE 8

( , , )

𝑦′ 𝑦 Φ

Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > exp 𝜁 ⟺ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ] > 𝜁

that violate 𝜁-DP:

𝜁-DP Counterexamples

8

Maximize ε(𝑦, 𝑦′, Φ)

slide-9
SLIDE 9

Bounds on "true" 𝜁

9

Evaluation: We get precise and large ε, close to known upper bounds

Proven: 10%-DP (𝜁 = 10% = 0.1) Counterexample: 9.9%-DP Counterexample: 15%-DP Counterexample: 5%-DP

slide-10
SLIDE 10

𝜁-DP Counterexamples

10

Goal: Maximize ε(𝑦, 𝑦′, Φ)

Challenge 1: Expensive to compute ε precisely Challenge 2: Search space is sparse: Few 𝑦, 𝑦′, Φ lead to large ε(𝑦, 𝑦′, Φ)

Estimate 𝜁 by sampling Make Ƹ 𝜁 differentiable

𝜁 Ƹ 𝜁 Ƹ 𝜁 𝑒

slide-11
SLIDE 11

Step 1: Estimate 𝜁

11

Estimate 𝜁 by sampling

𝜁 Ƹ 𝜁

slide-12
SLIDE 12

Estimating 𝜁

12

𝜁 x, x′, Φ ≔ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ]

slide-13
SLIDE 13

𝜁 x, x′, Φ ≔ log Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺(𝑦′) ∈ Φ]

Estimating 𝜁

෢ Pr 𝐺(𝑦) ∈ Φ = 1

𝑜 ෍

𝑗=1 𝑜

check𝐺,Φ

𝑗

(𝑦)

67% 33%

13

𝐺

7.3

𝐺

7.6

𝐺

6.8

𝐺(𝑦) 𝑦 check𝐺,Φ

𝑗

(𝑦)

yes no yes

slide-14
SLIDE 14

How precise is our estimate?

Precision of Pr[𝐺 𝑦 ∈ Φ] and Pr[𝐺 𝑦′ ∈ Φ] Sampling effort 𝑜 Precision of 𝜁

14

Exponential search

Counterexample: 9.9% ± 10%-DP Counterexample: 9.9% ± 2 ∙ 10−3-DP

vs

slide-15
SLIDE 15

Estimating precisely is expensive

Estimating 𝜁 up to an error of 2 ∙ 10−3 with confidence of 90%

15

104

Probabillstic guarantees Heuristic Efficient Heuristic

slide-16
SLIDE 16

𝐺

7.3

𝐺

7.6

𝐺

6.8

1 𝑜 ෍

𝑗=1 𝑜

check𝐺,Φ

𝑗

𝑦 1 𝑜 ෍

𝑗=1 𝑜

check𝐺,Φ

𝑗

𝑦′

Follows 2D Gaussian distribution

Applying the M-CLT (Correlation)

16

yes no yes

𝐺 𝐺 𝐺

7.3 7.6 8.2

yes no no

slide-17
SLIDE 17

Obtaining a Confidence Interval for 𝜁

17 Distribution of Gauss Gauss (correlated):

  • D. V. Hinkley. 1969. On the Ratio of Two Correlated Normal Random Variables.

Biometrika 56, 3 (1969), 635–639. http://www.jstor.org/stable/2334671

Joint likelihood of Pr[𝐺 𝑦 ∈ Φ] Pr[𝐺 𝑦′ ∈ Φ] Likelihood of ε(𝑦, x′, Φ) Confidence Interval for ε(𝑦, x′, Φ)

slide-18
SLIDE 18

How precise is our estimate?

18

Counterexample: 9.9% ± 10%-DP Counterexample: 9.9% ± 2 ∙ 10−3-DP

vs

slide-19
SLIDE 19

Step 2: Finding Counterexamples

19

Ƹ 𝜁 Ƹ 𝜁 𝑒

Make Ƹ 𝜁 differentiable

slide-20
SLIDE 20

How can we optimize our estimate?

maximize

¬𝐶 ↝ 1 − 𝐶 𝐶1 ∧ 𝐶2 ↝ 𝐶1 ∙ 𝐶2 if 𝐶 ∶ 𝑦 = 𝐹1 else ∶ 𝑦 = 𝐹2 ↝ 𝑦 = 𝐶 ∙ 𝐹1 + (1 − 𝐶) ∙ 𝐹2

20

Ƹ 𝜗 𝑦, 𝑦′, Φ = log

1 𝑜 σ𝑗=1 𝑜

check𝐺,Φ

𝑗

(𝑦)

1 𝑜 σ𝑗=1 𝑜

check𝐺,Φ

𝑗

(𝑦′) Not differentiable Goals

  • Make differentiable
  • Preserve semantics
slide-21
SLIDE 21

How can we optimize our estimate?

maximize

21

Ƹ 𝜗 𝑦, 𝑦′, Φ = log

1 𝑜 σ𝑗=1 𝑜

check𝐺,Φ

𝑗

(𝑦)

1 𝑜 σ𝑗=1 𝑜

check𝐺,Φ

𝑗

(𝑦′) Not differentiable

  • Maximize using SLSQP (supports hard constraints for

neighborhood)

  • Random starting point (+ restart)
  • What about division by zero?
  • What about very small denominators?
slide-22
SLIDE 22

Main differences to Ding et al.

22

Dimension Ding et al. This work Problem statement ε 𝑦, 𝑦′, Φ > ε0? Maximize ε(𝑦, 𝑦′, Φ) Approach Statistical tests Estimate + confidence interval Search By patterns Gradient descent (incremental)

slide-23
SLIDE 23

Evaluation

23

  • How precise is the differentiable estimate?
  • How efficient is DP-Finder in finding violations compared to

random search?

Exact solver (PSI) for ground truth

slide-24
SLIDE 24

Precision of Differentiable Estimate

24

𝜁 Algorithms Ƹ 𝜁 𝑒 𝜁

slide-25
SLIDE 25

Random vs Optimized

25

Random start Optimized

slide-26
SLIDE 26

Differential Privacy

Conclusion

26

𝜁-DP Counterexamples

( , , )

Estimate 𝜁 Finding Counterexamples

𝜁 Ƹ 𝜁 Ƹ 𝜁 𝑒 Ƹ 𝜁