Verifying Differentially Private Bayesian Inference Marco Gaboardi - - PowerPoint PPT Presentation

verifying differentially private bayesian inference
SMART_READER_LITE
LIVE PREVIEW

Verifying Differentially Private Bayesian Inference Marco Gaboardi - - PowerPoint PPT Presentation

Verifying Differentially Private Bayesian Inference Marco Gaboardi University of Dundee Joint work with G. Barthe, G.P . Farina, E.J. Gallego Arias, A.Gordon, Differentially Private vs Probabilistic Inference Differential Probabilistic


slide-1
SLIDE 1

Marco Gaboardi

University of Dundee

Verifying Differentially Private Bayesian Inference

Joint work with G. Barthe, G.P . Farina, E.J. Gallego Arias, A.Gordon,…

slide-2
SLIDE 2

Differentially Private vs Probabilistic Inference

Differential Privacy Probabilistic Inference

slide-3
SLIDE 3

Differentially Private vs Probabilistic Inference

Differential Privacy Probabilistic Inference

The goal in machine learning is very often similar to the goal in private data analysis. The learner typically wishes to learn some simple rule that explains a data set. However, she wishes this rule to generalize […] Generally, this means that she wants to learn a rule that captures distributional information about the data set on hand, in a way that does not depend too specifically

  • n any single data point.
  • C. Dwork & A. Roth - The algorithmic Foundations of Differential privacy
slide-4
SLIDE 4

Outline

  • Bayesian learning
  • A language for bayesian learning
  • Differential Privacy
  • Combining differential privacy and bayesian

learning

slide-5
SLIDE 5

Bayesian Inference

slide-6
SLIDE 6

Probabilistic Inference

Program Probabilistic Prior Distribution Posterior Distribution Fun/Tabular Data

slide-7
SLIDE 7

Bayes theorem

Bayes’ theorem

  • P

| P P | P x y y y x x

  • Bayes’ theorem to solve a problem.
slide-8
SLIDE 8

Bayes theorem

Bayes’ theorem

  • P

| P P | P x y y y x x

  • Bayes’ theorem to solve a problem.
  • Prior

Posterior Normalization Factor Model

slide-9
SLIDE 9

A simple example: 
 bias of a coin

ID disease 1 2 1 3 4 1 5 1 6 1 7 8 1 9 …

slide-10
SLIDE 10

Bayes theorem

Bayes’ theorem

  • P

| P P | P x y y y x x

  • Bayes’ theorem to solve a problem.
  • Prior

Posterior Model

slide-11
SLIDE 11

Model/Likelihood: Bernoulli

y = (0,1,0,1,1,1,…,0) Observed values

slide-12
SLIDE 12

Model/Likelihood: Bernoulli

The probability mass function f(s,Θ) is where Θ is the “bias of the coin”. This can also be expressed as:

f(s,Θ) = Θ if s = 1

{ 1 - Θ if s = 0

f(s,Θ) = Θs(1-Θ)1-s

y = (0,1,0,1,1,1,…,0) Observed values

slide-13
SLIDE 13

Prior: Beta distribution

f(a,b,Θ) = Θa(1-Θ)b Probability Distribution Function B(a,b)

Conjugate to the Bernoulli distribution

slide-14
SLIDE 14

Prior: Beta distribution

slide-15
SLIDE 15

Bias of a Coin

Beta(2500,500) Beta(1500,1500)

slide-16
SLIDE 16

Another example

ID Cholesterol 1 4.66 2 5.01 3 3.45 4 4.12 5 6.35 6 4.45 7 6.06 8 4.98 9 …

Known variance

slide-17
SLIDE 17

Another example

ID Y X 1 4.2 1.5 2 2.5 0.2 3 2.2 2 4 1.1 6.5 5 1.5 8.2 6 0.5 5.6 7 2.0 2.3 8 0.8 4.3 9 … …

slide-18
SLIDE 18

Graphical Models

I D disease 1 2 1 3 4 1 5 1 6 1 7 8 1 9 …

slide-19
SLIDE 19

Probabilistic PCF

slide-20
SLIDE 20

Probabilistic Semantics

slide-21
SLIDE 21

Observe for Conditional Distributions

  • bserve z ⇒ X(z) in

Y

slide-22
SLIDE 22

Observe for Conditional Distributions

  • bserve z ⇒ X(z) in

Y

Bayes’ theorem

  • P

| P P | P x y y y x x

  • Bayes’ theorem to solve a problem.
slide-23
SLIDE 23

Semantics of Observe

slide-24
SLIDE 24

Semantics of Observe

Filter and rescaling

slide-25
SLIDE 25

Semantics of Observe

Filter and rescaling

slide-26
SLIDE 26

Differential Privacy

slide-27
SLIDE 27

Private Queries?

slide-28
SLIDE 28

Private Queries?

medical correlation?

query answer

slide-29
SLIDE 29

Private Queries?

a n s w e r query

Does Critias have cancer?

slide-30
SLIDE 30

Private Queries?

a n s w e r query

Does Critias have cancer? I know he visited Atlantis

From Plato’s Timaeus dialogue

slide-31
SLIDE 31

Private Queries?

Does Critias have cancer? I know he visited Atlantis

From Plato’s Timaeus dialogue

slide-32
SLIDE 32

Differential Privacy

Noise

slide-33
SLIDE 33

Differential Privacy

Noise

query answer + noise

medical correlation?

slide-34
SLIDE 34

Differential Privacy

Noise

a n s w e r + n

  • i

s e query

?!?

slide-35
SLIDE 35

Noise

?!?

Differential Privacy

slide-36
SLIDE 36

(ε,δ)-Differential Privacy

Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R: Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

slide-37
SLIDE 37

(ε,δ)-Differential Privacy

Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R: Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

A query returning a probability distribution

slide-38
SLIDE 38

(ε,δ)-Differential Privacy

Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R: Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

Privacy parameters

slide-39
SLIDE 39

(ε,δ)-Differential Privacy

Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R: Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

a quantification over all the databases

slide-40
SLIDE 40

(ε,δ)-Differential Privacy

Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R: Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

a notion of adjacency or distance

slide-41
SLIDE 41

(ε,δ)-Differential Privacy

Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R: Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ

and over all the possible

  • utcomes
slide-42
SLIDE 42

ε-Differential Privacy

Definition Given ε ≥ 0, a probabilistic query Q: db → R is ε-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R: Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S]

slide-43
SLIDE 43

Let’s substitute a concrete instance: Let’s use the two quantifiers:

exp(-ε)· Pr[Q(b)∈ S] ≤ Pr[Q(b∪{x})∈ S] ≤ exp(ε)· Pr[Q(b)∈ S]

Definition Given ε ≥ 0, a probabilistic query Q: db → R is ε-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R: Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S]

Pr[Q(b∪{x})∈ S] ≤ exp(ε)· Pr[Q(b)∈ S]

ε-Differential Privacy

Pr[Q(b∪{x})∈S] Pr[Q(b)∈S] log

≤ε

and so:

slide-44
SLIDE 44

Q : db => R probabilistic

Q(b) Q(b∪{x})

ε-Differential Privacy

slide-45
SLIDE 45

ε-Differential Privacy

Pr[Q(b∪{x})∈S] Pr[Q(b) ∈ S]

log

≤ε

slide-46
SLIDE 46

Probability of a bad event

∫BQ(b∪{x}) ∫BQ(b)

log

≤ε

slide-47
SLIDE 47

Probability of a bad event

Dataset bad event without my data 16% with my data 16.3%

slide-48
SLIDE 48

Pr[Q(b’)∈S] ≦ eε Pr[Q(b) ∈ S] + δ

(ε,δ)-Differential privacy

for b ~1 b’

slide-49
SLIDE 49

Pr[Q(b’)∈S] ≦ eε Pr[Q(b) ∈ S] + δ

(ε,δ)-Differential privacy

for b ~1 b’

Program Noise

+

Differentially Private Program

slide-50
SLIDE 50

PP

Result

Sensitivity

PP

Result’

+

slide-51
SLIDE 51

PP

Result

Sensitivity

PP

Result’

+

Sensitivity

≤K

slide-52
SLIDE 52

Calibrating the Noise

  • n the Sensitivity

Program Noise( )

+

k sensitive

ε-Differentially Private Program ε

k

slide-53
SLIDE 53
  • Relational Refinement Types
  • Higher Order Refinements
  • Partiality Monad
  • Semantic Subtyping
  • Approximate Equivalence for Distributions

Components

HOARe2: Higher Order Approximate Relational Refinement T ypes for DP

Barthe et al, POPL’15

slide-54
SLIDE 54

HOARe2: Higher Order Approximate Relational Refinement T ypes for DP

slide-55
SLIDE 55
  • Relational Refinement Types
  • Higher Order Refinements
  • Partiality Monad
  • Semantic Subtyping
  • Approximate Equivalence for Distributions

Components

Barthe et al, POPL’15 reasoning about two runs

  • f a program

HOARe2: Higher Order Approximate Relational Refinement T ypes for DP

slide-56
SLIDE 56

Program Logic

Program P X Y

slide-57
SLIDE 57

Program P Precondition Postcondition X Y

Program Logic

slide-58
SLIDE 58

Program P Precondition Postcondition X Y

x ≥ 0 y ≥ 1

Program Logic

exp

slide-59
SLIDE 59

Program P Precondition Postcondition

Refinement Type

P : {x | Pre(x)} → {y | Post(y)}

slide-60
SLIDE 60

Logical Predicates Program P Precondition Postcondition

Refinement Type

P : {x | Pre(x)} → {y | Post(y)}

slide-61
SLIDE 61

Program P Precondition Postcondition

→ exp : {x | x ≥ 0} → {y | y ≥ 1}

Example

Refinement Type

P : {x | Pre(x)} → {y | Post(y)}

slide-62
SLIDE 62

Program P Precondition Postcondition X1 Y1 Program P X2 Y2

Differential privacy as a Relational Property

slide-63
SLIDE 63

Program P Precondition Postcondition X1 Y1 Program P X2 Y2

X1 and X2 differs for the presence or absence of an individual

Pr[Y1∈ S] ≤ exp(ε)· Pr[Y2∈ S]

Differential privacy as a Relational Property

slide-64
SLIDE 64

P : {x | Pre(x1,x2)} → {y | Post(y1,y2)}

Program P Precondition Postcondition

HOARe2: Relational Refinement Types

slide-65
SLIDE 65

P : {x | Pre(x1,x2)} → {y | Post(y1,y2)}

Program P Precondition Postcondition

HOARe2: Relational Refinement Types

Logical Relations

slide-66
SLIDE 66

P : {x | Pre(x1,x2)} → {y | Post(y1,y2)}

Program P Precondition Postcondition

HOARe2: Relational Refinement Types

exp : {x | x1 ≤ x2 } → {y | y1 ≤ y2}

Example: Monotonicity of exponential

slide-67
SLIDE 67
  • Relational Refinement Types
  • Higher Order Refinements
  • Partiality Monad
  • Semantic Subtyping
  • Approximate Equivalence for Distributions

Components

Barthe et al, POPL’15 reasoning about DP

HOARe2: Higher Order Approximate Relational Refinement T ypes for DP

slide-68
SLIDE 68

Lifting of P

P(x1,x2) P*(x1,x2)

Relation over distributions Relation

slide-69
SLIDE 69

iff it exists a dist. μ over AxB s.t.

  • μ(x) > 0 implies x∈R
  • 𝞺1μ ≦ μ1 and 𝞺2μ ≦ μ2

Lifting of P

  • Given dist. μ1 over A and μ2 over B:

μ1 P* μ2

slide-70
SLIDE 70

iff it exists a dist. μ over AxB s.t.

  • μ(x) > 0 implies x∈R
  • 𝞺1μ ≦ μ1 and 𝞺2μ ≦ μ2
  • maxA(𝞺iμ(A) - eε μi(A), μi(A) - eε 𝞺iμ(A)) ≦ δ

(ε,δ)-Lifting of P

  • Given dist. μ1 over A and μ2 over B:

μ1 P* μ2

εδ

slide-71
SLIDE 71

Verifying Differential Privacy

C:{x:db|d(y1,y2)≦1}{y:O|y1 =* y2}

If we can conclude then C is (ε,δ)-differentially private.

εδ

slide-72
SLIDE 72

Programming Languages for Differentially Private Probabilistic Inference

Differential Privacy Probabilistic Inference

slide-73
SLIDE 73

Programming Languages for Differentially Private Probabilistic Inference

Differential Privacy Probabilistic Inference Programming Language Tools

slide-74
SLIDE 74

Adding noise

slide-75
SLIDE 75

Adding noise

Program Noise

+

Differentially Private Program Probabilistic

slide-76
SLIDE 76

Adding noise

slide-77
SLIDE 77

Adding noise

slide-78
SLIDE 78

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise

slide-79
SLIDE 79

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise

slide-80
SLIDE 80

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise

slide-81
SLIDE 81

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise

slide-82
SLIDE 82

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise

slide-83
SLIDE 83

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise

slide-84
SLIDE 84

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise

Probabilistic Inference

slide-85
SLIDE 85

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise

Probabilistic Inference

slide-86
SLIDE 86

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise

Probabilistic Inference

slide-87
SLIDE 87

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise

Probabilistic Inference

slide-88
SLIDE 88

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise on the data

Probabilistic Inference

slide-89
SLIDE 89

An example

function privBerInput (l: B list) (p1: R) (p2: R): M[(0,1)]{ let function vExp (l: B list) : M[B list]{ match l with |nil -> mreturn nil |x::xs -> coercion (exp eps((0,0)->1,(0,1)->0,(1,1)->1, (1,0)->0) x) :: (vExp l) } in mlet nl = (vExp l) in let prior = mreturn(beta(p1,p2)) in let function Ber (l: B list) (p:M[(0,1)]): M[(0,1)]{ match l with |nil -> ran(p) |x::xs -> observe y => y = x in (Ber xs p) } in mreturn(infer (Ber nl prior)) }

slide-90
SLIDE 90

An example

function privBerInput (l: B list) (p1: R) (p2: R): M[(0,1)]{ let function vExp (l: B list) : M[B list]{ match l with |nil -> mreturn nil |x::xs -> coercion (exp eps((0,0)->1,(0,1)->0,(1,1)->1, (1,0)->0) x) :: (vExp l) } in mlet nl = (vExp l) in let prior = mreturn(beta(p1,p2)) in let function Ber (l: B list) (p:M[(0,1)]): M[(0,1)]{ match l with |nil -> ran(p) |x::xs -> observe y => y = x in (Ber xs p) } in mreturn(infer (Ber nl prior)) }

Noise

slide-91
SLIDE 91

Program Probabilistic Prior Distribution Posterior Distribution

Adding noise on the output

Probabilistic Inference

slide-92
SLIDE 92

DP for Probabilistic Programs

Posterior Distribution

slide-93
SLIDE 93

DP for Probabilistic Programs

Posterior Distribution Releasing the Parameters

slide-94
SLIDE 94

DP for Probabilistic Programs

Posterior Distribution Releasing the Parameters Sampling from the Distribution

slide-95
SLIDE 95

DP for Probabilistic Programs

Posterior Distribution Releasing the Parameters Sampling from the Distribution

slide-96
SLIDE 96

An example

A distance over distributions

function privBerInput (l: B list) (p1: R) (p2: R): M[(0,1)]{ let function hellingerDistance (a0:R) (b0:R) (a1:R) (b1:R) : R { let gamma (r:R) = (r-1)! in let betaf (a:R) (b:R) = gamma(a)*gamma(b))/gamma(a+b) in let num=betaf ((a0+a1)/2.0) ((b0+b1)/2.0) in let denum=Math.Sqrt((betaf a0 b0)*(betaf a1 b1)) in Math.Sqrt(1.0-(num/denum)) } in let function score (input:M[(0,1)]) (output:M[(0,1)]) : R { let beta(a0,b0) = input in let beta(a1,b1) = output in (-1.0) * (hellingerDistance a0 b0 a1 b1) } in let prior = mreturn(beta(p1,p2)) in let function Ber (l: B list) (p:M[(0,1)]): M[(0,1)]{ match l with |nil -> ran(p) |x::xs -> observe y => y = x in (Ber xs p) } in exp eps score (infer (Ber l prior)) }

slide-97
SLIDE 97

Noise

An example

A distance over distributions

function privBerInput (l: B list) (p1: R) (p2: R): M[(0,1)]{ let function hellingerDistance (a0:R) (b0:R) (a1:R) (b1:R) : R { let gamma (r:R) = (r-1)! in let betaf (a:R) (b:R) = gamma(a)*gamma(b))/gamma(a+b) in let num=betaf ((a0+a1)/2.0) ((b0+b1)/2.0) in let denum=Math.Sqrt((betaf a0 b0)*(betaf a1 b1)) in Math.Sqrt(1.0-(num/denum)) } in let function score (input:M[(0,1)]) (output:M[(0,1)]) : R { let beta(a0,b0) = input in let beta(a1,b1) = output in (-1.0) * (hellingerDistance a0 b0 a1 b1) } in let prior = mreturn(beta(p1,p2)) in let function Ber (l: B list) (p:M[(0,1)]): M[(0,1)]{ match l with |nil -> ran(p) |x::xs -> observe y => y = x in (Ber xs p) } in exp eps score (infer (Ber l prior)) }

slide-98
SLIDE 98

Noise

An example

A distance over distributions

function privBerInput (l: B list) (p1: R) (p2: R): M[(0,1)]{ let function hellingerDistance (a0:R) (b0:R) (a1:R) (b1:R) : R { let gamma (r:R) = (r-1)! in let betaf (a:R) (b:R) = gamma(a)*gamma(b))/gamma(a+b) in let num=betaf ((a0+a1)/2.0) ((b0+b1)/2.0) in let denum=Math.Sqrt((betaf a0 b0)*(betaf a1 b1)) in Math.Sqrt(1.0-(num/denum)) } in let function score (input:M[(0,1)]) (output:M[(0,1)]) : R { let beta(a0,b0) = input in let beta(a1,b1) = output in (-1.0) * (hellingerDistance a0 b0 a1 b1) } in let prior = mreturn(beta(p1,p2)) in let function Ber (l: B list) (p:M[(0,1)]): M[(0,1)]{ match l with |nil -> ran(p) |x::xs -> observe y => y = x in (Ber xs p) } in exp eps score (infer (Ber l prior)) }

slide-99
SLIDE 99

More general Lifting of P

P(x1,x2) P*(x1,x2)

Relation over distributions Relation

slide-100
SLIDE 100

Accuracy

Different ways of adding noise can have different accuracy.

  • we have theoretical accuracy (how to integrate it

in a framework for reasoning about DP?)

  • we have experimental accuracy (we need a

framework for for test our programs)

slide-101
SLIDE 101

Program Probabilistic Prior Distribution Posterior Distribution

A Nice Playground!

Probabilistic Inference

slide-102
SLIDE 102

Tänan teid väga!