Proving Expected Sensitivity of Probabilistic Programs Gilles - - PowerPoint PPT Presentation

proving expected sensitivity of probabilistic programs
SMART_READER_LITE
LIVE PREVIEW

Proving Expected Sensitivity of Probabilistic Programs Gilles - - PowerPoint PPT Presentation

Proving Expected Sensitivity of Probabilistic Programs Gilles Barthe Thomas Espitau Benjamin Grgoire Justin Hsu Pierre-Yves Strub 1 Program Sensitivity Similar inputs similar outputs Given: distances d in on inputs, d out on


slide-1
SLIDE 1

Gilles Barthe Thomas Espitau Benjamin Grégoire Justin Hsu Pierre-Yves Strub

Proving Expected Sensitivity

  • f Probabilistic Programs

1

slide-2
SLIDE 2

Program Sensitivity

Similar inputs → similar outputs

◮ Given: distances din on inputs, dout on outputs ◮ Want: for all inputs in1, in2,

dout(P(in1), P(in2)) ≤ din(in1, in2)

2

slide-3
SLIDE 3

Program Sensitivity

Similar inputs → similar outputs

◮ Given: distances din on inputs, dout on outputs ◮ Want: for all inputs in1, in2,

dout(P(in1), P(in2)) ≤ din(in1, in2)

If P is sensitive and Q is sensitive, then Q ◦ P is sensitive

2

slide-4
SLIDE 4

Probabilistic Program Sensitivity?

Similar inputs → similar output distributions

◮ Given: distances din on inputs, dout on output distributions ◮ Want: for all inputs in1, in2,

dout(P(in1), P(in2)) ≤ din(in1, in2)

3

slide-5
SLIDE 5

Probabilistic Program Sensitivity?

Similar inputs → similar output distributions

◮ Given: distances din on inputs, dout on output distributions ◮ Want: for all inputs in1, in2,

dout(P(in1), P(in2)) ≤ din(in1, in2)

What distance dout should we take?

3

slide-6
SLIDE 6

Our contributions

  • Coupling-based definition
  • f probabilistic sensitivity
  • Relational program logic EpRHL
  • Formalized examples:

stability and convergence

4

slide-7
SLIDE 7

What is a good definition

  • f probabilistic sensitivity?

5

slide-8
SLIDE 8

One possible definition: output distributions close

For two distributions µ1, µ2 over a set A:

dout(µ1, µ2) k · max

E⊆A |µ1(E) − µ2(E)|

6

slide-9
SLIDE 9

One possible definition: output distributions close

For two distributions µ1, µ2 over a set A:

dout(µ1, µ2) k · max

E⊆A |µ1(E) − µ2(E)|

k-Uniform sensitivity

◮ Larger k → closer output distributions ◮ Strong guarantee: probabilities close for all sets of outputs 6

slide-10
SLIDE 10

Application: probabilistic convergence/mixing

Probabilistic program forgets initial state

◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution 7

slide-11
SLIDE 11

Application: probabilistic convergence/mixing

Probabilistic program forgets initial state

◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution

Consequence of k-uniform sensitivity

◮ As number of iterations T increases, prove k-uniform

sensitivity for larger and larger k(T)

◮ Relation between k and T describes speed of convergence 7

slide-12
SLIDE 12

Another possible definition: average outputs close

For two distributions µ1, µ2 over real numbers:

dout(µ1, µ2) k · |E[µ1] − E[µ2]|

8

slide-13
SLIDE 13

Another possible definition: average outputs close

For two distributions µ1, µ2 over real numbers:

dout(µ1, µ2) k · |E[µ1] − E[µ2]|

k-Mean sensitivity

◮ Larger k → closer averages ◮ Weaker guarantee than uniform sensitivity 8

slide-14
SLIDE 14

Application: algorithmic stability

Machine learning algorithm A

◮ Input: set S of training examples ◮ Output: list of numeric parameters (randomized)

Danger: overfitting

◮ Output parameters depend too much on training set S ◮ Low error on training set, high error on new examples 9

slide-15
SLIDE 15

Application: algorithmic stability

One way to prevent overfitting

◮ L maps S to average error of randomized learning

algorithm A

◮ If |L(S) − L(S′)| is small for all training sets S, S′ differing

in a single example, then A does not overfit too much

10

slide-16
SLIDE 16

Application: algorithmic stability

One way to prevent overfitting

◮ L maps S to average error of randomized learning

algorithm A

◮ If |L(S) − L(S′)| is small for all training sets S, S′ differing

in a single example, then A does not overfit too much

L should be mean sensitive

10

slide-17
SLIDE 17

Wanted: a general definition that is ...

  • Expressive
  • Easy to reason about

11

slide-18
SLIDE 18

Ingredient #1: Probabilistic coupling

A coupling models two distributions with one distribution

Given two distributions µ1, µ2 ∈ Distr(A), a joint distribution µ ∈ Distr(A × A) is a coupling if

π1(µ) = µ1

and

π2(µ) = µ2

12

slide-19
SLIDE 19

Ingredient #1: Probabilistic coupling

A coupling models two distributions with one distribution

Given two distributions µ1, µ2 ∈ Distr(A), a joint distribution µ ∈ Distr(A × A) is a coupling if

π1(µ) = µ1

and

π2(µ) = µ2

Typical pattern

Prove property about two (output) distributions by constructing a coupling with certain properties

12

slide-20
SLIDE 20

Ingredient #2: Lift distance on outputs

Given:

◮ Two distributions µ1, µ2 ∈ Distr(A) ◮ Ground distance d : A × A → R+ 13

slide-21
SLIDE 21

Ingredient #2: Lift distance on outputs

Given:

◮ Two distributions µ1, µ2 ∈ Distr(A) ◮ Ground distance d : A × A → R+

Define distance on distributions:

d#(µ1, µ2) min µ ∈ C(µ1, µ2) Eµ[d]

set of all couplings

13

slide-22
SLIDE 22

Ingredient #2: Lift distance on outputs

Given:

◮ Two distributions µ1, µ2 ∈ Distr(A) ◮ Ground distance d : A × A → R+

Define distance on distributions:

d#(µ1, µ2) min µ ∈ C(µ1, µ2) Eµ[d]

set of all couplings

13

slide-23
SLIDE 23

Ingredient #2: Lift distance on outputs

Given:

◮ Two distributions µ1, µ2 ∈ Distr(A) ◮ Ground distance d : A × A → R+

Define distance on distributions:

d#(µ1, µ2) min µ ∈ C(µ1, µ2) Eµ[d]

set of all couplings

Typical pattern

Bound distance d# between two (output) distributions by constructing a coupling with small average distance d

13

slide-24
SLIDE 24

Putting it together: Expected sensitivity

Given:

◮ A function f : A → Distr(B) (think: probabilistic program) ◮ Distances din and dout on A and B 14

slide-25
SLIDE 25

Putting it together: Expected sensitivity

Given:

◮ A function f : A → Distr(B) (think: probabilistic program) ◮ Distances din and dout on A and B

We say f is (din, dout)-expected sensitive if:

d#

  • ut(f(a1), f(a2)) ≤ din(a1, a2)

for all inputs a1, a2 ∈ A.

14

slide-26
SLIDE 26

Benefits: Expressive

If dout(b1, b2) > k for all distinct b1, b2:

(din, dout)-expected sensitive = ⇒ k-uniform sensitive

15

slide-27
SLIDE 27

Benefits: Expressive

If dout(b1, b2) > k for all distinct b1, b2:

(din, dout)-expected sensitive = ⇒ k-uniform sensitive

If outputs are real-valued and dout(b1, b2) = k · |b1 − b2|:

(din, dout)-expected sensitive = ⇒ k-mean sensitive

15

slide-28
SLIDE 28

Benefits: Easy to reason about

16

slide-29
SLIDE 29

Benefits: Easy to reason about

f : A → Distr(B) is (dA, dB)-expected sensitive

16

slide-30
SLIDE 30

Benefits: Easy to reason about

f : A → Distr(B) is (dA, dB)-expected sensitive g : B → Distr(C) is (dB, dC)-expected sensitive

16

slide-31
SLIDE 31

Benefits: Easy to reason about

f : A → Distr(B) is (dA, dB)-expected sensitive g : B → Distr(C) is (dB, dC)-expected sensitive g ˜

  • f : A → Distr(C) is (dA, dC)-expected sensitive

16

slide-32
SLIDE 32

Benefits: Easy to reason about

f : A → Distr(B) is (dA, dB)-expected sensitive g : B → Distr(C) is (dB, dC)-expected sensitive g ˜

  • f : A → Distr(C) is (dA, dC)-expected sensitive

Abstract away distributions

◮ Work in terms of distances on ground sets ◮ No need to work with complex distances over distributions 16

slide-33
SLIDE 33

How to verify this property? The program logic EpRHL

17

slide-34
SLIDE 34

A relational program logic EpRHL

The pWhile imperative language

c ::= x ← e | x

$

← d | if e then c else c | while e do c | skip | c; c

18

slide-35
SLIDE 35

A relational program logic EpRHL

The pWhile imperative language

c ::= x ← e | x

$

← d | if e then c else c | while e do c | skip | c; c

18

slide-36
SLIDE 36

A relational program logic EpRHL

The pWhile imperative language

c ::= x ← e | x

$

← d | if e then c else c | while e do c | skip | c; c

Judgments

⊢ {P; din} c1 ∼ c2 {Q; dout}

◮ Tagged program variables: x1, x2 ◮ P and Q: boolean predicates over tagged variables ◮ din and dout: real-valued expressions over tagged variables 18

slide-37
SLIDE 37

EpRHL judgments model expected sensitivity

A judgment

⊢ {P; din} c1 ∼ c2 {Q; dout}

is valid if:

for all input memories (m1, m2) satisfying pre-condition P, there exists a coupling of outputs ([ [c1] ]m1, [ [c2] ]m2) with

◮ support satisfying post-condition Q ◮ E[dout] ≤ din(m1, m2) 19

slide-38
SLIDE 38

One proof rule: Sequential composition

⊢ {P; dA} c1 ∼ c2 {Q; dB} ⊢ {Q; dB} c′

1 ∼ c′ 2 {R; dC}

⊢ {P; dA} c1; c′

1 ∼ c2; c′ 2 {R; dC}

20

slide-39
SLIDE 39

One proof rule: Sequential composition

⊢ {P; dA} c1 ∼ c2 {Q; dB} ⊢ {Q; dB} c′

1 ∼ c′ 2 {R; dC}

⊢ {P; dA} c1; c′

1 ∼ c2; c′ 2 {R; dC}

20

slide-40
SLIDE 40

One proof rule: Sequential composition

⊢ {P; dA} c1 ∼ c2 {Q; dB} ⊢ {Q; dB} c′

1 ∼ c′ 2 {R; dC}

⊢ {P; dA} c1; c′

1 ∼ c2; c′ 2 {R; dC}

20

slide-41
SLIDE 41

One proof rule: Sequential composition

⊢ {P; dA} c1 ∼ c2 {Q; dB} ⊢ {Q; dB} c′

1 ∼ c′ 2 {R; dC}

⊢ {P; dA} c1; c′

1 ∼ c2; c′ 2 {R; dC}

20

slide-42
SLIDE 42

One proof rule: Sequential composition

⊢ {P; dA} c1 ∼ c2 {Q; dB} ⊢ {Q; dB} c′

1 ∼ c′ 2 {R; dC}

⊢ {P; dA} c1; c′

1 ∼ c2; c′ 2 {R; dC}

Expected sensitivity composes

20

slide-43
SLIDE 43

Wrapping up

21

slide-44
SLIDE 44

More in the paper

Theoretical results

◮ Full proof system (sampling, conditionals, loops, etc.) ◮ Transitivity principle (internalizes path coupling)

Implementation in EasyCrypt, formalizations of:

◮ Stability for the Stochastic Gradient Method ◮ Convergence for the RSM population dynamics ◮ Mixing for the Glauber dynamics 22

slide-45
SLIDE 45

Looking forward

Possible directions

◮ Other useful consequences of expected sensitivity? ◮ Formal verification systems beyond program logics? ◮ How to automate this proof technique? 23

slide-46
SLIDE 46

Looking forward

Possible directions

◮ Other useful consequences of expected sensitivity? ◮ Formal verification systems beyond program logics? ◮ How to automate this proof technique?

Shameless plug: Looking for students at UWisconsin!

23

slide-47
SLIDE 47

Gilles Barthe Thomas Espitau Benjamin Grégoire Justin Hsu Pierre-Yves Strub

Proving Expected Sensitivity

  • f Probabilistic Programs

24

slide-48
SLIDE 48

Our contributions

  • Coupling-based definition
  • f probabilistic sensitivity
  • Relational program logic EpRHL
  • Formalized examples:

stability and convergence

25