Gilles Barthe Thomas Espitau Benjamin Grégoire Justin Hsu Pierre-Yves Strub
Proving Expected Sensitivity
- f Probabilistic Programs
1
Proving Expected Sensitivity of Probabilistic Programs Gilles - - PowerPoint PPT Presentation
Proving Expected Sensitivity of Probabilistic Programs Gilles Barthe Thomas Espitau Benjamin Grgoire Justin Hsu Pierre-Yves Strub 1 Program Sensitivity Similar inputs similar outputs Given: distances d in on inputs, d out on
Gilles Barthe Thomas Espitau Benjamin Grégoire Justin Hsu Pierre-Yves Strub
1
Similar inputs → similar outputs
◮ Given: distances din on inputs, dout on outputs ◮ Want: for all inputs in1, in2,
2
Similar inputs → similar outputs
◮ Given: distances din on inputs, dout on outputs ◮ Want: for all inputs in1, in2,
2
Similar inputs → similar output distributions
◮ Given: distances din on inputs, dout on output distributions ◮ Want: for all inputs in1, in2,
3
Similar inputs → similar output distributions
◮ Given: distances din on inputs, dout on output distributions ◮ Want: for all inputs in1, in2,
3
4
5
For two distributions µ1, µ2 over a set A:
6
For two distributions µ1, µ2 over a set A:
k-Uniform sensitivity
◮ Larger k → closer output distributions ◮ Strong guarantee: probabilities close for all sets of outputs 6
Probabilistic program forgets initial state
◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution 7
Probabilistic program forgets initial state
◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution
Consequence of k-uniform sensitivity
◮ As number of iterations T increases, prove k-uniform
sensitivity for larger and larger k(T)
◮ Relation between k and T describes speed of convergence 7
For two distributions µ1, µ2 over real numbers:
8
For two distributions µ1, µ2 over real numbers:
k-Mean sensitivity
◮ Larger k → closer averages ◮ Weaker guarantee than uniform sensitivity 8
Machine learning algorithm A
◮ Input: set S of training examples ◮ Output: list of numeric parameters (randomized)
Danger: overfitting
◮ Output parameters depend too much on training set S ◮ Low error on training set, high error on new examples 9
One way to prevent overfitting
◮ L maps S to average error of randomized learning
algorithm A
◮ If |L(S) − L(S′)| is small for all training sets S, S′ differing
in a single example, then A does not overfit too much
10
One way to prevent overfitting
◮ L maps S to average error of randomized learning
algorithm A
◮ If |L(S) − L(S′)| is small for all training sets S, S′ differing
in a single example, then A does not overfit too much
10
11
A coupling models two distributions with one distribution
Given two distributions µ1, µ2 ∈ Distr(A), a joint distribution µ ∈ Distr(A × A) is a coupling if
and
12
A coupling models two distributions with one distribution
Given two distributions µ1, µ2 ∈ Distr(A), a joint distribution µ ∈ Distr(A × A) is a coupling if
and
Typical pattern
Prove property about two (output) distributions by constructing a coupling with certain properties
12
Given:
◮ Two distributions µ1, µ2 ∈ Distr(A) ◮ Ground distance d : A × A → R+ 13
Given:
◮ Two distributions µ1, µ2 ∈ Distr(A) ◮ Ground distance d : A × A → R+
Define distance on distributions:
set of all couplings
13
Given:
◮ Two distributions µ1, µ2 ∈ Distr(A) ◮ Ground distance d : A × A → R+
Define distance on distributions:
set of all couplings
13
Given:
◮ Two distributions µ1, µ2 ∈ Distr(A) ◮ Ground distance d : A × A → R+
Define distance on distributions:
set of all couplings
Typical pattern
Bound distance d# between two (output) distributions by constructing a coupling with small average distance d
13
Given:
◮ A function f : A → Distr(B) (think: probabilistic program) ◮ Distances din and dout on A and B 14
Given:
◮ A function f : A → Distr(B) (think: probabilistic program) ◮ Distances din and dout on A and B
We say f is (din, dout)-expected sensitive if:
for all inputs a1, a2 ∈ A.
14
If dout(b1, b2) > k for all distinct b1, b2:
(din, dout)-expected sensitive = ⇒ k-uniform sensitive
15
If dout(b1, b2) > k for all distinct b1, b2:
(din, dout)-expected sensitive = ⇒ k-uniform sensitive
If outputs are real-valued and dout(b1, b2) = k · |b1 − b2|:
(din, dout)-expected sensitive = ⇒ k-mean sensitive
15
16
16
16
16
Abstract away distributions
◮ Work in terms of distances on ground sets ◮ No need to work with complex distances over distributions 16
17
The pWhile imperative language
c ::= x ← e | x
$
← d | if e then c else c | while e do c | skip | c; c
18
The pWhile imperative language
c ::= x ← e | x
$
← d | if e then c else c | while e do c | skip | c; c
18
The pWhile imperative language
c ::= x ← e | x
$
← d | if e then c else c | while e do c | skip | c; c
Judgments
◮ Tagged program variables: x1, x2 ◮ P and Q: boolean predicates over tagged variables ◮ din and dout: real-valued expressions over tagged variables 18
A judgment
is valid if:
for all input memories (m1, m2) satisfying pre-condition P, there exists a coupling of outputs ([ [c1] ]m1, [ [c2] ]m2) with
◮ support satisfying post-condition Q ◮ E[dout] ≤ din(m1, m2) 19
20
20
20
20
20
21
Theoretical results
◮ Full proof system (sampling, conditionals, loops, etc.) ◮ Transitivity principle (internalizes path coupling)
Implementation in EasyCrypt, formalizations of:
◮ Stability for the Stochastic Gradient Method ◮ Convergence for the RSM population dynamics ◮ Mixing for the Glauber dynamics 22
Possible directions
◮ Other useful consequences of expected sensitivity? ◮ Formal verification systems beyond program logics? ◮ How to automate this proof technique? 23
Possible directions
◮ Other useful consequences of expected sensitivity? ◮ Formal verification systems beyond program logics? ◮ How to automate this proof technique?
23
Gilles Barthe Thomas Espitau Benjamin Grégoire Justin Hsu Pierre-Yves Strub
24
25