proving expected sensitivity of probabilistic programs
play

Proving Expected Sensitivity of Probabilistic Programs Gilles - PowerPoint PPT Presentation

Proving Expected Sensitivity of Probabilistic Programs Gilles Barthe Thomas Espitau Benjamin Grgoire Justin Hsu Pierre-Yves Strub 1 Program Sensitivity Similar inputs similar outputs Given: distances d in on inputs, d out on


  1. Proving Expected Sensitivity of Probabilistic Programs Gilles Barthe Thomas Espitau Benjamin Grégoire Justin Hsu Pierre-Yves Strub 1

  2. Program Sensitivity Similar inputs → similar outputs ◮ Given: distances d in on inputs, d out on outputs ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) 2

  3. Program Sensitivity Similar inputs → similar outputs ◮ Given: distances d in on inputs, d out on outputs ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) If P is sensitive and Q is sensitive, then Q ◦ P is sensitive 2

  4. Probabilistic Program Sensitivity? Similar inputs → similar output distributions ◮ Given: distances d in on inputs, d out on output distributions ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) 3

  5. Probabilistic Program Sensitivity? Similar inputs → similar output distributions ◮ Given: distances d in on inputs, d out on output distributions ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) What distance d out should we take? 3

  6. Our contributions • Coupling-based definition of probabilistic sensitivity • Relational program logic E pRHL • Formalized examples: stability and convergence 4

  7. What is a good definition of probabilistic sensitivity? 5

  8. One possible definition: output distributions close For two distributions µ 1 , µ 2 over a set A : d out ( µ 1 , µ 2 ) � k · max E ⊆ A | µ 1 ( E ) − µ 2 ( E ) | 6

  9. One possible definition: output distributions close For two distributions µ 1 , µ 2 over a set A : d out ( µ 1 , µ 2 ) � k · max E ⊆ A | µ 1 ( E ) − µ 2 ( E ) | k -Uniform sensitivity ◮ Larger k → closer output distributions ◮ Strong guarantee: probabilities close for all sets of outputs 6

  10. Application: probabilistic convergence/mixing Probabilistic program forgets initial state ◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution 7

  11. Application: probabilistic convergence/mixing Probabilistic program forgets initial state ◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution Consequence of k -uniform sensitivity ◮ As number of iterations T increases, prove k -uniform sensitivity for larger and larger k ( T ) ◮ Relation between k and T describes speed of convergence 7

  12. Another possible definition: average outputs close For two distributions µ 1 , µ 2 over real numbers: d out ( µ 1 , µ 2 ) � k · | E [ µ 1 ] − E [ µ 2 ] | 8

  13. Another possible definition: average outputs close For two distributions µ 1 , µ 2 over real numbers: d out ( µ 1 , µ 2 ) � k · | E [ µ 1 ] − E [ µ 2 ] | k -Mean sensitivity ◮ Larger k → closer averages ◮ Weaker guarantee than uniform sensitivity 8

  14. Application: algorithmic stability Machine learning algorithm A ◮ Input: set S of training examples ◮ Output: list of numeric parameters (randomized) Danger: overfitting ◮ Output parameters depend too much on training set S ◮ Low error on training set, high error on new examples 9

  15. Application: algorithmic stability One way to prevent overfitting ◮ L maps S to average error of randomized learning algorithm A ◮ If | L ( S ) − L ( S ′ ) | is small for all training sets S, S ′ differing in a single example, then A does not overfit too much 10

  16. Application: algorithmic stability One way to prevent overfitting ◮ L maps S to average error of randomized learning algorithm A ◮ If | L ( S ) − L ( S ′ ) | is small for all training sets S, S ′ differing in a single example, then A does not overfit too much L should be mean sensitive 10

  17. Wanted: a general definition that is ... • Expressive • Easy to reason about 11

  18. Ingredient #1 : Probabilistic coupling A coupling models two distributions with one distribution Given two distributions µ 1 , µ 2 ∈ Distr ( A ) , a joint distribution µ ∈ Distr ( A × A ) is a coupling if π 1 ( µ ) = µ 1 π 2 ( µ ) = µ 2 and 12

  19. Ingredient #1 : Probabilistic coupling A coupling models two distributions with one distribution Given two distributions µ 1 , µ 2 ∈ Distr ( A ) , a joint distribution µ ∈ Distr ( A × A ) is a coupling if π 1 ( µ ) = µ 1 π 2 ( µ ) = µ 2 and Typical pattern Prove property about two (output) distributions by constructing a coupling with certain properties 12

  20. Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + 13

  21. Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings 13

  22. Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings 13

  23. Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings Typical pattern Bound distance d # between two (output) distributions by constructing a coupling with small average distance d 13

  24. Putting it together: Expected sensitivity Given: ◮ A function f : A → Distr ( B ) (think: probabilistic program) ◮ Distances d in and d out on A and B 14

  25. Putting it together: Expected sensitivity Given: ◮ A function f : A → Distr ( B ) (think: probabilistic program) ◮ Distances d in and d out on A and B We say f is ( d in , d out ) -expected sensitive if: d # out ( f ( a 1 ) , f ( a 2 )) ≤ d in ( a 1 , a 2 ) for all inputs a 1 , a 2 ∈ A . 14

  26. Benefits: Expressive If d out ( b 1 , b 2 ) > k for all distinct b 1 , b 2 : ( d in , d out ) -expected sensitive = ⇒ k -uniform sensitive 15

  27. Benefits: Expressive If d out ( b 1 , b 2 ) > k for all distinct b 1 , b 2 : ( d in , d out ) -expected sensitive = ⇒ k -uniform sensitive If outputs are real-valued and d out ( b 1 , b 2 ) = k · | b 1 − b 2 | : ( d in , d out ) -expected sensitive = ⇒ k -mean sensitive 15

  28. Benefits: Easy to reason about 16

  29. Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive 16

  30. Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive 16

  31. Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive ◦ f : A → Distr ( C ) is ( d A , d C ) -expected sensitive g ˜ 16

  32. Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive ◦ f : A → Distr ( C ) is ( d A , d C ) -expected sensitive g ˜ Abstract away distributions ◮ Work in terms of distances on ground sets ◮ No need to work with complex distances over distributions 16

  33. How to verify this property? The program logic E pRHL 17

  34. A relational program logic E pRHL The pWhile imperative language c ::= x ← e | x ← d | if e then c else c | while e do c | skip | c ; c $ 18

  35. A relational program logic E pRHL The pWhile imperative language c ::= x ← e | x ← d | if e then c else c | while e do c | skip | c ; c $ 18

  36. A relational program logic E pRHL The pWhile imperative language c ::= x ← e | x ← d | if e then c else c | while e do c | skip | c ; c $ Judgments ⊢ { P ; d in } c 1 ∼ c 2 { Q ; d out } ◮ Tagged program variables: x � 1 � , x � 2 � ◮ P and Q : boolean predicates over tagged variables ◮ d in and d out : real-valued expressions over tagged variables 18

  37. E pRHL judgments model e xpected sensitivity A judgment ⊢ { P ; d in } c 1 ∼ c 2 { Q ; d out } is valid if: for all input memories ( m 1 , m 2 ) satisfying pre-condition P , there exists a coupling of outputs ([ ] m 2 ) with [ c 1 ] ] m 1 , [ [ c 2 ] ◮ support satisfying post-condition Q ◮ E [ d out ] ≤ d in ( m 1 , m 2 ) 19

  38. One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20

  39. One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20

  40. One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20

  41. One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend