Proving Expected Sensitivity of Probabilistic Programs Gilles - PowerPoint PPT Presentation

Proving Expected Sensitivity of Probabilistic Programs Gilles Barthe Thomas Espitau Benjamin Grégoire Justin Hsu Pierre-Yves Strub 1

Program Sensitivity Similar inputs → similar outputs ◮ Given: distances d in on inputs, d out on outputs ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) 2

Program Sensitivity Similar inputs → similar outputs ◮ Given: distances d in on inputs, d out on outputs ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) If P is sensitive and Q is sensitive, then Q ◦ P is sensitive 2

Probabilistic Program Sensitivity? Similar inputs → similar output distributions ◮ Given: distances d in on inputs, d out on output distributions ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) 3

Probabilistic Program Sensitivity? Similar inputs → similar output distributions ◮ Given: distances d in on inputs, d out on output distributions ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) What distance d out should we take? 3

Our contributions • Coupling-based definition of probabilistic sensitivity • Relational program logic E pRHL • Formalized examples: stability and convergence 4

What is a good definition of probabilistic sensitivity? 5

One possible definition: output distributions close For two distributions µ 1 , µ 2 over a set A : d out ( µ 1 , µ 2 ) � k · max E ⊆ A | µ 1 ( E ) − µ 2 ( E ) | 6

One possible definition: output distributions close For two distributions µ 1 , µ 2 over a set A : d out ( µ 1 , µ 2 ) � k · max E ⊆ A | µ 1 ( E ) − µ 2 ( E ) | k -Uniform sensitivity ◮ Larger k → closer output distributions ◮ Strong guarantee: probabilities close for all sets of outputs 6

Application: probabilistic convergence/mixing Probabilistic program forgets initial state ◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution 7

Application: probabilistic convergence/mixing Probabilistic program forgets initial state ◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution Consequence of k -uniform sensitivity ◮ As number of iterations T increases, prove k -uniform sensitivity for larger and larger k ( T ) ◮ Relation between k and T describes speed of convergence 7

Another possible definition: average outputs close For two distributions µ 1 , µ 2 over real numbers: d out ( µ 1 , µ 2 ) � k · | E [ µ 1 ] − E [ µ 2 ] | 8

Another possible definition: average outputs close For two distributions µ 1 , µ 2 over real numbers: d out ( µ 1 , µ 2 ) � k · | E [ µ 1 ] − E [ µ 2 ] | k -Mean sensitivity ◮ Larger k → closer averages ◮ Weaker guarantee than uniform sensitivity 8

Application: algorithmic stability Machine learning algorithm A ◮ Input: set S of training examples ◮ Output: list of numeric parameters (randomized) Danger: overfitting ◮ Output parameters depend too much on training set S ◮ Low error on training set, high error on new examples 9

Application: algorithmic stability One way to prevent overfitting ◮ L maps S to average error of randomized learning algorithm A ◮ If | L ( S ) − L ( S ′ ) | is small for all training sets S, S ′ differing in a single example, then A does not overfit too much 10

Application: algorithmic stability One way to prevent overfitting ◮ L maps S to average error of randomized learning algorithm A ◮ If | L ( S ) − L ( S ′ ) | is small for all training sets S, S ′ differing in a single example, then A does not overfit too much L should be mean sensitive 10

Wanted: a general definition that is ... • Expressive • Easy to reason about 11

Ingredient #1 : Probabilistic coupling A coupling models two distributions with one distribution Given two distributions µ 1 , µ 2 ∈ Distr ( A ) , a joint distribution µ ∈ Distr ( A × A ) is a coupling if π 1 ( µ ) = µ 1 π 2 ( µ ) = µ 2 and 12

Ingredient #1 : Probabilistic coupling A coupling models two distributions with one distribution Given two distributions µ 1 , µ 2 ∈ Distr ( A ) , a joint distribution µ ∈ Distr ( A × A ) is a coupling if π 1 ( µ ) = µ 1 π 2 ( µ ) = µ 2 and Typical pattern Prove property about two (output) distributions by constructing a coupling with certain properties 12

Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + 13

Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings 13

Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings Typical pattern Bound distance d # between two (output) distributions by constructing a coupling with small average distance d 13

Putting it together: Expected sensitivity Given: ◮ A function f : A → Distr ( B ) (think: probabilistic program) ◮ Distances d in and d out on A and B 14

Putting it together: Expected sensitivity Given: ◮ A function f : A → Distr ( B ) (think: probabilistic program) ◮ Distances d in and d out on A and B We say f is ( d in , d out ) -expected sensitive if: d # out ( f ( a 1 ) , f ( a 2 )) ≤ d in ( a 1 , a 2 ) for all inputs a 1 , a 2 ∈ A . 14

Benefits: Expressive If d out ( b 1 , b 2 ) > k for all distinct b 1 , b 2 : ( d in , d out ) -expected sensitive = ⇒ k -uniform sensitive 15

Benefits: Expressive If d out ( b 1 , b 2 ) > k for all distinct b 1 , b 2 : ( d in , d out ) -expected sensitive = ⇒ k -uniform sensitive If outputs are real-valued and d out ( b 1 , b 2 ) = k · | b 1 − b 2 | : ( d in , d out ) -expected sensitive = ⇒ k -mean sensitive 15

Benefits: Easy to reason about 16

Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive 16

Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive 16

Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive ◦ f : A → Distr ( C ) is ( d A , d C ) -expected sensitive g ˜ 16

Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive ◦ f : A → Distr ( C ) is ( d A , d C ) -expected sensitive g ˜ Abstract away distributions ◮ Work in terms of distances on ground sets ◮ No need to work with complex distances over distributions 16

How to verify this property? The program logic E pRHL 17

A relational program logic E pRHL The pWhile imperative language c ::= x ← e | x ← d | if e then c else c | while e do c | skip | c ; c $ Judgments ⊢ { P ; d in } c 1 ∼ c 2 { Q ; d out } ◮ Tagged program variables: x � 1 � , x � 2 � ◮ P and Q : boolean predicates over tagged variables ◮ d in and d out : real-valued expressions over tagged variables 18

E pRHL judgments model e xpected sensitivity A judgment ⊢ { P ; d in } c 1 ∼ c 2 { Q ; d out } is valid if: for all input memories ( m 1 , m 2 ) satisfying pre-condition P , there exists a coupling of outputs ([ ] m 2 ) with [ c 1 ] ] m 1 , [ [ c 2 ] ◮ support satisfying post-condition Q ◮ E [ d out ] ≤ d in ( m 1 , m 2 ) 19

One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20

Proving Expected Sensitivity of Probabilistic Programs Gilles - PowerPoint PPT Presentation

Proving Expected Sensitivity of Probabilistic Programs Gilles Barthe Thomas Espitau Benjamin Grgoire Justin Hsu Pierre-Yves Strub 1 Program Sensitivity Similar inputs similar outputs Given: distances d in on inputs, d out on

Climate Sensitivity We consider climate sensitivity in a very simple context. Climate Sensitivity

Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players

Proving Probabilistic Proving Probabilistic Properties of the I tai I tai Rodeh Rodeh

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Sensitivity to Market Risks 1 METAC Workshop Sensitivity to Market Risks I OVERVIEW A

Proving that Artinian implies Noetherian without proving that Artinian implies finite length

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Symbolic Computation and Theorem Proving in Program Analysis Laura Kov acs Chalmers

Visual theorem proving with the Incredible Proof Machine The idea Theorem Proving without

4.5: Proving the Correctness of Grammars In this section, we consider techniques for proving the

Multiple Programs How do programs communicate? 1 Multiple Programs How do programs communicate?

Expected Value 27 February 2012 Expected Value 27 February 2012 1/19 This week we discuss the

Decision Theory III (MATH 3071) Lecture 4 2017/18 Expected Money Value Expected Money Value

Gaussian process regression for Sensitivity analysis GPSS Workshop on UQ, Sheffield, September

Motivation Topic-Sensitive PageRank Improve search results Current engines work well for

simulations Workshop on Bioinformatics of Gene Regulation on the occasion of 30 Years TRANSFAC

CSE 427 Computational Biology http://courses.cs.washington.edu/courses/cse427 Larry Ruzzo Winter

Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School

Data-driven sensitivity analysis for Matching estimators Giovanni Cerulli 1 1 IRCrES-CNR, Research

Corporate Earnings Sensitivity to FX Volatility: Evidence from Peru Alberto Humala Central

Alias Analysis Last time Alias analysis I (pointer analysis) Address Taken FIAlias,