Comparing distributions via their canonical Stein operators: a new - PowerPoint PPT Presentation

Comparing distributions via their canonical Stein operators: a new view of Stein’s method Gesine Reinert Department of Statistics University of Oxford International Colloquium on Stein’s Method, Concentration Inequalities, and Malliavin Calculus June 30, 2014 Joint work with Christophe Ley (Brussels) and Yvik Swan (Li´ ege) 1 / 45

Stein’s method Outline 1 Stein’s method 2 A canonical Stein operator 3 Examples 4 Distances between expectations 5 Distance between posteriors 6 Last words 2 / 45

Stein’s method Stein’s method in a nutshell For µ a target distribution, with support I : 1 Find a suitable operator A (called Stein operator) and a wide class of functions F ( A ) (called Stein class) such that X ∼ µ if and only if for all functions f ∈ F ( A ), E A f ( X ) = 0 . 2 Let H ( I ) be a measure-determining class on I . For each h ∈ H find a solution f = f h ∈ F ( A ) of the h ( x ) − E h ( X ) = A f ( x ) , where X ∼ µ . If the solution exists and if it is unique in F ( A ) then we can write f ( x ) = A − 1 ( h ( x ) − E h ( X )) . We call A − 1 the inverse Stein operator (for µ ). 3 / 45

Stein’s method Comparison of distributions Let X and Y have distributions µ X and µ Y with Stein operators A X and A Y , so that F ( A X ) ∩ F ( A Y ) � = ∅ and choose H ( I ) such that all solutions f of the Stein equation belong to this intersection. Then E h ( X ) − E h ( Y ) = E A Y f ( X ) = E A Y f ( X ) − E A X f ( X ) and | E h ( X ) − E h ( Y ) | ≤ | E A X f ( X ) − E A Y f ( X ) | . sup sup h ∈H ( I ) f ∈F ( A X ) ∩F ( A Y ) If H ( I ) is the set of all Lipschitz-1-functions then the resulting distance is d W , the Wasserstein distance. For examples see for example Holmes (2004), Eichelsbacher and R. (2008), D¨ obler (2012) . 4 / 45

A canonical Stein operator Outline 1 Stein’s method 2 A canonical Stein operator 3 Examples 4 Distances between expectations 5 Distance between posteriors 6 Last words 5 / 45

A canonical Stein operator Our set-up Let ( X , B , µ ) be a measure space, with X ⊂ R . Let X ⋆ be the set of real-valued functions on X . Let D : dom ( D ) ⊂ X ⋆ → im ( D ) be a linear operator and dom ( D ) \ { 0 } � = ∅ . Let D − 1 : im ( D ) → dom ( D ) be the linear operator which sends any h = D f onto f . Then D − 1 h � � D = h for all h ∈ im ( D ) whereas, for f ∈ dom ( D ), D − 1 ( D f ) is only defined up to addition with an element of ker ( D ). 6 / 45

A canonical Stein operator Assumption There exists a linear operator D ⋆ : dom ( D ⋆ ) ⊂ X ⋆ → im ( D ⋆ ) and a constant l := l X , D such that D ( f ( x ) g ( x + l )) = g ( x ) D f ( x ) + f ( x ) D ⋆ g ( x ) for all ( f , g ) ∈ dom ( D ) × dom ( D ⋆ ). Under this assumption, D and D ⋆ are skew-adjoint in the sense that � � f D ⋆ gd µ g D fd µ = − X X for all ( f , g ) ∈ dom ( D ) × dom ( D ⋆ ) such that g D f ∈ L 1 ( µ ) or f D ⋆ g ∈ L 1 ( µ ) and � X D ( f ( · ) g ( · + l )) d µ = 0. 7 / 45

A canonical Stein operator Example 1 Let µ be the Lebesgue measure on X = R and take D the usual strong derivative. Then � x D − 1 f ( x ) = f ( u ) du , • the usual antiderivative. Our assumption D ( f ( x ) g ( x + l )) = g ( x ) D f ( x ) + f ( x ) D ⋆ g ( x ) is satisfied with D ⋆ = D and l = 0. 8 / 45

A canonical Stein operator Example 2 Let µ be the counting measure on X = Z and take D = ∆ + , the forward difference operator. Then x − 1 D − 1 f ( x ) = � f ( k ) . k = • Also we have the discrete product rule ∆ + ( f ( x ) g ( x − 1)) = g ( x )∆ + f ( x ) + f ( x )∆ − g ( x ) for all f , g ∈ Z ⋆ and all x ∈ Z . Hence our assumption D ( f ( x ) g ( x + l )) = g ( x ) D f ( x ) + f ( x ) D ⋆ g ( x ) is satisfied with D ⋆ = ∆ − , the backward difference operator and l = − 1. 9 / 45

A canonical Stein operator Example 3 Let µ ( x ) be the N (0 , 1) measure on R , with density ϕ , and take D ϕ f ( x ) = f ′ ( x ) − xf ( x ) = ( f ( x ) ϕ ( x )) ′ , ϕ ( x ) see e.g. Ledoux, Nourdin, Peccati (2014) . Then � x 1 D − 1 ϕ f ( x ) = f ( y ) ϕ ( y ) dy . ϕ ( x ) • Also we have the product rule D ϕ ( gf )( x ) = ( gf ) ′ ( x ) − xg ( x ) f ( x ) = g ( x ) D ϕ f ( x ) + f ( x ) g ′ ( x ) . Hence our assumption D ( f ( x ) g ( x + l )) = g ( x ) D f ( x ) + f ( x ) D ⋆ g ( x ) is satisfied with D ⋆ g = g ′ and l = 0. 10 / 45

A canonical Stein operator Example 4 Let µ ( x ) be the Poisson( λ )measure on Z + with pmf γ λ and λ f ( x ) = λ f ( x + 1) − xf ( x ) = ∆ + ( f ( x ) x γ λ ( x )) ∆ + . γ λ ( x ) Then x − 1 1 (∆ + λ ) − 1 f ( x ) = � f ( k ) γ λ ( k ) x γ λ ( x ) k = • (which is ill-defined at x = 0) and ∆ + λ ( g ( x − 1) f ( x )) = g ( x )∆ + λ f ( x ) + f ( x ) x ∆ − g ( x ) . Hence our assumption D ( f ( x ) g ( x + l )) = g ( x ) D f ( x ) + f ( x ) D ⋆ g ( x ) is satisfied with D ⋆ g ( x ) = x ∆ − g ( x ) and l = − 1. 11 / 45

A canonical Stein operator Remark In all examples the choice of D is, in a sense, arbitrary and other options are available. Less conventional choices of D can be envisaged (even forward differences in the continuous setting, etc.). The restriction to dimension 1 is not necessary. From now for the sake of presentation on we concentrate on the Lebesgue measure and D the usual derivative. 12 / 45

A canonical Stein operator A canonical Stein operator Let X be a continuous random variable distribution having pdf p with interval support I = [ a , b ] ⊂ R . We define the Stein class of X as the class F ( X ) of functions f : R → R such that (i) x �→ f ( x ) p ( x ) is differentiable on R (ii) ( fp ) ′ is integrable and ( fp ) ′ = 0. � To X we associate the Stein operator T X of X such that T X f = ( fp ) ′ p with the convention that T X f = 0 outside of I . 13 / 45

A canonical Stein operator A useful relationship We have a distributional characterisation: Y D = X if and only if ( T Y , F ( Y )) = ( T X , F ( X )) for all random variables Y which have the same support as X . See Ley and Swan (2011) for more details. By the product rule, g ′ ( X ) f ( X ) � � = − E [ g ( X ) T X f ( X )] E for all f ∈ F ( X ) and for all differentiable functions g such that | g ′ fp | dx < ∞ ; we say that g ∈ dom (( · ) ′ , X , f ). ( gfp ) ′ dx = 0 , and � � 14 / 45

A canonical Stein operator Stein characterisations Let Y be continuous with density q , and same support as X . 1 Y D f ( Y ) g ′ ( Y ) � � = X if and only if E = − E [ g ( Y ) T X f ( Y )] for all f ∈ F ( X ) and for all g ∈ dom (( · ) ′ , X , f ) . p is differentiable. Take g ∈ ∩ f ∈F ( X ) dom (( · ) ′ , X , f ) 2 Suppose that q such that g is X -a.s. never 0 and g q p is differentiable. Then Y D f ( Y ) g ′ ( Y ) � � = X if and only if E = − E [ g ( Y ) T X f ( Y )] for all f ∈ F ( X ). 3 Let f ∈ F ( X ) be X -a.s. never zero and assume that dom (( · ) ′ , X , f ) is dense in L 1 ( X ). Then Y D f ( Y ) g ′ ( Y ) � � = X if and only if E = − E [ g ( Y ) T X f ( Y )] for all g ∈ dom (( · ) ′ , X , f ). 15 / 45

A canonical Stein operator Some special cases Take g ≡ 1 ( this is always permitted) to obtain the Stein characterization Y D = X if and only if E [ T X f ( Y )] = 0 for all f ∈ F ( X ) . If f ≡ 1 is in F ( X ) then we obtain the Stein characterization � p ′ ( Y ) � Y D ⇒ E [ g ′ ( Y )] = − E = 0 for all g ∈ dom (( · ) ′ , X , 1) . = X ⇐ p ( Y ) g ( Y ) 16 / 45

A canonical Stein operator A connection to couplings: an equation Let X be a mean zero random variable with finite, nonzero variance σ 2 . We say that X ∗ has the X -zero biased distribution if for all differentiable f for which E Xf ( X ) exists, σ 2 E f ′ ( X ∗ ) − E Xf ( X ) = 0; N (0 , σ 2 ) is the unique fixed point of the zero-bias transformation. More generally, if X is a random variable with differentiable density p X = then for all differentiable f , p X ( x ) T X ( f )( x ) = ( f ( x ) p X ( x )) ′ = p X ( x ) f ′ ( x ) + f ( x ) p ′ X ( x ) and so f ( X ) p ′ � X ( X ) � f ′ ( X ) � � + E = 0 . E p X ( X ) 17 / 45

A canonical Stein operator A connection to couplings: a transformation The equation f ( X ) p ′ � X ( X ) � f ′ ( X ) � � + E = 0 E p X ( X ) could lead to a transformation which maps a random variable Y to Y ( X ) such that for all differentiable f ∈ for which the expressions exist, E f ( Y ) p ′ � X ( Y ) � E f ′ ( Y ( X ) ) = − . p X ( Y ) 18 / 45

A canonical Stein operator A connection to couplings: unique fixed points Now assume that f ∈ F ( X ) ∩ dom ( D ) is dense in L 1 ( X ). and that Y ( X ) is well-defined. To see that Y = d X if and only if Y ( X ) = d Y : As for all f ∈ F ( X ), f ( X ) p ′ � X ( X ) � f ′ ( X ) � � + E = 0 E p X ( X ) and E f ( Y ) p ′ � X ( Y ) � E f ′ ( Y ( X ) ) = − , p X ( Y ) if Y = d X then Y ( X ) = d Y . If Y ( X ) = d Y , then E T X ( f )( Y ) = 0 for all differentiable f ∈ F ( X ), and the assertion follows from the density assumption and using g = 1 in Y D f ( Y ) g ′ ( Y ) � � = X if and only if E = − E [ g ( Y ) T X f ( Y )] for all f ∈ F ( X ). 19 / 45

Comparing distributions via their canonical Stein operators: a new - PowerPoint PPT Presentation

Comparing distributions via their canonical Stein operators: a new view of Steins method Gesine Reinert Department of Statistics University of Oxford International Colloquium on Steins Method, Concentration Inequalities, and Malliavin

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Ordering comparisons Comparing distributions: Part 4 R.W. Oldford More than two distributions

Introducing... Benjamin Mako Hill GULEV: Ubuntu Canonical Ltd. Ubuntu A GNU/Linux Operating

Canonical Typology Danny Hieber Hieber, Daniel W. 2011. Canonical Typology. Talk given to the

A canonical martingale coupling Workshop on Optimal Transportation and Appplications Nicolas

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Kernel Exploitation via Uninitialized Stack http://people.canonical.com/~kees/defcon19/ Kees

Stein Elliptic Curves over Q ( 5) William Stein, University of Washington (This is part of

Lecture 5: Probability Distributions Random Variables Probability Distributions

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Nonlinear matrix equations and canonical factorizations Beatrice Meini joint work with Dario A.

Virgo Status B. Mours (LAPP Annecy) For the Virgo Collaboration Detector Status Computing

Mobile Resource Guarantees Ian Stark Laboratory for Foundations of Computer Science School of

Geospaal data processing for image automac analysis PyParis 2018 - 15/11/18 Raphal

October 9, 2014 San Joaquin Valley APCD State law requires the Office of Environmental

The Clock-Proxy Auction: A Practical Combinatorial Auction Design Lawrence M. Ausubel, Peter

CS70: Jean Walrand: Lecture 37. Gaussian RVs and CLT 1. Review: Continuous Probability 2. Normal

Closure Rules for Non-Intrusive Appliance Load Monitoring Drs.

t s rt

Comparing distributions via their canonical Stein operators: a new - PowerPoint PPT Presentation

Comparing distributions via their canonical Stein operators: a new view of Steins method Gesine Reinert Department of Statistics University of Oxford International Colloquium on Steins Method, Concentration Inequalities, and Malliavin

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Ordering comparisons Comparing distributions: Part 4 R.W. Oldford More than two distributions

Introducing... Benjamin Mako Hill GULEV: Ubuntu Canonical Ltd. Ubuntu A GNU/Linux Operating

Canonical Typology Danny Hieber Hieber, Daniel W. 2011. Canonical Typology. Talk given to the

A canonical martingale coupling Workshop on Optimal Transportation and Appplications Nicolas

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Kernel Exploitation via Uninitialized Stack http://people.canonical.com/~kees/defcon19/ Kees

Stein Elliptic Curves over Q ( 5) William Stein, University of Washington (This is part of

Lecture 5: Probability Distributions Random Variables Probability Distributions

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Nonlinear matrix equations and canonical factorizations Beatrice Meini joint work with Dario A.

Virgo Status B. Mours (LAPP Annecy) For the Virgo Collaboration Detector Status Computing

Mobile Resource Guarantees Ian Stark Laboratory for Foundations of Computer Science School of

Geospaal data processing for image automac analysis PyParis 2018 - 15/11/18 Raphal

October 9, 2014 San Joaquin Valley APCD State law requires the Office of Environmental

The Clock-Proxy Auction: A Practical Combinatorial Auction Design Lawrence M. Ausubel, Peter

CS70: Jean Walrand: Lecture 37. Gaussian RVs and CLT 1. Review: Continuous Probability 2. Normal

Closure Rules for Non-Intrusive Appliance Load Monitoring Drs.

t s rt

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart