representation formulae for score functions
play

Representation formulae for score functions Ivan Nourdin, Giovanni - PowerPoint PPT Presentation

Representation formulae for score functions Ivan Nourdin, Giovanni Peccati and Yvik Swan D epartement de Math ematique, Universit e de Li` ege July 2, 2014 1 Score 2 Stein and Fisher 3 Controlling the relative entropy 4 Key identity


  1. Representation formulae for score functions Ivan Nourdin, Giovanni Peccati and Yvik Swan ⋆ D´ epartement de Math´ ematique, Universit´ e de Li` ege July 2, 2014

  2. 1 Score 2 Stein and Fisher 3 Controlling the relative entropy 4 Key identity 5 Cattywampus Stein’s method 6 Extension 7 Coda

  3. Scoooores

  4. 1 Score 2 Stein and Fisher 3 Controlling the relative entropy 4 Key identity 5 Cattywampus Stein’s method 6 Extension 7 Coda

  5. Let X be a centered d -random vector with covariance B > 0. Definition The Stein kernel of X is a d × d matrix τ X ( X ) such that E [ τ X ( X ) ∇ ϕ ( X )] = E [ X ϕ ( X )] for all ϕ ∈ C ∞ c ( R d ) . Definition The score of X is the d × 1 vector ρ X ( X ) such that E [ ρ X ( X ) ϕ ( X )] = − E [ ∇ ϕ ( X )] for all ϕ ∈ C ∞ c ( R d ) .

  6. In the Gaussian case Z ∼ N d (0 , C ) the Stein identity E [ Z ϕ ( Z )] = E [ C ∇ ϕ ( Z )] gives ρ Z ( Z ) = − C − 1 Z and τ Z ( Z ) = C . Intuitively, a measure of proximity ρ X ( X ) ≈ − B − 1 X and τ X ( X ) ≈ B should provide an assessment of “Gaussianity”.

  7. Definition The standardised Fisher information of X is �� � T � ρ X ( X ) + B − 1 X ρ X ( X ) + B − 1 X � � J st ( X ) = BE . A simple computation gives J st ( X ) = BJ ( X ) − Id � ρ X ( X ) ρ X ( X ) T � with J ( X ) = E the Fisher information matrix. Definition The Stein discrepancy is � τ X ( X ) − B � 2 � � S ( X ) = E . H . S .

  8. Control on J st ( X ) and S ( X ) provides control on several distances (Kullback-Leibler, Kolmogorov, Wasserstein, Hellinger, Total Variation, ...) between the law of X and the Gaussian. Controlling J st ( X ) : • Johnson and Barron through careful analysis of the score function (PTRF, 2004) • Artstein, Ball, Barthe, Naor through “variational tour de force” (PTRF, 2004) Controlling S ( X ) : • Cacoullos Papathanassiou and Utev (AoP 1994) in a number of settings • Nourdin and Peccati through their infamous Malliavin/Stein fourth moment theorem (PTRF, 2009) • Extension to abstract settings (Ledoux, AoP 2012)

  9. 1 Score 2 Stein and Fisher 3 Controlling the relative entropy 4 Key identity 5 Cattywampus Stein’s method 6 Extension 7 Coda

  10. Let Z be centered Gaussian with density φ = φ d ( · ; C ). Definition The relative entropy between X and Z is � � f ( x ) � D ( F || Z ) = E [log( f ( X ) /φ ( X ))] = R d f ( x ) log dx . φ ( x ) The Pinsker-Csiszar-Kullback inequality yields � 2 TV ( X , Z ) ≤ 2 D ( X || Z ) . In other words D ( X || Z ) ⇒ TV ( X , Z ) 2 .

  11. Usefulness of J st ( X ) can be seen via the de Bruijn identity. Let X t = √ tX + √ 1 − tZ and Γ t = tB + (1 − t ) C . Then 1 � 1 C Γ − 1 � � D ( X || Z ) = 2 t tr t J st ( X t ) dt 0 1 + 1 � 1 C − 1 B C Γ − 1 � � � � � � tr − d + 2 t tr − I d dt t 2 0 In other words J st ( X t ) ⇒ D ( X || Z ) ⇒ ( TV ( X , Z )) 2 .

  12. Usefulness of S ( X ) can be seen via Stein’s method. Fix d = 1. Then, given h : R → R such that � h � ∞ ≤ 1 seek g h solution of the Stein equation to get g ′ � � E [ h ( X )] − E [ h ( Z )] = E h ( X ) − Xg h ( X ) (1 − τ X ( X )) g ′ � � = E h ( X ) so that TV ( X , Z ) = 1 sup | E [ h ( X )] − E [ h ( Z )] | 2 � h � ∞ ≤ 1 � � 1 � g ′ � ≤ sup h � S ( X ) . 2 � h � ∞ ≤ 1 In other words S ( X ) ⇒ TV ( X , Z ) 2 .

  13. If h is not smooth there is no way of obtaining sufficiently precise estimates on the quantity “ ∇ g h ” in dimension greater than 1. For the moment Stein’s method only works in dimension 1 for total variation distance. The IT approach via de Bruijn’s identity does not suffer from this “dimensionality issue”. We aim to mix the Stein method approach and the IT approach. To this end we need one final ingredient : a representation formulae for the score in terms of the Stein kernel.

  14. 1 Score 2 Stein and Fisher 3 Controlling the relative entropy 4 Key identity 5 Cattywampus Stein’s method 6 Extension 7 Coda

  15. Theorem Let X t = √ tX + √ 1 − tZ with X and Z independent. Then t ρ t ( X t ) + C − 1 X t = − ( I d − C − 1 τ X ( X )) Z | X t √ 1 − t E � � (1) for all 0 < t < 1 . Proof when d = 1 and C = 1. E [ E [(1 − τ X ( X )) Z | X t ] φ ( X t )] = E [(1 − τ X ( X )) Z φ ( X t )] √ √ φ ′ ( X t ) τ X ( X ) φ ′ ( X t ) � � � � = 1 − tE − 1 − tE √ � 1 − t φ ′ ( X t ) � � 1 − tE − = E [ X φ ( X t )] t √ 1 − t √ E [ X t φ ( X t )] + 1 − t φ ′ ( X t ) � � = 1 − tE − E [ Z φ ( X t )] t t √ 1 − t √ √ E [ X t φ ( X t )] + 1 − t φ ′ ( X t ) φ ′ ( X t ) � � � � = 1 − tE − 1 − tE t t √ 1 − t φ ′ ( X t ) = − � � � − E [ X t φ ( X t )] � E t

  16. This formula provides a nearly one-line argument. Define ( I d − C − 1 τ X ( X )) Z | X t � � ∆( X , t ) = E . Take d = 1 and all variances set to 1. Then t 2 ( ρ t ( X t ) + X t ) 2 � ∆( X , t ) 2 � � � J st ( X t ) = E = 1 − t E so that � 1 D ( X || Z ) = 1 t ∆( X , t ) 2 � � dt . 1 − t E 2 0 Also, ∆( X , t ) 2 � (1 − τ X ( X )) 2 � � ≤ E � E = S ( X ) .

  17. This yields � 1 D ( X || Z ) ≤ 1 t 2 S ( X ) 1 − t dt 0 which is useless. There is hope, nevertheless : � 1 t 1 − t dt 0 is barely infinity.

  18. Recall X t = √ tX + √ 1 − tZ . Then ∆( X , t ) = E [(1 − τ X ( X )) Z | X t ] is such that ∆( X , 0) = ∆( X , 1) = 0 a.s. Hence we need to identify conditions under which t ∆( X , t ) 2 � � 1 − t E is integrable at t = 1.

  19. The behaviour of ∆( X , t ) around t ≈ 1 is central to the understanding of the law of X . The behaviour of ∆( X , t ) 2 � � at t ≈ 1 E is closely connected to the so-called MMSE dimension studied by the IT community. This quantity revolves around the remarkable “MMSE formula” dr I ( X ; √ rX + Z ) = E ( X − E [ X | √ rX + Z ]) 2 � d � due to Guo, Shamai and Verdu (IEEE, 2005) The connexion is explicitly stated in NPSb (IEEE, 2014).

  20. 1 Score 2 Stein and Fisher 3 Controlling the relative entropy 4 Key identity 5 Cattywampus Stein’s method 6 Extension 7 Coda

  21. In NPSa (JFA, 2014) we suggest the following IT alternative to Stein’s method. First cut the integral : 2 D ( X || Z ) (1 − τ X ( X )) 2 � � 1 − ǫ � 1 t t ∆( X , t ) 2 � � � ≤ E 1 − t dt + 1 − t E dt 0 1 − ǫ � 1 t (1 − τ X ( X )) 2 � ∆( X , t ) 2 � � � ≤ E | log ǫ | + 1 − t E dt . 1 − ǫ Next suppose that when t is close to 1 we have ∆( X , t ) 2 � ≤ C κ t − 1 (1 − t ) κ � E (2) for some κ > 0.

  22. We deduce � 1 (1 − t ) − 1+ κ dt 2 D ( X || Z ) ≤ S ( X ) | log ǫ | + C η 1 − ǫ = S ( X ) | log ǫ | + C κ κ ǫ κ . (1 − τ X ( X )) 2 � 1 /κ which leads to � The optimal choice is ǫ = E D ( X || Z ) ≤ 1 2 κ S ( X ) log S ( X ) + C κ 2 κ S ( X ) which provides a bound on the total variation distance in terms of S ( X ) which is of the correct order up to a logarithmic factor.

  23. Under what conditions do we have (2)? It is relatively easy to show (via H¨ older’s inequality) that � | τ X ( X ) | 2+ η � < ∞ and E [ | ∆( X , t ) | ] ≤ ct − 1 (1 − t ) δ (3) E implies (2). It now remains to identify under which conditions we have (3). Lemma (Poly’s first lemma) Let X be an integrable random variable and let Y be a R d -valued random vector having an absolutely continuous distribution. Then E | E [ X | Y ] | = sup E [ Xg ( Y )] , where the supremum is taken over all g ∈ C 1 c such that � g � ∞ ≤ 1

  24. Thus E | E [ Z (1 − τ X ( X )) | X t ] | = sup E [ Z (1 − τ X ( X )) g ( X t )] . Now choose g ∈ C 1 c such that � g � ∞ ≤ 1. Then E [ Z (1 − τ X ( X )) g ( X t )] = E [ Zg ( X t )] − E [ Zg ( X t ) τ X ( X )] √ τ X ( X ) g ′ ( X t ) � � = E [ Zg ( X t )] − 1 − tE � 1 − t = E [ Z ( g ( X t ) − g ( X ))] − E [ g ( X t ) X ] t and thus | E [ Z (1 − τ X ( X )) g ( X t )] | ≤ | E [ Z ( g ( X t ) − g ( X ))] | + t − 1 √ 1 − t .

  25. Also sup | E [ Z ( g ( X t ) − g ( X ))] | √ √ � � � � � � � = sup xE g ( tX + 1 − tx ) − g ( X ) φ 1 ( x ) dx � � � � R √ √ � ≤ 2 | x | TV ( tX + 1 − tx , X ) φ 1 ( x ) dx . R Wrapping up we get E | E [ Z (1 − τ X ( X )) | X t ] | √ √ + t − 1 √ � � ≤ 2 E | Z | TV ( 1 − tZ , X ) 1 − t . tX + It therefore all boils down to a condition on √ √ 1 − tx , X ) . TV ( tX +

  26. Recall that we want E | E [ Z (1 − τ X ( X )) | X t ] | ≤ ct − 1 (1 − t ) δ . (3) As it turns out, in view of previous results, a sufficient condition for (3) is √ √ 1 − tx , X ) ≤ κ (1 + | x | ) t − 1 (1 − t ) α . TV ( tX + This condition – and its multivariate extension – is satisfied by a wide family of random vectors including those for which they can apply their fourth moment bound X 4 � S ( X ) ≤ c ( E � − 3) .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend