a monte carlo approach to a divergence minimization
play

A Monte Carlo approach to a divergence minimization problem (work in - PowerPoint PPT Presentation

A Monte Carlo approach to a divergence minimization problem (work in progress) IGAIA IV, June 12-17, 2016, Liblice Michel Broniatowski Universit Pierre et Marie Curie, Paris, France June 13, 2016 Michel Broniatowski (Institute) Monte Carlo


  1. A Monte Carlo approach to a divergence minimization problem (work in progress) IGAIA IV, June 12-17, 2016, Liblice Michel Broniatowski Université Pierre et Marie Curie, Paris, France June 13, 2016 Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 1 / 36

  2. Contents From Large deviations to Monte Carlo based minimization Divergences Large deviations for bootstrapped empirical measure A Minimization problem Minimum of the Kullback divergence Minimum of the Likelihood divergence Building weights Exponential families and their variance functions, minimizing Cressie-Read divergences Rare events and Gibbs conditional principle Looking for the minimizers Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 2 / 36

  3. An inferential principle for minimization A sequence of random elements X n with values in a measurable space ( T , T ) satisfies a Large Deviation Principle with rate Φ whenever, for all measurable set Ω ⊂ T it holds − Φ ( int ( Ω )) ≤ lim inf n → ∞ ε n log Pr ( X n ∈ Ω ) ≤ lim sup n → ∞ ε n log Pr ( X n ∈ Ω ) ≤ − Φ ( cl ( Ω )) for some positive sequence ε n where int ( Ω ) (resp. cl ( Ω ) ) denotes the interior (resp. the closure) of Ω in T and Φ ( Ω ) : = inf { Φ ( t ) ; t ∈ Ω } . The σ − field T is the Borel one defined by a given basis on T . For subsets Ω in T such that Φ ( int ( Ω )) = Φ ( cl ( Ω )) (1) it follows by inclusion that n → ∞ ε n log Pr ( X n ∈ Ω ) = Φ ( int ( Ω )) = Φ ( cl ( Ω )) = inf t ∈ Ω Φ ( t ) = Φ ( Ω ) . − lim (2) Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 3 / 36

  4. Assume that we are given such a family of random elements X 1 , X 2 , .. together with a set Ω ⊂ T which satisfies (1). Suppose that we are interested in estimating Φ ( Ω ) . Then, whenever we are able to simulate a family of replicates X n , 1 , .., X n , K such that Pr ( X n ∈ Ω ) can be approximated by the frequency of those X n , i ’s in Ω , say f n , K ( Ω ) : = 1 K card ( i : X n , i ∈ Ω ) (3) a natural estimator of Φ ( Ω ) writes Φ n , K ( Ω ) : = − ε n log f n , K ( Ω ) . We have substituted the approximation of the variational problem Φ ( Ω ) : = inf ( Φ ( ω ) , ω ∈ Ω ) by a much simpler one, namely a Monte Carlo one, defined by (3). No need to identify the set of points ω in Ω which minimize Φ . Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 4 / 36

  5. This program can be realized whenever we can identify the sequence of random elements X i ’s for which, given the criterion Φ and the set Ω , the limit statement (2) holds. Here the X i ’s are empirical measures of some kind, and Φ ( Ω ) writes φ ( Ω , P ) which is the infimum of a divergence between some reference probability measure P and a class of probability measures Ω . Standpoint: φ ( Ω , P ) is a LDP rate for specific X i ’s to be built. Applications: choice of models, estimation of the minimizers (dichotomy, etc) Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 5 / 36

  6. Divergences Let ( X , B ) be a measurable Polish space and P be a given reference probability measure (p.m.) on ( X , B ) . Denote M 1 the set of all p.m.’s on ( X , B ) . Let ϕ be a proper closed convex function from ] − ∞ , + ∞ [ to [ 0 , + ∞ ] with ϕ ( 1 ) = 0 and such that its domain dom ϕ : = { x ∈ R such that ϕ ( x ) < ∞ } is a finite or infinite interval . For any measure Q in M 1 , the φ -divergence between Q and P is defined by � dQ � � φ ( Q , P ) : = X ϕ dP ( x ) dP ( x ) . if Q << P . When Q is not a.c. w.r.t. P , set φ ( Q , P ) = + ∞ . The φ -divergences between p.m.’s were introduced in Csiszar (1963) as “ f -divergences” with some different definition. For all p.m. P , the mappings Q ∈ M �→ φ ( Q , P ) are convex and take nonnegative values. When Q = P then φ ( Q , P ) = 0. Furthermore, if the function x �→ ϕ ( x ) is strictly convex on a neighborhood of x = 1, then φ ( Q , P ) = 0 if and only if Q = P . Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 6 / 36

  7. Cressie-Read divergences When defined on M 1 , divergences associated with ϕ 1 ( x ) = x log x − x + 1 (KL), ϕ 0 ( x ) = − log x + x − 1 (KL m -likelihood), 2 ( x − 1 ) 2 (Spearman Chi-square), ϕ − 1 ( x ) = 1 2 ( x − 1 ) 2 / x ϕ 2 ( x ) = 1 (modified Chi-square, Neyman), ϕ 1 / 2 ( x ) = 2 ( √ x − 1 ) 2 (Hellinger) The class of Cressie and Read power divergences x ∈ ] 0 , + ∞ [ �→ ϕ γ ( x ) : = x γ − γ x + γ − 1 (4) γ ( γ − 1 ) Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 7 / 36

  8. Extensions The power divergences functions Q ∈ M 1 �→ φ γ ( Q , P ) can be defined on the whole vector space of signed finite measures M via the extension of the definition of the convex functions ϕ γ : For all γ ∈ R such that the function x �→ ϕ γ ( x ) is not defined on ] − ∞ , 0 [ or defined but not convex on whole R , we extend its definition as follows � ϕ γ ( x ) if x ∈ [ 0 , + ∞ [ , x ∈ ] − ∞ , + ∞ [ �→ (5) + ∞ x ∈ ] − ∞ , 0 [ . if 2 ( x − 1 ) 2 is Note that for the χ 2 -divergence for instance, ϕ 2 ( x ) : = 1 defined and convex on whole R . Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 8 / 36

  9. The conjugate (or Legendre transform) of ϕ will be denoted ϕ ∗ , t ∈ R �→ ϕ ∗ ( t ) : = sup { tx − ϕ ( x ) } , x ∈ R Property: ϕ is essentially smooth iff ϕ ∗ is strictly convex; then, � � ϕ ∗ ( t ) = t ϕ �− 1 ( t ) − ϕ ϕ �− 1 ( t ) ϕ ∗� ( t ) = ϕ �− 1 ( t ) . and In the present setting this holds. Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 9 / 36

  10. The bootstrapped empirical measure Let Y , Y 1 , Y 2 , ... denote a sequence of positive i.i.d. random variables . We assume that Y satisfies the so-called Cramer condition � � t ∈ R such that Λ Y ( t ) : = log Ee tY < ∞ N : = contains a neigborhood of 0 with non void interior. Consider the weights W n i , 1 ≤ i ≤ n Y i W n i : = ∑ n i = 1 Y i which define a vector of exchangeable variables ( W n 1 , .., W n n ) for all n ≥ 1 . Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 10 / 36

  11. The data x n 1 , .., x n n : We assume that n 1 ∑ lim δ x n i = P n n → ∞ i = 1 a.s. and we define the bootstrapped empirical measure of ( x n 1 , .., x n n ) by n : = 1 P W W n ∑ i δ x n i . n n i = 1 Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 11 / 36

  12. A Sanov type result for the weighted Bootstrap empirical measure Define the Legendre transform of Λ Y , say Λ ∗ defined on Im Λ � by Λ ∗ ( x ) : = sup t tx − Λ Y ( t ) . Theorem Under the above hypotheses and notation the sequence P W obeys a LDP n on the space of all finite signed measures on X equipped with the weak convergence topology with rate function � Λ ∗ � � m dQ inf m > 0 dP ( x ) dP ( x ) if Q << P φ ( Q , P ) : = (6) + ∞ otherwise This Theorem is a variation on Corollary 3.3 in Trashorras and Wintenberger (2014). Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 12 / 36

  13. Estimation of the minimum of the Kullback divergence Set Y 1 , .., Y n i.i.d. standard exponential . Then Λ ∗ ( x ) = ϕ 1 ( x ) : = x log x − x + 1 and � � � dQ � � � mdQ Λ ∗ Λ ∗ inf dP ( x ) dP ( x ) = dP ( x ) dP ( x ) = KL ( Q , P ) . m > 0 Repete sampling ( Y 1 , .., Y n ) i.i.d. E(1) K times. Hence for sets Ω such that KL ( int Ω , P ) = KL ( cl Ω , P ) then for large K �� � � 1 n log 1 P W K card j ∈ Ω , 1 ≤ j ≤ K n is a proxy of � � 1 P W ∈ Ω n log Pr n and therefore an estimator of KL ( Ω , P ) . Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 13 / 36

  14. When Y is E(1) then by Pyke’s Theorem, ( W 1 .., W n ) coincides with the spacings of the ordered statistics of n i.i.d. uniformly distributed r.v’s on ( 0 , 1 ) , i.e. the simplest bootstrap version of P n based on exchangeable weights. It also holds with these weights � � � 1 − 1 � P W � x n 1 , .., x n ∈ Ω n log Pr ( P n ∈ Ω ) = 0 lim n log Pr n n n → ∞ This weighted bootstrap is the only LDP efficient one. Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 14 / 36

  15. Estimation of the minimum of the Likelihood divergence � dQ � � dQ � � � KL m ( Q , P ) : = ϕ 0 dP = − log dP dP dP ϕ 0 ( x ) : = − log x + x − 1 . Set Y 1 , .., Y n i.i.d. Poisson (1), then Λ ∗ ( x ) = ϕ 0 ( x ) : = − log x + x − 1 � � � dQ � � � mdQ Λ ∗ Λ ∗ inf dP ( x ) dP ( x ) = dP ( x ) dP ( x ) = KL ( Q , P ) . m > 0 Repete sampling ( Y 1 , .., Y n ) i.i.d. Poisson(1) K times. For large K �� � � 1 n log 1 P W j ∈ Ω , 1 ≤ j ≤ K K card n is an estimator of KL m ( Ω , P ) , since a proxy of � � 1 P W n log Pr ∈ Ω n Michel Broniatowski (Institute) Monte Carlo and divergences June 13, 2016 15 / 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend