a kullback leibler divergence for bayesian model
play

A Kullback-Leibler Divergence for Bayesian Model Comparison with - PDF document

A Kullback-Leibler Divergence for Bayesian Model Comparison with Applications to Diabetes Studies Chen-Pin Wang, UTHSCSA Malay Ghosh, U. Florida Lehmann Symposium, May 9, 2011 1 Background KLD: the expected (with respect to the refer-


  1. A Kullback-Leibler Divergence for Bayesian Model Comparison with Applications to Diabetes Studies Chen-Pin Wang, UTHSCSA Malay Ghosh, U. Florida Lehmann Symposium, May 9, 2011 1

  2. Background • KLD: the expected (with respect to the refer- ence model) logarithm of the ratio of the proba- bility density functions (p.d.f.’s) of two models. � � � r ( t n | θ ) log r ( t n | θ ) dt n f ( t n | θ ) • KLD: a measure of the discrepancy of informa- tion about θ contained in the data revealed by two competing models (K-L; Lindley; Bernardo; Akaike; Schwarz; Goutis and Robert). • Challenge in the Bayesian framework: – identify priors that are compatible under the competing models – the resulting integrated likelihoods are proper.

  3. G-R KLD • Remedy: The Kullback-Leibler projection by Goutis and Robert (1998), or G-R KLD: the inf. KLD between the likelihood under the reference model and all possible likelihoods arising from the competing model. • G-R KLD is the KLD between the reference model and the competing model evaluated at its MLE if the reference model is correctly specified (ref. Akaike 1974). • G-R KLD overcomes the challenges associated with prior elicitation in calculating KLD under the Bayesian framework.

  4. G-R KLD • The Bayesian estimate of G-R KLD: integrat- ing the G-R KLD with respect to the posterior distribution of model parameters under the ref- erence model. – Bayesian estimate of G-R KLD is not subject to impropriety of the prior as long as the poste- rior under the reference model is proper. – G-R KLD is suitable for comparing the predic- tivity of the competing models. – G-R KLD was originally developed for compar- ing nested GLM with a known true model, and its extension to general model comparison remains limited.

  5. Proposed KLD � � � r ( t n | θ ) log r ( t n | θ ) dt n . (1) f ( t n | ˆ θ f ) Bayes estimate of (1): � �� � � � r ( t n | θ ) log r ( t n | θ ) dt n π ( θ | U n ) . (2) f ( t n | ˆ θ f ) Objective: To study the property of KLD esti- mate given in (2).

  6. Notations • X i ’s are i.i.d. originating from model g gov- erned by θ ∈ Θ. • T n = T ( X 1 , · · · , X n ): the statistic for model diagnostics. • Two competing models: r for the reference model and f for the fitted model. • Assume that prior π r ( θ ) leads to proper poste- rior under r .

  7. Our proposed KLD • KLD t ( r, f | θ ) quantifies the relative model fit for statistic T n between models r and f . • KLD t ( r, f | θ ) is identical to G-R KLD when the reference model r is the correct model. • KLD t ( r, f | θ ) is not necessarily the same as the G-R KLD. • KLD t ( r, f | θ ) needs no additional adjustment for non-nested situations. • KLD t ( r, f | θ ) is more practical than G-R KLD.

  8. Regularity Conditions I ( A 1) For each x , both log r ( x | θ ) and log f ( x | θ ) are 3 times continuously differentiable in θ . Further, there exist neigh- borhoods N r ( δ ) = ( θ − δ r , θ + δ r ) and N f ( δ ) = ( θ − δ f , θ + δ f ) of θ and integrable functions H θ,δ r ( x ) and H θ,δ f ( x ) such that � � � � ∂ k � � sup ∂θ k log r ( x | θ ) ≤ H θ,δ r ( x ) � � θ ′ ∈ N ( δ r ) θ = θ ′ and � � � � ∂ k � � sup ∂θ k log f ( x | θ ) ≤ H θ,δ f ( x ) � � θ ′ ∈ N ( δ f ) θ = θ ′ for k=1, 2, 3. ( A 2) For all sufficiently large λ > 0, � � log r ( x | θ ′ ) E r sup < 0; r ( x | θ ) | θ ′ − θ | >λ � � log f ( x | θ ′ ) E f sup < 0 . f ( x | θ ) | θ ′ − θ | >λ

  9. Regularity Conditions II � � � � � log r ( x | θ ′ ) ( A 3) E r sup � θ → E r [log r ( x | θ )] as δ → 0; θ ′ ∈ ( θ − δ,θ + δ ) � � � � � log f ( x | θ ′ ) sup → E f [log f ( x | θ )] as δ → 0 . E f � θ θ ′ ∈ ( θ − δ,θ + δ ) ( A 4) The prior density π ( θ ) is continuously differentiable in a neighborhood of θ and π ( θ ) > 0. ( A 5) Suppose that T n is asymptotically normally distributed under both models such that r ( θ ) φ ( √ n { T n − µ r ( θ ) } /σ r ( θ )) + O ( n − 1 / 2 ); r ( T n | θ ) = σ − 1 f ( θ ) φ ( √ n { T n − µ f ( θ ) } /σ f ( θ )) + O ( n − 1 / 2 ) . f ( T n | θ ) = σ − 1

  10. Assume the regularity conditions Theorem 1. (A1)-(A5). Then µ r ( U n ) } 2 − { ˆ µ f ( U n ) − ˆ 2 KLD t ( r, f | U n ) = o p (1)(3) σ 2 n ˆ f ( U n ) when µ f ( θ ) � = µ r ( θ ), and   σ 2  ˆ r ( U n )  = o p (1) 2 KLD t ( r, f | U n ) − Q (4) σ 2 ˆ f ( U n ) when µ r ( θ ) = µ f ( θ ) but σ 2 r ( θ ) � = σ 2 f ( θ ).

  11. Remarks for Theorem 1 • KLD t ( r, f | θ ) is also a divergence of model pa- rameter estimates • Model comparison in real applications may rely on the fit to a multi-dimensional statistic. The results in Theorem 1 are applicable to the mul- tivariate case with a fixed dimension. • KLD t ( r, f | θ ) can be viewed as the discrepancy between r and f in terms of their posterior pre- dictivity of T n . • We study how KLD t ( r, f | θ ) is connected to a weighted posterior predictive p-value, a typical Bayesian technique to assess model discrepancy (see Rubin 1984; Gelman et al. 1996).

  12. Weighted Posterior Predictive P-value � �� � t n � f ∗ ( y n | ˆ θ f ) dy n r ∗ ( t n | θ ) dt n WPPP r ( U n ) ≡ π r ( θ | U n ) dθ, (5) −∞ where r ∗ and f ∗ are the predictive density func- tions of T n under r and f , respectively. • WPPP is equivalent to the weighted posterior predictive p-value of T n under f with respect to the posterior predictive distribution of T n under r .

  13. Theorem 2. 2 KLD t ( r, f | U n ) n { Φ − 1 ( WPPP r ( U n )) } 2 = n � � µ f ( U n )) 2 σ 2 (ˆ µ r ( U n ) − ˆ ˆ r ( U n ) + f ( U n ) + o p (1) (6) σ 2 σ 2 σ 2 ˆ f ( U n ) + ˆ r ( U n ) ˆ when µ f ( θ ) � = µ r ( θ ). Let Q ( y ) = y − log ( y ) − 1. Then � � σ 2 ˆ r ( U n ) 2 KLD t ( r, f | U n ) − Q = o p (1) (7) σ 2 ˆ f ( U n ) and WPPP r ( U n ) − 0 . 5 = o p (1) (8) when µ r ( θ ) = µ f ( θ ) but σ 2 r ( θ ) � = σ 2 f ( θ ).

  14. Remarks of Theorem 2. • It shows the asymptotic relationship between KLD t ( r, f | u n ) and WPPP. • Suppose that µ f ( θ ) � = µ r ( θ ). – Both KLD t ( r, f | U n ) and Φ − 1 ( WP P P r ( U n )) are of order O p ( n ). – KLD t ( r, f | U n ) is greater than Φ − 1 ( WP P P r ( U n )) by an O p ( n ) term that assumes positive values with probability 1. • When µ r ( θ ) = µ f ( θ ) (i.e., both r and f assume the same mean of T n ) but σ 2 f ( θ ) � = σ 2 r ( θ ), – Φ − 1 ( WP P P r ( U n )) converges to 0; WP P P r ( U n ) converges to 0.5 – KLD t ( r, f | U n ) converges to a positive quantity order O p (1)

  15. ∼ g θ ( x i ) = φ (( x i − θ 1 ) / √ θ 2 ) / √ θ 2 , where i.i.d. Example 1. X i Let T n = √ n [( � i X i ) /n − θ 1 ] / √ κ . θ 2 > 0. Let r = g and f θ ( x i ) = φ (( x i − θ 1 ) / √ κ ) / √ κ . Then • µ r ( θ ) = E h ( T n ) = µ f ( θ ) = E f ( T n ) = θ 1 , σ 2 r ( θ ) = θ 2 , σ 2 f ( θ ) = κ , � 2 lim KLD t ( r, f | u n ) n →∞ � ˆ � � > 0 ˆ θ 2 ( u n ) θ 2 ( u n ) if κ � = θ 2 = − log + − 1 κ = θ 2 . = 0 if κ κ • T n is the MLE for θ 1 under both h and f . • lim n →∞ WPPP ( U n ) = 0 . 5 • WPPP ( U n ) is asymptotically equivalent to the KLD ap- proaches.

  16. i.i.d. Example 2 Assume X i ∼ g θ ( x i ) = exp {− θ/ (1 − θ ) }{ θ/ (1 − θ ) } x i /x i !, where 0 < θ < 1. Let T n = ¯ X n / (1 + ¯ X n ), r = g , and f θ ( x i ) = θ x i (1 − θ ). Then • µ r ( θ ) = µ f ( θ ) = θ , σ 2 r ( θ ) = θ (1 − θ ) 3 , and σ 2 f ( θ ) = θ (1 − θ ) 2 . • θ = E ( X i ) / (1 + E ( X i )). • T n is the MLE for θ under both r and f • 2 lim n →∞ KLD t ( r, f | u n ) = − log(1 − ˆ θ ( u n )) + (1 − ˆ θ ( u n )) − 1 > 0 for 0 < θ < 1. • lim n →∞ WPPP ( U n ) = 0 . 5

  17. i.i.d. Γ(( θ 2 +1) / 2) Example 3 Assume X i ∼ g θ ( x i ) = πθ 2 (1 + ( x − √ Γ( θ 2 / 2) θ 1 ) 2 /θ 2 ) − (1+ θ 2 ) / 2 , where θ 2 > 2. Let T n = ¯ X . Let r = g and f θ ( x i ) = φ ( X i − θ 1 ). Then • µ f ( θ ) = µ r ( θ ) = θ 1 , σ 2 r ( θ ) = θ 2 / ( θ 2 − 2), and σ 2 f ( θ ) = 1 • 2 lim n →∞ KLD t ( r, f | u n ) = − log( θ 2 ( u n ) / ( θ 2 ( u n ) − 2))+ θ 2 / ( θ 2 ( u n ) − 2) − 1 ≥ 0 for all θ 2 with equality if and only if θ 2 = ∞ .

  18. i.i.d. Example 4 Assume X i ∼ g θ ( x i ) = exp( − x i /θ ) /θ . Let r = g and f θ ( x i ) = exp( − x i ), T n = min { X 1 , · · · , X n } . Then • r θ ( t n ) = n exp( − nt n /θ ) /θ and f θ ( t n ) = n exp( − nt n ) ¯ x n x n ) = E f ( Pr ( T ∗ • WPPP f (¯ n < T n ) | ¯ x n ) → ¯ x n +1 � KLD t ( r, f | ¯ x n ) → − log(¯ x n ) + n (¯ x n − 1) • • The asymptotic equivalence between KLD t ( r, f | u n ) and WPPP f ( u n ) does not hold in the sense of Thm. 2 due to the violation of the asym. normality assumption.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend