bayesian estimation of the discrepancy with misspecified
play

Bayesian estimation of the discrepancy with misspecified parametric - PowerPoint PPT Presentation

Semiparametric density estimation Asymptotics and illustration References Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics


  1. Semiparametric density estimation Asymptotics and illustration References Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012 Joint work with S. Walker 1 / 22

  2. Semiparametric density estimation Asymptotics and illustration References Outline Semiparametric density estimation Asymptotics and illustration References 2 / 22

  3. Semiparametric density estimation Asymptotics and illustration References BNP density estimation • Let X 1 , . . . , X n be exchangeable (i.e. conditionally iid) observations from an unknown density f on the real line. • If F is the density space and Π( d f ) the prior, via Bayes theorem R Q n i = 1 f ( X i ) Π( d f ) A Π( A | X 1 , . . . , X n ) = R Q n i = 1 f ( X i ) Π( d f ) F • Wealth of Bayesian nonparametric (BNP) models • Dirichlet process mixtures of continuos densities; • log spline models; • Bernstein polynomials; • log Gaussian processes. • All with well studied asymptotic properties, e.g. posterior concentration rates n →∞ Π( f : d ( f , f 0 ) > M ǫ n | X 1 , . . . , X n ) → 0 , when X 1 , X 2 , . . . are iid from some “ true ” f 0 . 3 / 22

  4. Semiparametric density estimation Asymptotics and illustration References Discrepancy from a parametric model • Suppose now we have a favorite parametric family f θ ( x ) , θ ∈ Θ ⊂ R p . likely to be misspecified: there is no θ such that f 0 = f θ . • We want to learn about the best parameter value θ 0 which minimizes the Kullback-Leibler divergence from true f 0 : R θ 0 = arg min f 0 log ( f 0 / f θ ) Θ • A nonparametric component W is introduced to model the discrepancy between f 0 and the closest density f θ 0 : f θ, W ( x ) ∝ f θ ( x ) W ( x ) , so that W ( x ) C ( x ) := R W ( s ) f θ ( s ) ds is designed to estimate C 0 ( x ) = f 0 ( x ) / f θ 0 ( x ) . 4 / 22

  5. Semiparametric density estimation Asymptotics and illustration References Related works - Frequentist Hjort and Glad (1995) θ ( x ) , ˆ • Start with a parametric density estimate f ˆ θ being, e.g., the MLE of θ with respect to the likelihood Q n i = 1 log f θ ( x i ) . • Then multiply it with a nonparametric kernel-type of the correction function r ( x ) = f 0 ( x ) / f ˆ θ ( x ) : n K h ( x i − x ) f ˆ θ ( x ) r ( x ) = 1 X ˆ f ( x ) = f ˆ θ ( x )ˆ n f ˆ θ ( x i ) i = 1 in a two-stage sequential analysis . • ˆ f is shown to be more precise than traditional kernel density estimator in a broad neighborhood around the parametric family, while losing little when the f 0 is far from the parametric family. 5 / 22

  6. Semiparametric density estimation Asymptotics and illustration References Related works - Bayes Nonparametric prior built around a parametric model via f ( x ) = f θ ( x ) g ( F θ ( x )) , where F θ is the cdf of f θ and g is a density on [ 0 , 1 ] with prior Π . • Verdinelli and Wasserman (1999): Π as an infinite exponential family. Application to goodness of fit testing. • Rousseau (2008): Π as a mixtures of betas. Application to goodness of fit testing. • Tokdar (2007): Π as a log Gaussian process prior. Application to posterior inference for densities with unbounded support. R 1 0 e Z ( s ) d s and Z Gaussian process with covariance For g ( x ) = e Z ( x ) / σ ( · , · ) , f ( x ) can be written ˜ Z ( x ) f ( x ) ∝ f θ ( x ) e |{z} W ( x ) for ˜ Z Gaussian process with covariance σ ( F θ ( · ) , F θ ( · )) . 6 / 22

  7. Semiparametric density estimation Asymptotics and illustration References Posterior updating W ( x ) f θ, W ( x ) ∝ f θ ( x ) W ( x ) , C ( x ) := W ( s ) f θ ( s ) ds . R • Truly semi–parametric: aim is at learning about the best parameter θ 0 , R then at seeing how close f θ 0 is to f 0 via C ( x ) = W ( x ) / W ( s ) f θ ( s ) d s . • Situation in which the updating process from prior to posterior may be seen as problematic: the model f θ, W is intrinsically non identified in ( θ, C ) • The full Bayesian update π ( θ, W | x 1 , . . . , x n ) ∝ π ( θ ) π ( W ) Q n ˜ i = 1 f θ, W ( x i ) is appropriate for learning about f 0 ; it is not so for learning about ( θ 0 , C 0 ) . R • The marginal posterior ˜ π ( θ | x 1 , . . . , x n ) = ˜ π ( θ, W | x 1 , . . . , x n ) d W has no interpretation: it is not identified what parameter value this ˜ π is targeting. 7 / 22

  8. Semiparametric density estimation Asymptotics and illustration References Posterior updating • What removes us from the formal Bayes set–up is the desire to specifically learn about θ 0 . • θ 0 defined without any reference to W , or C . Whether we are interested in learning about C 0 or not, our beliefs about θ 0 should not change. • Hence, the appropriate update for θ is the parametric one: π ( θ | x 1 , . . . , x n ) ∝ π ( θ ) Q n i = 1 f θ ( x i ) . • We keep updating W according to the semi–parametric model, π ( W | θ, x 1 , . . . , x n ) ∝ π ( W ) Q n ˜ i = 1 f θ, W ( x i ) , so our updating scheme is π ( θ, W | x 1 , . . . , x n ) = ˜ π ( W | θ, x 1 , . . . , x n ) π ( θ | x 1 , . . . , x n ) . non-full Bayesian update 8 / 22

  9. Semiparametric density estimation Asymptotics and illustration References Posterior updating π ( θ, W | x 1 , . . . , x n ) = ˜ π ( W | θ, x 1 , . . . , x n ) π ( θ | x 1 , . . . , x n ) . • ( θ, W ) are estimated sequentially , with W reflecting additional uncertainty on θ . • Marginalization of the posterior over W is well defined , R π ( W | x 1 , . . . , x n ) = Θ ˜ π ( W | θ, x 1 , . . . , x n ) π ( d θ | x 1 , . . . , x n ) since π ( θ | x 1 , . . . , x n ) describes the beliefs about the real parameter θ 0 . • Coherence is about properly defining the quantities of interest and showing that Bayesian updates provide learning about these quantities and this is checked by what is yielded asymptotically. • Hence we seek frequentist validation: we show that the posterior of ( θ, C ) converges to a point mass at ( θ 0 , C 0 ) . 9 / 22

  10. Semiparametric density estimation Asymptotics and illustration References Lenk (2003) • Let I be a compact interval on the real line and Z a Gaussian process. Lenk (2003) considers the semi–parametric density model f θ ( x ) e Z ( x ) f ( x ) = R I f θ ( s ) e Z ( s ) d s for f θ ( x ) member of the exponential family. • In the Loève expansion of Z ( x ) , the orthogonal basis is chosen so that the sample paths integrate to zero. • Further assumption for identification: the orthogonal basis does not contain any of the canonical statistics of f θ ( x ) . • Estimation based on truncation of the series expansion or by imputation of the Gaussian process at a fixed grid of points, see Tokdar (2007). 10 / 22

  11. Semiparametric density estimation Asymptotics and illustration References Bounded W ( x ) • Building upon Lenk (2003), we keep working with Gaussian processes and consider f θ ( x ) W ( x ) f θ, W ( x ) = I f θ ( s ) W ( s ) d s , W ( x ) = Ψ( Z ( x )) R where Ψ( u ) is a cdf having a smooth unimodal symmetric density ψ ( u ) on the real line. • With an additional condition on Ψ( u ) , we can show that W ( x ) preserves the asymptotic properties of log Gaussian process prior. • On the other hand, with W ( x ) ≤ 1, Walker (2011) describes a latent model which can deal with the intractable normalizing constant. It is based on ! »Z – k „ « n ∞ n + k − 1 1 X f θ ( s ) ( 1 − W ( s )) d s = . R k W ( s ) f θ ( s ) d s k = 0 11 / 22

  12. Semiparametric density estimation Asymptotics and illustration References Link function Ψ( u ) • Lipschitz condition on log Ψ( u ) : ψ ( u ) / Ψ( u ) ≤ m uniformly on R satisfied by the standard Laplace cdf, standard logistic cdf or standard Cauchy cdf, but not by the standard normal cdf. • For fixed θ , write p z = f θ, Ψ( z ) . It can be shown that, when � z 1 − z 2 � ∞ < ǫ , ( ≤ m ǫ e m ǫ/ 2 h ( p z 1 , p z 2 ) � m 2 ǫ 2 e m ǫ ( 1 + m ǫ ) K ( p z 1 , p z 2 ) • Posterior asymptotic results of van der Vaart and van Zanten (2008) carries over to this setting: If Ψ − 1 ( f 0 / f θ ) is contained in the support of Z , then Π { p z : h ( p z , f 0 ) > ǫ | X 1 , . . . , X n } → 0 , F ∞ − a.s. 0 Results on posterior contraction rate can be also derived. 12 / 22

  13. Semiparametric density estimation Asymptotics and illustration References Conditional posterior of W (A) Lipschitz condition on log Ψ( u ) ; (B) f θ ( x ) is continuous and bounded away from zero; (C) the support of Z contains the space C ( I ) of continuous densities on I . Theorem 1. Under assumptions (A), (B) and (C), the conditional posterior of W given θ is exponentially consistent at all f 0 ∈ C ( I ) , i.e. for any ǫ > 0, π { W : h ( f θ, W , f 0 ) > ǫ | θ, X 1 , . . . , X n } ≤ e − dn , F ∞ ˜ − a.s. 0 for some d > 0 as n → ∞ . R • As corollary, for fixed θ , the posterior of C ( x ) = W ( x ) / I f θ ( s ) W ( s ) d s consistently estimates the discrepancy f 0 ( x ) / f θ ( x ) . • The exponential convergence to 0 is a by-product of standard techniques for proving posterior consistency. 13 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend