Bayesian estimation of the discrepancy with misspecified parametric - PowerPoint PPT Presentation

Semiparametric density estimation Asymptotics and illustration References Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012 Joint work with S. Walker 1 / 22

Semiparametric density estimation Asymptotics and illustration References Outline Semiparametric density estimation Asymptotics and illustration References 2 / 22

Semiparametric density estimation Asymptotics and illustration References BNP density estimation • Let X 1 , . . . , X n be exchangeable (i.e. conditionally iid) observations from an unknown density f on the real line. • If F is the density space and Π( d f ) the prior, via Bayes theorem R Q n i = 1 f ( X i ) Π( d f ) A Π( A | X 1 , . . . , X n ) = R Q n i = 1 f ( X i ) Π( d f ) F • Wealth of Bayesian nonparametric (BNP) models • Dirichlet process mixtures of continuos densities; • log spline models; • Bernstein polynomials; • log Gaussian processes. • All with well studied asymptotic properties, e.g. posterior concentration rates n →∞ Π( f : d ( f , f 0 ) > M ǫ n | X 1 , . . . , X n ) → 0 , when X 1 , X 2 , . . . are iid from some “ true ” f 0 . 3 / 22

Semiparametric density estimation Asymptotics and illustration References Discrepancy from a parametric model • Suppose now we have a favorite parametric family f θ ( x ) , θ ∈ Θ ⊂ R p . likely to be misspecified: there is no θ such that f 0 = f θ . • We want to learn about the best parameter value θ 0 which minimizes the Kullback-Leibler divergence from true f 0 : R θ 0 = arg min f 0 log ( f 0 / f θ ) Θ • A nonparametric component W is introduced to model the discrepancy between f 0 and the closest density f θ 0 : f θ, W ( x ) ∝ f θ ( x ) W ( x ) , so that W ( x ) C ( x ) := R W ( s ) f θ ( s ) ds is designed to estimate C 0 ( x ) = f 0 ( x ) / f θ 0 ( x ) . 4 / 22

Semiparametric density estimation Asymptotics and illustration References Related works - Frequentist Hjort and Glad (1995) θ ( x ) , ˆ • Start with a parametric density estimate f ˆ θ being, e.g., the MLE of θ with respect to the likelihood Q n i = 1 log f θ ( x i ) . • Then multiply it with a nonparametric kernel-type of the correction function r ( x ) = f 0 ( x ) / f ˆ θ ( x ) : n K h ( x i − x ) f ˆ θ ( x ) r ( x ) = 1 X ˆ f ( x ) = f ˆ θ ( x )ˆ n f ˆ θ ( x i ) i = 1 in a two-stage sequential analysis . • ˆ f is shown to be more precise than traditional kernel density estimator in a broad neighborhood around the parametric family, while losing little when the f 0 is far from the parametric family. 5 / 22

Semiparametric density estimation Asymptotics and illustration References Related works - Bayes Nonparametric prior built around a parametric model via f ( x ) = f θ ( x ) g ( F θ ( x )) , where F θ is the cdf of f θ and g is a density on [ 0 , 1 ] with prior Π . • Verdinelli and Wasserman (1999): Π as an infinite exponential family. Application to goodness of fit testing. • Rousseau (2008): Π as a mixtures of betas. Application to goodness of fit testing. • Tokdar (2007): Π as a log Gaussian process prior. Application to posterior inference for densities with unbounded support. R 1 0 e Z ( s ) d s and Z Gaussian process with covariance For g ( x ) = e Z ( x ) / σ ( · , · ) , f ( x ) can be written ˜ Z ( x ) f ( x ) ∝ f θ ( x ) e |{z} W ( x ) for ˜ Z Gaussian process with covariance σ ( F θ ( · ) , F θ ( · )) . 6 / 22

Semiparametric density estimation Asymptotics and illustration References Posterior updating W ( x ) f θ, W ( x ) ∝ f θ ( x ) W ( x ) , C ( x ) := W ( s ) f θ ( s ) ds . R • Truly semi–parametric: aim is at learning about the best parameter θ 0 , R then at seeing how close f θ 0 is to f 0 via C ( x ) = W ( x ) / W ( s ) f θ ( s ) d s . • Situation in which the updating process from prior to posterior may be seen as problematic: the model f θ, W is intrinsically non identified in ( θ, C ) • The full Bayesian update π ( θ, W | x 1 , . . . , x n ) ∝ π ( θ ) π ( W ) Q n ˜ i = 1 f θ, W ( x i ) is appropriate for learning about f 0 ; it is not so for learning about ( θ 0 , C 0 ) . R • The marginal posterior ˜ π ( θ | x 1 , . . . , x n ) = ˜ π ( θ, W | x 1 , . . . , x n ) d W has no interpretation: it is not identified what parameter value this ˜ π is targeting. 7 / 22

Semiparametric density estimation Asymptotics and illustration References Posterior updating • What removes us from the formal Bayes set–up is the desire to specifically learn about θ 0 . • θ 0 defined without any reference to W , or C . Whether we are interested in learning about C 0 or not, our beliefs about θ 0 should not change. • Hence, the appropriate update for θ is the parametric one: π ( θ | x 1 , . . . , x n ) ∝ π ( θ ) Q n i = 1 f θ ( x i ) . • We keep updating W according to the semi–parametric model, π ( W | θ, x 1 , . . . , x n ) ∝ π ( W ) Q n ˜ i = 1 f θ, W ( x i ) , so our updating scheme is π ( θ, W | x 1 , . . . , x n ) = ˜ π ( W | θ, x 1 , . . . , x n ) π ( θ | x 1 , . . . , x n ) . non-full Bayesian update 8 / 22

Semiparametric density estimation Asymptotics and illustration References Posterior updating π ( θ, W | x 1 , . . . , x n ) = ˜ π ( W | θ, x 1 , . . . , x n ) π ( θ | x 1 , . . . , x n ) . • ( θ, W ) are estimated sequentially , with W reflecting additional uncertainty on θ . • Marginalization of the posterior over W is well defined , R π ( W | x 1 , . . . , x n ) = Θ ˜ π ( W | θ, x 1 , . . . , x n ) π ( d θ | x 1 , . . . , x n ) since π ( θ | x 1 , . . . , x n ) describes the beliefs about the real parameter θ 0 . • Coherence is about properly defining the quantities of interest and showing that Bayesian updates provide learning about these quantities and this is checked by what is yielded asymptotically. • Hence we seek frequentist validation: we show that the posterior of ( θ, C ) converges to a point mass at ( θ 0 , C 0 ) . 9 / 22

Semiparametric density estimation Asymptotics and illustration References Lenk (2003) • Let I be a compact interval on the real line and Z a Gaussian process. Lenk (2003) considers the semi–parametric density model f θ ( x ) e Z ( x ) f ( x ) = R I f θ ( s ) e Z ( s ) d s for f θ ( x ) member of the exponential family. • In the Loève expansion of Z ( x ) , the orthogonal basis is chosen so that the sample paths integrate to zero. • Further assumption for identification: the orthogonal basis does not contain any of the canonical statistics of f θ ( x ) . • Estimation based on truncation of the series expansion or by imputation of the Gaussian process at a fixed grid of points, see Tokdar (2007). 10 / 22

Semiparametric density estimation Asymptotics and illustration References Bounded W ( x ) • Building upon Lenk (2003), we keep working with Gaussian processes and consider f θ ( x ) W ( x ) f θ, W ( x ) = I f θ ( s ) W ( s ) d s , W ( x ) = Ψ( Z ( x )) R where Ψ( u ) is a cdf having a smooth unimodal symmetric density ψ ( u ) on the real line. • With an additional condition on Ψ( u ) , we can show that W ( x ) preserves the asymptotic properties of log Gaussian process prior. • On the other hand, with W ( x ) ≤ 1, Walker (2011) describes a latent model which can deal with the intractable normalizing constant. It is based on ! »Z – k „ « n ∞ n + k − 1 1 X f θ ( s ) ( 1 − W ( s )) d s = . R k W ( s ) f θ ( s ) d s k = 0 11 / 22

Semiparametric density estimation Asymptotics and illustration References Link function Ψ( u ) • Lipschitz condition on log Ψ( u ) : ψ ( u ) / Ψ( u ) ≤ m uniformly on R satisfied by the standard Laplace cdf, standard logistic cdf or standard Cauchy cdf, but not by the standard normal cdf. • For fixed θ , write p z = f θ, Ψ( z ) . It can be shown that, when � z 1 − z 2 � ∞ < ǫ , ( ≤ m ǫ e m ǫ/ 2 h ( p z 1 , p z 2 ) � m 2 ǫ 2 e m ǫ ( 1 + m ǫ ) K ( p z 1 , p z 2 ) • Posterior asymptotic results of van der Vaart and van Zanten (2008) carries over to this setting: If Ψ − 1 ( f 0 / f θ ) is contained in the support of Z , then Π { p z : h ( p z , f 0 ) > ǫ | X 1 , . . . , X n } → 0 , F ∞ − a.s. 0 Results on posterior contraction rate can be also derived. 12 / 22

Semiparametric density estimation Asymptotics and illustration References Conditional posterior of W (A) Lipschitz condition on log Ψ( u ) ; (B) f θ ( x ) is continuous and bounded away from zero; (C) the support of Z contains the space C ( I ) of continuous densities on I . Theorem 1. Under assumptions (A), (B) and (C), the conditional posterior of W given θ is exponentially consistent at all f 0 ∈ C ( I ) , i.e. for any ǫ > 0, π { W : h ( f θ, W , f 0 ) > ǫ | θ, X 1 , . . . , X n } ≤ e − dn , F ∞ ˜ − a.s. 0 for some d > 0 as n → ∞ . R • As corollary, for fixed θ , the posterior of C ( x ) = W ( x ) / I f θ ( s ) W ( s ) d s consistently estimates the discrepancy f 0 ( x ) / f θ ( x ) . • The exponential convergence to 0 is a by-product of standard techniques for proving posterior consistency. 13 / 22

Bayesian estimation of the discrepancy with misspecified parametric - PowerPoint PPT Presentation

Semiparametric density estimation Asymptotics and illustration References Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics

Discrepancy and SDPs Nikhil Bansal (TU Eindhoven, Netherlands ) Outline Discrepancy Theory

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Constructive Discrepancy Minimization for Convex Sets Thomas Rothvoss UW Seattle Discrepancy

Discrepancy of Random Set Systems Rebecca Hoberg and Thomas Rothvo Discrepancy theory Set

Flow Cytometry Data Assessment Flow Cytometry Data Assessment with L2 Discrepancy Learning with

The discrepancy of the linear flow on the torus Bence Borda Alfr ed R enyi Institute of

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Calibrating misspecified ERGMs for Bayesian inference Nial Friel University College Dublin

An improper estimator with optimal excess risk in misspecified density estimation and logistic

Discrepancy Theory and Applications to Bin Packing Thomas Rothvoss Joint work with Becca Hoberg

Lower Bounds for L 1 Discrepancy Armen Vagharshakyan Brown University January 10, 2013 Armen

On some sets with minimal L 2 discrepancy Dmitriy Bilyk University of South Carolina, Columbia,

Bootstrap method for misspecified stochastic differential equation models Yuma Uehara The

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

COMS 4721: Machine Learning for Data Science Lecture 5, 1/31/2017 Prof. John Paisley Department

Bayesian Estimation & Information Theory Jonathan Pillow Mathematical Tools for Neuroscience

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

CSC421/2516 Lecture 19: Bayesian Neural Nets Roger Grosse and Jimmy Ba Roger Grosse and Jimmy Ba

Refresh Your Understanding: Multi-armed Bandits Select all that are true: Up to slight variations

Some DIC slides David Spiegelhalter MRC Biostatistics Unit, Cambridge with thanks to: Nicky

Introduction to Bayesian models with Stata Ernesto F. L. Amaral Katherine A. C. Willyard May

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 25: Introduction to

Sambuz

Useful Links

Newsletter

Mail Us

Bayesian estimation of the discrepancy with misspecified parametric - PowerPoint PPT Presentation

Semiparametric density estimation Asymptotics and illustration References Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics

Discrepancy and SDPs Nikhil Bansal (TU Eindhoven, Netherlands ) Outline Discrepancy Theory

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Constructive Discrepancy Minimization for Convex Sets Thomas Rothvoss UW Seattle Discrepancy

Discrepancy of Random Set Systems Rebecca Hoberg and Thomas Rothvo Discrepancy theory Set

Flow Cytometry Data Assessment Flow Cytometry Data Assessment with L2 Discrepancy Learning with

The discrepancy of the linear flow on the torus Bence Borda Alfr ed R enyi Institute of

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Calibrating misspecified ERGMs for Bayesian inference Nial Friel University College Dublin

An improper estimator with optimal excess risk in misspecified density estimation and logistic

Discrepancy Theory and Applications to Bin Packing Thomas Rothvoss Joint work with Becca Hoberg

Lower Bounds for L 1 Discrepancy Armen Vagharshakyan Brown University January 10, 2013 Armen

On some sets with minimal L 2 discrepancy Dmitriy Bilyk University of South Carolina, Columbia,

Bootstrap method for misspecified stochastic differential equation models Yuma Uehara The

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

COMS 4721: Machine Learning for Data Science Lecture 5, 1/31/2017 Prof. John Paisley Department

Bayesian Estimation &amp; Information Theory Jonathan Pillow Mathematical Tools for Neuroscience

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

CSC421/2516 Lecture 19: Bayesian Neural Nets Roger Grosse and Jimmy Ba Roger Grosse and Jimmy Ba

Refresh Your Understanding: Multi-armed Bandits Select all that are true: Up to slight variations

Some DIC slides David Spiegelhalter MRC Biostatistics Unit, Cambridge with thanks to: Nicky

Introduction to Bayesian models with Stata Ernesto F. L. Amaral Katherine A. C. Willyard May

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 25: Introduction to

Sambuz

Useful Links

Newsletter

Mail Us

Bayesian Estimation & Information Theory Jonathan Pillow Mathematical Tools for Neuroscience