Posterior consistency in Bayesian inference with exponential priors - PowerPoint PPT Presentation

Posterior consistency in Bayesian inference with exponential priors Masoumeh Dashti University of Sussex Workshop on Optimization and Inversion under Uncertainty Linz, 12 November 2019 Based on joint work with S Agapiou (Cyprus), T Helin (LUT, Finland)

The setting Suppose (indirect) noisy measurements, y , of quantity of interest, u , is available y = G ( u ) + η

The setting Suppose (indirect) noisy measurements, y , of quantity of interest, u , is available y = G ( u ) + η Examples. j = 1 , . . . , n , x j ∈ D ⊂ R d i) y j = u ( x j ) + η , u ∈ C b ( D )

The setting Suppose (indirect) noisy measurements, y , of quantity of interest, u , is available y = G ( u ) + η Examples. j = 1 , . . . , n , x j ∈ D ⊂ R d i) y j = u ( x j ) + η , u ∈ C b ( D ) j = 1 , . . . , n x j ∈ D ⊂ R d ii) y j = p ( x j ) + η , ∇ · ( u ∇ p ) = f in D u ∈ C b ( D ) with u > 0.

Bayesian approach Consider y = G ( u ) + η with u ∈ X , y ∈ R n ( X separable Banach spaces ), • prior u ∼ µ 0 • statistics of noise is known: η ∼ ρ η posterior µ y ( when well-defined ∗ ) satisfies µ y ( d u ) ∝ ρ η ( y − G ( u )) µ 0 ( d u ) � µ y ( A ) = ⇐ ⇒ c ρ η ( y − G ( u )) µ 0 ( d u ) ∀ A ∈ B ( X ) A � �� d µ y ( u ) d µ 0

Posterior consistency suppose:   y 1 .   . ◮ y =  with n arbitrarily large .  y n ◮ there exists an underlying truth y = G ( w 0 ) + η

Posterior consistency suppose:   y 1 .   . ◮ y =  with n arbitrarily large .  y n ◮ there exists an underlying truth y = G ( w 0 ) + η Does µ y concentrate on arbitrarily small neighbourhoods of w 0 as n → ∞ and how fast?

Posterior consistency suppose:   y 1 .   . ◮ y =  with n arbitrarily large .  y n ◮ there exists an underlying truth y = G ( w 0 ) + η Does µ y concentrate on arbitrarily small neighbourhoods of w 0 as n → ∞ and how fast? Simpler: Do modes of µ y converge to w 0 ?

Outline 1 MAP estimators and weak posterior consistency Posterior consistency with contraction rates 2

MAP estimates µ ( X ) = 1, X a function space There is no Lebesgue density, modes can be defined topologically: Any point ˜ u ∈ X satisfying sup u ∈ X µ ( B ǫ ( u )) lim = 1 , µ ( B ǫ (˜ u )) ǫ → 0 is a MAP estimator. (MD, L AW , S TUART , V OSS ’13)

∃ Z ⊂ X s.t. for u ∈ Z µ ( B ǫ ( u )) µ ( B ǫ ( 0 )) = e − I ( u ) lim ǫ → 0 ◮ If X = R n , Z = R n and I ( u ) = − log ρ µ ( u ) ◮ For X function space Z is a proper dense subset of X with µ ( Z ) = 0

∃ Z ⊂ X s.t. for u ∈ Z µ ( B ǫ ( u )) µ ( B ǫ ( 0 )) = e − I ( u ) lim ǫ → 0 ◮ If X = R n , Z = R n and I ( u ) = − log ρ µ ( u ) ◮ For X function space Z is a proper dense subset of X with µ ( Z ) = 0 Are modes of µ characterised by minimisers of I ?

The Prior X ⊂ L 2 { ψ j } orthonormal basis in L 2 ( T d ) , ξ j ∼ c p exp( − | x | p p ) , p ≥ 1, i.i.d { γ j } → 0 positive decreasing sequence µ 0 law of ( γ j ξ j ) j , and u ∼ µ 0 satisfies � u ( x ) = γ j ξ j ψ j ( x ) j ∈ N

The Prior X ⊂ L 2 { ψ j } orthonormal basis in L 2 ( T d ) , ξ j ∼ c p exp( − | x | p p ) , p ≥ 1, i.i.d { γ j } → 0 positive decreasing sequence µ 0 law of ( γ j ξ j ) j , and u ∼ µ 0 satisfies � u ( x ) = γ j ξ j ψ j ( x ) j ∈ N Gaussian Besov (L ASSAS , S AKSMAN , S ILTANEN ’09) p = 2 p ≥ 1, γ j negative powers of j { ψ j } an orthonormal basis { ψ j } orthonormal wavelet basis p = 1 sparsity promoting, continuous but not differentiable measure

d µ ( u ) = c e − Φ( u ) For with Φ given d µ 0 p � u � p I ( u ) = Φ( u ) + 1 Z , � p < ∞ � � � u ∈ X : � � � � u ,ψ j � Z := j γ j � 2 < ∞ � � � u ∈ X : � � � � u ,ψ j � for h ∈ Q := , γ j ξ j ∼ ρ j j γ j N d µ 0 , h ρ j ( u i − h j ) � ( u ) = lim d µ 0 ρ j ( u j ) N →∞ j = 1 hj − uj uj � N | p + | γ j | p j = 1 −| in L 1 = lim γ j N →∞ e µ For locally Lipschitz Φ , modes of µ are minimisers of I : • p = 2 MD, L AW , S TUART , V OSS ’13 ( Z = Q ) • p > 1 H ELIN & B URGER ’15; L IE & S ULLIVAN ’18 (differentiable) • p = 1 A GAPIOU , B URGER , MD, H ELIN ’18

Weak posterior consistency d µ y ( u ) ∝ ρ η ( y − G ( u )) =: e − Φ( u , y ) , suppose: d µ 0   y 1 .   . ◮ y =  with n arbitrarily large .  y n ◮ there exists an underlying truth, ( y j = G ( w 0 ) + η j ) y = G ( w 0 ) + η for µ 0 exponential, MAP estimates are u n := argmin Φ( u , y ) + � u � Z . u ∈ Z

u n := argmin Φ( u , y ) + � u � Z u ∈ Z n � 2 + 2 + 1 � � � � = argmin | G ( w 0 ) − G ( u ) G ( w 0 ) − G ( u ) , η j n � u � Z spacespacespace n u ∈ Z j = 1 Theorem. (A GAPIOU , B URGER , D, H ELIN ’18) Assume that G : X → R + is locally Lipschitz and w 0 ∈ Z. Then • G ( u n ) → G ( w 0 ) in probability. • If G is injective � u n − w 0 � X → 0 in probability. Otherwise, ∃ u ∗ ∈ Z and a subseq of { u n } n ∈ N such that � u n − w 0 � X → 0 in probability. For any such u ∗ , G ( u ∗ ) = G ( w 0 ) .

u n := argmin Φ( u , y ) + � u � Z u ∈ Z n � 2 + 2 + 1 � � � � = argmin | G ( w 0 ) − G ( u ) G ( w 0 ) − G ( u ) , η j n � u � Z spacespacespace n u ∈ Z j = 1 Theorem. (A GAPIOU , B URGER , D, H ELIN ’18) Assume that G : X → R + is locally Lipschitz and w 0 ∈ Z. Then • G ( u n ) → G ( w 0 ) in probability. • If G is injective � u n − w 0 � X → 0 in probability. Otherwise, ∃ u ∗ ∈ Z and a subseq of { u n } n ∈ N such that � u n − w 0 � X → 0 in probability. For any such u ∗ , G ( u ∗ ) = G ( w 0 ) . small noise limit similar y = G ( w 0 ) + δ n η

Outline 1 MAP estimators and weak posterior consistency Posterior consistency with contraction rates 2

Consistency with contraction rates µ y is said to contract with rate ǫ n at w 0 if µ y �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability

Consistency with contraction rates µ y is said to contract with rate ǫ n at w 0 if µ y �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability G HOSAL , G OSH & VAN DER V AART ’00 give sufficient conditions on model and prior to ensure this. Conditions on prior • µ 0 puts sufficient mass around w 0 , • distribution of mass under µ 0 is ‘ not too complex ’ model: i.i.d sampling or white noise model

Conditions on prior – Exponential case A GAPIOU , MD & H ELIN ’18: For appropriate ǫ n ∗ there exists X n ⊂ X s.t. ◮ µ 0 ( � u − w 0 � X < 2 ǫ n ) ≥ e − n ǫ 2 n ǫ 2 ◮ log N (˜ ǫ n , X n , � · � X ) ≤ Cn ˜ ( N : min # of balls needed to cover X n ) n µ 0 ( X \ X n ) ≤ e − Cn ǫ 2 n

Conditions on prior – Exponential case A GAPIOU , MD & H ELIN ’18: For appropriate ǫ n ∗ there exists X n ⊂ X s.t. ◮ µ 0 ( � u − w 0 � X < 2 ǫ n ) ≥ e − n ǫ 2 n ǫ 2 ◮ log N (˜ ǫ n , X n , � · � X ) ≤ Cn ˜ ( N : min # of balls needed to cover X n ) n µ 0 ( X \ X n ) ≤ e − Cn ǫ 2 n ————————————————– ( ∗ ) ǫ n satisfies p � h � p φ w 0 ( ǫ n ) ≤ n ǫ 2 1 with φ w ( ǫ ) := inf Z − log µ 0 ( ǫ B X ) n h ∈ Z : � h − w � X ≤ ǫ (based on VAN DER V AART & VAN Z ANTEN ’08 for Gaussian)

• for h ∈ Z p � h � p µ 0 ( ǫ B X + h ) ≥ e − 1 Z µ 0 ( ǫ B X ) • By two-level Talagrand’s inequality – 1994 1 , ∀ M > 0 1 p µ ( A ) exp( − cM p ) 2 B Q + MB Z ) ≥ 1 − µ ( A + M p → choose X n = ǫ B X + M n B Q + M n B Z 2 1 with M n ∝ ( n ǫ 2 n ) p 1 generalised Borell’s inequality

Contraction rates φ w 0 ( ǫ n ) ≤ n ǫ 2 Find the largest ǫ n s.t. n � t • For White noise model y n = 1 0 u ( s ) d s + √ n B t , t ∈ [ 0 , 1 ] α + 1 with truth w 0 ∈ B β p qq and prior B -Besov measure pp � β n − 1 + 2 β + p ( α − β ) , if β ≤ α , c ǫ n = α n − 1 + 2 α , if β > α µ y �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability

Contraction rates φ w 0 ( ǫ n ) ≤ n ǫ 2 Find the largest ǫ n s.t. n � t • For White noise model y n = 1 0 u ( s ) d s + √ n B t , t ∈ [ 0 , 1 ] α + 1 with truth w 0 ∈ B β p qq and prior B -Besov measure pp � β n − 1 + 2 β + p ( α − β ) , if β ≤ α , c ǫ n = α n − 1 + 2 α , if β > α µ y �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability • Upper bounds on µ 0 ( ǫ B X + h ) enables study of lower bounds on concentration rates: → work in progress, recently established for p = 1

Final remarks • Convergence rates for MAPs for Gaussian priors N ICKL , VAN DE G EER , W ANG ’19 • Posterior contraction for nonlinear forward operator V OLLMER ’13 – pushforward µ 0 with G : elliptic inverse problem N ICKL ’17 – Bernstein-von Mises theorem: elliptic inverse problem • Generalised MAPs for discontinuous priors ( C LASON , H ELIN , K RETSCHMANN , P IIROINEN ’19 )

Posterior consistency in Bayesian inference with exponential priors - PowerPoint PPT Presentation

Posterior consistency in Bayesian inference with exponential priors Masoumeh Dashti University of Sussex Workshop on Optimization and Inversion under Uncertainty Linz, 12 November 2019 Based on joint work with S Agapiou (Cyprus), T Helin (LUT,

A O I Posterior View A O I Posterior View A O I

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

Section 33: Hip Structural Components 33-1 posterior posterior anterior anterior head of

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

FEASIBLE JOINT POSTERIOR BELIEFS BAYESIAN COMMUNICATION N Receivers: POSTERIOR s 1 S 1 p

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Scalable and Robust Bayesian Inference via the Median Posterior CS 584: Big Data Analytics

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Consistency algorithms Chapter 3 Fall 2010 1 Consistency methods Approximation of inference:

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W . Moore

New characterizations of completely monotone functions and Bernstein functions, a converse to

Supremacy Experiments Complexity-Theoretic Foundations of Quantum . . . . . . 1 / 29 UT

SzegTaikov inequality for conjugate polynomials Polina Glazyrina Ural Federal University

Some rigidity results for complete spacelike hypersurfaces with constant mean curvature in

Inverse problems in models of distribution of resources A. A. Shananin Contents 1.

Uniform bounds for positive random functionals with application to density estimation Oleg Lepski

Finding positive instances of parametric polynomials Salvador Lucas Departamento de Sistemas