Bayesian parameter estimation using Multilevel and multi-index Monte - PowerPoint PPT Presentation

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Bayesian parameter estimation using Multilevel and multi-index Monte Carlo Kody Law joint with A. Jasra (NUS), K. Kamatani (Osaka), Y. Xu (NUS*), & Y. Zhou (Cubist) Monash Workshop on Numerical Differential Equations and Applications Monash University, AU February 12, 2020

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Outline Multilevel Monte Carlo sampling 1 Bayesian inference problem 2 Our Bayesian inference problem 3 Approximate coupling 4 Particle Markov chain Monte Carlo 5 Particle Markov chain Multilevel Monte Carlo 6 Sequential Monte Carlo 2 7 Sequential Multilevel Monte Carlo 2 8 Numerical simulations 9 10 Multi-index Monte Carlo sampling 11 Summary

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Orientation Aim: Approximate posterior expectations of the state path and static parameters associated to an S(P)DE which must be finitely approximated. Solution: Apply an approximate coupling strategy so that multi-index Monte Carlo (MIMC) methods can be used within a particle MCMC [B02, AR08, ADH10] and SMC 2 [CJP13]. MLMC ( d = 1) [H00, G08] and MIMC ( d > 1) [HNT15] methods reduce cost to mean-squared error = O ( ε 2 ) ; Recently this methodology has been applied to inference , mostly in cases where target can be evaluated up to a normalizing constant [HSS13, DKST15, HTL16, BJLTZ17]. Here we can only simulate a non-negative unbiased estimator (utanc); using PMCMC we are able to sample consistently from an approximate coupling of successive targets [JKLZ18.i, JKLZ18.ii], and this is extended to the sequential context via SMC 2 [JLX19].

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Example: expectation for SDE [G08] Estimation of expectation of solution of intractable stochastic differential equation (SDE). dX = f ( X ) dt + σ ( X ) dW , X 0 = x 0 . Aim: estimate E ( g ( X T )) . We need to (1) Approximate, e.g. by Euler-Maruyama method with resolution h : √ ξ n ∼ N ( 0 , 1 ) . X n + 1 = X n + hf ( X n ) + h σ ( X n ) ξ n , (2) Sample { X ( i ) N T } N i = 1 , N T = T / h .

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Multilevel Monte Carlo (MLMC) Aim: Approximate η ∞ ( g ) := E η ∞ ( g ) for g : E → R . � N i = 1 g ( U ( i ) L ) , U ( i ) Single level estimator: 1 ∼ η L i.i.d. N L Cost to achieve MSE = O ( ε 2 ) is C = Cost ( U ( i ) L ) × ε − 2 . Multilevel estimator*: � L � N l i = 1 { g ( U ( i ) l ) − g ( U ( i ) 1 l − 1 ) } , l = 0 N l � ( U l , U l − 1 ) ( i ) ∼ ¯ η l i.i.d. such that η l du l − 1 , l = η l , l − 1 for ¯ l = 0 , . . . , L . (* g ( U ( i ) − 1 ) := 0) Cost is C ML = � L l = 0 C l N l , where C l is the cost to obtain a η l . sample from ¯ Fix bias by choosing L . Minimize cost C ML ( { N l } L l = 0 ) for � fixed variance = � L l = 0 V l / N l , ⇒ N l ∝ V l / C l . Example: Milstein solution of SDE for MSE = O ( ε 2 ) C = O ( ε − 3 ) C ML = O ( ε − 2 ) . vs .

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Illustration of pairwise coupling Pairwise coupling of trajectories of an SDE: √ X 1 n + 1 = X 1 n + hf ( X 1 h σ ( X 1 n )+ n ) ξ n , ξ n ∼ N ( 0 , 1 ) , n = 0 , . . . , N 1 √ X 0 n + 1 = X 0 n +( 2 h ) f ( X 0 h σ ( X 0 n )+ n )( ξ 2 n + ξ 2 n + 1 ) , n = 0 , . . . , N 1 / 2 . 1.6 � � � � 0.6 0 0 W X t 1.5 t � � � � 1 W 1 X 0.4 t t 1.4 0.2 1.3 1.2 0.0 1.1 0.2 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t t (a) Wiener process (b) Stochastic process driven by √ h � n W 1 i = 0 ξ n , W 0 n = W 1 n = 2 n . Wiener process.

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Bayesian inference is about approximating integrals Suppose we know how to evaluate γ ( x ) for x ∈ X. Let γ ( x ) dx � η ( dx ) = X γ ( x ) dx , and ϕ : X → R , and suppose we want to estimate � η ( ϕ ) := ϕ ( x ) η ( dx ) . X X may be quite high dimension, e.g. R d with d = 100 easily, or even 1000, 10000, etc...

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Monte Carlo If we could obtain i.i.d. samples x i ∼ η , then we could use N � η ( ϕ ) ≈ 1 ϕ ( x i ) . N i = 1 Convergence rate (of MSE) is O ( 1 / N ) , independently of d . Unfortunately we cannot get i.i.d. samples.

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Importance sampling and ratio estimators Suppose we can get i.i.d. samples x i ∼ ν where 0 < G ( x ) := γ ( x ) ν ( x ) < C . Then we can use the self-normalized importance sampling estimator � N i = 1 G ( x i ) ϕ ( x i ) η ( ϕ ) ≈ . � N i = 1 G ( x i ) The rate will still be O ( 1 / N ) , but typically with a constant O ( e d ) , depending on E ( G ( x ) − E G ( x )) 2 . We may as well use quadrature.

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Markov chain Monte Carlo Suppose we can construct a Markov chain K , that is an operator with the property K : B ( X ) → B ( X ) and K ∗ : P ( X ) → P ( X ) , where B ( X ) are bounded measurable functions and P ( X ) are probability measures, and such that � η ( dx ′ ) K ( x ′ , dx ) = η ( dx ) , ( η K )( dx ) = X and for all A ⊂ X, x , x ′ ∈ X, � � K ( x ′ , dz ) . K ( x , dz ) ≤ A A

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Markov chain Monte Carlo Then we can run the Markov chain to collect samples, x 0 ∈ X and x i ∼ K ( x i − 1 , · ) = K i ( x 0 , · ) and use these for Monte Carlo N b + N � η ( ϕ ) ≈ 1 ϕ ( x i ) . N i = N b + 1 Again Monte Carlo provides rate O ( 1 / N ) , but now under quite general conditions one may achieve polynomial constant O ( d ) .

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Example: Metropolis-Hastings Let Q denote a Markov kernel on X. Let x 0 ∈ X. 1 Sample x ∗ ∼ Q ( x i , · ) . 2 Set x i + 1 = x ∗ with probability: 3 � � 1 , γ ( x ∗ ) Q ( x ∗ , x i ) min , γ ( x i ) Q ( x i , x ∗ ) otherwise x i + 1 = x i . Set i = i + 1 and return to the start of (2). 4

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Parameter inference Estimate the posterior expectation of a function ϕ of the joint path X 1 : T and parameters θ , of an intractable S(P)DE dX = f θ ( X ) dt + σ θ ( X ) dW , X 0 ∼ µ θ , given noisy partial observations Y n ∼ g θ ( X n , · ) , n = 1 , . . . , T . Aim: estimate E [ ϕ ( θ, X 0 : T ) | y 1 : T ] , where y 1 : T := { y 1 , . . . , y T } . The hidden process { X n } is a Markov chain. Discretize with resolution h and denote the transition kernel � � F θ, h x p − 1 , d x p – this can be simulated from , but its density cannot be evaluated .

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Return to ML (SDE, for simplicity) The joint measure (suppressing fixed y p in notation) is n � Π h ( d θ, d x 0 : n ) ∝ Π( d θ ) µ θ ( d x 0 ) g θ ( x p , y p ) F θ, h ( x p − 1 , d x p ) , p = 1 For + ∞ > h 0 > · · · > h L > 0, we would like to compute L � � � E Π hl [ ϕ ( θ, X 0 : n )] − E Π hl − 1 [ ϕ ( θ, X 0 : n )] E Π hL [ ϕ ( θ, X 0 : n )] = l = 0 where E Π h − 1 [ · ] := 0.

Bayesian parameter estimation using Multilevel and multi-index Monte - PowerPoint PPT Presentation

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Bayesian parameter estimation using Multilevel and multi-index Monte Carlo Kody Law joint with A. Jasra (NUS), K. Kamatani (Osaka), Y. Xu (NUS*), & Y.

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

brms: Bayesian Multilevel Models using Stan Paul Brkner 2018-04-09 1 Why using Multilevel

Bayesian Methods for Parameter Estimation Bayesian vs Frequentist Inference Frequentist Chris

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Figure 1a: Multilevel Visualization of Market Power Cases in Tree Format Figure 1b: Multilevel

Agenda 1. More on multilevel R formulas 2. Generalized Multilevel Models 3. GMLM in R 1 More

Multilevel Krylov Methods Deflation Deflation, DD, MG Reinhard Nabben Multilevel Krylov

Bayesian hierarchical models Bruno Nicenboim / Shravan Vasishth 2020-03-14 1 Bayesian

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Overview Bayesian Methods for Parameter Estimation Introduction to Bayesian Statistics: Learning

DAETS: a Differential-Algebraic Equation Code in C++ for High Index and High Accuracy Ned

Combinatorial and colorful proofs of cyclic sieving phenomena Bruce Sagan Michigan State

Even more intriguing, if rather less plausible... Louis J. Billera Cornell University

lgebra Linear e Aplicaes EIGENVALUES AND EIGENVECTORS Motivation #1 We have so far been

CS681: Advanced Topics in Computational Biology Week 4, Lectures 1-2-3 Can Alkan EA224

Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University The Problem Input: Multiple

Restless bandits with controlled restarts: Indexability and computation of Whittle index Nima

Stitch: The Sound Type-Indexed Type Checker Richard A. Eisenberg Bryn Mawr College