Particle Learning and Smoothing Hedibert Freitas Lopes The University of Chicago Booth School of Business September 18th 2009 Instituto Nacional de Pesquisas Espaciais S˜ ao Jos´ e dos Campos, Brazil Joint with Nick Polson, Carlos Carvalho and Mike Johannes 1
Outline of the talk Let the general dynamic model be Observation equation : p ( y t +1 | x t +1 , θ ) p ( x t +1 | x t , θ ) System equation : We will talk about.... ◮ MCMC in normal dynamic linear models. ◮ Particle filters: learning states x t +1 . ◮ Particle filters: learning parameters θ . ◮ Particle Learning (PL) framework. ◮ More general dynamic models. ◮ Final remarks. 1
Normal dynamic linear model (NDLM) West and Harrison (1997): N ( F t +1 x t +1 , σ 2 y t +1 | x t +1 ∼ t +1 ) N ( G t +1 x t , τ 2 x t +1 | x t ∼ t +1 ) Hidden Markovian structure Sequential learning p ( x t | y t ) = ⇒ p ( x t +1 | y t ) = ⇒ p ( x t +1 | y t +1 ) ⇒ p ( y t +1 | x t ) = where y t = ( y 1 , . . . , y t ). 2
Example i. Local level model Let θ = ( σ 2 , τ 2 ), x 0 ∼ N ( m 0 , C 0 ) and N ( x t +1 , σ 2 ) y t +1 | x t +1 , θ ∼ N ( x t , τ 2 ) x t +1 | x t , θ ∼ Kalman filter recursions ◮ Posterior at t : ( x t | y t ) ∼ N ( m t , C t ) ◮ Prior at t + 1: ( x t +1 | y t ) ∼ N ( m t , R t +1 ) ◮ Predictive at t + 1: ( y t +1 | y t ) ∼ N ( m t , Q t +1 ) ◮ Posterior at t + 1: ( x t +1 | y t +1 ) ∼ N ( m t +1 , C t +1 ) where R t +1 = C t + τ 2 , Q t +1 = R t +1 + σ 2 , A t +1 = R t +1 / Q t +1 , C t +1 = A t +1 σ 2 , and m t +1 = (1 − A t +1 ) m t + A t +1 y t +1 . 3
Example i. Backward smoothing For t = n , x n | y n ∼ N ( m n n , C n n ), where m n = m n n C n = C n n For t < n , x t | y n ∼ N ( m n t , C n t ), where m n (1 − B t ) m t + B t m n = t t +1 C n (1 − B t ) C t + B 2 t C n = t +1 t and C t B t = C t + τ 2 4
Example i. Backward sampling For t = n , x n | y n ∼ N ( a n n , R n n ), where a n = m n n R n = C n n For t < n , x t | x t +1 , y n ∼ N ( a n t , R n t ), where a n = (1 − B t ) m t + B t x t +1 t B t τ 2 R n = t and C t B t = C t + τ 2 This is basically the Forward filtering, backward sampling (FFBS) algorithm commonly used to sample from p ( x n | y n ) (Carter and Kohn, 1994 and Fr¨ uhwirth-Schnatter, 1994). 5
Example i. n = 100, σ 2 = 1 . 0, τ 2 = 0 . 5 and x 0 = 0 10 8 y(t) x(t) 6 4 2 0 −2 −4 0 20 40 60 80 100 time 6
Example i. p ( x t | y t , θ ) and p ( x t | y n , θ ) for t ≤ n . m 0 = 0 . 0 and C 0 = 10 . 0 10 8 Forward filtering Backward smoothing 6 4 2 0 −2 −4 0 20 40 60 80 100 time 7
Non-Gaussian, nonlinear dynamic models The dynamic model is p ( y t +1 | x t +1 ) and p ( x t +1 | x t ) for t = 1 , . . . , n and p ( x 0 ). Prior and posterior at time t + 1, i.e. � p ( x t +1 | y t ) p ( x t +1 | x t ) p ( x t | y t ) dx t = p ( x t +1 | y t +1 ) p ( y t +1 | x t +1 ) p ( x t +1 | y t ) ∝ are usually unavailable in closed form . Over the last 20 years: ◮ Carlin, Polson and Stoffer (1992) for more general DMs; ◮ Carter and Kohn (1994) and Fr¨ uhwirth-Schnatter (1994) for conditionally Gaussian DLMs; ◮ Gamerman (1998) for generalized DLMs. 8
The Bayesian boostrap filter (BBF) Gordon, Salmond and Smith (1993) use a propagate-sample scheme to go from p ( x t | y t ) to p ( x t +1 | y t ) to p ( x t − 1 | y t − 1 ). 9
Resample or not resample? PARTICLE PATHS 3 2 1 0 −1 −2 0 50 100 150 Time 10
Weights PARTICLE WEIGHTS 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 100 150 Time 11
Fully adapted BBF (FABBF) ◮ Posterior at t : { x ( i ) t } N i =1 ∼ p ( x t | y t ). x ( i ) ◮ Propagate Draw { ˜ t +1 } N i =1 from p ( x t +1 | x ( i ) t , y t +1 ) . ◮ Resample Draw { x ( j ) x ( i ) t +1 } N t +1 } N j =1 from { ˜ i =1 with weights w ( i ) t +1 ∝ p ( y t +1 | x ( i ) t ) . ◮ Posterior at t + 1: { x ( i ) t +1 } N i =1 ∼ p ( x t +1 | y t +1 ). 12
The auxiliary particle filter (APF) ◮ Posterior at t : { x ( i ) t } N i =1 ∼ p ( x t | y t ). x ( j ) j =1 from { x ( i ) ◮ Resample Draw { ˜ t } N t } N i =1 with weights w ( i ) t +1 ∝ p ( y t +1 | g ( x ( i ) t )) where g ( x t ) = E ( x t | x t − 1 ). ◮ Propagate Draw { x ( i ) t +1 } N i =1 from x ( i ) p ( x t +1 | ˜ t ) and resample (SIR) with weights p ( y t +1 | x ( j ) t +1 ) ω ( j ) t +1 ∝ . p ( y t +1 | g ( x ( k j ) )) t ◮ Posterior at t + 1: { x ( i ) t +1 } N i =1 ∼ p ( x t +1 | y t +1 ). 13
How about p ( θ | y n )? Two-step strategy: On the first step, approximate p ( θ | y n ) by p N ( θ | y n ) = p N ( y n | θ ) p ( θ ) ∝ p N ( y n | θ ) p ( θ ) p ( y n ) where p N ( y n | θ ) is a SMC approximation to p ( y n | θ ). Then, on the 2nd step, sample θ via a MCMC scheme or a SIR scheme 1 . Problem 1: SMC looses its appealing sequential nature. Problem 2: Overall sampling scheme is sensitive to p N ( y | θ ). 1See Fern´ andes-Villaverde and Rubio-Ram´ ırez (2007) “Estimating Macroeconomic Models: A Likelihood Approach”, DeJong, Dharmarajan, Liesenfeld, Moura and Richard (2009) “Efficient Likelihood Evaluation of 14 State-Space Representations” for applications of this two-step strategy to DSGE and related models.
Recommend
More recommend