advanced simulation lecture 13
play

Advanced Simulation - Lecture 13 Patrick Rebeschini February 26th, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 13 Patrick Rebeschini February 26th, 2018 Patrick Rebeschini Lecture 13 1/ 27 Outline Sequential Importance Sampling. Resampling step. Sequential Monte Carlo / Particle Filters. Patrick Rebeschini Lecture 13


  1. Advanced Simulation - Lecture 13 Patrick Rebeschini February 26th, 2018 Patrick Rebeschini Lecture 13 1/ 27

  2. Outline Sequential Importance Sampling. Resampling step. Sequential Monte Carlo / Particle Filters. Patrick Rebeschini Lecture 13 2/ 27

  3. Hidden Markov Models ... X 0 X 1 X 2 X T θ ... y 2 y T y 1 Figure: Graph representation of a general HMM. ( X t ): initial distribution µ θ , transition f θ . ( Y t ) given ( X t ): measurement g θ . Prior on the parameter θ ∈ Θ. Inference in HMMs , Capp´ e, Moulines, Ryden, 2005. Patrick Rebeschini Lecture 13 3/ 27

  4. General inference in HMM Proposition : The posterior p ( x 1: t | y 1: t , θ ) satisfies p ( x 1: t | y 1: t , θ ) = p ( x 1: t − 1 | y 1: t − 1 , θ ) f θ ( x t | x t − 1 ) g θ ( y t | x t ) p ( y t | y 1: t − 1 , θ ) where � p ( y t | y 1: t − 1 , θ ) = p ( x 1: t − 1 | y 1: t − 1 , θ ) f θ ( x t | x t − 1 ) g θ ( y t | x t ) dx 1: t . Proposition : The marginal posterior p ( x t | y 1: t ) satisfies the following recursion � p ( x t | y 1: t − 1 ) = f ( x t | x t − 1 ) p ( x t − 1 | y 1: t − 1 ) dx t − 1 p ( x t | y 1: t ) = g ( y t | x t ) p ( x t | y 1: t − 1 ) p ( y t | y 1: t − 1 ) where � p ( y t | y 1: t − 1 ) = g ( y t | x t ) p ( x t | y 1: t − 1 ) dx t . Patrick Rebeschini Lecture 13 4/ 27

  5. General inference in HMM In general, the filtering problem is thus intractable: � � ϕ ( x t ) p ( x t | y 1: t , θ ) dx t = ϕ ( x t ) p ( x 1: t , y 1: t | θ ) dx 1: t t t � � � = ϕ ( x t ) µ θ ( x 1 ) f θ ( x s | x s − 1 ) g θ ( y s | x s ) dx 1: t . s =1 s =1 It is a t × dim( X ) dimensional integral. The likelihood is also intractable: � p ( y 1: t | θ ) = p ( x 1: t , y 1: t | θ ) dx 1: t t t � � � = µ θ ( x 1 ) f θ ( x s | x s − 1 ) g θ ( y s | x s ) dx 1: t . s =1 s =1 Thus we cannot compute it pointwise, e.g. to perform Metropolis–Hastings algorithm on the parameters. Patrick Rebeschini Lecture 13 5/ 27

  6. Sequential Importance Sampling We now consider the parameter θ to be fixed. We want to infer X 1: t given y 1: t . Two ingredients: importance sampling, and “sampling via composition”, or “via condition”. IS: if we have a weighted sample ( w i 1 , X i ) approximating π 1 , then ( w i 2 , X i ) approximates π 2 if we define 1 × π 2 ( X i ) w i 2 = w i π 1 ( X i ) . In standard IS, π 1 and π 2 are defined on the same space. Patrick Rebeschini Lecture 13 6/ 27

  7. Sequential Importance Sampling Sampling via composition: if ( w i , X i ) approximates p X ( x ), and if Y i ∼ q Y | X ( y | X i ), then ( w i , ( X i , Y i )) approximates p X ( x ) q Y | X ( y | x ). The space has been extended. Marginally, ( w i , Y i ) approximates � q Y ( y ) = p X ( x ) q Y | X ( y | x ) dx. Sequential Importance Sampling combines both ingredients to iteratively approximate p ( x 1: t | y 1: t ). Patrick Rebeschini Lecture 13 7/ 27

  8. Sequential Importance Sampling: algorithm At time t = 1 Sample X i 1 ∼ q 1 ( · ) . Compute the weights 1 = µ ( X i 1 ) g ( y 1 | X i 1 ) w i . � X i � q 1 1 At time t ≥ 2 Sample X i t ∼ q t | t − 1 ( ·| X i t − 1 ) . Compute the weights w i t = w i t − 1 × ω i t � g � X i � X i � � y t | X i � t − 1 × f t t − 1 t = w i . q t | t − 1 ( X i t | X i t − 1 ) Patrick Rebeschini Lecture 13 8/ 27

  9. Sequential Importance Sampling: example 4 2 0 x −2 −4 −6 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of filtering means E ( x t | y 1: t ). Patrick Rebeschini Lecture 13 9/ 27

  10. Sequential Importance Sampling: example 2.0 1.5 Var ( x ) 1.0 0.5 0.0 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of filtering variances V ( x t | y 1: t ). Patrick Rebeschini Lecture 13 10/ 27

  11. Sequential Importance Sampling: example 0 −100 log ( L ( y 1 : t )) −200 −300 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of marginal log likelihoods log p ( y 1: t ). Patrick Rebeschini Lecture 13 11/ 27

  12. Sequential Importance Sampling: example 800 600 ESS 400 200 0 0 25 50 75 100 time prior optimal Figure: Effective sample size over time. Patrick Rebeschini Lecture 13 12/ 27

  13. Sequential Importance Sampling: example 10 5 0 x −5 −10 0 25 50 75 100 time Figure: Spread of 100 paths drawn from the prior proposal, and KF means in blue. Darker lines indicate higher weights. Patrick Rebeschini Lecture 13 13/ 27

  14. Resampling Idea: at time t , select particles with high weights, and remove particles with low weights. Spend the fixed computational budget “ N ” on the most promising paths. Obtain an equally weighted sample ( N − 1 , ¯ X i ) from a weighted sample ( w i , X i ). Resampling on empirical probability measures: input w j � − 1 � �� π N ( x ) = w i δ X i ( x ) and output π N ( x ) = N − 1 � ¯ δ ¯ X i ( x ) . Patrick Rebeschini Lecture 13 14/ 27

  15. Multinomial resampling How to draw from an empirical probability distribution? N 1 π N ( x ) = � w i δ X i ( x ) � N j =1 w j i =1 Remember how to draw from a mixture model? K ω i p i ( x ) � i =1 Draw k with probabilities ω 1 , . . . , ω N , then draw from p k . Patrick Rebeschini Lecture 13 15/ 27

  16. Multinomial resampling Draw an “ancestry vector” ∈ { 1 , . . . , N } N independently from a A 1: N = � A 1 , . . . , A N � categorical distribution A 1: N i.i.d � w 1 , . . . , w N � ∼ C at , in other words w k A i = k � � ∀ i ∈ { 1 , . . . , N } ∀ k ∈ { 1 , . . . , N } P = j =1 w j . � N X i to be X A i for all i ∈ { 1 , . . . , N } . X A i is said to Define ¯ be the “parent” or “ancestor” of ¯ X i . � ¯ X N � Return ¯ X 1 , . . . , ¯ X = . Patrick Rebeschini Lecture 13 16/ 27

  17. Multinomial resampling Draw an “offspring vector” ∈ { 0 , . . . , N } N from a multinomial O 1: N = � O 1 , . . . , O N � distribution � N ; w 1 , . . . , w N � O 1: N ∼ M ultinomial t so that N w i O i = N. � O i � � ∀ i ∈ { 1 , . . . , N } E = N and � N j =1 w j i =1 Each particle X i is replicated O i times (possibly zero times) to create the sample ¯ X : ¯ X ← {} � ¯ X, X i � t , ¯ For i = 1 , . . . , N , for k = 0 , . . . , O i X ← � ¯ X N � Return ¯ X 1 , . . . , ¯ X = . Patrick Rebeschini Lecture 13 17/ 27

  18. Multinomial resampling Other strategies are possible to perform resampling. Some properties are desirable: w i � O i � = N E j =1 w j , � N w k A i = k � � or = P j =1 w j . � N This is sometimes called “unbiasedness”, because then � ¯ N N 1 = 1 X k � � X k � � � O k ϕ ϕ N N k =1 k =1 has expectation N w k � X k � � j =1 w j ϕ . � N k =1 Patrick Rebeschini Lecture 13 18/ 27

  19. Sequential MC / Sequential Importance Resampling At time t = 1 Sample X i 1 ∼ q 1 ( · ) . Compute the weights 1 = µ ( X i 1 ) g ( y 1 | X i 1 ) w i . � X i � q 1 1 At time t ≥ 2 � → N − 1 , X i � � � w i t − 1 , X i . Resample 1: t − 1 1: t − 1 � ¯ t ∼ q t | t − 1 ( ·| ¯ � Sample X i X i t − 1 ), X i X i 1: t − 1 , X i 1: t := t Compute the weights � g � X i � X i � � y t | X i � t = f t t − 1 t w i t = ω i . q t | t − 1 ( X i t | X i t − 1 ) Patrick Rebeschini Lecture 13 19/ 27

  20. Sequential Importance Sampling (a) (b) (c) Level sets of likelihood function ! g ( x, Y n ) Y n ! x − Patrick Rebeschini Lecture 13 20/ 27

  21. Sequential Importance Resampling (a) (b) (c) (d) Level sets of 3%par:cles% likelihood function % 2%par:cles% Y n ! x − ! g ( x, Y n ) % N N Patrick Rebeschini Lecture 13 21/ 27

  22. Sequential Monte Carlo: example 2 0 x −2 −4 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of filtering means E ( x t | y 1: t ). Patrick Rebeschini Lecture 13 22/ 27

  23. Sequential Monte Carlo: example 1.0 0.8 Var ( x ) 0.6 0.4 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of filtering variances V ( x t | y 1: t ). Patrick Rebeschini Lecture 13 23/ 27

  24. Sequential Monte Carlo: example 0 −50 log ( L ( y 1 : t )) −100 −150 −200 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of marginal log likelihoods log p ( y 1: t ). Patrick Rebeschini Lecture 13 24/ 27

  25. Sequential Monte Carlo: example 1000 750 ESS 500 250 0 0 25 50 75 100 time prior optimal Figure: Effective sample size over time. Patrick Rebeschini Lecture 13 25/ 27

  26. Sequential Monte Carlo: example 6 3 0 x −3 −6 0 25 50 75 100 time Figure: Support of the approximation of p ( x t | y 1: t ), over time. Patrick Rebeschini Lecture 13 26/ 27

  27. Sequential Monte Carlo: example 2.5 0.0 x −2.5 −5.0 0 25 50 75 100 time Figure: Trajectories ¯ X i 1: T , for i ∈ { 1 , . . . , N } and N = 100. Patrick Rebeschini Lecture 13 27/ 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend