advanced simulation lecture 16
play

Advanced Simulation - Lecture 16 Patrick Rebeschini March 7th, 2018 - PowerPoint PPT Presentation

Advanced Simulation - Lecture 16 Patrick Rebeschini March 7th, 2018 Patrick Rebeschini Lecture 16 1/ 25 Outline Particle methods for static problems. Evidence estimation. The End! Patrick Rebeschini Lecture 16 2/ 25 Sequence of


  1. Advanced Simulation - Lecture 16 Patrick Rebeschini March 7th, 2018 Patrick Rebeschini Lecture 16 1/ 25

  2. Outline Particle methods for static problems. Evidence estimation. The End! Patrick Rebeschini Lecture 16 2/ 25

  3. Sequence of posterior distributions Consider the problem of estimating � ϕ ( x ) π ( x ) dx for test function ϕ : X → R and target distribution π on X . Introduce intermediate distributions π 0 , π 1 , . . . , π T such that: π 0 is easy to sample from, π t and π t +1 are not too different, π T = π . Patrick Rebeschini Lecture 16 3/ 25

  4. Sequence of posterior distributions Case where π ( θ ) ∝ p ( θ ) p ( y 1: T | θ ). Introduce the partial posterior π t ( θ ) ∝ p ( θ ) p ( y 1: t | θ ). We have π 0 ( θ ) = p ( θ ), the prior distribution. We have π T ( θ ) = π ( θ ) = p ( θ | y 1: T ), the full posterior distribution. We can expect π t ( θ ) to be similar to π t +1 ( θ ), at least when t is large, when the data is i.i.d. Patrick Rebeschini Lecture 16 4/ 25

  5. Sequence of posterior distributions General target distribution π ( x ). Introduce a simple parametric distribution q . We can introduce π t ( x ) ∝ π ( x ) γ t q ( x ) 1 − γ t where 0 = γ 0 < γ 1 < . . . < γ T = 1. Then π 0 = q and π T = π . If γ t is close to γ t +1 , then π t is close to π t +1 . Patrick Rebeschini Lecture 16 5/ 25

  6. Sequential Importance Sampling: algorithm At time t = 0 Sample X i ∼ π 0 ( · ) for i ∈ { 1 , . . . , N } . Compute the weights 0 = 1 w i ∀ i ∈ { 1 , . . . , N } N . At time t ≥ 1 Compute the weights w i t = w i t − 1 × ω i t π t ( X i ) = w i t − 1 × π t − 1 ( X i ) . At all times, ( w i t , X i ) N i =1 approximates π t . Patrick Rebeschini Lecture 16 6/ 25

  7. Sequential Importance Sampling: diagnostics As already seen for IS, we can compute the effective sample size � 2 �� N i =1 w i 1 t � = ESS t = � 2 . �� N � n i =1 ( w i � W i t ) 2 i =1 t t = N − 1 for all i . ESS t = N if W i t ≈ 1, and for j � = i , W j If there exists i such that W i t ≈ 0, then ESS t ≈ 1. As a rule of thumb, the higher the ESS the better our approximation. Patrick Rebeschini Lecture 16 7/ 25

  8. Toy model Data y t ∼ N ( µ, 1), 1000 points generated with µ ⋆ = 2. Prior µ ∼ N (0 , 1) Sequence of partial posterior π t ( µ ) ∝ p ( µ ) � t s =1 p ( y s | µ ). Incremental weights: π t ( µ ) π t − 1 ( µ ) ∝ p ( y t | µ ) We can look at the evolution of the effective sample size with t . Patrick Rebeschini Lecture 16 8/ 25

  9. Toy model 1024 128 ESS 32 2 0 250 500 750 1000 time Figure: Effective sample size against “time”, using sequential importance sampling. Patrick Rebeschini Lecture 16 9/ 25

  10. Sequential Importance Sampling with Resampling At time t = 0 Sample X i 0 ∼ π 0 ( · ) for i ∈ { 1 , . . . , N } . Compute the weights 0 = 1 w i ∀ i ∈ { 1 , . . . , N } N . At time t ≥ 1 � → � N − 1 , X i � � w i t − 1 , X i . Resample t − 1 t − 1 t = ¯ Define X i X i t − 1 . Compute the weights π t ( X i t ) w i t = ω i t = t ) . π t − 1 ( X i Problem: there are less and less unique values in ( X i t ) N i =1 . Patrick Rebeschini Lecture 16 10/ 25

  11. Toy model 1024 128 ESS 32 2 0 250 500 750 1000 time Figure: Effective sample size against “time”, using sequential importance sampling with resampling. Patrick Rebeschini Lecture 16 11/ 25

  12. Toy model 1024 128 # unique 32 2 0 250 500 750 1000 time Figure: Number of unique values against “time”, using sequential importance sampling with resampling. Patrick Rebeschini Lecture 16 12/ 25

  13. Move steps Consider particles ( N − 1 , ¯ X i t ) N i =1 , approximating π t . The ESS is maximum (= N ), but multiple values within ( ¯ X i t ) N i =1 are identical. How to diversify the particles while still approximating π t ? Apply a Markov kernel K t to each ¯ X i t : t ∼ K t ( ¯ X i X i ∀ i ∈ { 1 , . . . , N } t , dx ) Can we find K t such that ( N − 1 , X i t ) still approximates π t ? e.g. N 1 � a.s. � ϕ ( X i t ) − − − − → ϕ ( x ) π t ( x ) dx. N N →∞ i =1 Patrick Rebeschini Lecture 16 13/ 25

  14. Move steps Assume that K t leaves π t invariant: � π t ( x ) K t ( x, y ) dx = π t ( y ) . If ¯ t ∼ K t ( ¯ X i t ∼ π t , then X i X i t , dy ) also follows π t . Also, if ¯ X ∼ q and w ( x ) = π ( x ) /q ( x ), such that E q [ w ( ¯ X ) ϕ ( ¯ X )] = E π [ ϕ ( X )] then, for X ∼ K ( ¯ X, dx ), E qK [ w ( ¯ X ) ϕ ( X )] = E π [ ϕ ( X )] . Patrick Rebeschini Lecture 16 14/ 25

  15. Move steps Result: if ( w i , ¯ X i ) approximates π , then we can apply a X i and not change the weights. π -invariant kernel to each ¯ Draw X i ∼ K ( ¯ X i , dx ) for all i ∈ { 1 , . . . , N } . Keep the weights unchanged. ( w i , X i ) still approximates π . Patrick Rebeschini Lecture 16 15/ 25

  16. Sequential Monte Carlo Sampler: algorithm At time t = 0 Sample X i 0 ∼ π 0 ( · ) for i ∈ { 1 , . . . , N } . Compute the weights 0 = 1 w i ∀ i ∈ { 1 , . . . , N } N . At time t ≥ 1 � → � N − 1 , X i � � w i t − 1 , X i Resample . t − 1 t − 1 Draw t ∼ K t − 1 ( ¯ X i X i t − 1 , dx ) . Compute the weights π t ( X i t ) w i t = ω i t = t ) . π t − 1 ( X i At all times, ( w i t , X i t ) N i =1 approximates π t . Patrick Rebeschini Lecture 16 16/ 25

  17. Adaptive resampling Problem: if K t is a Metropolis-Hastings with invariant distribution π t , computational cost typically linear in t . Thus applying K t at each step t ∈ { 1 , . . . , T } yields an overall cost in T � O ( t ) = O ( T 2 ) t =1 We can save time by performing the resample-move step only when necessary. Use the Effective Sample Size to know whether to trigger a resample-move step. Patrick Rebeschini Lecture 16 17/ 25

  18. Sequential Monte Carlo Sampler: algorithm At time t = 0 Sample X i 0 ∼ π 0 ( · ) for i ∈ { 1 , . . . , N } . Compute the weights 0 = 1 w i ∀ i ∈ { 1 , . . . , N } N At time t ≥ 1 If ESS < ˜ N , then: � � i Resample � w i t − 1 , X i � N − 1 , X . → t − 1 t − 1 Draw t ∼ K t − 1 ( ¯ X i X i t − 1 , dx ) . Compute the weights π t ( X i t ) w i t = w i t − 1 × t ) . π t − 1 ( X i At all times, ( w i t , X i t ) N i =1 approximates π t . Patrick Rebeschini Lecture 16 18/ 25

  19. Toy model 1024 128 ESS 32 2 0 250 500 750 1000 time Figure: Effective sample size against “time”, using sequential Monte Carlo sampling. Dashed lines indicate resampling times. Patrick Rebeschini Lecture 16 19/ 25

  20. Toy model 1024 128 Nunique 32 2 0 250 500 750 1000 time Figure: Number of unique values against “time”, using sequential Monte Carlo sampling. Dashed lines indicate resampling times. Patrick Rebeschini Lecture 16 20/ 25

  21. Toy model 15 10 π T ( µ ) 5 0 1.90 1.95 2.00 2.05 2.10 2.15 µ Figure: Posterior approximation (in black), and true posterior (in red), after 1000 observations. Patrick Rebeschini Lecture 16 21/ 25

  22. Evidence estimation Assume ( w i t , X i t ) N i =1 approximates the posterior distribution π t ( θ ) ∝ p ( θ ) p ( y 1: t | θ ) at time t . Then the following estimator � N i =1 w i t p ( y t +1 | X i t ) � N i =1 w i t converges to � p ( y t +1 | θ ) π t ( θ ) dθ p ( y t +1 | θ ) p ( θ ) p ( y 1: t | θ ) � = dθ p ( y 1: t ) 1 � = p ( y 1: t +1 | θ ) p ( θ ) dθ = p ( y t +1 | y 1: t ) . p ( y 1: t ) Similar to the likelihood estimator in particle filters. Patrick Rebeschini Lecture 16 22/ 25

  23. Toy model We compare the SMC estimator with the estimator obtained by importance sampling from the prior distribution: X i ∼ p ( θ ) , ∀ i ∈ { 1 , . . . , N } L i t = p ( y 1: t | X i ) , ∀ i ∈ { 1 , . . . , N } N p N ( y 1: t ) = 1 � L i t . N i =1 We plot the following relative error t = | p N ( y 1: t ) − p ( y 1: t ) | r N . | p ( y 1: t ) | Patrick Rebeschini Lecture 16 23/ 25

  24. Toy model IS SMC 1e−01 1e−02 1e−03 1e−04 N 1e−05 r t 1e−06 1e−07 0 250 500 750 10000 250 500 750 1000 time Figure: Relative error using SMC samplers and importance sampling from the prior; 10 independent experiments. Patrick Rebeschini Lecture 16 24/ 25

  25. The End Inversion, Transformation, Composition, Accept-reject, Importance Sampling, Metropolis–Hastings, Gibbs sampling, Adaptive Multiple Importance Sampling, Reversible Jump, Slice Sampling, Sequential Importance Sampling, Particle filter, Pseudo-marginal Metropolis–Hastings, Sequential Monte Carlo Sampler. . . What about Nested Sampling, Path Sampling, Hamiltonian MCMC, Pinball Sampler, Sequential Quasi-Monte Carlo, Multiple-Try Metropolis–Hastings, Coupling From the Past, Multilevel Splitting, Wang–Landau algorithm, Free Energy MCMC, Stochastic Gradient Langevin Dynamics, Firefly MCMC, Configurational Bias Monte Carlo. . . ? Good luck for the exams! Patrick Rebeschini Lecture 16 25/ 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend