metropolis hastings algorithm
play

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019 Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 1 / 32 Outline Metropolis-Hastings algorithm Independence proposal Random-walk


  1. Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019 Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 1 / 32

  2. Outline Metropolis-Hastings algorithm Independence proposal Random-walk proposal Optimal tuning parameter Binomial example Normal example Binomial hierarchical example Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 2 / 32

  3. Metropolis-Hastings algorithm Metropolis-Hastings algorithm Let p ( θ | y ) be the target distribution and θ ( t ) be the current draw from p ( θ | y ) . The Metropolis-Hastings algorithm performs the following 1. propose θ ∗ ∼ g ( θ | θ ( t ) ) 2. accept θ ( t +1) = θ ∗ with probability min { 1 , r } where r = r ( θ ( t ) , θ ∗ ) = p ( θ ∗ | y ) /g ( θ ∗ | θ ( t ) ) p ( θ ( t ) | y ) /g ( θ ( t ) | θ ∗ ) = p ( θ ∗ | y ) g ( θ ( t ) | θ ∗ ) p ( θ ( t ) | y ) g ( θ ∗ | θ ( t ) ) otherwise, set θ ( t +1) = θ ( t ) . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 3 / 32

  4. Metropolis-Hastings algorithm Metropolis-Hastings algorithm Suppose we only know the target up to a normalizing constant, i.e. p ( θ | y ) = q ( θ | y ) /q ( y ) where we only know q ( θ | y ) . The Metropolis-Hastings algorithm performs the following 1. propose θ ∗ ∼ g ( θ | θ ( t ) ) 2. accept θ ( t +1) = θ ∗ with probability min { 1 , r } where r = r ( θ ( t ) , θ ∗ ) = p ( θ ∗ | y ) g ( θ ∗ | θ ( t ) ) = q ( θ ∗ | y ) /q ( y ) g ( θ ( t ) | θ ∗ ) g ( θ ( t ) | θ ∗ ) g ( θ ∗ | θ ( t ) ) = q ( θ ∗ | y ) g ( θ ( t ) | θ ∗ ) p ( θ ( t ) | y ) q ( θ ( t ) | y ) /q ( y ) q ( θ ( t ) | y ) g ( θ ∗ | θ ( t ) ) otherwise, set θ ( t +1) = θ ( t ) . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 4 / 32

  5. Metropolis-Hastings algorithm Two standard Metropolis-Hastings algorithms Independent Metropolis-Hastings Independent proposal, i.e. g ( θ | θ ( t ) ) = g ( θ ) Random-walk Metropolis Symmetric proposal, i.e. g ( θ | θ ( t ) ) = g ( θ ( t ) | θ ) for all θ, θ ( t ) . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 5 / 32

  6. Independence Metropolis-Hastings Independence Metropolis-Hastings Let p ( θ | y ) ∝ q ( θ | y ) be the target distribution, θ ( t ) be the current draw from p ( θ | y ) , and g ( θ | θ ( t ) ) = g ( θ ) , i.e. the proposal is independent of the current value. The independence Metropolis-Hastings algorithm performs the following 1. propose θ ∗ ∼ g ( θ ) 2. accept θ ( t +1) = θ ∗ with probability min { 1 , r } where q ( θ ( t ) | y ) /g ( θ ( t ) ) = q ( θ ∗ | y ) q ( θ ∗ | y ) /g ( θ ∗ ) g ( θ ( t ) ) r = g ( θ ∗ ) q ( θ ( t ) | y ) otherwise, set θ ( t +1) = θ ( t ) . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 6 / 32

  7. Independence Metropolis-Hastings Intuition through examples proposed= −1 proposed= 0 proposed= 1 0.4 0.3 current= −1 0.2 distribution 0.1 proposal 0.0 target 0.4 0.3 accept current= 0 0.2 FALSE y TRUE 0.1 0.0 value 0.4 current 0.3 current= 1 proposed 0.2 0.1 0.0 −2 −1 0 1 2 3 −2 −1 0 1 2 3 −2 −1 0 1 2 3 theta Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 7 / 32

  8. Independence Metropolis-Hastings Example: Normal-Cauchy model Let Y ∼ N ( θ, 1) with θ ∼ Ca (0 , 1) such that the posterior is p ( θ | y ) ∝ p ( y | θ ) p ( θ ) ∝ exp( − ( y − θ ) 2 / 2) 1 + θ 2 Use N ( y, 1) as the proposal, then the Metropolis-Hastings acceptance probability is the min { 1 , r } with q ( θ ∗ | y ) g ( θ ( t ) ) r = g ( θ ∗ ) q ( θ ( t ) | y ) exp( − ( y − θ ∗ ) 2 / 2) / 1+( θ ∗ ) 2 exp( − ( θ ( t ) − y ) 2 / 2) = exp( − ( θ ∗ − y ) 2 / 2) exp( − ( y − θ ( t ) ) 2 / 2) / 1+( θ ( t ) ) 2 = 1+( θ ( t ) ) 2 1+( θ ∗ ) 2 Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 8 / 32

  9. Independence Metropolis-Hastings Example: Normal-Cauchy model 0.4 distribution density proposal target 0.2 0.0 −2 0 2 theta Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 9 / 32

  10. Independence Metropolis-Hastings Example: Normal-Cauchy model Independence Metropolis−Hastings 1 θ 0 −1 0 25 50 75 100 Iteration (t) Independence Metropolis−Hastings (poor starting value) 10.0 7.5 5.0 θ 2.5 0.0 0 25 50 75 100 Iteration (t) Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 10 / 32

  11. Independence Metropolis-Hastings Need heavy tails Recall that rejection sampling requires the proposal to have heavy tails and importance sampling is efficient only when the proposal has heavy tails. Independence Metropolis-Hastings also requires heavy tailed proposals for efficiency since if θ ( t ) is in a region where p ( θ ( t ) | y ) >> g ( θ ( t ) ) , i.e. target has heavier tails than the proposal, then any proposal θ ∗ such that p ( θ ∗ | y ) ≈ g ( θ ∗ ) , i.e. in the center of the target and proposal, will result in r = g ( θ ( t ) ) p ( θ ∗ | y ) g ( θ ∗ ) ≈ 0 p ( θ ( t ) | y ) and few samples will be accepted. Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 11 / 32

  12. Independence Metropolis-Hastings Need heavy tails - example Suppose θ | y ∼ Ca (0 , 1) and we use a standard normal as a proposal. Then 0.4 0.3 density value 0.2 target proposal 0.1 0.0 −4 −2 0 2 4 x 100 target / proposal 10 1 −4 −2 0 2 4 theta Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 12 / 32

  13. Independence Metropolis-Hastings Need heavy tails 2 θ 0 −2 0 250 500 750 1000 Iteration (t) Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 13 / 32

  14. Random-walk Metropolis Random-walk Metropolis Let p ( θ | y ) ∝ q ( θ | y ) be the target distribution, θ ( t ) be the current draw from p ( θ | y ) , and g ( θ ∗ | θ ( t ) ) = g ( θ ( t ) | θ ∗ ) , i.e. the proposal is symmetric. The Metropolis algorithm performs the following 1. propose θ ∗ ∼ g ( θ | θ ( t ) ) 2. accept θ ( t +1) = θ ∗ with probability min { 1 , r } where r = q ( θ ∗ | y ) g ( θ ∗ | θ ( t ) ) = q ( θ ∗ | y ) g ( θ ( t ) | θ ∗ ) q ( θ ( t ) | y ) q ( θ ( t ) | y ) otherwise, set θ ( t +1) = θ ( t ) . This is also referred to as random-walk Metropolis. Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 14 / 32

  15. Random-walk Metropolis Stochastic hill climbing Notice that r = q ( θ ∗ | y ) /q ( θ ( t ) | y ) and thus will accept whenever the target density is larger when evaluated at the proposed value than it is when evaluated at the current value. Suppose θ | y ∼ N (0 , 1) , θ ( t ) = 1 , and θ ∗ ∼ N ( θ ( t ) , 1) . 0.4 Target Proposal 0.3 dnorm(x) 0.2 0.1 0.0 −3 −2 −1 0 1 2 3 x Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 15 / 32

  16. Random-walk Metropolis Example: Normal-Cauchy model Let Y ∼ N ( θ, 1) with θ ∼ Ca (0 , 1) such that the posterior is p ( θ | y ) ∝ p ( y | θ ) p ( θ ) ∝ exp( − ( y − θ ) 2 / 2) 1 + θ 2 Use N ( θ ( t ) , v 2 ) as the proposal, then the acceptance probability is the min { 1 , r } with r = q ( θ ∗ | y ) p ( y | θ ∗ ) p ( θ ∗ ) q ( θ ( t ) | y ) = p ( y | θ ( t ) ) p ( θ ( t ) ) . For this example, let v 2 = 1 . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 16 / 32

  17. Random-walk Metropolis Example: Normal-Cauchy model Random−walk Metropolis 2 1 0 θ −1 −2 0 25 50 75 100 t Random−walk Metropolis (poor starting value) 10.0 7.5 5.0 θ 2.5 0.0 0 25 50 75 100 t Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 17 / 32

  18. Random-walk Metropolis Optimal tuning parameter Random-walk tuning parameter Let p ( θ | y ) be the target distribution, the proposal is symmetric with scale v 2 , and θ ( t ) is (approximately) distributed according to p ( θ | y ) . If v 2 ≈ 0 , then θ ∗ ≈ θ ( t ) and r = q ( θ ∗ | y ) q ( θ ( t ) | y ) ≈ 1 and all proposals are accepted, but θ ∗ ≈ θ ( t ) . As v 2 → ∞ , then q ( θ ∗ | y ) ≈ 0 since θ ∗ will be far from the mass of the target distribution and r = q ( θ ∗ | y ) q ( θ ( t ) | y ) ≈ 0 so all proposed values are rejected. So there is an optimal v 2 somewhere. For normal targets, the optimal random-walk proposal variance is 2 . 4 2 V ar ( θ | y ) /d where d is the dimension of θ which results in an acceptance rate of 40% for d = 1 down to 20% as d → ∞ . Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 18 / 32

  19. Random-walk Metropolis Optimal tuning parameter Random-walk with tuning parameter that is too big and too small Let y | θ ∼ N ( θ, 1) , θ ∼ Ca (0 , 1) , and y = 1 . 0.8 0.4 as.factor(v) theta 0.1 0.0 10 −0.4 0 25 50 75 100 iteration Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 19 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend