csci 8980 advanced topics in graphical models mcmc gibbs
play

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling - PowerPoint PPT Presentation

Basics MCMC Gibbs Sampling Auxiliary Variable Samplers CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee September 27, 2007 Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Problems


  1. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x )

  2. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x ) Super efficient sampling, variance lower than even q ( x ) = p ( x )

  3. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x ) Super efficient sampling, variance lower than even q ( x ) = p ( x ) Exploited to evaluate probability of rare events, q ( x ) ∝ I E ( x ) p ( x )

  4. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x ) Super efficient sampling, variance lower than even q ( x ) = p ( x ) Exploited to evaluate probability of rare events, q ( x ) ∝ I E ( x ) p ( x )

  5. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.)

  6. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space

  7. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 )

  8. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i

  9. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if

  10. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1

  11. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2

  12. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2 Sufficient condition to ensure p ( x ) is the invariant distribution p ( x i ) T ( x i − 1 | x i ) = p ( x i − 1 ) T ( x i | x i − 1 )

  13. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2 Sufficient condition to ensure p ( x ) is the invariant distribution p ( x i ) T ( x i − 1 | x i ) = p ( x i − 1 ) T ( x i | x i − 1 ) MCMC samplers, invariant distribution = target distribution

  14. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2 Sufficient condition to ensure p ( x ) is the invariant distribution p ( x i ) T ( x i − 1 | x i ) = p ( x i − 1 ) T ( x i | x i − 1 ) MCMC samplers, invariant distribution = target distribution Design of samplers for fast convergence

  15. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web

  16. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages

  17. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop

  18. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E

  19. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph

  20. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity

  21. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity Invariant distribution p ( x ) represents rank of webpage x

  22. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity Invariant distribution p ( x ) represents rank of webpage x Continuous spaces, T becomes an integral kernel K � p ( x i ) K ( x i +1 | x i ) dx i = p ( x i +1 ) x i

  23. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity Invariant distribution p ( x ) represents rank of webpage x Continuous spaces, T becomes an integral kernel K � p ( x i ) K ( x i +1 | x i ) dx i = p ( x i +1 ) x i p ( x ) is the corresponding eigenfunction

  24. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method

  25. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x )

  26. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1)

  27. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1)

  28. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1) Sample x ∗ ∼ q ( x ∗ | x i )

  29. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1) Sample x ∗ ∼ q ( x ∗ | x i ) Then � � � 1 , p ( x ∗ ) q ( x i | x ∗ ) x ∗ if u < A ( x i , x ∗ ) = min p ( x i ) q ( x ∗ | x i ) x i +1 = otherwise x i

  30. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1) Sample x ∗ ∼ q ( x ∗ | x i ) Then � � � 1 , p ( x ∗ ) q ( x i | x ∗ ) x ∗ if u < A ( x i , x ∗ ) = min p ( x i ) q ( x ∗ | x i ) x i +1 = otherwise x i The transition kernel is K MH ( x i +1 | x i ) = q ( x i +1 | x i ) A ( x i , x i +1 ) + δ x i ( x i +1 ) r ( x i ) where r ( x i ) is the term associated with rejection � r ( x i ) = q ( x | x i )(1 − A ( x i , x )) dx x

  31. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.)

  32. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 )

  33. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution

  34. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties

  35. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p

  36. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p Aperiodicity, ensured since rejection is always a possibility

  37. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p Aperiodicity, ensured since rejection is always a possibility Independent sampler: q ( x ∗ | x i ) = q ( x ∗ ) so that 1 , p ( x ∗ ) q ( x i ) � � A ( x i , x ∗ ) = min q ( x ∗ ) p ( x i )

  38. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p Aperiodicity, ensured since rejection is always a possibility Independent sampler: q ( x ∗ | x i ) = q ( x ∗ ) so that 1 , p ( x ∗ ) q ( x i ) � � A ( x i , x ∗ ) = min q ( x ∗ ) p ( x i ) Metropolis sampler: symmetric q ( x ∗ | x i ) = q ( x i | x ∗ ) 1 , p ( x ∗ ) � � A ( x i , x ∗ ) = min p ( x i )

  39. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.)

  40. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x )

  41. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max

  42. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s)

  43. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain

  44. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x )

  45. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x ) Sample update follows � 1 �  Ti ( x ∗ ) q ( x i | x ∗ ) 1 , p x ∗ if u < A ( x i , x ∗ ) = min   1 x i +1 = Ti ( x i ) q ( x ∗ | x i ) p  otherwise x i 

  46. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x ) Sample update follows � 1 �  Ti ( x ∗ ) q ( x i | x ∗ ) 1 , p x ∗ if u < A ( x i , x ∗ ) = min   1 x i +1 = Ti ( x i ) q ( x ∗ | x i ) p  otherwise x i  T i decreases following a cooling schedule, lim i →∞ T i = 0

  47. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x ) Sample update follows � 1 �  Ti ( x ∗ ) q ( x i | x ∗ ) 1 , p x ∗ if u < A ( x i , x ∗ ) = min   1 x i +1 = Ti ( x i ) q ( x ∗ | x i ) p  otherwise x i  T i decreases following a cooling schedule, lim i →∞ T i = 0 1 Cooling schedule needs proper choice, e.g., T i = C log( i + T 0 )

  48. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing (Contd.)

  49. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x

  50. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC

  51. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z )

  52. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z ) Several variants:

  53. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z ) Several variants: Stochastic EM: Draw one sample

  54. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z ) Several variants: Stochastic EM: Draw one sample Monte Carlo EM: Draw multiple samples

  55. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers

  56. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p

  57. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p

  58. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p

  59. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals

  60. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α )

  61. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α ))

  62. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α )) Example: Target has many narrow peaks

  63. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α )) Example: Target has many narrow peaks Global proposal gets the peaks

  64. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α )) Example: Target has many narrow peaks Global proposal gets the peaks Local proposals get the neighborhood of peaks (random walk)

  65. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks

  66. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately

  67. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked

  68. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked Transition kernel is given by n b K MH ( j ) ( x ( i +1) | x ( i ) b j , x ( i +1) K MHCycle ( x ( i +1) | x ( i ) ) = � − [ b j ] ) b j j =1

  69. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked Transition kernel is given by n b K MH ( j ) ( x ( i +1) | x ( i ) b j , x ( i +1) K MHCycle ( x ( i +1) | x ( i ) ) = � − [ b j ] ) b j j =1 Trade-off on block size

  70. Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked Transition kernel is given by n b K MH ( j ) ( x ( i +1) | x ( i ) b j , x ( i +1) K MHCycle ( x ( i +1) | x ( i ) ) = � − [ b j ] ) b j j =1 Trade-off on block size If block size is small, chain takes long time to explore the space

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend