CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling - - PowerPoint PPT Presentation

csci 8980 advanced topics in graphical models mcmc gibbs
SMART_READER_LITE
LIVE PREVIEW

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling - - PowerPoint PPT Presentation

Basics MCMC Gibbs Sampling Auxiliary Variable Samplers CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee September 27, 2007 Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Problems


  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x ) Super efficient sampling, variance lower than even q ( x ) = p ( x )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x ) Super efficient sampling, variance lower than even q ( x ) = p ( x ) Exploited to evaluate probability of rare events, q ( x ) ∝ I E ( x ) p ( x )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x ) Super efficient sampling, variance lower than even q ( x ) = p ( x ) Exploited to evaluate probability of rare events, q ( x ) ∝ I E ( x ) p ( x )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.)

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2 Sufficient condition to ensure p ( x ) is the invariant distribution p ( x i ) T ( x i − 1 | x i ) = p ( x i − 1 ) T ( x i | x i − 1 )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2 Sufficient condition to ensure p ( x ) is the invariant distribution p ( x i ) T ( x i − 1 | x i ) = p ( x i − 1 ) T ( x i | x i − 1 ) MCMC samplers, invariant distribution = target distribution

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2 Sufficient condition to ensure p ( x ) is the invariant distribution p ( x i ) T ( x i − 1 | x i ) = p ( x i − 1 ) T ( x i | x i − 1 ) MCMC samplers, invariant distribution = target distribution Design of samplers for fast convergence

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity Invariant distribution p ( x ) represents rank of webpage x

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity Invariant distribution p ( x ) represents rank of webpage x Continuous spaces, T becomes an integral kernel K � p ( x i ) K ( x i +1 | x i ) dx i = p ( x i +1 ) x i

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity Invariant distribution p ( x ) represents rank of webpage x Continuous spaces, T becomes an integral kernel K � p ( x i ) K ( x i +1 | x i ) dx i = p ( x i +1 ) x i p ( x ) is the corresponding eigenfunction

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1)

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1)

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1) Sample x ∗ ∼ q ( x ∗ | x i )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1) Sample x ∗ ∼ q ( x ∗ | x i ) Then � � � 1 , p ( x ∗ ) q ( x i | x ∗ ) x ∗ if u < A ( x i , x ∗ ) = min p ( x i ) q ( x ∗ | x i ) x i +1 = otherwise x i

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1) Sample x ∗ ∼ q ( x ∗ | x i ) Then � � � 1 , p ( x ∗ ) q ( x i | x ∗ ) x ∗ if u < A ( x i , x ∗ ) = min p ( x i ) q ( x ∗ | x i ) x i +1 = otherwise x i The transition kernel is K MH ( x i +1 | x i ) = q ( x i +1 | x i ) A ( x i , x i +1 ) + δ x i ( x i +1 ) r ( x i ) where r ( x i ) is the term associated with rejection � r ( x i ) = q ( x | x i )(1 − A ( x i , x )) dx x

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.)

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p Aperiodicity, ensured since rejection is always a possibility

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p Aperiodicity, ensured since rejection is always a possibility Independent sampler: q ( x ∗ | x i ) = q ( x ∗ ) so that 1 , p ( x ∗ ) q ( x i ) � � A ( x i , x ∗ ) = min q ( x ∗ ) p ( x i )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p Aperiodicity, ensured since rejection is always a possibility Independent sampler: q ( x ∗ | x i ) = q ( x ∗ ) so that 1 , p ( x ∗ ) q ( x i ) � � A ( x i , x ∗ ) = min q ( x ∗ ) p ( x i ) Metropolis sampler: symmetric q ( x ∗ | x i ) = q ( x i | x ∗ ) 1 , p ( x ∗ ) � � A ( x i , x ∗ ) = min p ( x i )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.)

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s)

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x ) Sample update follows � 1 �  Ti ( x ∗ ) q ( x i | x ∗ ) 1 , p x ∗ if u < A ( x i , x ∗ ) = min   1 x i +1 = Ti ( x i ) q ( x ∗ | x i ) p  otherwise x i 

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x ) Sample update follows � 1 �  Ti ( x ∗ ) q ( x i | x ∗ ) 1 , p x ∗ if u < A ( x i , x ∗ ) = min   1 x i +1 = Ti ( x i ) q ( x ∗ | x i ) p  otherwise x i  T i decreases following a cooling schedule, lim i →∞ T i = 0

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x ) Sample update follows � 1 �  Ti ( x ∗ ) q ( x i | x ∗ ) 1 , p x ∗ if u < A ( x i , x ∗ ) = min   1 x i +1 = Ti ( x i ) q ( x ∗ | x i ) p  otherwise x i  T i decreases following a cooling schedule, lim i →∞ T i = 0 1 Cooling schedule needs proper choice, e.g., T i = C log( i + T 0 )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing (Contd.)

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z ) Several variants:

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z ) Several variants: Stochastic EM: Draw one sample

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z ) Several variants: Stochastic EM: Draw one sample Monte Carlo EM: Draw multiple samples

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α )

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α ))

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α )) Example: Target has many narrow peaks

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α )) Example: Target has many narrow peaks Global proposal gets the peaks

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α )) Example: Target has many narrow peaks Global proposal gets the peaks Local proposals get the neighborhood of peaks (random walk)

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked Transition kernel is given by n b K MH ( j ) ( x ( i +1) | x ( i ) b j , x ( i +1) K MHCycle ( x ( i +1) | x ( i ) ) = � − [ b j ] ) b j j =1

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked Transition kernel is given by n b K MH ( j ) ( x ( i +1) | x ( i ) b j , x ( i +1) K MHCycle ( x ( i +1) | x ( i ) ) = � − [ b j ] ) b j j =1 Trade-off on block size

  • Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked Transition kernel is given by n b K MH ( j ) ( x ( i +1) | x ( i ) b j , x ( i +1) K MHCycle ( x ( i +1) | x ( i ) ) = � − [ b j ] ) b j j =1 Trade-off on block size If block size is small, chain takes long time to explore the space