The Gibbs Sampling Algorithm: with Applications to Change-point Detection and Restricted Boltzmann Machine
Yubo βPaulβ Yang
- Jan. 24 2017
Change-point model Restricted Boltzmann machine
with Applications to Change-point Detection and Restricted Boltzmann - - PowerPoint PPT Presentation
The Gibbs Sampling Algorithm: with Applications to Change-point Detection and Restricted Boltzmann Machine Restricted Boltzmann machine Change-point model Yubo Paul Yang Jan. 24 2017 Introduction: History Stuart Geman Donald Geman @
Change-point model Restricted Boltzmann machine
Donald Geman @ Johns Hopkins
Literature from UIUC
from Northwestern Stuart Geman @ Brown
UMich
Neurophysiology from Dartmouth
Mathematics from MIT
i.e. Acceptance = 1
Basic version: Block version: Collapsed version: Samplers within Gibbs:
Basic Gibbs sampling from bivariate Normal
Example inspired by: MCMC: The Gibbs Sampler , The Clever Machine, https://theclevermachine.wordpress.com/2012/1 1/05/mcmc-the-gibbs-sampler/
Q0/ How to sample π¦ from standard normal distribution Ζ(π = 0, π = 1)?
Example inspired by: MCMC: The Gibbs Sampler , The Clever Machine, https://theclevermachine.wordpress.com/2012/1 1/05/mcmc-the-gibbs-sampler/
π π¦1, π¦2 = Ζ(π1, π2, Ξ£) = 1 2ππ1π2 1 β π2 ππ¦π β π¨ 2(1 β π2) Q0/ How to sample π¦ from standard normal distribution Ζ(π = 0, π = 1)? A0/ np.random.randn() samples from P(x) =
1 2ππ2 exp[β π¦βπ 2 2π2 ].
Bivariate normal distribution is the generalization of the normal distribution to two variables:
For simplicity, let π1 = π2 = 0, and π1 = π2 = 1 then:
Example inspired by: MCMC: The Gibbs Sampler , The Clever Machine, https://theclevermachine.wordpress.com/2012/1 1/05/mcmc-the-gibbs-sampler/
π π¦1, π¦2 = Ζ(π1, π2, Ξ£) = 1 2ππ1π2 1 β π2 ππ¦π β π¨ 2(1 β π2) Ξ£ = π1 π π π2 where π¨ = π¦1 β π1 2 π1
2
β 2π π¦1 β π1 π¦2 β π2 π1π2 + π¦2 β π2 2 π2
2
and lπ π π¦1, π¦2 = β π¦1
2 β 2ππ¦1π¦2 + π¦2 2
2 1 β π2 + ππππ‘π’. Q/ How to sample ππ, ππ from πΈ(ππ, ππ)? Q0/ How to sample π¦ from standard normal distribution Ζ(π = 0, π = 1)? A0/ np.random.randn() samples from P(x) =
1 2ππ2 exp[β π¦βπ 2 2π2 ].
Bivariate normal distribution is the generalization of the normal distribution to two variables:
lπ π π¦1, π¦2 = β π¦1
2 β 2ππ¦1π¦2 + π¦2 2
2 1 β π2 + ππππ‘π’. Q/ How to sample ππ, ππ from πΈ(ππ, ππ)? A/ Gibbs sampling. Fix x2, sample x1 from πΈ(ππ|ππ) Fix x1, sample x2 from πΈ(ππ|ππ) Rinse and repeat The joint probability distribution of π¦1, π¦2 has log:
lπ π π¦1, π¦2 = β π¦1
2 β 2ππ¦1π¦2 + π¦2 2
2 1 β π2 + ππππ‘π’. Q/ How to sample ππ, ππ from πΈ(ππ, ππ)? A/ Gibbs sampling. Fix x2, sample x1 from πΈ(ππ|ππ) Fix x1, sample x2 from πΈ(ππ|ππ) Rinse and repeat ln π π¦1 π¦2 = β π¦1
2 β 2ππ¦1π¦2
2 1 β π2 + ππππ‘π’. = β (π¦1βππ¦2)2 2 1 β π2 + ππππ‘π’. β π π¦1 π¦2 = Ζ(π = ππ¦2, π = 1 β π2) The joint probability distribution of π¦1, π¦2 has log: The full conditional probability distribution of π¦1 has log: new_x1 = np.sqrt(1-rho*rho) * np.random.randn() + rho*x2
Fixing x2 shifts the mean of x1 and changes its variance π = 0.8
Gibbs sampler has worse correlation than numpyβs built-in multivariate_normal sampler, but is much better than naΓ―ve Metropolis ( reversible moves, π΅ = min(1,
π(πβ²) π(π)) )
Both Gibbs and Metropolis still fail when correlation is too high.
Bayesian Inference: Improve βguessβ model with data.
Example inspired by: Ilker Yildirimβs notes
http://www.mit.edu/~ilkery/papers/Gibbs Sampling.pdf
The question that change-point model answers: when did a change occur to the distribution of a random variable? How to estimate the change point from observations? π
Example inspired by: Ilker Yildirimβs notes
http://www.mit.edu/~ilkery/papers/Gibbs Sampling.pdf
(Gamma prior, Poisson posterior) π π¦0, π¦1, β¦ , π¦πβ1, π1, π2, π =
π=0 πβ1
ππππ‘π‘ππ π¦π, π1
π=π πβ1
ππππ‘π‘ππ π¦π, π2 π»ππππ π1; π = 2, π = 1 π»ππππ π2; π = 2, π = 1 ππππππ π(π, π) ππππ‘π‘ππ π¦; π = πβπ ππ¦ π¦! π»ππππ π; π, π = 1 Ξ(π) ππππβ1 exp βππ ππππππ π π; π = 1/π Q/ What is the full conditional probability of ππ? where π2 π1 π
Example inspired by: Ilker Yildirimβs notes
http://www.mit.edu/~ilkery/papers/Gibbs Sampling.pdf
(Gamma prior, Poisson posterior) π π¦0, π¦1, β¦ , π¦πβ1, π1, π2, π =
π=0 πβ1
ππππ‘π‘ππ π¦π, π1
π=π πβ1
ππππ‘π‘ππ π¦π, π2 π»ππππ π1; π = 2, π = 1 π»ππππ π2; π = 2, π = 1 ππππππ π(π, π) ππππ‘π‘ππ π¦; π = πβπ ππ¦ π¦! π»ππππ π; π, π = πβππ ππβ1 Ξ(π) Γ ππ ππππππ π π; π = 1/π where
π π1, π2, π = π»ππππ π1; π = 2, π = 1 π»ππππ π2; π = 2, π = 1 ππππππ π(π, π)
π π1, π2, π|π¦0, π¦1, β¦ , π¦πβ1 Q/ How to sample from the joint posterior distribution of π1, π2, π? π2 π1 π
Example inspired by: Ilker Yildirimβs notes
http://www.mit.edu/~ilkery/papers/Gibbs Sampling.pdf
ln π π1 π2, π, π = ln π»ππππ(π1; π +
π=0 πβ1
π¦π , π + π) ln π π2 π1, π, π = ln π»ππππ(π2; π +
π=π πβ1
π¦π , π + π β π) ln π π π1, π2, π = πππ‘π‘ π π1, π2, π Q/How to sample this mess?! Gibbs sampling require full conditionals
Example inspired by: Ilker Yildirimβs notes
http://www.mit.edu/~ilkery/papers/Gibbs Sampling.pdf
ln π π1 π2, π, π = ln π»ππππ(π1; π +
π=0 πβ1
π¦π , π + π) ln π π2 π1, π, π = ln π»ππππ(π2; π +
π=π πβ1
π¦π , π + π β π) ln π π π1, π2, π = πππ‘π‘ π π1, π2, π Q/How to sample this mess?! A/ In general: Metropolis within Gibbs. In this case: bruteforce π π π1, π2, π , βπ = 0, β¦ , π β 1 because N is rather small. Gibbs sampling require full conditionals
π1 samples from Gibbs and naΓ―ve Metropolis Model sampled from Metropolis sampler
Binary Restricted Boltzmann Machine (BRBM):
π π, π, π, π, π = exp πππ + πππ + ππππ π π =
π,π
exp[
π=0 ππ€ππ‘β1
πππ€π +
π=0 πβππβ1
ππβπ +
π,π
βππ
πππ€π]
See Dimaβs presentation for more detailed description of RBM: http://algorithm- interest-group.me/algorithm/Boltzmann- Machines-Dima-Kochkov/
visualize π πβππ = 3 ππ€ππ‘ = 4
Binary Restricted Boltzmann Machine (BRBM):
π π, π, π, π, π = exp πππ + πππ + ππππ π π =
π,π
exp[
π=0 ππ€ππ‘β1
πππ€π +
π=0 πβππβ1
ππβπ +
π,π
βππ
πππ€π]
Thus full conditionals are simple:
See Dimaβs presentation for more detailed description of RBM: http://algorithm- interest-group.me/algorithm/Boltzmann- Machines-Dima-Kochkov/
visualize π π π€π = 1 β π π€π = 0 β = exp πππ + πππ + ππππ vj=1 exp πππ + πππ + ππππ vj=0 = exp ππ +
π
βππ
ππ
πβππ = 3 ππ€ππ‘ = 4 π π€π = 1 β =
π βπ = 1 β π βπ = 1 β +π βπ = 0 β = 1 1+exp βππβ π βππππ = π‘ππππππ(ππ + π βππ ππ)
That is: we can sample binary RBM efficiently with block Gibbs sampling! Notice no matrix element among π€π (restricted), thus: π π = 1 β = π‘ππππππ(π + πππ)
Q/ How to βtrainβ a BRBM? Q1/ What is the outcome/goal of βtrainingβ? Q2/ What are the inputs in a βtrainingβ? Q3/ What does it mean to βtrainβ? Q4/ What changes in the βtrainingβ? MNIST database: 70,000 handwritten digits from 0 to 9 Each picture has 28Γ28 gray scale pixels {0,1,β¦,255}. For input into the BRBM, scale to [0,1.0) and cutoff at 0.5. nvis = 28Γ28 = 784
Q/ How to βtrainβ a BRBM? Q1/ What is the outcome/goal of βtrainingβ? A1/ A joint probability distribution of 784 Bernoulli random variables, which favors configurations that look like digits. i.e. want π(π| β) that represents data. Q2/ What are the inputs in a βtrainingβ? A2/ ππ‘, s=1,2,β¦,ndata. Each ππ‘ is a vector 784 0s and 1s. Q3/ What does it mean to βtrainβ? A3/ Increase the probability of π(ππ‘| β). Q4/ What changes in the βtrainingβ? A4/ The βmachineβ. Specifically: {π, π, π} A/ Increase π ππ‘ β , βπ‘ by changing π, π, π . MNIST database: 70,000 handwritten digits from 0 to 9 Each picture has 28Γ28 gray scale pixels {0,1,β¦,255}. For input into the BRBM, scale to [0,1.0) and cutoff at 0.5. nvis = 28Γ28 = 784
π π = 1 β = π‘ππππππ(π + πππ) π π = 1 β = π‘ππππππ(π + π π)
π ln π ππ
ππ
=< βππ€π >πππ’π β < βππ€π >πππππ
G.E. Hinton, A Practical Guide to Training Restricted Boltzmann Machines, Neural Networks: Tricks of the Trade, vol. 7700, pp 599-619, 2010.
Shift vector for visible units Rows of weight matrix π (ordered by shift vector for hidden units π) BRBM samples after training π
Pros:
probability distribution by sampling the full conditional of each variable in turn.
block Gibbs sampling highly efficient for certain distributions. Cons:
π π = 1 β = π‘ππππππ(π + πππ) π π = 1 β = π‘ππππππ(π + π π)
Bivariate Normal Distribution:
Change-point Model:
Restricted Boltzmann Machine: