Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods - - PowerPoint PPT Presentation
Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods - - PowerPoint PPT Presentation
Approximate Inference: Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling Example: Collapsed Gibbs Sampler for
Outline
Recap Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Recap from last time…
Exponential Family Forms: Capture Common Distributions
Discrete (Finite distributions) Dirichlet (Distributions over (finite) distributions) Gaussian Gamma, Exponential, Poisson, Negative-Binomial, Laplace, log-Normal,…
Exponential Family Forms: “Easy” Posterior Inference
p is the conjugate prior for q Posterior p has same form as prior p Posterior Likelihood Prior Dirichlet (Beta) Discrete (Bernoulli) Dirichlet (Beta) Normal Normal (fixed var.) Normal Gamma Exponential Gamma
Variational Inference: A Gradient- Based Optimization Technique
Set t = 0
Pick a starting value λt
Until converged:
- 1. Get value y t = F(q(•;λt))
- 2. Get gradient g t = F’(q(•;λt))
- 3. Get scaling factor ρ t
- 4. Set λt+1 = λt + ρt*g t
- 5. Set t += 1
Difficult to compute Easy(ier) to compute
Minimize the “difference” by changing λ
𝑞(𝜄|𝑦) 𝑟𝜇(𝜄)
Variational Inference: The Function to Optimize
Find the best distribution Variational parameters for θ Parameters for desired model KL-Divergence (expectation)
DKL 𝑟 𝜄 || 𝑞(𝜄|𝑦) = 𝔽𝑟 𝜄 log 𝑟 𝜄 𝑞(𝜄|𝑦)
Goal: Posterior Inference
Hyperparameters α Unknown parameters Θ Data: Likelihood model:
p( | Θ ) pα( Θ | )
(Some) Learning Techniques
MAP/MLE: Point estimation, basic EM Variational Inference: Functional Optimization Sampling/Monte Carlo
today
Outline
Recap Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Two Problems for Sampling Methods to Solve
Generate samples from p
𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard?
Two Problems for Sampling Methods to Solve
Generate samples from p
𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big)
Two Problems for Sampling Methods to Solve
Generate samples from p
𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) 𝑣 𝑦 = exp(.4 𝑦 − .4 2 − 0.08𝑦4)
ITILA, Fig 29.1
Two Problems for Sampling Methods to Solve
Generate samples from p Estimate expectation of a function 𝜚
𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
= ∫ 𝑞 𝑦 𝜚 𝑦 𝑒𝑦
Two Problems for Sampling Methods to Solve
Generate samples from p Estimate expectation of a function 𝜚
𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
= ∫ 𝑞 𝑦 𝜚 𝑦 𝑒𝑦
Φ =
1 𝑆 σ𝑠 𝜚 𝑦𝑠
Two Problems for Sampling Methods to Solve
Generate samples from p Estimate expectation of a function 𝜚
If we could sample from p… 𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
= ∫ 𝑞 𝑦 𝜚 𝑦 𝑒𝑦
Φ =
1 𝑆 σ𝑠 𝜚 𝑦𝑠 𝔽 Φ = Φ
consistent estimator
Outline
Recap Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Uniform Sampling
Φ =
𝑠
𝜚 𝑦𝑠 𝑞∗(𝑦𝑠)
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝑦1, 𝑦2, … , 𝑦𝑆
Uniform Sampling
𝑞∗ 𝑦 = 𝑣 𝑦 𝑎∗
Φ =
𝑠
𝜚 𝑦𝑠 𝑞∗(𝑦𝑠)
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal: 𝑎∗ =
𝑠
𝑣(𝑦𝑠)
sample uniformly: 𝑦1, 𝑦2, … , 𝑦𝑆
Uniform Sampling
𝑞∗ 𝑦 = 𝑣 𝑦 𝑎∗
Φ =
𝑠
𝜚 𝑦𝑠 𝑞∗(𝑦𝑠)
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal: 𝑎∗ =
𝑠
𝑣(𝑦𝑠)
sample uniformly: 𝑦1, 𝑦2, … , 𝑦𝑆
this might work if R (the number of samples) sufficiently hits high probability regions
Uniform Sampling
𝑞∗ 𝑦 = 𝑣 𝑦 𝑎∗
Φ =
𝑠
𝜚 𝑦𝑠 𝑞∗(𝑦𝑠)
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal: 𝑎∗ =
𝑠
𝑣(𝑦𝑠)
sample uniformly: 𝑦1, 𝑦2, … , 𝑦𝑆
this might work if R (the number of samples) sufficiently hits high probability regions Ising model example:
- 2H states of high
probability
- 2N states total
Uniform Sampling
𝑞∗ 𝑦 = 𝑣 𝑦 𝑎∗
Φ =
𝑠
𝜚 𝑦𝑠 𝑞∗(𝑦𝑠)
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal: 𝑎∗ =
𝑠
𝑣(𝑦𝑠)
sample uniformly: 𝑦1, 𝑦2, … , 𝑦𝑆
this might work if R (the number of samples) sufficiently hits high probability regions Ising model example:
- 2H states of high
probability
- 2N states total
chance of sample being in high prob. region:
2𝐼 2𝑂
- min. samples needed: ∼ 2𝑂−𝐼
Outline
Recap Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Importance Sampling
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦
ITILA, Fig 29.5
Importance Sampling
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
Importance Sampling
𝑥 𝑦𝑠 = 𝑣𝑞 𝑦 𝑣𝑟 𝑦
Φ = σ𝑠 𝜚 𝑦𝑠 𝑥(𝑦𝑠) σ𝑠 𝑥 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
Importance Sampling
𝑥 𝑦𝑠 = 𝑣𝑞 𝑦 𝑣𝑟 𝑦
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
Φ = σ𝑠 𝜚 𝑦𝑠 𝑥(𝑦𝑠) σ𝑠 𝑥 𝑦𝑠
Q: How reliable will this estimator be?
Importance Sampling
𝑥 𝑦𝑠 = 𝑣𝑞 𝑦 𝑣𝑟 𝑦
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
Φ = σ𝑠 𝜚 𝑦𝑠 𝑥(𝑦𝑠) σ𝑠 𝑥 𝑦𝑠
Q: How reliable will this estimator be? A: In practice, difficult to say. 𝑥 𝑦𝑠 may not be a good indicator
Importance Sampling
𝑥 𝑦𝑠 = 𝑣𝑞 𝑦 𝑣𝑟 𝑦
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
Φ = σ𝑠 𝜚 𝑦𝑠 𝑥(𝑦𝑠) σ𝑠 𝑥 𝑦𝑠
Q: How reliable will this estimator be? A: In practice, difficult to say. 𝑥 𝑦𝑠 may not be a good indicator Q: How do you choose a good approximating distribution?
Importance Sampling
𝑥 𝑦𝑠 = 𝑣𝑞 𝑦 𝑣𝑟 𝑦
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
Φ = σ𝑠 𝜚 𝑦𝑠 𝑥(𝑦𝑠) σ𝑠 𝑥 𝑦𝑠
Q: How reliable will this estimator be? A: In practice, difficult to say. 𝑥 𝑦𝑠 may not be a good indicator Q: How do you choose a good approximating distribution? A: Task/domain specific
Importance Sampling: Variance Estimator may vary
ITILA, Fig 29.6
true value
q(x): Gaussian q(x): Cauchy distribution iterations
Outline
Recap Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Rejection Sampling
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦
ITILA, Fig 29.8
Rejection Sampling
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦
ITILA, Fig 29.8
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗
select tuples
approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞
Rejection Sampling
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦
ITILA, Fig 29.8
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨
select tuples
if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points
- therwise: reject it
approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞
Rejection Sampling
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦
ITILA, Fig 29.8
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨
select tuples
if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points
- therwise: reject it
approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞
this produces samples from the p-distribution
Rejection Sampling
Φ = 1 𝑆
𝑠
𝜚 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦
ITILA, Fig 29.8
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨
select tuples
if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points
- therwise: reject it
approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨
Rejection Sampling
Φ = 1 𝑆
𝑠
𝜚 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )
ITILA, Fig 29.8
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗
select tuples
if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points
- therwise: reject it
Q: How reliable will this estimator be?
approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨
Rejection Sampling
Φ = 1 𝑆
𝑠
𝜚 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )
ITILA, Fig 29.8
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗
select tuples
if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points
- therwise: reject it
Q: How reliable will this estimator be? A: How well does Q approximate P? Q: How do you choose a good approximating distribution?
approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨
Rejection Sampling
Φ = 1 𝑆
𝑠
𝜚 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )
ITILA, Fig 29.8
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗
select tuples
if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points
- therwise: reject it
Q: How reliable will this estimator be? A: How well does Q approximate P? Q: How do you choose a good approximating distribution? A: Task/domain specific
approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞
𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨
Rejection Sampling
Φ = 1 𝑆
𝑠
𝜚 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )
ITILA, Fig 29.8
sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗
select tuples
if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points
- therwise: reject it
Q: How reliable will this estimator be? A: How well does Q approximate P? Q: How do you choose a good approximating distribution? A: Task/domain specific
approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞 rejection sampling can be difficult to use in high- dimensional spaces
Outline
Recap Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Markov Chain Monte Carlo
transition kernel
Metropolis-Hastings
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
importance and rejection sampling: a single proposal distribution 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 Metropolis-Hastings (and Gibbs): create a proposal distribution based
- n current state
𝑅 𝑦|𝑦(𝑢) ∝ 𝑣𝑟 𝑦| 𝑦(𝑢)
Metropolis-Hastings
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
importance and rejection sampling: a single proposal distribution 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 Metropolis-Hastings (and Gibbs): create a proposal distribution based
- n current state
𝑅 𝑦|𝑦(𝑢) ∝ 𝑣𝑟 𝑦|𝑦(𝑢)
Q does not need to look similar to P
ITILA, Fig 29.10
Metropolis-Hastings
Φ = 1 𝑆
𝑠
𝜚 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝛽𝑙 = 𝑣𝑞(𝑦𝑙) 𝑣𝑞 𝑦(𝑢) 𝑣𝑟 𝑦𝑙 𝑣𝑟 𝑦(𝑢) sample from 𝑹 𝑦|𝑦(𝑢) : 𝑦1, 𝑦2, … , 𝑦𝑆∗ if𝛽𝑙 ≥ 1: add 𝑦𝑙 to sampled R points
- therwise: accept with
probability 𝛽𝑙 transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) ∝ 𝑣𝑟 𝑦|𝑦(𝑢)
Metropolis-Hastings
Φ = 1 𝑆
𝑠
𝜚 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝛽𝑙 = 𝑣𝑞(𝑦𝑙) 𝑣𝑞 𝑦(𝑢) 𝑣𝑟 𝑦𝑙 𝑣𝑟 𝑦(𝑢) sample from 𝑹 𝑦|𝑦(𝑢) : 𝑦1, 𝑦2, … , 𝑦𝑆∗ if 𝛽𝑙 ≥ 1: add 𝑦𝑙 to sampled R points
- therwise: accept with
probability 𝛽𝑙 transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) ∝ 𝑣𝑟 𝑦|𝑦(𝑢)
if accepted: 𝑦(𝑢+1) = 𝑦𝑙
- therwise: 𝑦(𝑢+1) = 𝑦(𝑢)
samples are not independent
Metropolis-Hastings
Φ = 1 𝑆
𝑠
𝜚 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample uniformly: 𝛽𝑙 = 𝑣𝑞(𝑦𝑙) 𝑣𝑞 𝑦(𝑢) 𝑣𝑟 𝑦𝑙 𝑣𝑟 𝑦(𝑢) sample from 𝑹 𝑦|𝑦(𝑢) : 𝑦1, 𝑦2, … , 𝑦𝑆∗ if 𝛽𝑙 ≥ 1: add 𝑦𝑙 to sampled R points
- therwise: accept with
probability 𝛽𝑙 transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) ∝ 𝑣𝑟 𝑦|𝑦(𝑢)
if accepted: 𝑦(𝑢+1) = 𝑦𝑙
- therwise: 𝑦(𝑢+1) = 𝑦(𝑢)
samples are not independent Metropolis-Hastings can be used effectively in high- dimensional spaces ☺
Outline
Recap Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Gibbs Sampling
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) = 𝑞 𝑦 all other variables)
Next sampled value
- f current variable
Values of all other variables, both new and old
𝑦 𝑗
(𝑢+1) ∼ 𝑞 ⋅ 𝑦 1 𝑢+1 , … , 𝑦 𝑗−1 𝑢+1 , 𝑦 𝑗+1 𝑢
, … , 𝑦 𝑂
𝑢 )
x[i]
Remember: Markov Blanket
x Markov blanket of a node x is its parents, children, and children's parents
𝑞 𝑦𝑗 𝑦𝑘≠𝑗 = 𝑞(𝑦1, … , 𝑦𝑂) ∫ 𝑞 𝑦1, … , 𝑦𝑂 𝑒𝑦𝑗 = ς𝑙 𝑞(𝑦𝑙|𝜌 𝑦𝑙 ) ∫ ς𝑙 𝑞 𝑦𝑙 𝜌 𝑦𝑙 ) 𝑒𝑦𝑗
factor out terms not dependent on xi
factorization
- f graph
= ς𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦𝑙 𝑞(𝑦𝑙|𝜌 𝑦𝑙 ) ∫ ς𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦𝑙 𝑞 𝑦𝑙 𝜌 𝑦𝑙 ) 𝑒𝑦𝑗
the set of nodes needed to form the complete conditional for a variable xi
Gibbs Sampling
Φ = 1 𝑆
𝑠
𝜚 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample (always accept) from 𝑹 𝑦|𝑦(𝑢) : 𝑦1, 𝑦2, … , 𝑦𝑆∗ transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) = 𝑞 𝑦 MB(𝑦(𝑢)))
𝑦(𝑢+1) = 𝑦𝑙
Markov blanket
samples are not independent Gibbs Sampling can be used effectively in high-dimensional spaces ☺
Collapsed Gibbs Sampling
Φ = 1 𝑆
𝑠
𝜚 𝑦𝑠
Φ = 𝜚 𝑦
𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦
Goal:
sample (always accept) from 𝑹 𝑦|𝑦(𝑢) : 𝑦1, 𝑦2, … , 𝑦𝑆∗ transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) = ∫ 𝑞 𝑦 MB(𝑦 𝑢 )) 𝑒𝑧 = 𝑞 𝑦 MB−y 𝑦𝑢 )
𝑦(𝑢+1) = 𝑦𝑙
integrate out some of Markov blanket
samples are not independent Collapsed Gibbs can be used effectively in high-dimensional spaces ☺
Outline
Recap Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Latent Dirichlet Allocation (Blei et al., 2003)
Per- document (latent) topic usage Per-document (unigram) word counts Per-topic word usage
d
Gibbs Sampler for LDirA
for each document d: resample θd | zd,1,…, zd,Nd for each token i in d: resample zd,i | wd,i , {ψk }, θd for each topic k: resample ψk
Latent Dirichlet Allocation (Blei et al., 2003)
Per- document (latent) topic usage Per-document (unigram) word counts Per-topic word usage
d
integrate these out
Collapsed Gibbs Sampler for LDirA
for each document d: resample θd | zd,1,…, zd,Nd for each token i in d: resample zd,i | wd,i , {ψk }, {z*,-i} for each topic k: resample ψk
Collapsed Gibbs Sampler for LDirA
for each document d: resample θd | zd,1,…, zd,Nd for each token i in d: resample zd,i | wd,i , {ψk }, {z*,-i} for each topic k: resample ψk 𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) = 𝑞(𝑨∗,∗) 𝑞(𝑨∗,−𝑗)
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Gamma function fact: Γ 𝑦 + 1 = 𝑦Γ(𝑦) maintain count tables
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) = 𝑞(𝑨∗,∗) 𝑞(𝑨∗,−𝑗)
Collapsed Gibbs Sampling goal:
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Gamma function fact: Γ 𝑦 + 1 = 𝑦Γ(𝑦)
𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) = Γ(σ𝑙 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 + 𝛽𝑙) ς𝑙 Γ(𝑑 𝑒, 𝑙 + 𝛽𝑙) Γ(𝛽𝑙) Γ(σ𝑙 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) ς𝑙 Γ(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) Γ(𝛽𝑙)
Collapsed Gibbs Sampling goal:
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Gamma function fact: Γ 𝑦 + 1 = 𝑦Γ(𝑦) 𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) = ς𝑙(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙)Γ(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) (σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) ς𝑙 Γ(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙)
Collapsed Gibbs Sampling goal:
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Gamma function fact: Γ 𝑦 + 1 = 𝑦Γ(𝑦) 𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) = ς𝑙(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙)Γ(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) (σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) ς𝑙 Γ(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙)
Collapsed Gibbs Sampling goal:
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Gamma function fact: Γ 𝑦 + 1 = 𝑦Γ(𝑦)
𝑞 𝑨𝑒𝑗 = 𝑙 𝑨∗,−𝑗) ∝ 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙
Collapsed Gibbs Sampling goal:
maintain count tables
Collapsed Gibbs Sampler for LDirA
for each document d: for each token i in d: resample zd,i | wd,i , {ψk }, {z*,-i} 𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) =∝ 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙 ∗topic-word counts
Collapsed Gibbs Sampler for LDirA
randomly assign z*,* maintain count tables: c(d,k): document-topic counts c(k,v): topic-word counts for each document d: for each token i in d: unassign topic: zd,i resample zd,i | wd,i , {ψk }, {z*,-i} reassign topic: zd,i
𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) =∝ 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙 ∗topic-word counts
decrease counts increase counts
Outline
Recap Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models