Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods - - PowerPoint PPT Presentation

sampling methods
SMART_READER_LITE
LIVE PREVIEW

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods - - PowerPoint PPT Presentation

Approximate Inference: Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling Example: Collapsed Gibbs Sampler for


slide-1
SLIDE 1

Approximate Inference: Sampling Methods

CMSC 678 UMBC

slide-2
SLIDE 2

Outline

Recap Monte Carlo methods Sampling Techniques

Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling

Example: Collapsed Gibbs Sampler for Topic Models

slide-3
SLIDE 3

Recap from last time…

slide-4
SLIDE 4

Exponential Family Forms: Capture Common Distributions

Discrete (Finite distributions) Dirichlet (Distributions over (finite) distributions) Gaussian Gamma, Exponential, Poisson, Negative-Binomial, Laplace, log-Normal,…

slide-5
SLIDE 5

Exponential Family Forms: “Easy” Posterior Inference

p is the conjugate prior for q Posterior p has same form as prior p Posterior Likelihood Prior Dirichlet (Beta) Discrete (Bernoulli) Dirichlet (Beta) Normal Normal (fixed var.) Normal Gamma Exponential Gamma

slide-6
SLIDE 6

Variational Inference: A Gradient- Based Optimization Technique

Set t = 0

Pick a starting value λt

Until converged:

  • 1. Get value y t = F(q(•;λt))
  • 2. Get gradient g t = F’(q(•;λt))
  • 3. Get scaling factor ρ t
  • 4. Set λt+1 = λt + ρt*g t
  • 5. Set t += 1

Difficult to compute Easy(ier) to compute

Minimize the “difference” by changing λ

𝑞(𝜄|𝑦) 𝑟𝜇(𝜄)

slide-7
SLIDE 7

Variational Inference: The Function to Optimize

Find the best distribution Variational parameters for θ Parameters for desired model KL-Divergence (expectation)

DKL 𝑟 𝜄 || 𝑞(𝜄|𝑦) = 𝔽𝑟 𝜄 log 𝑟 𝜄 𝑞(𝜄|𝑦)

slide-8
SLIDE 8

Goal: Posterior Inference

Hyperparameters α Unknown parameters Θ Data: Likelihood model:

p( | Θ ) pα( Θ | )

slide-9
SLIDE 9

(Some) Learning Techniques

MAP/MLE: Point estimation, basic EM Variational Inference: Functional Optimization Sampling/Monte Carlo

today

slide-10
SLIDE 10

Outline

Recap Monte Carlo methods Sampling Techniques

Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling

Example: Collapsed Gibbs Sampler for Topic Models

slide-11
SLIDE 11

Two Problems for Sampling Methods to Solve

Generate samples from p

𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard?

slide-12
SLIDE 12

Two Problems for Sampling Methods to Solve

Generate samples from p

𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big)

slide-13
SLIDE 13

Two Problems for Sampling Methods to Solve

Generate samples from p

𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) 𝑣 𝑦 = exp(.4 𝑦 − .4 2 − 0.08𝑦4)

ITILA, Fig 29.1

slide-14
SLIDE 14

Two Problems for Sampling Methods to Solve

Generate samples from p Estimate expectation of a function 𝜚

𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

= ∫ 𝑞 𝑦 𝜚 𝑦 𝑒𝑦

slide-15
SLIDE 15

Two Problems for Sampling Methods to Solve

Generate samples from p Estimate expectation of a function 𝜚

𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

= ∫ 𝑞 𝑦 𝜚 𝑦 𝑒𝑦

෡ Φ =

1 𝑆 σ𝑠 𝜚 𝑦𝑠

slide-16
SLIDE 16

Two Problems for Sampling Methods to Solve

Generate samples from p Estimate expectation of a function 𝜚

If we could sample from p… 𝑞 𝑦 = 𝑣 𝑦 𝑎 , 𝑦 ∈ ℝ𝐸 𝑦1, 𝑦2, … , 𝑦𝑆 samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

= ∫ 𝑞 𝑦 𝜚 𝑦 𝑒𝑦

෡ Φ =

1 𝑆 σ𝑠 𝜚 𝑦𝑠 𝔽 ෡ Φ = Φ

consistent estimator

slide-17
SLIDE 17

Outline

Recap Monte Carlo methods Sampling Techniques

Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling

Example: Collapsed Gibbs Sampler for Topic Models

slide-18
SLIDE 18

Uniform Sampling

෡ Φ = ෍

𝑠

𝜚 𝑦𝑠 𝑞∗(𝑦𝑠)

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝑦1, 𝑦2, … , 𝑦𝑆

slide-19
SLIDE 19

Uniform Sampling

𝑞∗ 𝑦 = 𝑣 𝑦 𝑎∗

෡ Φ = ෍

𝑠

𝜚 𝑦𝑠 𝑞∗(𝑦𝑠)

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal: 𝑎∗ = ෍

𝑠

𝑣(𝑦𝑠)

sample uniformly: 𝑦1, 𝑦2, … , 𝑦𝑆

slide-20
SLIDE 20

Uniform Sampling

𝑞∗ 𝑦 = 𝑣 𝑦 𝑎∗

෡ Φ = ෍

𝑠

𝜚 𝑦𝑠 𝑞∗(𝑦𝑠)

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal: 𝑎∗ = ෍

𝑠

𝑣(𝑦𝑠)

sample uniformly: 𝑦1, 𝑦2, … , 𝑦𝑆

this might work if R (the number of samples) sufficiently hits high probability regions

slide-21
SLIDE 21

Uniform Sampling

𝑞∗ 𝑦 = 𝑣 𝑦 𝑎∗

෡ Φ = ෍

𝑠

𝜚 𝑦𝑠 𝑞∗(𝑦𝑠)

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal: 𝑎∗ = ෍

𝑠

𝑣(𝑦𝑠)

sample uniformly: 𝑦1, 𝑦2, … , 𝑦𝑆

this might work if R (the number of samples) sufficiently hits high probability regions Ising model example:

  • 2H states of high

probability

  • 2N states total
slide-22
SLIDE 22

Uniform Sampling

𝑞∗ 𝑦 = 𝑣 𝑦 𝑎∗

෡ Φ = ෍

𝑠

𝜚 𝑦𝑠 𝑞∗(𝑦𝑠)

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal: 𝑎∗ = ෍

𝑠

𝑣(𝑦𝑠)

sample uniformly: 𝑦1, 𝑦2, … , 𝑦𝑆

this might work if R (the number of samples) sufficiently hits high probability regions Ising model example:

  • 2H states of high

probability

  • 2N states total

chance of sample being in high prob. region:

2𝐼 2𝑂

  • min. samples needed: ∼ 2𝑂−𝐼
slide-23
SLIDE 23

Outline

Recap Monte Carlo methods Sampling Techniques

Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling

Example: Collapsed Gibbs Sampler for Topic Models

slide-24
SLIDE 24

Importance Sampling

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦

ITILA, Fig 29.5

slide-25
SLIDE 25

Importance Sampling

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦

p(x)

ITILA, Fig 29.5

x where Q(x) > p(x):

  • ver-represented

x where Q(x) < p(x): under-represented

slide-26
SLIDE 26

Importance Sampling

𝑥 𝑦𝑠 = 𝑣𝑞 𝑦 𝑣𝑟 𝑦

෡ Φ = σ𝑠 𝜚 𝑦𝑠 𝑥(𝑦𝑠) σ𝑠 𝑥 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦

p(x)

ITILA, Fig 29.5

x where Q(x) > p(x):

  • ver-represented

x where Q(x) < p(x): under-represented

slide-27
SLIDE 27

Importance Sampling

𝑥 𝑦𝑠 = 𝑣𝑞 𝑦 𝑣𝑟 𝑦

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦

p(x)

ITILA, Fig 29.5

x where Q(x) > p(x):

  • ver-represented

x where Q(x) < p(x): under-represented

෡ Φ = σ𝑠 𝜚 𝑦𝑠 𝑥(𝑦𝑠) σ𝑠 𝑥 𝑦𝑠

Q: How reliable will this estimator be?

slide-28
SLIDE 28

Importance Sampling

𝑥 𝑦𝑠 = 𝑣𝑞 𝑦 𝑣𝑟 𝑦

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦

p(x)

ITILA, Fig 29.5

x where Q(x) > p(x):

  • ver-represented

x where Q(x) < p(x): under-represented

෡ Φ = σ𝑠 𝜚 𝑦𝑠 𝑥(𝑦𝑠) σ𝑠 𝑥 𝑦𝑠

Q: How reliable will this estimator be? A: In practice, difficult to say. 𝑥 𝑦𝑠 may not be a good indicator

slide-29
SLIDE 29

Importance Sampling

𝑥 𝑦𝑠 = 𝑣𝑞 𝑦 𝑣𝑟 𝑦

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦

p(x)

ITILA, Fig 29.5

x where Q(x) > p(x):

  • ver-represented

x where Q(x) < p(x): under-represented

෡ Φ = σ𝑠 𝜚 𝑦𝑠 𝑥(𝑦𝑠) σ𝑠 𝑥 𝑦𝑠

Q: How reliable will this estimator be? A: In practice, difficult to say. 𝑥 𝑦𝑠 may not be a good indicator Q: How do you choose a good approximating distribution?

slide-30
SLIDE 30

Importance Sampling

𝑥 𝑦𝑠 = 𝑣𝑞 𝑦 𝑣𝑟 𝑦

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆 approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦

p(x)

ITILA, Fig 29.5

x where Q(x) > p(x):

  • ver-represented

x where Q(x) < p(x): under-represented

෡ Φ = σ𝑠 𝜚 𝑦𝑠 𝑥(𝑦𝑠) σ𝑠 𝑥 𝑦𝑠

Q: How reliable will this estimator be? A: In practice, difficult to say. 𝑥 𝑦𝑠 may not be a good indicator Q: How do you choose a good approximating distribution? A: Task/domain specific

slide-31
SLIDE 31

Importance Sampling: Variance Estimator may vary

ITILA, Fig 29.6

true value

q(x): Gaussian q(x): Cauchy distribution iterations

slide-32
SLIDE 32

Outline

Recap Monte Carlo methods Sampling Techniques

Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling

Example: Collapsed Gibbs Sampler for Topic Models

slide-33
SLIDE 33

Rejection Sampling

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦

ITILA, Fig 29.8

slide-34
SLIDE 34

Rejection Sampling

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦

ITILA, Fig 29.8

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗

select tuples

approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞

slide-35
SLIDE 35

Rejection Sampling

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦

ITILA, Fig 29.8

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨

select tuples

if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points

  • therwise: reject it

approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞

slide-36
SLIDE 36

Rejection Sampling

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦

ITILA, Fig 29.8

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨

select tuples

if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points

  • therwise: reject it

approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞

this produces samples from the p-distribution

slide-37
SLIDE 37

Rejection Sampling

෡ Φ = 1 𝑆 ෍

𝑠

𝜚 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦

ITILA, Fig 29.8

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨

select tuples

if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points

  • therwise: reject it

approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞

slide-38
SLIDE 38

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨

Rejection Sampling

෡ Φ = 1 𝑆 ෍

𝑠

𝜚 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )

ITILA, Fig 29.8

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗

select tuples

if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points

  • therwise: reject it

Q: How reliable will this estimator be?

approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞

slide-39
SLIDE 39

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨

Rejection Sampling

෡ Φ = 1 𝑆 ෍

𝑠

𝜚 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )

ITILA, Fig 29.8

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗

select tuples

if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points

  • therwise: reject it

Q: How reliable will this estimator be? A: How well does Q approximate P? Q: How do you choose a good approximating distribution?

approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞

slide-40
SLIDE 40

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨

Rejection Sampling

෡ Φ = 1 𝑆 ෍

𝑠

𝜚 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )

ITILA, Fig 29.8

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗

select tuples

if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points

  • therwise: reject it

Q: How reliable will this estimator be? A: How well does Q approximate P? Q: How do you choose a good approximating distribution? A: Task/domain specific

approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞

slide-41
SLIDE 41

𝑑 ∗ 𝑣𝑟 𝑦 𝑣𝑞 𝑦 𝑨

Rejection Sampling

෡ Φ = 1 𝑆 ෍

𝑠

𝜚 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝑨𝑙 ∼ Unif(0, 𝑑 ∗ 𝑣𝑟 𝑦𝑙 )

ITILA, Fig 29.8

sample from Q: 𝑦1, 𝑦2, … , 𝑦𝑆∗

select tuples

if 𝑨𝑙 ≤ 𝑣𝑞 𝑦𝑙 : add 𝑦𝑙 to sampled R points

  • therwise: reject it

Q: How reliable will this estimator be? A: How well does Q approximate P? Q: How do you choose a good approximating distribution? A: Task/domain specific

approximating distribution: 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 , 𝑑 ∗ 𝑣𝑟 > 𝑣𝑞 rejection sampling can be difficult to use in high- dimensional spaces 

slide-42
SLIDE 42

Outline

Recap Monte Carlo methods Sampling Techniques

Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling

Example: Collapsed Gibbs Sampler for Topic Models

slide-43
SLIDE 43

Markov Chain Monte Carlo

transition kernel

slide-44
SLIDE 44

Metropolis-Hastings

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

importance and rejection sampling: a single proposal distribution 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 Metropolis-Hastings (and Gibbs): create a proposal distribution based

  • n current state

𝑅 𝑦|𝑦(𝑢) ∝ 𝑣𝑟 𝑦| 𝑦(𝑢)

slide-45
SLIDE 45

Metropolis-Hastings

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

importance and rejection sampling: a single proposal distribution 𝑅 𝑦 ∝ 𝑣𝑟 𝑦 Metropolis-Hastings (and Gibbs): create a proposal distribution based

  • n current state

𝑅 𝑦|𝑦(𝑢) ∝ 𝑣𝑟 𝑦|𝑦(𝑢)

Q does not need to look similar to P

ITILA, Fig 29.10

slide-46
SLIDE 46

Metropolis-Hastings

෡ Φ = 1 𝑆 ෍

𝑠

𝜚 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝛽𝑙 = 𝑣𝑞(𝑦𝑙) 𝑣𝑞 𝑦(𝑢) 𝑣𝑟 𝑦𝑙 𝑣𝑟 𝑦(𝑢) sample from 𝑹 𝑦|𝑦(𝑢) : 𝑦1, 𝑦2, … , 𝑦𝑆∗ if𝛽𝑙 ≥ 1: add 𝑦𝑙 to sampled R points

  • therwise: accept with

probability 𝛽𝑙 transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) ∝ 𝑣𝑟 𝑦|𝑦(𝑢)

slide-47
SLIDE 47

Metropolis-Hastings

෡ Φ = 1 𝑆 ෍

𝑠

𝜚 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝛽𝑙 = 𝑣𝑞(𝑦𝑙) 𝑣𝑞 𝑦(𝑢) 𝑣𝑟 𝑦𝑙 𝑣𝑟 𝑦(𝑢) sample from 𝑹 𝑦|𝑦(𝑢) : 𝑦1, 𝑦2, … , 𝑦𝑆∗ if 𝛽𝑙 ≥ 1: add 𝑦𝑙 to sampled R points

  • therwise: accept with

probability 𝛽𝑙 transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) ∝ 𝑣𝑟 𝑦|𝑦(𝑢)

if accepted: 𝑦(𝑢+1) = 𝑦𝑙

  • therwise: 𝑦(𝑢+1) = 𝑦(𝑢)

samples are not independent

slide-48
SLIDE 48

Metropolis-Hastings

෡ Φ = 1 𝑆 ෍

𝑠

𝜚 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample uniformly: 𝛽𝑙 = 𝑣𝑞(𝑦𝑙) 𝑣𝑞 𝑦(𝑢) 𝑣𝑟 𝑦𝑙 𝑣𝑟 𝑦(𝑢) sample from 𝑹 𝑦|𝑦(𝑢) : 𝑦1, 𝑦2, … , 𝑦𝑆∗ if 𝛽𝑙 ≥ 1: add 𝑦𝑙 to sampled R points

  • therwise: accept with

probability 𝛽𝑙 transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) ∝ 𝑣𝑟 𝑦|𝑦(𝑢)

if accepted: 𝑦(𝑢+1) = 𝑦𝑙

  • therwise: 𝑦(𝑢+1) = 𝑦(𝑢)

samples are not independent Metropolis-Hastings can be used effectively in high- dimensional spaces ☺

slide-49
SLIDE 49

Outline

Recap Monte Carlo methods Sampling Techniques

Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling

Example: Collapsed Gibbs Sampler for Topic Models

slide-50
SLIDE 50

Gibbs Sampling

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) = 𝑞 𝑦 all other variables)

Next sampled value

  • f current variable

Values of all other variables, both new and old

𝑦 𝑗

(𝑢+1) ∼ 𝑞 ⋅ 𝑦 1 𝑢+1 , … , 𝑦 𝑗−1 𝑢+1 , 𝑦 𝑗+1 𝑢

, … , 𝑦 𝑂

𝑢 )

x[i]

slide-51
SLIDE 51

Remember: Markov Blanket

x Markov blanket of a node x is its parents, children, and children's parents

𝑞 𝑦𝑗 𝑦𝑘≠𝑗 = 𝑞(𝑦1, … , 𝑦𝑂) ∫ 𝑞 𝑦1, … , 𝑦𝑂 𝑒𝑦𝑗 = ς𝑙 𝑞(𝑦𝑙|𝜌 𝑦𝑙 ) ∫ ς𝑙 𝑞 𝑦𝑙 𝜌 𝑦𝑙 ) 𝑒𝑦𝑗

factor out terms not dependent on xi

factorization

  • f graph

= ς𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦𝑙 𝑞(𝑦𝑙|𝜌 𝑦𝑙 ) ∫ ς𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦𝑙 𝑞 𝑦𝑙 𝜌 𝑦𝑙 ) 𝑒𝑦𝑗

the set of nodes needed to form the complete conditional for a variable xi

slide-52
SLIDE 52

Gibbs Sampling

෡ Φ = 1 𝑆 ෍

𝑠

𝜚 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample (always accept) from 𝑹 𝑦|𝑦(𝑢) : 𝑦1, 𝑦2, … , 𝑦𝑆∗ transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) = 𝑞 𝑦 MB(𝑦(𝑢)))

𝑦(𝑢+1) = 𝑦𝑙

Markov blanket

samples are not independent Gibbs Sampling can be used effectively in high-dimensional spaces ☺

slide-53
SLIDE 53

Collapsed Gibbs Sampling

෡ Φ = 1 𝑆 ෍

𝑠

𝜚 𝑦𝑠

Φ = 𝜚 𝑦

𝑞 = 𝔽𝑦∼𝑞 𝜚 𝑦

Goal:

sample (always accept) from 𝑹 𝑦|𝑦(𝑢) : 𝑦1, 𝑦2, … , 𝑦𝑆∗ transition kernel/distribution: 𝑅 𝑦|𝑦(𝑢) = ∫ 𝑞 𝑦 MB(𝑦 𝑢 )) 𝑒𝑧 = 𝑞 𝑦 MB−y 𝑦𝑢 )

𝑦(𝑢+1) = 𝑦𝑙

integrate out some of Markov blanket

samples are not independent Collapsed Gibbs can be used effectively in high-dimensional spaces ☺

slide-54
SLIDE 54

Outline

Recap Monte Carlo methods Sampling Techniques

Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling

Example: Collapsed Gibbs Sampler for Topic Models

slide-55
SLIDE 55

Latent Dirichlet Allocation (Blei et al., 2003)

Per- document (latent) topic usage Per-document (unigram) word counts Per-topic word usage

d

slide-56
SLIDE 56

Gibbs Sampler for LDirA

for each document d: resample θd | zd,1,…, zd,Nd for each token i in d: resample zd,i | wd,i , {ψk }, θd for each topic k: resample ψk

slide-57
SLIDE 57

Latent Dirichlet Allocation (Blei et al., 2003)

Per- document (latent) topic usage Per-document (unigram) word counts Per-topic word usage

d

integrate these out

slide-58
SLIDE 58

Collapsed Gibbs Sampler for LDirA

for each document d: resample θd | zd,1,…, zd,Nd for each token i in d: resample zd,i | wd,i , {ψk }, {z*,-i} for each topic k: resample ψk

slide-59
SLIDE 59

Collapsed Gibbs Sampler for LDirA

for each document d: resample θd | zd,1,…, zd,Nd for each token i in d: resample zd,i | wd,i , {ψk }, {z*,-i} for each topic k: resample ψk 𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) = 𝑞(𝑨∗,∗) 𝑞(𝑨∗,−𝑗)

slide-60
SLIDE 60

Sampling: Discrete Observations

Griffiths and Stevers (PNAS, 2004)

slide-61
SLIDE 61

Sampling: Discrete Observations

Griffiths and Stevers (PNAS, 2004)

slide-62
SLIDE 62

Sampling: Discrete Observations

Griffiths and Stevers (PNAS, 2004)

slide-63
SLIDE 63

Sampling: Discrete Observations

Griffiths and Stevers (PNAS, 2004)

Gamma function fact: Γ 𝑦 + 1 = 𝑦Γ(𝑦) maintain count tables

slide-64
SLIDE 64

Sampling: Discrete Observations

Griffiths and Stevers (PNAS, 2004)

𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) = 𝑞(𝑨∗,∗) 𝑞(𝑨∗,−𝑗)

Collapsed Gibbs Sampling goal:

slide-65
SLIDE 65

Sampling: Discrete Observations

Griffiths and Stevers (PNAS, 2004)

Gamma function fact: Γ 𝑦 + 1 = 𝑦Γ(𝑦)

𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) = Γ(σ𝑙 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 + 𝛽𝑙) ς𝑙 Γ(𝑑 𝑒, 𝑙 + 𝛽𝑙) Γ(𝛽𝑙) Γ(σ𝑙 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) ς𝑙 Γ(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) Γ(𝛽𝑙)

Collapsed Gibbs Sampling goal:

slide-66
SLIDE 66

Sampling: Discrete Observations

Griffiths and Stevers (PNAS, 2004)

Gamma function fact: Γ 𝑦 + 1 = 𝑦Γ(𝑦) 𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) = ς𝑙(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙)Γ(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) (σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) ς𝑙 Γ(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙)

Collapsed Gibbs Sampling goal:

slide-67
SLIDE 67

Sampling: Discrete Observations

Griffiths and Stevers (PNAS, 2004)

Gamma function fact: Γ 𝑦 + 1 = 𝑦Γ(𝑦) 𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) = ς𝑙(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙)Γ(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) (σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) ς𝑙 Γ(𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙) Γ(σ𝑙 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙)

Collapsed Gibbs Sampling goal:

slide-68
SLIDE 68

Sampling: Discrete Observations

Griffiths and Stevers (PNAS, 2004)

Gamma function fact: Γ 𝑦 + 1 = 𝑦Γ(𝑦)

𝑞 𝑨𝑒𝑗 = 𝑙 𝑨∗,−𝑗) ∝ 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙

Collapsed Gibbs Sampling goal:

maintain count tables

slide-69
SLIDE 69

Collapsed Gibbs Sampler for LDirA

for each document d: for each token i in d: resample zd,i | wd,i , {ψk }, {z*,-i} 𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) =∝ 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙 ∗topic-word counts

slide-70
SLIDE 70

Collapsed Gibbs Sampler for LDirA

randomly assign z*,* maintain count tables: c(d,k): document-topic counts c(k,v): topic-word counts for each document d: for each token i in d: unassign topic: zd,i resample zd,i | wd,i , {ψk }, {z*,-i} reassign topic: zd,i

𝑞 𝑨𝑒𝑗 𝑨∗,−𝑗) =∝ 𝑑 𝑒, 𝑙 − 1 + 𝛽𝑙 ∗topic-word counts

decrease counts increase counts

slide-71
SLIDE 71

Outline

Recap Monte Carlo methods Sampling Techniques

Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling

Example: Collapsed Gibbs Sampler for Topic Models