Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - - - PowerPoint PPT Presentation

introduction to bayesian computation
SMART_READER_LITE
LIVE PREVIEW

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - - - PowerPoint PPT Presentation

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 26, 2019 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 1 / 30 Bayesian computation Goals: E | y [ h ( ) |


slide-1
SLIDE 1

Introduction to Bayesian Computation

  • Dr. Jarad Niemi

STAT 544 - Iowa State University

March 26, 2019

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 1 / 30

slide-2
SLIDE 2

Bayesian computation

Goals: Eθ|y[h(θ)|y] =

  • h(θ)p(θ|y)dθ

p(y) =

  • p(y|θ)p(θ)dθ = Eθ[p(y|θ)]

Approaches: Deterministic approximation Monte Carlo approximation

Theoretical justification Gridding Inverse CDF Accept-reject

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 2 / 30

slide-3
SLIDE 3

Numerical integration

Deterministic methods where E[h(θ)|y] =

  • h(θ)p(θ|y)dθ ≈

S

  • S=1

wsh

  • θ(s)

p

  • θ(s)
  • y
  • and

θ(s) are selected points, ws is the weight given to the point θ(s), and the error can be bounded.

Monte Carlo (simulation) methods where E[h(θ)|y] =

  • h(θ)p(θ|y)dθ ≈

S

  • S=1

wsh

  • θ(s)

and

θ(s) ind ∼ g(θ) (for some proposal distribution g), ws = p(θ(s)|y)/g(θ(s)), and we have SLLN and CLT.

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 3 / 30

slide-4
SLIDE 4

Deterministic methods

Example: Normal-Cauchy model

Let Y ∼ N(θ, 1) with θ ∼ Ca(0, 1). The posterior is p(θ|y) ∝ p(y|θ)p(θ) ∝ exp(−(y − θ)2/2) 1 + θ2 = q(θ|y) which is not a known distribution. We might be interested in

  • 1. normalizing this posterior, i.e. calculating

c(y) =

  • q(θ|y)dθ
  • 2. or in calculating the posterior mean, i.e.

E[θ|y] =

  • θp(θ|y)dθ =
  • θq(θ|y)

c(y) dθ.

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 4 / 30

slide-5
SLIDE 5

Deterministic methods

Normal-Cauchy: marginal likelihood

y = 1 # Data q = function(theta, y, log = FALSE) {

  • ut = -(y-theta)^2/2-log(1+theta^2)

if (log) return(out) return(exp(out)) } # Find normalizing constant for q(theta|y) w = 0.1 theta = seq(-5,5,by=w)+y (cy = sum(q(theta,y)*w)) # gridding based approach [1] 1.305608 integrate(function(x) q(x,y), -Inf, Inf) # numerical integration 1.305609 with absolute error < 0.00013 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 5 / 30

slide-6
SLIDE 6

Deterministic methods

Normal-Cauchy: distribution

curve(q(x,y)/cy, -5, 5, n=1001) points(theta,rep(0,length(theta)), cex=0.5, pch=19) segments(theta,0,theta,q(theta,y)/cy) −4 −2 2 4 0.0 0.1 0.2 0.3 0.4 0.5 x q(x, y)/cy Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 6 / 30

slide-7
SLIDE 7

Deterministic methods

Posterior expectation

E[h(θ)|y] ≈

S

  • s=1

wsh

  • θ(s)

p

  • θ(s)
  • y
  • =

S

  • s=1

wsh

  • θ(s) q
  • θ(s)

y

  • c(y)

h = function(theta) theta sum(w*h(theta)*q(theta,y)/cy) [1] 0.5542021 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 7 / 30

slide-8
SLIDE 8

Deterministic methods

Posterior expectation as a function of observed data

−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0

y Posterior expectation prior

Cauchy normal improper uniform

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 8 / 30

slide-9
SLIDE 9

Monte Carlo methods

Convergence review

Three main notions of convergence of a sequence of random variables X1, X2, . . . and a random variable X: Convergence in distribution (Xn

d

→ X): lim

n→∞ Fn(X) = F (x).

Convergence in probability (WLLN, Xn

p

→ X): lim

n→∞ P (|Xn − X| ≥ ǫ) = 0.

Almost sure convergence (SLLN, Xn

a.s.

− → X): P

  • lim

n→∞ Xn = X

  • = 1.

Implications: Almost sure convergence implies convergence in probability. Convergence in probability implies convergence in distribution. Here, Xn will be our approximation to an integral and X the true (constant) value of that integral or Xn will be a standardized approximation and X will be N(0, 1). Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 9 / 30

slide-10
SLIDE 10

Monte Carlo methods

Monte Carlo integration

Consider evaluating the integral E[h(θ)] =

  • Θ

h(θ)p(θ)dθ using the Monte Carlo estimate ˆ hS = 1 S

S

  • s=1

h

  • θ(s)

where θ(s) ind ∼ p(θ). We know SLLN: ˆ hS

a.s.

− → E[h(θ)]. CLT: if h2 has finite expectation, then ˆ hS − E[h(θ)]

  • vS/S

d

→ N(0, 1) where vS = V ar[h(θ)] ≈ 1 S

S

  • s=1
  • h
  • θ(s)

− ˆ hS 2

  • r any other consistent estimator.

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 10 / 30

slide-11
SLIDE 11

Monte Carlo methods Definite integral

Definite integral

Suppose you are interested in evaluating I = 1 e−θ2/2dθ. Then set h(θ) = e−θ2/2 and p(θ) = 1, i.e. θ ∼ Unif(0, 1). and approximate by a Monte Carlo estimate via

  • 1. For s = 1, . . . , S,
  • a. sample θ(s) ind

∼ Unif(0, 1) and

  • b. calculate h
  • θ(s)

.

  • 2. Calculate

I ≈ 1 S

S

  • s=1

h(θ(s)).

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 11 / 30

slide-12
SLIDE 12

Monte Carlo methods Definite integral

Monte Carlo sampling randomly infills

0.0 0.4 0.8 0.6 0.7 0.8 0.9 1.0

20 samples

x f(x) 0.0 0.4 0.8 0.6 0.7 0.8 0.9 1.0

200 samples

x f(x)

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 12 / 30

slide-13
SLIDE 13

Monte Carlo methods Definite integral

Strong law of large numbers

200 400 600 800 1000 0.85 0.90 0.95

Monte Carlo estimate

Number of samples Estimate

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 13 / 30

slide-14
SLIDE 14

Monte Carlo methods Definite integral

Central limit theorem

200 400 600 800 1000 0.75 0.80 0.85 0.90 0.95

Monte Carlo estimate

Number of samples Estimate Truth Estimate 95% CI

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 14 / 30

slide-15
SLIDE 15

Monte Carlo methods Infinite bounds

Infinite bounds

Suppose θ ∼ N(0, 1) and you are interested in evaluating E[θ] = ∞

−∞

θ 1 √ 2πe−θ2/2dθ Then set h(θ) = θ and g(θ) = φ(θ), i.e. θ ∼ N(0, 1). and approximate by a Monte Carlo estimate via

  • 1. For s = 1, . . . , S,
  • a. sample θ(s) ind

∼ N(0, 1) and

  • b. calculate h
  • θ(s)

.

  • 2. Calculate

E[θ] ≈ 1 S

S

  • s=1

h(θ(s)).

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 15 / 30

slide-16
SLIDE 16

Monte Carlo methods Infinite bounds

Non-uniform sampling

−3 −1 1 2 3 −3 −2 −1 1 2 3

n=20

locations, theta h(theta)=theta −3 −1 1 2 3 −3 −2 −1 1 2 3

n=100

locations, theta h(theta)=theta

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 16 / 30

slide-17
SLIDE 17

Monte Carlo methods Infinite bounds

Monte Carlo estimate

200 400 600 800 1000 −0.6 −0.4 −0.2 0.0 0.2 0.4

Monte Carlo estimate

S Estimate Truth Estimate 95% CI

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 17 / 30

slide-18
SLIDE 18

Monte Carlo methods Gridding

Monte Carlo approximation via gridding

Rather than determining c(y) and then E[θ|y] via deterministic gridding (all wi are equal), we can use the grid as a discrete approximation to the posterior, i.e. p(θ|y) ≈

N

  • i=1

piδθi(θ) pi = q(θi|y) N

s=1 q(θj|y)

where δθi(θ) is the Dirac delta function, i.e. δθi(θ) = 0 ∀ θ = θi

  • δθi(θ)dθ = 1.

This discrete approximation to p(θ|y) can be used to approximate the expectation E[h(θ)|y] deterministically or via simulation, i.e. E[h(θ)|y] ≈

N

  • i=1

pih(θi) E[h(θ)|y] ≈ 1 S

S

  • s=1

h

  • θ(s)

where θ(s) ind ∼ N

i=1 piδθi(θ) (with replacement).

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 18 / 30

slide-19
SLIDE 19

Monte Carlo methods Gridding

Example: Normal-Cauchy model

y = 1 # Data # Small number of grid locations theta = seq(-5,5,length=1e2+1)+y; p = q(theta,y)/sum(q(theta,y)); sum(p*theta) [1] 0.5542021 mean(sample(theta,prob=p,replace=TRUE)) [1] 0.6118812 # Large number of grid locations theta = seq(-5,5,length=1e6+1)+y; p = q(theta,y)/sum(q(theta,y)); sum(p*theta) [1] 0.5542021 mean(sample(theta,1e2,prob=p,replace=TRUE)) # But small MC sample [1] 0.598394 # Truth post_expectation(1) [1] 0.5542021 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 19 / 30

slide-20
SLIDE 20

Monte Carlo methods Inverse CDF method

Inverse cumulative distribution function

Definition The cumulative distribution function (cdf) of a random variable X is defined by FX(x) = PX(X ≤ x) for all x. Lemma Let X be a random variable whose cdf is F(x) and you have access to the inverse cdf of X, i.e. if u = F(x) = ⇒ x = F −1(u). If U ∼ Unif(0, 1), then X = F −1(U) is a simulation from the distribution for X.

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 20 / 30

slide-21
SLIDE 21

Monte Carlo methods Inverse CDF method

Inverse CDF

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

Standard normal CDF

x pnorm(x)

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 21 / 30

slide-22
SLIDE 22

Monte Carlo methods Inverse CDF method

Exponential example

For example, to sample X ∼ Exp(1),

  • 1. Sample U ∼ Unif(0, 1).
  • 2. Set X = − log(1 − U), or X = − log(U).

0.00 0.25 0.50 0.75 1.00 0.0 2.5 5.0 7.5

x density

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 22 / 30

slide-23
SLIDE 23

Monte Carlo methods Inverse CDF method

Sampling from a univariate truncated distribution

Suppose you wish to sample from X ∼ N(µ, σ2)I(a < X < b), i.e. a normal random variable with untruncated mean µ and variance σ2, but truncated to the interval (a, b). Suppose the untruncated cdf is F and inverse cdf is F −1.

  • 1. Calculate endpoints pa = F(a) and pb = F(b).
  • 2. Sample U ∼ Unif(pa, pb).
  • 3. Set X = F −1(U).

This just avoids having to recalculate the normalizing constant for the pdf, i.e. 1/(F −1(b) − F −1(a)).

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 23 / 30

slide-24
SLIDE 24

Monte Carlo methods Inverse CDF method

Truncated normal

X ∼ N(5, 92)I(1 ≤ X ≤ 6)

−10 −5 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

N(5,9^2) cdf

x pnorm(x, mean, sd)

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 24 / 30

slide-25
SLIDE 25

Monte Carlo methods Inverse CDF method

Truncated normal

X ∼ N(5, 92)I(1 ≤ X ≤ 6)

0.0 0.1 0.2 2 4 6

x density

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 25 / 30

slide-26
SLIDE 26

Monte Carlo methods Rejection sampling

Rejection sampling

Suppose you wish to obtain samples θ ∼ p(θ|y), rejection sampling performs the following

  • 1. Sample a proposal θ∗ ∼ g(θ) and U ∼ Unif(0, 1).
  • 2. Accept θ = θ∗ as a draw from p(θ|y) if U ≤ p(θ∗|y)/Mg(θ∗),
  • therwise return to step 1.

where M satisfies M g(θ) ≥ p(θ|y) for all θ. For a given proposal distribution g(θ), the optimal M is M = supθ p(θ|y)/g(θ). The probability of acceptance is 1/M. The accept-reject idea is to create an envelope, M g(θ), above p(θ|y).

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 26 / 30

slide-27
SLIDE 27

Monte Carlo methods Rejection sampling

Rejection sampling with unnormalized density

Suppose you wish to obtain samples θ ∼ p(θ|y) ∝ q(θ|y), rejection sampling performs the following

  • 1. Sample a proposal θ∗ ∼ g(θ) and U ∼ Unif(0, 1).
  • 2. Accept θ = θ∗ as a draw from p(θ|y) if U ≤ q(θ∗|y)/M†g(θ∗),
  • therwise return to step 1.

where M† satisfies M† g(θ) ≥ q(θ|y) for all θ. For a given proposal distribution g(θ), the optimal M† is M† = supθ q(θ|y)/g(θ). The acceptance probability is 1/M = c(y)/M †. The accept-reject idea is to create an envelope, M† g(θ), above q(θ|y).

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 27 / 30

slide-28
SLIDE 28

Monte Carlo methods Rejection sampling

Example: Normal-Cauchy model

If Y ∼ N(θ, 1) and θ ∼ Ca(0, 1), then p(θ|y) ∝ e−(y−θ)2/2 1 (1 + θ2) for θ ∈ R. Choose a N(y, 1) as a proposal distribution, i.e. g(θ) = 1 √ 2πe−(θ−y)2/2 with M† = sup

θ

q(θ|y) g(θ) = sup

θ

e−(y−θ)2/2

1 (1+θ2) 1 √ 2πe−(θ−y)2/2

= √ 2π (1 + θ2) ≤ √ 2π The acceptance rate is 1/M = c(y)/M † = 1.3056085/ √ 2π = 0.5208624.

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 28 / 30

slide-29
SLIDE 29

Monte Carlo methods Rejection sampling

Example: Normal-Cauchy model

0.00 0.25 0.50 0.75 1.00 −2 −1 1 2 3

sample u M g(θ) accept

FALSE TRUE

Observed acceptance rate was 0.52 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 29 / 30

slide-30
SLIDE 30

Monte Carlo methods Rejection sampling

Heavy-tailed proposals

Suppose our target is a standard Cauchy and our (proposed) proposal is a standard normal, then p(θ|y) g(θ) =

1 π(1+θ2) 1 √ 2πe−θ2/2

and

1 π(1+θ2) 1 √ 2πe−θ2/2 θ→∞

− → ∞ since e−a converges to zero faster than 1/(1 + a). Thus, there is no value M such that M g(θ) ≥ p(θ|y) for all θ. TL;DR the condition M g(θ) ≥ p(θ|y) requires the proposal to have tails at least as thick (heavy) as the target.

Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 30 / 30