Examples and comparisons of rigorous error bounds for MCMC estimates - - PowerPoint PPT Presentation

examples and comparisons of rigorous error bounds for
SMART_READER_LITE
LIVE PREVIEW

Examples and comparisons of rigorous error bounds for MCMC estimates - - PowerPoint PPT Presentation

Outline Examples and comparisons of rigorous error bounds for MCMC estimates Part I Wojciech Niemiro Nicolaus Copernicus University, Toru n, Poland & University of Warsaw, Poland joint work with Ba zej Miasojedow University of


slide-1
SLIDE 1

Outline

Examples and comparisons of rigorous error bounds for MCMC estimates

Part I Wojciech Niemiro Nicolaus Copernicus University, Toru´ n, Poland & University of Warsaw, Poland

joint work with Bła˙ zej Miasojedow University of Warsaw, Poland & LCTI Telecom ParisTech, France and Agnieszka Perduta-Pilarska Nicolaus Copernicus University, Toru´ n, Poland

MCQMC Sydney, February 2012

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-2
SLIDE 2

Outline

Outline

Introduction Computing integrals via MCMC Rigorous accuracy bounds Non-asymptotic bounds Basic bound on MSE Explicit bounds for geometrically ergodic chains Explicit bounds for uniformly ergodic chains Confidence estimation Median of Averages Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-3
SLIDE 3

Introduction Non-asymptotic bounds Confidence estimation Examples Computing integrals via MCMC Rigorous accuracy bounds

Computing integrals via MCMC

We are to compute Eπf = π(f) =

  • X

f(x)π(dx), where

◮ X – state space (high dimensional). ◮ π – probability distribution on X (posterior in a Bayesian model).

Markov chain X0, X1, . . . , Xn, . . . P(Xn ∈ ·) → π(·) (n → ∞). MCMC estimator ¯ fn = 1 n

n−1

  • i=0

f(Xi) → Eπf (n → ∞).

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-4
SLIDE 4

Introduction Non-asymptotic bounds Confidence estimation Examples Computing integrals via MCMC Rigorous accuracy bounds

Accuracy bounds

Rate of convergence of probability distributions: P(Xt ∈ ·) − π(·) ≤ ? considered in many papers ... Rate of convergence of sample averages. Mean square error: √ MSE =

  • E (¯

fn − Eπf)2 ≤ ? Confidence bounds: P(|¯ fn − Eπf| > ε) ≤ ?

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-5
SLIDE 5

Introduction Non-asymptotic bounds Confidence estimation Examples Computing integrals via MCMC Rigorous accuracy bounds

Convergence of probability distributions:

Gibbs Sampler in a hierarchical Bayesian model of variance

  • components. Trajectory of one of 1003 coordinates.

Time to reach stationarity: ∼ 5-10 steps (visual asessment).

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-6
SLIDE 6

Introduction Non-asymptotic bounds Confidence estimation Examples Computing integrals via MCMC Rigorous accuracy bounds

Convergence of sample averages:

Gibbs Sampler in a hierarchical Bayesian model of variance

  • components. The same computation. A graph of cumulative

averages. Time to stabilize estimates: at least ∼ 1000 steps (visual asessment).

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-7
SLIDE 7

Introduction Non-asymptotic bounds Confidence estimation Examples Computing integrals via MCMC Rigorous accuracy bounds

Why insist on non-asymptotic bounds?

A simulation of an ergodic chain (by courtesy of K. Łatuszy´ nski).

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-8
SLIDE 8

Introduction Non-asymptotic bounds Confidence estimation Examples Computing integrals via MCMC Rigorous accuracy bounds

Why insist on non-asymptotic bounds?

A longer version of the same simulation.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-9
SLIDE 9

Introduction Non-asymptotic bounds Confidence estimation Examples Basic bound on MSE Explicit bounds for geometrically ergodic chains Explicit bounds for uniformly ergodic chains

Assumptions

Notation for the transition kernel: P(x, A) = P(Xn ∈ A|Xn−1 = x).

ASSUMPTIONS

◮ Stationary distribution. There exists a probability distribution π

  • n X such that πP = π and P is π-irreducible.

◮ Small set (minorization condition). There exist J ⊆ X with

π(J) > 0, a probability measure ν and β > 0 such that P(x, ·) ≥ β1(x ∈ J)ν(·). The minorization condition allows us to use an approach based on regeneration due to Nummelin, Athreya & Ney.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-10
SLIDE 10

Introduction Non-asymptotic bounds Confidence estimation Examples Basic bound on MSE Explicit bounds for geometrically ergodic chains Explicit bounds for uniformly ergodic chains

Regeneration

Under the minorization condition we can partition a trajectory of the chain into independent “tours” (excursions) of random length: X0, . . . , XT1−1

  • T1

, XT1, . . . , XT2−1

  • T2−T1

, XT2, . . . , XT3−1

  • T3−T2

, . . . ↑ ↑ R R R ⇐ ⇒ Regeneration ⇐ ⇒ Xn ∼ ν(·). The excursions are i.i.d. except for the first one, which can have a different distribution).

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-11
SLIDE 11

Introduction Non-asymptotic bounds Confidence estimation Examples Basic bound on MSE Explicit bounds for geometrically ergodic chains Explicit bounds for uniformly ergodic chains

Basic bound for the Mean Square Error

THEOREM (Łatuszy´ nski, Miasojedow & Niemiro, 2011)

Under Assumptions Stationary distribution and Small set,

  • Eξ (¯

fn − Eπf)2 ≤ σas √n

  • 1 + C0

n

  • + C1

n + C2 n , where σ2

as = σ2 as(P, f) is the “asymptotic variance” in the CLT:

√n σas (¯ fn − Eπf) →d N(0, 1) (n → ∞). The leading term in our bound is optimal, because

  • Eξ (¯

fn − Eπf)2 ∼ σas √n (n → ∞). Constants C0, C1, C2 are expressed in terms of “regenerative tours”.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-12
SLIDE 12

Introduction Non-asymptotic bounds Confidence estimation Examples Basic bound on MSE Explicit bounds for geometrically ergodic chains Explicit bounds for uniformly ergodic chains

Geometric drift condition

Explicit bounds for the MSE for geometrically ergodic chains. Notation for the transition operator: PV(x) =

  • X

P(x, dy)V(y) = E(V(Xn)|Xn−1 = x).

ASSUMPTION

◮ Geometric Drift. There exist a function V : X → [1, ∞[,

constants λ < 1 and K < ∞ such that PV(x) ≤

  • λV(x)

for x ∈ J, K for x ∈ J. J is a small set which appears in the minorization condition.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-13
SLIDE 13

Introduction Non-asymptotic bounds Confidence estimation Examples Basic bound on MSE Explicit bounds for geometrically ergodic chains Explicit bounds for uniformly ergodic chains

Explicit bounds under drift condition

Bounds on constants σas, C0, C1, C2 (Łatuszy´ nski, Miasojedow & Niemiro, 2011). Under Assumptions Small Set and Geometric Drift, if f is such that f − π(f)V 1/2 = supx |f(x) − π(f)|/V(x)1/2 ≤ 1 then σ2

as ≤ 1 + λ1/2

1 − λ1/2 π(V) + 2(K 1/2 − λ1/2 − β) β(1 − λ1/2) π(V 1/2), C0 ≤ λ1/2 1 − λ1/2 π(V 1/2) + K 1/2 − λ1/2 − β β(1 − λ1/2) , C2

1 ≤

1 (1 − λ1/2)2 ξ(V) + 2(K 1/2 − λ1/2 − β) β(1 − λ1/2)2 ξ(V 1/2) + β(K − λ − β) + 2(K 1/2 − λ1/2 − β)2 β2(1 − λ1/2)2 , C2

2 ≤ analogous expression with ξ replaced by ξPn.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-14
SLIDE 14

Introduction Non-asymptotic bounds Confidence estimation Examples Basic bound on MSE Explicit bounds for geometrically ergodic chains Explicit bounds for uniformly ergodic chains

Explicit bounds under drift condition

Further bounds on π(V 1/2), π(V), ξPn(V 1/2), ξPn(V), f − π(f)V 1/2. Under Assumptions Small Set and Geometric Drift, π(V 1/2) ≤ π(J)K 1/2 − λ1/2 1 − λ1/2 ≤ K 1/2 − λ1/2 1 − λ1/2 , π(V) ≤ π(J)K − λ 1 − λ ≤ K − λ 1 − λ , if ξ(V 1/2) ≤ K 1/2 1 − λ1/2 then ξPn(V 1/2) ≤ K 1/2 1 − λ1/2 , if ξ(V) ≤ K 1 − λ then ξPn(V) ≤ K 1 − λ, f − π(f)V 1/2 ≤ fV 1/2

  • 1 +

π(J)(K 1/2 − λ1/2) (1 − λ1/2) infx∈X V 1/2(x)

  • ≤ fV 1/2
  • 1 + K 1/2 − λ1/2

1 − λ1/2

  • .
  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-15
SLIDE 15

Introduction Non-asymptotic bounds Confidence estimation Examples Basic bound on MSE Explicit bounds for geometrically ergodic chains Explicit bounds for uniformly ergodic chains

Uniformly ergodic chains and perfect sampling

If Assumption Small Set is satisfied with J = X then XT2−1, XT3−1, . . . are i.i.d. and have exactly π-distribution. For a perfect sampler ˇ fr = 1 r

r

  • k=1

XTk+1−1, we obviously have the standard i.i.d. bound

  • E (ˇ

fr − Eπf)2 ≤ Dπf √r , where Dπf = √Varπf. Remember that to generate r perfect samples we have to run the chain r/β steps (on the average).

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-16
SLIDE 16

Introduction Non-asymptotic bounds Confidence estimation Examples Median of Averages

Confidence estimation via Median of Averages

Median of Averages (MA) (Jerrum, Valiant and Vazirani, 1986): Generate m independent copies of the Markov chain and compute averages: X (1)

0 , X (1) 1 , . . . , X (1) n−1

− → ¯ f (1)

n

=

n−1

  • i=0

f(X (1)

i

), · · · X (m) , X (m)

1

, . . . , X (m)

n−1

− → ¯ f (m)

n

=

n−1

  • i=0

f(X (m)

i

). Estimator MA: ˆ fm,n = med

  • ¯

f (1)

n , . . . ,¯

f (m)

n

  • .
  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-17
SLIDE 17

Introduction Non-asymptotic bounds Confidence estimation Examples Median of Averages

Confidence estimation via Median of Averages

Nonasymptotic level of confidence for the MA estimator: P(|ˆ fm,n − Eπf| > ε) ≤ α, To ensure given precision ε at a given level of confidence 1 − α we need mn samples, where n is such that

  • E (¯

fn − Eπf)2 ≤ 0.346 ε, , and m ≥ 2.315 log(2α)−1, (Niemiro and Pokarowski, 2009).

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-18
SLIDE 18

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Hierarchical Poisson/Gamma model

A Bayesian model applied to a well-known pump failure data set, analysed by Gelfand and Smith (1990), Tierney (1994), Rosenthal (1995), Mykland et al. (1995). Data consist of m = 10 pairs (yi, ti) where yi is the number of failures for ith pump, during ti observed

  • hours. The model assumes that:

yi ∼ Poiss(tiθi), conditionally independent for i = 1, . . . m, θi ∼ Gamma(α, r), conditionally i.i.d. for i = 1, . . . , m, r ∼ Gamma(σ, δ). where α, σ, δ are known hyperparameters. The numeric results correspond to the hyperparameter values used in the above cited papers: α = 1.802, σ = 0.01 and δ = 1.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-19
SLIDE 19

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Hierarchical Poisson/Gamma model

Gibbs Sampler. In this example: x = (θ, r) = (θ1, . . . , θm, r) ∈ Rm × [0, ∞[. Gibbs Sampler updates cyclically r and θ using the following conditional distributions: r|θ, y ∼ Gamma

  • mα + σ, δ +
  • θi
  • θi|θ−i, r, y

∼ Gamma (yi + α, ti + r)

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-20
SLIDE 20

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Bounds based on drift/minorization

Rosenthal’s drift/minorization. Rosenthal (1995) constructed a small set J = {(θ, r) : 4 ≤ θi ≤ 9} which satisfies the one-step minorization condition and established a geometric drift towards J with V(θ, r) = 1 + ( θi − 6.5)2. The constants are the following: β = 0.14, λ = 0.46, K = 3.3. Remark: The choice of J and V is taylored for data/hyperparameter under consideration and probably close to optimum. Constants in our MSE bound: σas ≤ 171.6, C0 ≤ 27.5, C1 ≤ 547.7, C2 ≤ 676.1.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-21
SLIDE 21

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Bounds based on drift/minorization

Rosenthal’s bound on the total variation distance: P(Xt ∈ ·) − π(·) ≤ (0.976)t + 6.2 · (0.951)t. This can be regarded as a bound on sufficient burn-in time t. A bound on the MSE: To get

  • E (¯

fn − Eπf)2 ≤ 171.6 √n

  • 1 + 27.5

n

  • + 1223.8

n . This can be regarded as a bound on sufficient simulation time n.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-22
SLIDE 22

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Bounds based on drift/minorization

Rosenthal’s bound on the total variation distance: To get P(Xt ∈ ·) − π(·) ≤ 0.01 one needs t = 192 steps of GS. This can be regarded as a bound on sufficient burn-in time t. A bound on the MSE: To get

  • E (¯

fn − Eπf)2 ≤ 0.01 one needs n = 2.95 · 108 steps of GS. This can be regarded as a bound on sufficient simulation time n. A confidence bound for MA: To get P(|ˆ fm,n − Eπf| > 0.01) ≤ 0.01 one needs n = 2.47 · 109 steps

  • f GS, repeated m = 11 times.
  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-23
SLIDE 23

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Bounds for a perfect sampler

In fact, the GS in Poisson/Gamma model is uniformly ergodic. Minorization condition holds for the whole space with β = 1.64 · 10−8. If we use the bound Dπf ≤ 34.623 (from a drift condition!) then for a perfect sampling-based estimate ˇ fr we obtain that

  • E (ˇ

fr − Eπf)2 ≤ 0.01 requires r/β = 7.41014 steps of GS.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-24
SLIDE 24

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Hierarchical model of variance components

The model examined by Rosenthal (1995) and Jones and Hobert (2004). yij ∼ N(θi, r −1) for i = 1, . . . k, j = 1, . . . m θi ∼ N(µ, q−1) for i = 1, . . . , k q ∼ Gamma(α, δ), p(r, µ) ∝ r −1. Remark: we consider the balanced model and an improper prior for r and µ.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-25
SLIDE 25

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Hierarchical model of variance components

The conditionals used by a block Gibbs Sampler: r|θ, µ, q, y ∼ Gamma km 2 , V2(θ) + RSS 2

  • ,

q|θ, µ, r, y ∼ Gamma k 2 + α, V1(θ, µ) 2 + δ

  • ,

(θ, µ)|r, q, y ∼ N(Σ, ξ) where RSS =

k

  • i=1

m

  • j=1

(yij − ¯ yi)2, ESS =

k

  • i=1

(¯ y − ¯ yi)2, Σ−1 = (rm + q)I −q1 −q1T kq

  • ,

ξ = Σ [rm ¯ y1, . . . , rm ¯ yk, 0]T .

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-26
SLIDE 26

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Hierarchical model of variance components

V1(θ, µ) =

k

  • i=1

(θi − µ)2, V2(θ) =

k

  • i=1

m(θi − ¯ yi)2. Function VR(θ, µ) := A · V1(θ, µ) + B · V2(θ) satisfies the Rosenthal-type drift condition PVR(θ, µ) ≤ γVR(θ, µ) + b. with γ = max k(A + Bm) − A Bm(km − 2) , 1 2α + k − 2

  • ,

b = k(A + mB) − A m(km − 2) · RSS + max(A, Bm) · ESS + A · 2δ 2α + k − 2.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-27
SLIDE 27

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Hierarchical model of variance components

Minorization condition is obtained exactly as in Jones and Hobert for J := {(θ, µ) : VR(θ, µ) ≤ d}. To apply our MSE bounds, we let V(θ, µ) := 1 + VR(θ, µ) and infer a Baxendale-type drift condition: PV(θ, µ) ≤

  • λV(θ, µ)

for (θ, µ) ∈ J; K for (θ, µ) ∈ J.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-28
SLIDE 28

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Hierarchical model of variance components

The following data are taken from Jones and Hobert (2004): Simulated data i 1 2 3 4 5 ¯ yi −0.80247 −1.0014 −0.69090 −1.1413 −1.0125 m = 10, ¯ y = −0.92973, RSS = 32.990 Different prior specifications Hyperparameter setting α δ 1 2.5 1 2 0.1 0.1 3 0.01 0.01

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-29
SLIDE 29

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Hierarchical model of variance components

Total variation bounds for the block Gibbs sampler Hyperparameter Bound on setting γ β t TV 1 0.1827 8.322 x 10−3 7.875 x 103 0.00999 2 0.3125 6.807 x 10−4 9.267 x 104 0.00999 3 0.3311 7.67 x 10−4 7.501 x 104 0.00999 √ MSE nonasymptotic bounds for the block Gibbs sampler Hyperparameter Bound on setting λ K β n √ MSE 1 0.7241 1.991 8.82 x 10−2 4.9 x 106 0.01 2 0.836 2.215 2.89 x 10−2 5.98 x 107 0.01 3 0.79 2.31 4.8 x 10−3 2.5 x 108 0.01

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-30
SLIDE 30

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Hierarchical model of variance components

Remark: The choice of improper priors + some minor modifications of the bounds + fine tuning of the parameters A, B, d, . . . lead to some improvement of the TV bounds as compared to Jones and Hobert (2004). Comment (too vague to be a conclusion): Our focus is on the MSE bounds. They suggest that n (simulation time) should be ∼ 103 times as big as t (burn-in time). But both types of bounds are pessimistic.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-31
SLIDE 31

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

The end

Thank you.

  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC

slide-32
SLIDE 32

Introduction Non-asymptotic bounds Confidence estimation Examples Hierarchical Poisson/Gamma model Hierarchical model of variance components

Some papers with nonasymptotic bounds on MSE

WN, P . Pokarowski, Fixed precision MCMC Estimation by Median of Products

  • f Averages, J. Appl. Probab. 2009.
  • K. Łatuszy´

nski, WN, Rigorous confidence bounds for MCMC under a geometric drift condition, J. Complex. 2011.

  • K. Łatuszy´

nski, B. Miasojedow, WN, Nonasymptotic bounds on the estimation error for regenerative MCMC algorithms, MCQMC 2010 (to appear).

  • K. Łatuszy´

nski, B. Miasojedow, WN, Nonasymptotic bounds on the estimation error of MCMC algorithms, (submitted).

  • G. Fort, E. Moulines, Convergence of the Monte Carlo EM for curved

exponential families, Ann. Appl. Probab. 2003.

  • R. Douc, A. Guillin and E. Moulines, Bounds on regeneration times and limit

theorems for subgeometric Markov chains, Ann. Inst. H. Poincar´ e Probab.

  • Statist. 2008.
  • D. Rudolf, Explicit error bounds for lazy reversible Markov chain Monte Carlo,
  • J. Complex. 2009.
  • W. Niemiro, Toru´

n, Poland Examples of error bounds for MCMC