Complexity Results for MCMC derived from Quantitative Bounds Jun - - PowerPoint PPT Presentation

complexity results for mcmc derived from quantitative
SMART_READER_LITE
LIVE PREVIEW

Complexity Results for MCMC derived from Quantitative Bounds Jun - - PowerPoint PPT Presentation

Complexity Results for MCMC derived from Quantitative Bounds Jun Yang (joint work with Jeffrey S. Rosenthal) Department of Statistical Sciences University of Toronto SSC 2018, McGill University MCMC Complexity Bounds (J. Yang) 1 Motivation


slide-1
SLIDE 1

Complexity Results for MCMC derived from Quantitative Bounds

Jun Yang (joint work with Jeffrey S. Rosenthal)

Department of Statistical Sciences University of Toronto

SSC 2018, McGill University

MCMC Complexity Bounds (J. Yang) 1

slide-2
SLIDE 2

Motivation

◮ Quantitative bounds for MCMC

e.g. Drift and Minorization, Rosenthal, JASA, 1995.

◮ Big data (high-dimensional setting)

“Large p and large n” or “large p and small n”.

◮ Convergence complexity of MCMC

e.g. “little is known for MCMC complexity” by Yang, Wainwright, Jordan, Ann. Stats., 2016.

◮ Directly translating the quantitative bounds is problematic

e.g. Rajaratnam and Sparks, arXiv, 2015: “We therefore hope that one consequence of our work will be to motivate the proposal and development of new ideas analogous to those of Rosenthal that are suitable for high-dimensional settings.”

MCMC Complexity Bounds (J. Yang) 2

slide-3
SLIDE 3

A Realistic MCMC Model

Consider the following example: Yi | θi ∼ N(θi, 1), 1 ≤ i ≤ n, θi | µ, A ∼ N(µ, A), 1 ≤ i ≤ n, µ ∼ flat prior on R, A ∼ IG(a, b).

◮ n observed data: (Y1, . . . , Yn); ◮ p = n + 2 states: x = (A, µ, θ1, . . . , θn); ◮ Posterior distribution:

π(·) = L(A, µ, θ1, . . . , θn | Y1, . . . , Yn).

◮ A Gibbs sampler was originally analyzed by Rosenthal in 1996. ◮ Directly translated complexity bound: Ω(exp(p)).

MCMC Complexity Bounds (J. Yang) 3

slide-4
SLIDE 4

Tight Complexity Bound for the Gibbs Sampler

Consider the Gibbs sampler:

µ(1) ∼ N

  • ¯

θ(0), A(0) n

  • ,

θ(1)

i

∼ N µ(1) + YiA(0) 1 + A(0) , A(0) 1 + A(0)

  • ,

A(1) ∼ IG

  • a + n − 1

2 , b + 1 2

n

  • i=1

(θ(1)

i

− ¯ θ(1))2

  • .

We can show using the new approach:

◮ Mixing time is O(1) if choosing initial states :

¯ θ(0) = ¯ Y , A(0) = n

i=1(Yi − ¯

Y )2 n − 1 − 1.

◮ Mixing time is O(log p) if initial states are not “too bad”.

MCMC Complexity Bounds (J. Yang) 4

slide-5
SLIDE 5

Drift and Minorization

Drift Condition

For some function f : X → R+, some 0 < λ < 1, and b < ∞ E[f (X1) | X0 = x] ≤ λf (x) + b, ∀x ∈ X. The KEY for good complexity bound: b and λ has small complexity order

Generalized Drift Condition (Y. and Rosenthal, 2017)

Let R′ ∈ X be a large set, function f (·) satisfies E[f (X1) | X0 = x, X1 ∈ R′] ≤ E[f (X1) | X0 = x] ≤ λf (x) + b, ∀x ∈ R′,

MCMC Complexity Bounds (J. Yang) 5

slide-6
SLIDE 6

Modified Drift and Minorization

New Quantitative Bound (Y. and Rosenthal, 2017)

ǫ is established by associated minorization condition, α and Λ are functions of λ and b

Pk(x0, ·) − π(·) ≤(1 − ǫ)rk + (αΛ)rk 1 +

b 1−λ + f (x0)

  • − αrk

αk − αrk + k π((R′)c) +

k

  • i=1

Pi(x0, (R′)c), ∀0 < r < 1,

The large set R′ should be chosen to balance the two parts.

The Gibbs sampler example

For large enough n, we have Pk(x0, ·) − π(·) ≤ C1γk + C2 k(1 + k) p + C3 k √p, which implies the mixing time is O(1).

MCMC Complexity Bounds (J. Yang) 6

slide-7
SLIDE 7

References

◮ Rosenthal, Minorization conditions and convergence rates for

Markov chain Monte Carlo, JASA, 1995.

◮ Rajaratnam and Sparks, MCMC-based inference in the era of

big data: A fundamental analysis of the convergence complexity of high-dimensional chains, arXiv:1508:00947, 2015.

◮ Yang, Wainwright, and Jordan, On the computational

complexity of high-dimensional Bayesian variable selection, AoS, 2016.

◮ Y. and Rosenthal, Complexity results for MCMC derived from

quantitative bounds, arXiv:1708.00829, 2017.

MCMC Complexity Bounds (J. Yang) 7