[PPT] - Introduction to the Read Paper Young Statisticians Section Mark PowerPoint Presentation

SLIDE 1

Introduction to the Read Paper Young Statisticians Section

Mark Girolami

Department of Statistical Science University College London

The Royal Statistical Society Errol Street London October 13, 2010

SLIDE 2

Riemann manifold Langevin and Hamiltonian Monte Carlo Methods, Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, Part 2

SLIDE 3

Riemann manifold Langevin and Hamiltonian Monte Carlo Methods, Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, Part 2

SLIDE 4

Riemann manifold Langevin and Hamiltonian Monte Carlo Methods, Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, Part 2

◮ Advancing MC methods via underlying geometry of fundamental objects

SLIDE 5

Riemann manifold Langevin and Hamiltonian Monte Carlo Methods, Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, Part 2

◮ Advancing MC methods via underlying geometry of fundamental objects ◮ Develop proposal mechanisms based on

◮ Stochastic diffusions on Riemann manifold

SLIDE 6

Riemann manifold Langevin and Hamiltonian Monte Carlo Methods, Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, Part 2

◮ Advancing MC methods via underlying geometry of fundamental objects ◮ Develop proposal mechanisms based on

◮ Stochastic diffusions on Riemann manifold ◮ Deterministic mechanics on Riemann manifold

SLIDE 7

Riemann manifold Langevin and Hamiltonian Monte Carlo Methods, Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, Part 2

◮ Advancing MC methods via underlying geometry of fundamental objects ◮ Develop proposal mechanisms based on

◮ Stochastic diffusions on Riemann manifold ◮ Deterministic mechanics on Riemann manifold

◮ Focus on Hamiltonian Monte Carlo for next 27 minutes

SLIDE 8

Hamiltonian Monte Carlo for Computational Statistical Inference

◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M).

SLIDE 9

Hamiltonian Monte Carlo for Computational Statistical Inference

◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then

H(θ, p) = −L(θ) + 1 2 log(2π)D|M| + 1 2pTM−1p

SLIDE 10

Hamiltonian Monte Carlo for Computational Statistical Inference

◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then

H(θ, p) = −L(θ) + 1 2 log(2π)D|M| + 1 2pTM−1p

◮ Interpreted as separable Hamiltonian in position & momentum variables

dθ dτ = ∂H ∂p = M−1p dp dτ = −∂H ∂θ = ∇θL(θ)

SLIDE 11

Hamiltonian Monte Carlo for Computational Statistical Inference

◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then

H(θ, p) = −L(θ) + 1 2 log(2π)D|M| + 1 2pTM−1p

◮ Interpreted as separable Hamiltonian in position & momentum variables

dθ dτ = ∂H ∂p = M−1p dp dτ = −∂H ∂θ = ∇θL(θ)

◮ Energy (approximate), volume preserving and reversible integrator

follows p(τ + ǫ/2) = p(τ) + ǫ∇θL(θ(τ))/2 θ(τ + ǫ) = θ(τ) + ǫM−1p(τ + ǫ/2) p(τ + ǫ) = p(τ + ǫ/2) + ǫ∇θL(θ(τ + ǫ))/2

SLIDE 12

Hamiltonian Monte Carlo for Computational Statistical Inference

◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then

H(θ, p) = −L(θ) + 1 2 log(2π)D|M| + 1 2pTM−1p

◮ Interpreted as separable Hamiltonian in position & momentum variables

dθ dτ = ∂H ∂p = M−1p dp dτ = −∂H ∂θ = ∇θL(θ)

◮ Energy (approximate), volume preserving and reversible integrator

follows p(τ + ǫ/2) = p(τ) + ǫ∇θL(θ(τ))/2 θ(τ + ǫ) = θ(τ) + ǫM−1p(τ + ǫ/2) p(τ + ǫ) = p(τ + ǫ/2) + ǫ∇θL(θ(τ + ǫ))/2

◮ Detailed balance satisfied by min{1, exp{−H(θ∗, p∗) + H(θ, p)}}

SLIDE 13

Hamiltonian Monte Carlo for Computational Statistical Inference

◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then

H(θ, p) = −L(θ) + 1 2 log(2π)D|M| + 1 2pTM−1p

◮ Interpreted as separable Hamiltonian in position & momentum variables

dθ dτ = ∂H ∂p = M−1p dp dτ = −∂H ∂θ = ∇θL(θ)

◮ Energy (approximate), volume preserving and reversible integrator

follows p(τ + ǫ/2) = p(τ) + ǫ∇θL(θ(τ))/2 θ(τ + ǫ) = θ(τ) + ǫM−1p(τ + ǫ/2) p(τ + ǫ) = p(τ + ǫ/2) + ǫ∇θL(θ(τ + ǫ))/2

◮ Detailed balance satisfied by min{1, exp{−H(θ∗, p∗) + H(θ, p)}} ◮ The complete method to sample from the desired marginal p(θ) follows

pn+1|θn ∼ p(pn+1) = N(0, M) θn+1|pn+1 ∼ p(θn+1|pn+1)

SLIDE 14

Hamiltonian Monte Carlo for Computational Statistical Inference

◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then

H(θ, p) = −L(θ) + 1 2 log(2π)D|M| + 1 2pTM−1p

◮ Interpreted as separable Hamiltonian in position & momentum variables

dθ dτ = ∂H ∂p = M−1p dp dτ = −∂H ∂θ = ∇θL(θ)

◮ Energy (approximate), volume preserving and reversible integrator

follows p(τ + ǫ/2) = p(τ) + ǫ∇θL(θ(τ))/2 θ(τ + ǫ) = θ(τ) + ǫM−1p(τ + ǫ/2) p(τ + ǫ) = p(τ + ǫ/2) + ǫ∇θL(θ(τ + ǫ))/2

◮ Detailed balance satisfied by min{1, exp{−H(θ∗, p∗) + H(θ, p)}} ◮ The complete method to sample from the desired marginal p(θ) follows

pn+1|θn ∼ p(pn+1) = N(0, M) θn+1|pn+1 ∼ p(θn+1|pn+1)

◮ Integrator provides proposals for p(θ|p) conditional

SLIDE 15

Illustrative Example - Bivariate Gaussian

◮ Target density N(0, Σ) where

Σ = » 1 ρ ρ 1 –

◮ For ρ large e.g. 0.98 sampling from this distribution is challenging ◮ Overall Hamiltonian

1 2xTΣ−1x + 1 2pTp

SLIDE 16

HMC Integration ǫ = 0.18, L = 20, implicit identity matrix for metric

−1.5

−1.0 −0.5 0.0 0.5 1.0 1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

θ1 θ2

0.0

0.5 1.0 1.5 0.0 0.5 1.0 1.5

p1 p2

5

10 15 20 1.76 1.78 1.80 1.82 1.84 1.86 1.88

Integration Step Hamiltonian

SLIDE 17

Metropolis Algorithm, Parameters of Stoch Vol Model, Acc Rate 25%

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

SLIDE 18

Metropolis Algorithm, Parameters of Stoch Vol Model, Acc Rate 25%

0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 0.17 0.175 0.18 0.965 0.97 0.975 0.98 0.985 0.99 0.995 1

SLIDE 19

HMC Algorithm, Parameters of Stoch Vol Model, Acc Rate 95%

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

SLIDE 20

HMC Algorithm, Parameters of Stoch Vol Model, Acc Rate 95%

0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 0.17 0.175 0.18 0.965 0.97 0.975 0.98 0.985 0.99 0.995 1

SLIDE 21

Hamiltonian Monte Carlo for Posterior Inference

◮ Deterministic proposal for θ ensures greater efficiency over Metropolis

random walk

SLIDE 22

Hamiltonian Monte Carlo for Posterior Inference

◮ Deterministic proposal for θ ensures greater efficiency over Metropolis

random walk

◮ Small fly in the ointment - tuning of values of matrix M essential for

efficient performance of HMC

SLIDE 23

Hamiltonian Monte Carlo for Posterior Inference

◮ Deterministic proposal for θ ensures greater efficiency over Metropolis

random walk

◮ Small fly in the ointment - tuning of values of matrix M essential for

efficient performance of HMC

◮ Diagonal elements of M reflect scale and off-diagonal elements capture

correlation structure of target - (no off-diagonal terms in physical interpretation)

SLIDE 24

Hamiltonian Monte Carlo for Posterior Inference

◮ Deterministic proposal for θ ensures greater efficiency over Metropolis

random walk

◮ Small fly in the ointment - tuning of values of matrix M essential for

efficient performance of HMC

◮ Diagonal elements of M reflect scale and off-diagonal elements capture

correlation structure of target - (no off-diagonal terms in physical interpretation)

◮ Require knowledge of target density to set M - this requires extensive

tuning via pilot runs of sampler

SLIDE 25

Hamiltonian Monte Carlo for Posterior Inference

◮ Deterministic proposal for θ ensures greater efficiency over Metropolis

random walk

◮ Small fly in the ointment - tuning of values of matrix M essential for

efficient performance of HMC

◮ Diagonal elements of M reflect scale and off-diagonal elements capture

correlation structure of target - (no off-diagonal terms in physical interpretation)

◮ Require knowledge of target density to set M - this requires extensive

tuning via pilot runs of sampler

◮ Common reason given for lack of HMC take-up in non-trivial applications

e.g. see recent discussion on Gelman blog

SLIDE 26

Hamiltonian Monte Carlo for Posterior Inference

◮ Deterministic proposal for θ ensures greater efficiency over Metropolis

random walk

◮ Small fly in the ointment - tuning of values of matrix M essential for

efficient performance of HMC

◮ Diagonal elements of M reflect scale and off-diagonal elements capture

correlation structure of target - (no off-diagonal terms in physical interpretation)

◮ Require knowledge of target density to set M - this requires extensive

tuning via pilot runs of sampler

◮ Common reason given for lack of HMC take-up in non-trivial applications

e.g. see recent discussion on Gelman blog

◮ Can this weakness be resolved?

SLIDE 27

Geometric Concepts in MCMC

◮ What is the distance between probabilities?

SLIDE 28

Geometric Concepts in MCMC

◮ What is the distance between probabilities? ◮ Rao, 1945, distance between p(y; θ + δθ) and p(y; θ) follows as

SLIDE 29

Geometric Concepts in MCMC

◮ What is the distance between probabilities? ◮ Rao, 1945, distance between p(y; θ + δθ) and p(y; θ) follows as

δθTEy|θ n ∇θ log p(y; θ)∇θ log p(y; θ)To δθ = δθTG(θ)δθ

SLIDE 30

Geometric Concepts in MCMC

◮ What is the distance between probabilities? ◮ Rao, 1945, distance between p(y; θ + δθ) and p(y; θ) follows as

δθTEy|θ n ∇θ log p(y; θ)∇θ log p(y; θ)To δθ = δθTG(θ)δθ

◮ Rao, 1945, noted that G(θ) is positive definite - defines a Riemann

manifold - obtain complete non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics

SLIDE 31

Geometric Concepts in MCMC

◮ What is the distance between probabilities? ◮ Rao, 1945, distance between p(y; θ + δθ) and p(y; θ) follows as

δθTEy|θ n ∇θ log p(y; θ)∇θ log p(y; θ)To δθ = δθTG(θ)δθ

◮ Rao, 1945, noted that G(θ) is positive definite - defines a Riemann

manifold - obtain complete non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics

◮ Can this geometric structure also be employed in addressing problems

with MCMC in complex inference problems?

SLIDE 32

Geometric Concepts in MCMC

◮ What is the distance between probabilities? ◮ Rao, 1945, distance between p(y; θ + δθ) and p(y; θ) follows as

δθTEy|θ n ∇θ log p(y; θ)∇θ log p(y; θ)To δθ = δθTG(θ)δθ

◮ Rao, 1945, noted that G(θ) is positive definite - defines a Riemann

manifold - obtain complete non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics

◮ Can this geometric structure also be employed in addressing problems

with MCMC in complex inference problems?

SLIDE 33

Riemann Manifold

SLIDE 34

Riemann Manifold

SLIDE 35

Riemann Manifold

SLIDE 36

Manifold Structure in Density

SLIDE 37

Riemannian Hamiltonian Monte Carlo

◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy

defined in terms of contraviant form i.e. ˙ θ

TG(θ) ˙

θ = pTG−1(θ)p

SLIDE 38

Riemannian Hamiltonian Monte Carlo

◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy

defined in terms of contraviant form i.e. ˙ θ

TG(θ) ˙

θ = pTG−1(θ)p

SLIDE 39

Riemannian Hamiltonian Monte Carlo

◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy

defined in terms of contraviant form i.e. ˙ θ

TG(θ) ˙

θ = pTG−1(θ)p

◮ Hamiltonian defined on Riemann manifold is non-separable

H(θ, p) = −L(θ) + 1 2 log 2πD|G(θ)| + 1 2pTG(θ)−1p

SLIDE 40

Riemannian Hamiltonian Monte Carlo

◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy

defined in terms of contraviant form i.e. ˙ θ

TG(θ) ˙

θ = pTG−1(θ)p

◮ Hamiltonian defined on Riemann manifold is non-separable

H(θ, p) = −L(θ) + 1 2 log 2πD|G(θ)| + 1 2pTG(θ)−1p

◮ Hamiltonian in HMC artificially imposes a position independent metric

tensor, M, defining a flat manifold, upon the statistical model

SLIDE 41

Riemannian Hamiltonian Monte Carlo

◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy

defined in terms of contraviant form i.e. ˙ θ

TG(θ) ˙

θ = pTG−1(θ)p

◮ Hamiltonian defined on Riemann manifold is non-separable

H(θ, p) = −L(θ) + 1 2 log 2πD|G(θ)| + 1 2pTG(θ)−1p

◮ Hamiltonian in HMC artificially imposes a position independent metric

tensor, M, defining a flat manifold, upon the statistical model

◮ Marginal density follows as required

p(θ) ∝ exp {L(θ)} p 2πD|G(θ)| Z exp  −1 2pTG(θ)−1p ff dp = exp {L(θ)}

SLIDE 42

Riemannian Hamiltonian Monte Carlo

◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy

defined in terms of contraviant form i.e. ˙ θ

TG(θ) ˙

θ = pTG−1(θ)p

◮ Hamiltonian defined on Riemann manifold is non-separable

H(θ, p) = −L(θ) + 1 2 log 2πD|G(θ)| + 1 2pTG(θ)−1p

◮ Hamiltonian in HMC artificially imposes a position independent metric

tensor, M, defining a flat manifold, upon the statistical model

◮ Marginal density follows as required

p(θ) ∝ exp {L(θ)} p 2πD|G(θ)| Z exp  −1 2pTG(θ)−1p ff dp = exp {L(θ)}

◮ Complete sampler follows as

p(pn+1|θn) = N(0, G(θn)) θn+1|pn+1 ∼ p(θn+1|pn+1)

SLIDE 43

Riemannian Hamiltonian Monte Carlo

◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy

defined in terms of contraviant form i.e. ˙ θ

TG(θ) ˙

θ = pTG−1(θ)p

◮ Hamiltonian defined on Riemann manifold is non-separable

H(θ, p) = −L(θ) + 1 2 log 2πD|G(θ)| + 1 2pTG(θ)−1p

◮ Hamiltonian in HMC artificially imposes a position independent metric

tensor, M, defining a flat manifold, upon the statistical model

◮ Marginal density follows as required

p(θ) ∝ exp {L(θ)} p 2πD|G(θ)| Z exp  −1 2pTG(θ)−1p ff dp = exp {L(θ)}

◮ Complete sampler follows as

p(pn+1|θn) = N(0, G(θn)) θn+1|pn+1 ∼ p(θn+1|pn+1)

SLIDE 44

Riemannian Hamiltonian Monte Carlo

◮ The dynamics of the non-separable Hamiltonian follow as

dθi dτ = ∂H ∂pi = “ G(θ)−1p ”

i

dpi dτ = −∂H ∂θi = ∂L(θ) ∂θi − 1 2Tr » G(θ)−1 ∂G(θ) ∂θi – + 1 2pT ∂G−1(θ) ∂θi p

SLIDE 45

Riemannian Hamiltonian Monte Carlo

◮ The dynamics of the non-separable Hamiltonian follow as

dθi dτ = ∂H ∂pi = “ G(θ)−1p ”

i

dpi dτ = −∂H ∂θi = ∂L(θ) ∂θi − 1 2Tr » G(θ)−1 ∂G(θ) ∂θi – + 1 2pT ∂G−1(θ) ∂θi p

◮ Require reversible, volume preserving integrator - 2’nd order

semi-implicit integrator

SLIDE 46

Riemannian Hamiltonian Monte Carlo

◮ The dynamics of the non-separable Hamiltonian follow as

dθi dτ = ∂H ∂pi = “ G(θ)−1p ”

i

dpi dτ = −∂H ∂θi = ∂L(θ) ∂θi − 1 2Tr » G(θ)−1 ∂G(θ) ∂θi – + 1 2pT ∂G−1(θ) ∂θi p

◮ Require reversible, volume preserving integrator - 2’nd order

semi-implicit integrator pτ+ ǫ

2

= pn − ǫ 2∇θH(θτ, pτ+ ǫ

2 )

(1) θτ+ǫ = θτ + ǫ 2 h ∇pH(θτ, pτ+ ǫ

2 ) + ∇pH(θτ+ǫ, pτ+ ǫ 2 )

i (2) pτ+ǫ = pτ+ ǫ

2 − ǫ

2∇θH(θτ+ǫ, pτ+ ǫ

2 )

(3) Step (3) is explicit so require to solve (1) and (2) for pn+ 1

2 and θn+1 using

e.g. fixed point iteration found to be very efficient in practice.

SLIDE 47

Illustrative Example - Bivariate Gaussian

◮ Target density N(0, Σ) where

Σ = » 1 ρ ρ 1 –

◮ For ρ large e.g. 0.98 sampling from this distribution is challenging ◮ Metric tensor defines flat manifold Σ−1 ◮ Overall Hamiltonian

1 2xTΣ−1x + 1 2pTΣp

◮ Stormer Verlet integrator is required

SLIDE 48

Illustrative Example - ρ = 0.6, RMHMC Integration ǫ = 0.8, L = 15

−1.5

−1.0 −0.5 0.0 0.5 1.0 1.5 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

θ1 θ2

−1.0

−0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

p1 p2

2

4 6 8 10 12 14 1.57 1.58 1.59 1.60 1.61

Integration Step Hamiltonian

SLIDE 49

RMHMC Integration and 3 × Auxiliary Variable Sampling

−1.5

−0.5 0.0 0.5 1.0 1.5 −2 −1 1 2

θ1 θ2

●
−1

1 2 −2 −1 1 2

p1 p2

10

20 30 40 1.5 2.0 2.5 3.0 3.5

Integration Step Hamiltonian

● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ●
● ●
● ●

SLIDE 50

Gibbs Sampler Performance for ρ = 0.98

20 40 60 80 100 −0.4 0.0 0.2 0.4 0.6 0.8 1.0

Lag ACF

20 40 60 80 100 −0.4 0.0 0.2 0.4 0.6 0.8 1.0

Lag ACF

100

200 300 400 −3 −2 −1 1 2 3

Sample number x1

100

200 300 400 −3 −2 −1 1 2 3

Sample number x2

SLIDE 51

RMHMC Performance, ρ = 0.98, ǫ = 0.8, L = 15

20 40 60 80 100 −0.5 0.0 0.5 1.0

Lag ACF

20 40 60 80 100 −0.5 0.0 0.5 1.0

Lag ACF

100

200 300 400 −3 −2 −1 1 2 3

Sample number x1

100

200 300 400 −2 2 4

Sample number x2

SLIDE 52

HMC Algorithm, Parameters of Stoch Vol Model, Acc Rate 95%

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

SLIDE 53

HMC Algorithm, Parameters of Stoch Vol Model, Acc Rate 95%

0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 0.17 0.175 0.18 0.965 0.97 0.975 0.98 0.985 0.99 0.995 1

SLIDE 54

RMHMC Algorithm, Parameters of Stoch Vol Model, Acc Rate 95%

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

SLIDE 55

RMHMC Algorithm, Parameters of Stoch Vol Model, Acc Rate 95%

0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 0.17 0.175 0.18 0.965 0.97 0.975 0.98 0.985 0.99 0.995 1

SLIDE 56

Log-Gaussian Cox Point Process with Latent Field

◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}

conditionally independent and Poisson distributed with means mΛ(i, j) = m exp(Xi,j)

SLIDE 57

Log-Gaussian Cox Point Process with Latent Field

◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}

conditionally independent and Poisson distributed with means mΛ(i, j) = m exp(Xi,j)

◮ X = {Xi,j} ∼ GP, E{x} = µ1, Σ(i,j),(i′,j′) = σ2 exp(−δ(i, i′, j, j′)/64β),

and δ(i, i′, j, j′) = p (i − i′)2 + (j − j′)2.

SLIDE 58

Log-Gaussian Cox Point Process with Latent Field

◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}

conditionally independent and Poisson distributed with means mΛ(i, j) = m exp(Xi,j)

◮ X = {Xi,j} ∼ GP, E{x} = µ1, Σ(i,j),(i′,j′) = σ2 exp(−δ(i, i′, j, j′)/64β),

and δ(i, i′, j, j′) = p (i − i′)2 + (j − j′)2.

◮ The joint density is

p(y, x|µ, σ, β) ∝ Y64

i,j exp{yi,jxi,j−m exp(xi,j)} exp(−(x−µ1)TΣ−1(x−µ1)/2)

SLIDE 59

Log-Gaussian Cox Point Process with Latent Field

◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}

conditionally independent and Poisson distributed with means mΛ(i, j) = m exp(Xi,j)

◮ X = {Xi,j} ∼ GP, E{x} = µ1, Σ(i,j),(i′,j′) = σ2 exp(−δ(i, i′, j, j′)/64β),

and δ(i, i′, j, j′) = p (i − i′)2 + (j − j′)2.

◮ The joint density is

p(y, x|µ, σ, β) ∝ Y64

i,j exp{yi,jxi,j−m exp(xi,j)} exp(−(x−µ1)TΣ−1(x−µ1)/2)

G(θ)i,j = 1 2trace „ Σ−1

θ

∂Σθ ∂θi Σ−1

θ

∂Σθ ∂θj « G(x) = Λ + Σ−1

θ

where Λ is diagonal with elements m exp(µ + (Σθ)i,i)

SLIDE 60

Log-Gaussian Cox Point Process with Latent Field

◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}

conditionally independent and Poisson distributed with means mΛ(i, j) = m exp(Xi,j)

◮ X = {Xi,j} ∼ GP, E{x} = µ1, Σ(i,j),(i′,j′) = σ2 exp(−δ(i, i′, j, j′)/64β),

and δ(i, i′, j, j′) = p (i − i′)2 + (j − j′)2.

◮ The joint density is

p(y, x|µ, σ, β) ∝ Y64

i,j exp{yi,jxi,j−m exp(xi,j)} exp(−(x−µ1)TΣ−1(x−µ1)/2)

G(θ)i,j = 1 2trace „ Σ−1

θ

∂Σθ ∂θi Σ−1

θ

∂Σθ ∂θj « G(x) = Λ + Σ−1

θ

where Λ is diagonal with elements m exp(µ + (Σθ)i,i)

◮ Latent field metric tensor defining flat manifold is 4096 × 4096, O(N3)

btained from parameter conditional

SLIDE 61

Log-Gaussian Cox Point Process with Latent Field

◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}

conditionally independent and Poisson distributed with means mΛ(i, j) = m exp(Xi,j)

◮ X = {Xi,j} ∼ GP, E{x} = µ1, Σ(i,j),(i′,j′) = σ2 exp(−δ(i, i′, j, j′)/64β),

and δ(i, i′, j, j′) = p (i − i′)2 + (j − j′)2.

◮ The joint density is

p(y, x|µ, σ, β) ∝ Y64

i,j exp{yi,jxi,j−m exp(xi,j)} exp(−(x−µ1)TΣ−1(x−µ1)/2)

G(θ)i,j = 1 2trace „ Σ−1

θ

∂Σθ ∂θi Σ−1

θ

∂Σθ ∂θj « G(x) = Λ + Σ−1

θ

where Λ is diagonal with elements m exp(µ + (Σθ)i,i)

◮ Latent field metric tensor defining flat manifold is 4096 × 4096, O(N3)

btained from parameter conditional

◮ Metropolis and HMC fails.... completely. MALA requires transformation

f latent field to sample - with separate tuning in transient and stationary

phases - RMHMC requires no pilot tuning or additional transformations

SLIDE 62

RMHMC for Log-Gaussian Cox Point Processes

20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60

Figure: Data, Latent Field, Poisson Mean Field

SLIDE 63

RMHMC for Log-Gaussian Cox Point Processes

4000 8000 500 1000 1500 2000 4000 8000 500 1000 1500 2000 4000 8000 500 1000 1500 2000 4000 8000 500 1000 1500 2000 RM-HMC mMALA MALA (Transient) MALA (Stationary)

SLIDE 64

RMHMC for Log-Gaussian Cox Point Processes

Table: Sampling the latent variables of a Log-Gaussian Cox Process - Comparison of sampling methods Method Time ESS (Min, Med, Max) s/Min ESS

Rel. Speed

MALA (Transient) 31,577 (3, 8, 50) 10,605 ×1 MALA (Stationary) 31,118 (4, 16, 80) 7836 ×1.35 mMALA 634 (26, 84, 174) 24.1 ×440 RMHMC 2936 (1951, 4545, 5000) 1.5 ×7070

SLIDE 65

RMHMC for Log-Gaussian Cox Point Processes

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

Figure: Kernel density estimates of the hyperparameter samples obtained from Gibbs style sampling from the Log-Gaussian Cox model. The true values are σ = 0.19 (left hand plot) and β = 0.03 (right hand plot).

SLIDE 66

Conclusion and Discussion

◮ HMC implicitly defines a flat manifold upon which the statistical model

resides - manual tuning challenging in complex models

SLIDE 67

Conclusion and Discussion

◮ HMC implicitly defines a flat manifold upon which the statistical model

resides - manual tuning challenging in complex models

◮ Exploiting Riemannian structure of statistical models to devise

Hamiltonian on manifold

SLIDE 68

Conclusion and Discussion

◮ HMC implicitly defines a flat manifold upon which the statistical model

resides - manual tuning challenging in complex models

◮ Exploiting Riemannian structure of statistical models to devise

Hamiltonian on manifold

◮ No requirement for costly pilot runs and estimates of posterior

covariance

SLIDE 69

Conclusion and Discussion

◮ HMC implicitly defines a flat manifold upon which the statistical model

resides - manual tuning challenging in complex models

◮ Exploiting Riemannian structure of statistical models to devise

Hamiltonian on manifold

◮ No requirement for costly pilot runs and estimates of posterior

covariance

◮ No requirement for costly tuning in transient and stationary phases of

Markov chain

SLIDE 70

Conclusion and Discussion

◮ HMC implicitly defines a flat manifold upon which the statistical model

resides - manual tuning challenging in complex models

◮ Exploiting Riemannian structure of statistical models to devise

Hamiltonian on manifold

◮ No requirement for costly pilot runs and estimates of posterior

covariance

◮ No requirement for costly tuning in transient and stationary phases of

Markov chain

◮ Assessed on complex high-dimensional latent variable models

SLIDE 71

Conclusion and Discussion

◮ HMC implicitly defines a flat manifold upon which the statistical model

resides - manual tuning challenging in complex models

◮ Exploiting Riemannian structure of statistical models to devise

Hamiltonian on manifold

◮ No requirement for costly pilot runs and estimates of posterior

covariance

◮ No requirement for costly tuning in transient and stationary phases of

Markov chain

◮ Assessed on complex high-dimensional latent variable models ◮ Theoretical analysis of effect of integration error, design of metric,

theoretical convergence rate of sampler, Optimality of Hamiltonian flows as local geodesics,

SLIDE 72

Conclusion and Discussion

◮ HMC implicitly defines a flat manifold upon which the statistical model

resides - manual tuning challenging in complex models

◮ Exploiting Riemannian structure of statistical models to devise

Hamiltonian on manifold

◮ No requirement for costly pilot runs and estimates of posterior

covariance

◮ No requirement for costly tuning in transient and stationary phases of

Markov chain

◮ Assessed on complex high-dimensional latent variable models ◮ Theoretical analysis of effect of integration error, design of metric,

theoretical convergence rate of sampler, Optimality of Hamiltonian flows as local geodesics,

◮ Transition operators that DO NOT rely on implicit integrator desirable

SLIDE 73

Conclusion and Discussion

◮ HMC implicitly defines a flat manifold upon which the statistical model

resides - manual tuning challenging in complex models

◮ Exploiting Riemannian structure of statistical models to devise

Hamiltonian on manifold

◮ No requirement for costly pilot runs and estimates of posterior

covariance

◮ No requirement for costly tuning in transient and stationary phases of

Markov chain

◮ Assessed on complex high-dimensional latent variable models ◮ Theoretical analysis of effect of integration error, design of metric,

theoretical convergence rate of sampler, Optimality of Hamiltonian flows as local geodesics,

◮ Transition operators that DO NOT rely on implicit integrator desirable

SLIDE 74

Codes to Replicate Results in Paper

◮ http://www.dcs.gla.ac.uk/inference/rmhmc

SLIDE 75

Codes to Replicate Results in Paper

◮ http://www.dcs.gla.ac.uk/inference/rmhmc ◮ Work funded by EPSRC Advanced Research Fellowship (Girolami),

BBSRC project grant and Microsoft PhD Scholarship (Calderhead)

SLIDE 76