Introduction to the Read Paper Young Statisticians Section
Mark Girolami
Department of Statistical Science University College London
Introduction to the Read Paper Young Statisticians Section Mark - - PowerPoint PPT Presentation
Introduction to the Read Paper Young Statisticians Section Mark Girolami Department of Statistical Science University College London The Royal Statistical Society Errol Street London October 13, 2010 Riemann manifold Langevin and
Department of Statistical Science University College London
◮ Advancing MC methods via underlying geometry of fundamental objects
◮ Advancing MC methods via underlying geometry of fundamental objects ◮ Develop proposal mechanisms based on
◮ Stochastic diffusions on Riemann manifold
◮ Advancing MC methods via underlying geometry of fundamental objects ◮ Develop proposal mechanisms based on
◮ Stochastic diffusions on Riemann manifold ◮ Deterministic mechanics on Riemann manifold
◮ Advancing MC methods via underlying geometry of fundamental objects ◮ Develop proposal mechanisms based on
◮ Stochastic diffusions on Riemann manifold ◮ Deterministic mechanics on Riemann manifold
◮ Focus on Hamiltonian Monte Carlo for next 27 minutes
◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M).
◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then
◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then
◮ Interpreted as separable Hamiltonian in position & momentum variables
◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then
◮ Interpreted as separable Hamiltonian in position & momentum variables
◮ Energy (approximate), volume preserving and reversible integrator
◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then
◮ Interpreted as separable Hamiltonian in position & momentum variables
◮ Energy (approximate), volume preserving and reversible integrator
◮ Detailed balance satisfied by min{1, exp{−H(θ∗, p∗) + H(θ, p)}}
◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then
◮ Interpreted as separable Hamiltonian in position & momentum variables
◮ Energy (approximate), volume preserving and reversible integrator
◮ Detailed balance satisfied by min{1, exp{−H(θ∗, p∗) + H(θ, p)}} ◮ The complete method to sample from the desired marginal p(θ) follows
◮ Target density p(θ), introduce auxiliary variable p ∼ p(p) = N(p|0, M). ◮ Negative log-density L(θ) ≡ log p(θ), then
◮ Interpreted as separable Hamiltonian in position & momentum variables
◮ Energy (approximate), volume preserving and reversible integrator
◮ Detailed balance satisfied by min{1, exp{−H(θ∗, p∗) + H(θ, p)}} ◮ The complete method to sample from the desired marginal p(θ) follows
◮ Integrator provides proposals for p(θ|p) conditional
◮ Target density N(0, Σ) where
◮ For ρ large e.g. 0.98 sampling from this distribution is challenging ◮ Overall Hamiltonian
−1.0 −0.5 0.0 0.5 1.0 1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
θ1 θ2
0.5 1.0 1.5 0.0 0.5 1.0 1.5
p1 p2
10 15 20 1.76 1.78 1.80 1.82 1.84 1.86 1.88
Integration Step Hamiltonian
◮ Deterministic proposal for θ ensures greater efficiency over Metropolis
◮ Deterministic proposal for θ ensures greater efficiency over Metropolis
◮ Small fly in the ointment - tuning of values of matrix M essential for
◮ Deterministic proposal for θ ensures greater efficiency over Metropolis
◮ Small fly in the ointment - tuning of values of matrix M essential for
◮ Diagonal elements of M reflect scale and off-diagonal elements capture
◮ Deterministic proposal for θ ensures greater efficiency over Metropolis
◮ Small fly in the ointment - tuning of values of matrix M essential for
◮ Diagonal elements of M reflect scale and off-diagonal elements capture
◮ Require knowledge of target density to set M - this requires extensive
◮ Deterministic proposal for θ ensures greater efficiency over Metropolis
◮ Small fly in the ointment - tuning of values of matrix M essential for
◮ Diagonal elements of M reflect scale and off-diagonal elements capture
◮ Require knowledge of target density to set M - this requires extensive
◮ Common reason given for lack of HMC take-up in non-trivial applications
◮ Deterministic proposal for θ ensures greater efficiency over Metropolis
◮ Small fly in the ointment - tuning of values of matrix M essential for
◮ Diagonal elements of M reflect scale and off-diagonal elements capture
◮ Require knowledge of target density to set M - this requires extensive
◮ Common reason given for lack of HMC take-up in non-trivial applications
◮ Can this weakness be resolved?
◮ What is the distance between probabilities?
◮ What is the distance between probabilities? ◮ Rao, 1945, distance between p(y; θ + δθ) and p(y; θ) follows as
◮ What is the distance between probabilities? ◮ Rao, 1945, distance between p(y; θ + δθ) and p(y; θ) follows as
◮ What is the distance between probabilities? ◮ Rao, 1945, distance between p(y; θ + δθ) and p(y; θ) follows as
◮ Rao, 1945, noted that G(θ) is positive definite - defines a Riemann
◮ What is the distance between probabilities? ◮ Rao, 1945, distance between p(y; θ + δθ) and p(y; θ) follows as
◮ Rao, 1945, noted that G(θ) is positive definite - defines a Riemann
◮ Can this geometric structure also be employed in addressing problems
◮ What is the distance between probabilities? ◮ Rao, 1945, distance between p(y; θ + δθ) and p(y; θ) follows as
◮ Rao, 1945, noted that G(θ) is positive definite - defines a Riemann
◮ Can this geometric structure also be employed in addressing problems
◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy
TG(θ) ˙
◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy
TG(θ) ˙
◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy
TG(θ) ˙
◮ Hamiltonian defined on Riemann manifold is non-separable
◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy
TG(θ) ˙
◮ Hamiltonian defined on Riemann manifold is non-separable
◮ Hamiltonian in HMC artificially imposes a position independent metric
◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy
TG(θ) ˙
◮ Hamiltonian defined on Riemann manifold is non-separable
◮ Hamiltonian in HMC artificially imposes a position independent metric
◮ Marginal density follows as required
◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy
TG(θ) ˙
◮ Hamiltonian defined on Riemann manifold is non-separable
◮ Hamiltonian in HMC artificially imposes a position independent metric
◮ Marginal density follows as required
◮ Complete sampler follows as
◮ Manifold defined by metric G(θ) hence Hamiltonian kinetic energy
TG(θ) ˙
◮ Hamiltonian defined on Riemann manifold is non-separable
◮ Hamiltonian in HMC artificially imposes a position independent metric
◮ Marginal density follows as required
◮ Complete sampler follows as
◮ The dynamics of the non-separable Hamiltonian follow as
i
◮ The dynamics of the non-separable Hamiltonian follow as
i
◮ Require reversible, volume preserving integrator - 2’nd order
◮ The dynamics of the non-separable Hamiltonian follow as
i
◮ Require reversible, volume preserving integrator - 2’nd order
2
2 )
2 ) + ∇pH(θτ+ǫ, pτ+ ǫ 2 )
2 − ǫ
2 )
2 and θn+1 using
◮ Target density N(0, Σ) where
◮ For ρ large e.g. 0.98 sampling from this distribution is challenging ◮ Metric tensor defines flat manifold Σ−1 ◮ Overall Hamiltonian
◮ Stormer Verlet integrator is required
−1.0 −0.5 0.0 0.5 1.0 1.5 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
θ1 θ2
−0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
p1 p2
4 6 8 10 12 14 1.57 1.58 1.59 1.60 1.61
Integration Step Hamiltonian
−0.5 0.0 0.5 1.0 1.5 −2 −1 1 2
θ1 θ2
1 2 −2 −1 1 2
p1 p2
20 30 40 1.5 2.0 2.5 3.0 3.5
Integration Step Hamiltonian
20 40 60 80 100 −0.4 0.0 0.2 0.4 0.6 0.8 1.0
Lag ACF
20 40 60 80 100 −0.4 0.0 0.2 0.4 0.6 0.8 1.0
Lag ACF
200 300 400 −3 −2 −1 1 2 3
Sample number x1
200 300 400 −3 −2 −1 1 2 3
Sample number x2
20 40 60 80 100 −0.5 0.0 0.5 1.0
Lag ACF
20 40 60 80 100 −0.5 0.0 0.5 1.0
Lag ACF
200 300 400 −3 −2 −1 1 2 3
Sample number x1
200 300 400 −2 2 4
Sample number x2
◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}
◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}
◮ X = {Xi,j} ∼ GP, E{x} = µ1, Σ(i,j),(i′,j′) = σ2 exp(−δ(i, i′, j, j′)/64β),
◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}
◮ X = {Xi,j} ∼ GP, E{x} = µ1, Σ(i,j),(i′,j′) = σ2 exp(−δ(i, i′, j, j′)/64β),
◮ The joint density is
i,j exp{yi,jxi,j−m exp(xi,j)} exp(−(x−µ1)TΣ−1(x−µ1)/2)
◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}
◮ X = {Xi,j} ∼ GP, E{x} = µ1, Σ(i,j),(i′,j′) = σ2 exp(−δ(i, i′, j, j′)/64β),
◮ The joint density is
i,j exp{yi,jxi,j−m exp(xi,j)} exp(−(x−µ1)TΣ−1(x−µ1)/2)
θ
θ
θ
◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}
◮ X = {Xi,j} ∼ GP, E{x} = µ1, Σ(i,j),(i′,j′) = σ2 exp(−δ(i, i′, j, j′)/64β),
◮ The joint density is
i,j exp{yi,jxi,j−m exp(xi,j)} exp(−(x−µ1)TΣ−1(x−µ1)/2)
θ
θ
θ
◮ Latent field metric tensor defining flat manifold is 4096 × 4096, O(N3)
◮ Number of points in cells on 64 × 64 grid denoted by Y = {Yi,j}
◮ X = {Xi,j} ∼ GP, E{x} = µ1, Σ(i,j),(i′,j′) = σ2 exp(−δ(i, i′, j, j′)/64β),
◮ The joint density is
i,j exp{yi,jxi,j−m exp(xi,j)} exp(−(x−µ1)TΣ−1(x−µ1)/2)
θ
θ
θ
◮ Latent field metric tensor defining flat manifold is 4096 × 4096, O(N3)
◮ Metropolis and HMC fails.... completely. MALA requires transformation
20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60 20 40 60 10 20 30 40 50 60
4000 8000 500 1000 1500 2000 4000 8000 500 1000 1500 2000 4000 8000 500 1000 1500 2000 4000 8000 500 1000 1500 2000 RM-HMC mMALA MALA (Transient) MALA (Stationary)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
◮ HMC implicitly defines a flat manifold upon which the statistical model
◮ HMC implicitly defines a flat manifold upon which the statistical model
◮ Exploiting Riemannian structure of statistical models to devise
◮ HMC implicitly defines a flat manifold upon which the statistical model
◮ Exploiting Riemannian structure of statistical models to devise
◮ No requirement for costly pilot runs and estimates of posterior
◮ HMC implicitly defines a flat manifold upon which the statistical model
◮ Exploiting Riemannian structure of statistical models to devise
◮ No requirement for costly pilot runs and estimates of posterior
◮ No requirement for costly tuning in transient and stationary phases of
◮ HMC implicitly defines a flat manifold upon which the statistical model
◮ Exploiting Riemannian structure of statistical models to devise
◮ No requirement for costly pilot runs and estimates of posterior
◮ No requirement for costly tuning in transient and stationary phases of
◮ Assessed on complex high-dimensional latent variable models
◮ HMC implicitly defines a flat manifold upon which the statistical model
◮ Exploiting Riemannian structure of statistical models to devise
◮ No requirement for costly pilot runs and estimates of posterior
◮ No requirement for costly tuning in transient and stationary phases of
◮ Assessed on complex high-dimensional latent variable models ◮ Theoretical analysis of effect of integration error, design of metric,
◮ HMC implicitly defines a flat manifold upon which the statistical model
◮ Exploiting Riemannian structure of statistical models to devise
◮ No requirement for costly pilot runs and estimates of posterior
◮ No requirement for costly tuning in transient and stationary phases of
◮ Assessed on complex high-dimensional latent variable models ◮ Theoretical analysis of effect of integration error, design of metric,
◮ Transition operators that DO NOT rely on implicit integrator desirable
◮ HMC implicitly defines a flat manifold upon which the statistical model
◮ Exploiting Riemannian structure of statistical models to devise
◮ No requirement for costly pilot runs and estimates of posterior
◮ No requirement for costly tuning in transient and stationary phases of
◮ Assessed on complex high-dimensional latent variable models ◮ Theoretical analysis of effect of integration error, design of metric,
◮ Transition operators that DO NOT rely on implicit integrator desirable
◮ http://www.dcs.gla.ac.uk/inference/rmhmc
◮ http://www.dcs.gla.ac.uk/inference/rmhmc ◮ Work funded by EPSRC Advanced Research Fellowship (Girolami),