Recent, current, and future issues for large scale space-time and nonlinear INLA Finn Lindgren (finn.lindgren@ed.ac.uk)
Avignon 2018-11-08
Covariances for four reference points t + 2 s ,t + m s ,t M - - PowerPoint PPT Presentation
Recent, current, and future issues for large scale space-time and nonlinear INLA Finn Lindgren ( finn.lindgren@ed.ac.uk ) Avignon 2018-11-08 GMRFs based on SPDEs (Lindgren et al., 2011) GMRF representations of SPDEs can be constructed for
Avignon 2018-11-08
GMRF representations of SPDEs can be constructed for oscillating, anisotropic, non-stationary, non-separable spatio-temporal, and multivariate fields on manifolds.
GMRF representations of SPDEs can be constructed for oscillating, anisotropic, non-stationary, non-separable spatio-temporal, and multivariate fields on manifolds.
GMRF representations of SPDEs can be constructed for oscillating, anisotropic, non-stationary, non-separable spatio-temporal, and multivariate fields on manifolds.
∂t + κ2 s,t + ∇ · ms,t − ∇ · M s,t∇
∂t + κ2 s,t + ∇ · ms,t − ∇ · M s,t∇
On any sufficiently smooth manifold domain D,
holds, even if either ∇f or −∇ · ∇g are as generalised as white noise. For α = 2 in the Matérn SPDE,
j ψjxj
The covariance for the RHS of the SPDE is
by the definition of W. Matching the LHS and RHS distributions leads to the finite element approximation
EU Surface Temperatures for All Corners of Earth EUSTACE will give publicly available daily estimates of surface air temperature since 1850 across the globe for the first time by combining surface and satellite data using novel statistical techniques.
The iterated heat equation is a simple non-separable space-time SPDE family:
Fourier spectra are based on eigenfunctions eω(s) of −∆. On R2, −∆eω(s) = ω2eω(s), and eω are harmonic functions. On S2, −∆ek(s) = λkek(s) = k(k + 1)ek(s), and ek are spherical harmonics. The isotropic spectrum on S2 × R is
The finite element approximation has precision matrix structure
α+β+γ
i ⊗ M [s] i
even, e.g., if κ is spatially varying.
Observations of mean, max, and min. Model mean and range.
m
m
m
m
m
m
r
r
r
r
r
r
ǫ
ǫ
ǫ
Data sources
Conditional specifications, e.g.
m|T 1 m, Q0 m) ∼ N
m, Q0 m −1
The observational&calibration errors are modelled as three error components: independent (ǫ0), spatially correlated (ǫ1), and systematic (ǫ2), with distributions determined by the uncertainty information from WP1 E.g., yi = Tm(si, ti) + ǫ0(si, ti) + ǫ1(si, ti) + ǫ2(si, ti)
For station k at day ti
m = Tm(sk, ti) + Jk
j (ti)ek,j m + ǫk,i m ,
where Hk
j (t) are temporal step functions, ek,j m are latent bias variables, and ǫk,i m are
independent measurement and discretisation errors.
Observed daily Tmean and Trange for station FRW00034051
1955 1960 1965 −10 10 20
FRW00034051
time (year) Tmean 1955 1960 1965 5 10 15 20
FRW00034051
time (year) Trange
The quantile function (inverse cumulative distribution function) F −1
θ (p), p ∈ [0, 1], is
defined through a quantile blend of generalised Pareto distributions:
θ (p) =
2θ
1 2 log(2p),
θ (p) = −f − θ (1 − p) =
2θ
2 log(2(1 − p)),
θ (p) = θ0 + τ
θ3(p) + (1 + γ)f + θ4(p)
The parameters θ = (θ0, θ1 = log τ, θ2 = logit[(γ + 1)/2], θ3, θ4) control the median, spread/scale, skewness, and the left and right tail shape. This model is also known as the five parameter lambda model. A spatio-temporally dependent Gaussian field u(s, t) with expectation 0 and variance
θ(s,t)(Φ(u(s, t)),
where the parameters can vary with space and time.
After seasonal compensation:
5 10 15 20 25 30 35 0.00 0.05 0.10 0.15
RSM00025594 (BUHTA PROVIDENJA)
DTR (deg C) Density
10 15 20 25 30 35 5 10 20 30 POQ predicted DTR (deg C) DTR values (deg C)
10 15 20 25 30 35 5 10 20 30 Log−Normal predicted DTR (deg C) DTR values (deg C)
5 10 15 20 25 30 35 5 10 20 30 Gamma predicted DTR (deg C) DTR values (deg C) 5 10 15 0.00 0.10 0.20 0.30
SP000060040 (LANZAROTE/AEROPUERTO)
DTR (deg C) Density
10 15 5 10 15 POQ predicted DTR (deg C) DTR values (deg C)
10 15 5 10 15 Log−Normal predicted DTR (deg C) DTR values (deg C)
5 10 15 5 10 15 Gamma predicted DTR (deg C) DTR values (deg C)
For these stations, POQ does a slightly better job than a Gamma distribution.
After seasonal compensation:
10 20 30 40 50 0.00 0.04 0.08 0.12
ASN00005008 (MARDIE)
DTR (deg C) Density
20 30 40 50 10 20 30 40 50 POQ predicted DTR (deg C) DTR values (deg C)
20 30 40 50 10 20 30 40 50 Log−Normal predicted DTR (deg C) DTR values (deg C)
10 20 30 40 50 10 20 30 40 50 Gamma predicted DTR (deg C) DTR values (deg C) 10 20 30 40 0.00 0.04 0.08
ASN00023738 (MYPONGA)
DTR (deg C) Density
20 30 40 10 20 30 40 POQ predicted DTR (deg C) DTR values (deg C)
20 30 40 10 20 30 40 Log−Normal predicted DTR (deg C) DTR values (deg C)
20 30 40 10 20 30 40 Gamma predicted DTR (deg C) DTR values (deg C)
For these stations only POQ comes close to representing the distributions. Note: Some of the mixture-like distribution shapes may be an effect of unmodeled station inhomogeneities as well as temporal shift effects.
5 10 15 20 Time
5 10 15 20 −20 20 Time Tm 5 10 15 20 5 10 15 Time Tr
−60 −40 −20 20 1 2 3 4 5 10 15 20 25 30 1 2 3 4
February climatology
Feb
−60 −40 −20 20
Feb
1 2 3 4
Feb
5 10 15 20 25 30
Feb
1 2 3 4
February climatology
All Spatio-temporal latent random processes combined into x = (u, β, b), with joint expectation µx and precision Qx:
x )
(Prior)
y|x)
(Observations)
(Conditional posterior)
The conditional posterior distribution is
−1)
(Posterior)
−1A⊤Qy (y − Aµx)
All Spatio-temporal latent random processes combined into x = (u, β, b), with joint expectation µx and precision Qx:
x )
(Prior)
y|x)
(Observations)
(Conditional posterior)
For a non-linear h(Ax) with Jacobian J at x =
approx
−1)
(Approximate posterior)
−1
Problem: ∼ 1011 latent variables Solution: Iterative solvers
Let B⊤
k be a restriction matrix to subdomain Ωk, and let W k be a diagonal weight
K
k QBk)−1B⊤ k W kx
Let B⊤
c be a projection matrix to a coarse approximative model. Then a basic
multigrid step for Qx = b is
c QBc, rc = B⊤ c r0
5 10 15 20 4 3 2 1
Full multigrid sequence
Step Approximation level
The hierarchy of scales and preconditioning (x0 = Bx1 + fine scale variability):
Solving Qx|yx = b can be formulated using two solves with the upper (fine) block
By mapping the fine scale model onto the coarse basis used for the coarse model, we get an approximate (and sparse) Schur solve via
ignored
The block matrix can be interpreted as the precision of a bivariate field on a common, coarse spatio-temporal scale, and the same technique applied to this system, with
Also applies to the station data bias homogenisation coefficients.
Takahashi recursions compute S such that Sij = (Q−1)ij for all Qij = 0. Postprocessing of the (sparse) Cholesky factor.
Let x(j) be samples from a Gaussian posterior and let a⊤x be a linear combination
k)
J
Ω∗
k)
k)
k)
Ω∗
k) + 1
J
Ω∗
k) − E(a⊤x)
Efficient if aa⊤ sparsity matches S for each subdomain. Sidén et al (2018, JCGS): Iterated blockwise Takahashi
solves for
Parameter estimation: In the project, several ad hoc methods are used; Timeseries subsets used for diurnal range distributions and temporal correlation parameters. Local estimation of spatial dependence paramters blended into a full spacetime SPDE model
R-INLA A.data <- inla.spde.make.A(...) A.pred <- inla.spde.make.A(...) stack.data <- inla.stack(data=..., A=list(A.data, ...), effects=...) stack.pred <- inla.stack(data=..., A=list(A.pred, ...), effects=...) stack <- inla.stack(stack.data, stack.pred) formula <- y ~ ... + f(field, model=spde) result <- inla(...) ## Linear prediction: prediction <- result$summary.fitted.values[some.indices, "mean"]
components <- ~ ... + field(map=coordinates, model=spde) formula <- y ~ ... + field result <- bru(...) ## Non-linear prediction (via direct posterior sampling) prediction <- predict(..., ~ cos(field)) ## Extra: non-linear formulas and marked LGCP capabilities formula <- y ~ field1 * exp(field2) formula <- coordinates + size ~ field1 + dnorm(size, field2, sd=exp(theta), log=TRUE)
components <- myeffect(map = covariate(x, y), ...) + field(map = coordinates, model = spde) No user-side inla.stack or inla.spde.make.A calls needed!
formula <- y ~ myeffect + exp(field) formula <- coordinates ~ myeffect + exp(field) fit <- bru(components, like(family = "cp", formula = formula, ...), ...)
# Sample and compute nonlinear functionals: generate(fit, formula = ~ exp(myeffect + exp(field)), ...) # Sample and compute posterior summaries, # here a data level posterior probability function: predict( fit, formula = ~ dpois(0:200, sum(weight * exp(myeffect + exp(field)))), ...)
The linear system for the best estimate and for samples have structure Qx = b.
The choice of linearisation point is important.
and the next linearisation point the method is robustified
blockwise
weighting
10 20 30 40 50 0.8 1.0 1.2 1.4
eta1 eta2
Basic Newton, eta
10 20 30 40 50 0.8 1.0 1.2 1.4
eta1 eta2
Robust Newton, eta
10 20 30 40 50 0.8 1.0 1.2 1.4
eta1 eta2
Partially Robust Newton, eta
type
Linear Quadratic True
1 2 3 4 1.0 1.1 1.2 1.3 1.4 1.5
x1 x2
Newton, x
1 2 3 4 1.0 1.1 1.2 1.3 1.4 1.5
x1 x2
Robust Newton, x
1 2 3 4 1.0 1.1 1.2 1.3 1.4 1.5
x1 x2
Partial Robust Newton, x
block calculations.
Current work with Joaquín Martínez Minaya to convert Dirichlet likelihoods to independent Gaussians to "fool" INLA. Postprocess for full Laplace approximation.
c=1 yc = 1:
C
c
pseudo-observations.
5 10 0.0 0.1 0.2
β0 p(β0|y) R−jags R−INLA
Category 1
5 10 −0.3 −0.2 −0.1
β0 p(β0|y) R−jags R−INLA
Category 2
5 10 0.20 0.25 0.30 0.35 0.40
β0 p(β0|y) R−jags R−INLA
Category 3
5 10 0.00 0.05 0.10 0.15 0.20
β0 p(β0|y) R−jags R−INLA
Category 4
5 10 15 0.10 0.15 0.20 0.25
β1 p(β1|y) R−jags R−INLA
Category 1
5 10 15 0.25 0.30 0.35 0.40 0.45
β1 p(β1|y) R−jags R−INLA
Category 2
5 10 15 0.25 0.30 0.35 0.40
β1 p(β1|y) R−jags R−INLA
Category 3
5 10 15 −0.30 −0.25 −0.20 −0.15 −0.10
β1 p(β1|y) R−jags R−INLA
Category 4
block calculations.
Current work with Joaquín Martínez Minaya to convert Dirichlet likelihoods to independent Gaussians to "fool" INLA. Postprocess for full Laplace approximation.
Simpson and David Bolin)
models
complete INLA support; improve programmability of INLA features without breaking existing code
models! Missing piece: fast and accurate log-determinants
Minimise data reading:
space-time block is constructed
Notes:
Experiments show that approximate nested Schur complements (Fine-Coarse-Fine) leads to a robust and efficient preconditioner, but requires re-reading all the data multiple times (2 fine scale steps)
Not having to re-read data multiple times is faster; offsets the need for more iterations