Covariances for four reference points t + 2 s ,t + m s ,t M - - PowerPoint PPT Presentation

covariances for four reference points
SMART_READER_LITE
LIVE PREVIEW

Covariances for four reference points t + 2 s ,t + m s ,t M - - PowerPoint PPT Presentation

Recent, current, and future issues for large scale space-time and nonlinear INLA Finn Lindgren ( finn.lindgren@ed.ac.uk ) Avignon 2018-11-08 GMRFs based on SPDEs (Lindgren et al., 2011) GMRF representations of SPDEs can be constructed for


slide-1
SLIDE 1

Recent, current, and future issues for large scale space-time and nonlinear INLA Finn Lindgren (finn.lindgren@ed.ac.uk)

Avignon 2018-11-08

slide-2
SLIDE 2

GMRFs based on SPDEs (Lindgren et al., 2011)

GMRF representations of SPDEs can be constructed for oscillating, anisotropic, non-stationary, non-separable spatio-temporal, and multivariate fields on manifolds.

(κ2 − ∆)(τ x(s)) = W(s), s ∈ Rd

slide-3
SLIDE 3

GMRFs based on SPDEs (Lindgren et al., 2011)

GMRF representations of SPDEs can be constructed for oscillating, anisotropic, non-stationary, non-separable spatio-temporal, and multivariate fields on manifolds.

(κ2 − ∆)(τ x(s)) = W(s), s ∈ Ω

slide-4
SLIDE 4

GMRFs based on SPDEs (Lindgren et al., 2011)

GMRF representations of SPDEs can be constructed for oscillating, anisotropic, non-stationary, non-separable spatio-temporal, and multivariate fields on manifolds.

∂t + κ2 s,t + ∇ · ms,t − ∇ · M s,t∇

  • (τs,tx(s, t)) = E(s, t),

(s, t) ∈ Ω × R

slide-5
SLIDE 5

Covariances for four reference points

∂t + κ2 s,t + ∇ · ms,t − ∇ · M s,t∇

  • (τs,tx(s, t)) = E(s, t),

(s, t) ∈ Ω × R

slide-6
SLIDE 6

Stochastic Green’s first identity

On any sufficiently smooth manifold domain D,

f, −∇ · ∇gD = ∇f, ∇gD − f, ∂ng∂D

holds, even if either ∇f or −∇ · ∇g are as generalised as white noise. For α = 2 in the Matérn SPDE,

  • ψi, (κ2 − ∇ · ∇)

j ψjxj

  • D
  • =
  • j
  • κ2 ψi, ψjD + ∇ψi, ∇ψjD
  • xj
  • = (κ2C + G)x

The covariance for the RHS of the SPDE is

Cov(ψi, WD , ψj, WD

  • =

ψi, ψjD

  • = C

by the definition of W. Matching the LHS and RHS distributions leads to the finite element approximation

x ∼ N(0, Q = κ4C + 2κ2G + GC−1G)

slide-7
SLIDE 7

EUSTACE

EU Surface Temperatures for All Corners of Earth EUSTACE will give publicly available daily estimates of surface air temperature since 1850 across the globe for the first time by combining surface and satellite data using novel statistical techniques.

slide-8
SLIDE 8

Matérn driven heat equation on the sphere

The iterated heat equation is a simple non-separable space-time SPDE family:

(κ2 − ∆)γ/2

  • φ ∂

∂t + (κ2 − ∆)α/2 β x(s, t) = W(s, t)/τ

Fourier spectra are based on eigenfunctions eω(s) of −∆. On R2, −∆eω(s) = ω2eω(s), and eω are harmonic functions. On S2, −∆ek(s) = λkek(s) = k(k + 1)ek(s), and ek are spherical harmonics. The isotropic spectrum on S2 × R is

  • R(k, ω) ∝

2k + 1 τ 2(κ2 + λk)γ [φ2ω2 + (κ2 + λk)α]β

The finite element approximation has precision matrix structure

Q =

α+β+γ

  • i=0

M [t]

i ⊗ M [s] i

even, e.g., if κ is spatially varying.

slide-9
SLIDE 9

Partial hierarchical representation

Observations of mean, max, and min. Model mean and range.

y T 0

m

T 1

m

T 2

m

βm Q0

m

Q1

m

Q2

m

Qβm T 0

r

T 1

r

T 2

r

βr Q0

r

Q1

r

Q2

r

Qβr ǫ0 ǫ1 ǫ2 Q0

ǫ

Q1

ǫ

Q2

ǫ

Data sources

Conditional specifications, e.g.

(T 0

m|T 1 m, Q0 m) ∼ N

  • T 1

m, Q0 m −1

slide-10
SLIDE 10

Observation models

Common satellite derived data error model framework

The observational&calibration errors are modelled as three error components: independent (ǫ0), spatially correlated (ǫ1), and systematic (ǫ2), with distributions determined by the uncertainty information from WP1 E.g., yi = Tm(si, ti) + ǫ0(si, ti) + ǫ1(si, ti) + ǫ2(si, ti)

Station homogenisation

For station k at day ti

yk,i

m = Tm(sk, ti) + Jk

  • j=1

Hk

j (ti)ek,j m + ǫk,i m ,

where Hk

j (t) are temporal step functions, ek,j m are latent bias variables, and ǫk,i m are

independent measurement and discretisation errors.

slide-11
SLIDE 11

Observed data

Observed daily Tmean and Trange for station FRW00034051

1955 1960 1965 −10 10 20

FRW00034051

time (year) Tmean 1955 1960 1965 5 10 15 20

FRW00034051

time (year) Trange

slide-12
SLIDE 12

Power tail quantile (POQ) model

The quantile function (inverse cumulative distribution function) F −1

θ (p), p ∈ [0, 1], is

defined through a quantile blend of generalised Pareto distributions:

f −

θ (p) =

  • 1−(2p)−θ

, θ = 0,

1 2 log(2p),

θ = 0, f +

θ (p) = −f − θ (1 − p) =

  • (2(1−p))−θ−1

, θ = 0, − 1

2 log(2(1 − p)),

θ = 0. F −1

θ (p) = θ0 + τ

2

  • (1 − γ)f −

θ3(p) + (1 + γ)f + θ4(p)

  • ,

The parameters θ = (θ0, θ1 = log τ, θ2 = logit[(γ + 1)/2], θ3, θ4) control the median, spread/scale, skewness, and the left and right tail shape. This model is also known as the five parameter lambda model. A spatio-temporally dependent Gaussian field u(s, t) with expectation 0 and variance

1 can be transformed into a POQ field by

  • u(s, t) = F −1

θ(s,t)(Φ(u(s, t)),

where the parameters can vary with space and time.

slide-13
SLIDE 13

Diurnal range distributions

After seasonal compensation:

5 10 15 20 25 30 35 0.00 0.05 0.10 0.15

RSM00025594 (BUHTA PROVIDENJA)

DTR (deg C) Density

  • 5

10 15 20 25 30 35 5 10 20 30 POQ predicted DTR (deg C) DTR values (deg C)

  • ● ● ●
  • 5

10 15 20 25 30 35 5 10 20 30 Log−Normal predicted DTR (deg C) DTR values (deg C)

  • ● ●

5 10 15 20 25 30 35 5 10 20 30 Gamma predicted DTR (deg C) DTR values (deg C) 5 10 15 0.00 0.10 0.20 0.30

SP000060040 (LANZAROTE/AEROPUERTO)

DTR (deg C) Density

  • 5

10 15 5 10 15 POQ predicted DTR (deg C) DTR values (deg C)

  • 5

10 15 5 10 15 Log−Normal predicted DTR (deg C) DTR values (deg C)

5 10 15 5 10 15 Gamma predicted DTR (deg C) DTR values (deg C)

For these stations, POQ does a slightly better job than a Gamma distribution.

slide-14
SLIDE 14

Diurnal range distributions; quantile model

After seasonal compensation:

10 20 30 40 50 0.00 0.04 0.08 0.12

ASN00005008 (MARDIE)

DTR (deg C) Density

  • 10

20 30 40 50 10 20 30 40 50 POQ predicted DTR (deg C) DTR values (deg C)

  • ● ●
  • 10

20 30 40 50 10 20 30 40 50 Log−Normal predicted DTR (deg C) DTR values (deg C)

10 20 30 40 50 10 20 30 40 50 Gamma predicted DTR (deg C) DTR values (deg C) 10 20 30 40 0.00 0.04 0.08

ASN00023738 (MYPONGA)

DTR (deg C) Density

  • 10

20 30 40 10 20 30 40 POQ predicted DTR (deg C) DTR values (deg C)

  • ● ● ●
  • 10

20 30 40 10 20 30 40 Log−Normal predicted DTR (deg C) DTR values (deg C)

  • 10

20 30 40 10 20 30 40 Gamma predicted DTR (deg C) DTR values (deg C)

For these stations only POQ comes close to representing the distributions. Note: Some of the mixture-like distribution shapes may be an effect of unmodeled station inhomogeneities as well as temporal shift effects.

slide-15
SLIDE 15

Multiscale model component samples

5 10 15 20 Time

slide-16
SLIDE 16

Combined model samples for Tm and Tr

5 10 15 20 −20 20 Time Tm 5 10 15 20 5 10 15 Time Tr

slide-17
SLIDE 17

Median & scale for daily means and ranges

−60 −40 −20 20 1 2 3 4 5 10 15 20 25 30 1 2 3 4

February climatology

slide-18
SLIDE 18

Estimates of median & scale for Tm and Tr

Feb

−60 −40 −20 20

Feb

1 2 3 4

Feb

5 10 15 20 25 30

Feb

1 2 3 4

February climatology

slide-19
SLIDE 19

Linearised inference

All Spatio-temporal latent random processes combined into x = (u, β, b), with joint expectation µx and precision Qx:

(x | θ) ∼ N(µx, Q−1

x )

(Prior)

(y | x, θ) ∼ N(Ax, Q−1

y|x)

(Observations)

p(x | y, θ) ∝ p(x | θ) p(y | x, θ)

(Conditional posterior)

Linear Gaussian observations

The conditional posterior distribution is

(x | y, θ) ∼ N( µ, Q

−1)

(Posterior)

  • Q = Qx + A⊤Qy|xA
  • µ = µx +

Q

−1A⊤Qy (y − Aµx)

slide-20
SLIDE 20

Linearised inference

All Spatio-temporal latent random processes combined into x = (u, β, b), with joint expectation µx and precision Qx:

(x | θ) ∼ N(µx, Q−1

x )

(Prior)

(y | x, θ) ∼ N(h(Ax), Q−1

y|x)

(Observations)

p(x | y, θ) ∝ p(x | θ) p(y | x, θ)

(Conditional posterior)

Non-linear and/or non-Gaussian observations

For a non-linear h(Ax) with Jacobian J at x =

µ, iterate: (x | y, θ)

approx

∼ N( µ, Q

−1)

(Approximate posterior)

  • Q = Qx + J⊤Qy|xJ
  • µ′ =

µ + a Q

−1

J⊤Qy [y − h(A µ)] − Qx( µ − µx)

  • for some a > 0 chosen by line-search.

Problem: ∼ 1011 latent variables Solution: Iterative solvers

slide-21
SLIDE 21

Triangulations for all corners of Earth

slide-22
SLIDE 22

Triangulations for all corners of Earth

slide-23
SLIDE 23

Domain decomposition and multigrid

Overlapping domain decomposition

Let B⊤

k be a restriction matrix to subdomain Ωk, and let W k be a diagonal weight

  • matrix. Then an additive Schwartz preconditioner is

M −1x =

K

  • k=1

W kBk(B⊤

k QBk)−1B⊤ k W kx

Multigrid

Let B⊤

c be a projection matrix to a coarse approximative model. Then a basic

multigrid step for Qx = b is

  • 1. Apply high frequency preconditioner to get

x0, let r0 = b − Q x0

  • 2. Project the problem to the coarser model: Qc = B⊤

c QBc, rc = B⊤ c r0

  • 3. Apply multigrid to Qcxc = rc
  • 4. Update the solution:

x1 = x0 + Bc xc

  • 5. Apply high frequency preconditioner to get

x2

slide-24
SLIDE 24

Full multigrid

5 10 15 20 4 3 2 1

Full multigrid sequence

Step Approximation level

slide-25
SLIDE 25

The hierarchy of scales and preconditioning (x0 = Bx1 + fine scale variability):

Multiscale Schur complement approximation

Solving Qx|yx = b can be formulated using two solves with the upper (fine) block

Q0 + A⊤QǫA, and one solve with the Schur complement Q1 + B⊤Q0B − B⊤Q0

  • Q0 + A⊤QǫA

−1 Q0

By mapping the fine scale model onto the coarse basis used for the coarse model, we get an approximate (and sparse) Schur solve via

  • QB + B⊤A⊤QǫAB

− QB − QB Q1 + QB

ignored

x1

  • =
  • b
  • where

QB = B⊤Q0B.

The block matrix can be interpreted as the precision of a bivariate field on a common, coarse spatio-temporal scale, and the same technique applied to this system, with

x1,1 = B1|2x1,2 + finer scale variability.

Also applies to the station data bias homogenisation coefficients.

slide-26
SLIDE 26

Variance calculations

Sparse partial inverse

Takahashi recursions compute S such that Sij = (Q−1)ij for all Qij = 0. Postprocessing of the (sparse) Cholesky factor.

Basic Rao-Blackwellisation of sample estimators

Let x(j) be samples from a Gaussian posterior and let a⊤x be a linear combination

  • f interest. Then, for any subdomain Ωk ⊂ Ω,

E(a⊤x) = E

  • E(a⊤x | xΩ∗

k)

  • ≈ 1

J

J

  • j=1

E(a⊤x | x(j)

Ω∗

k)

Var(a⊤x) = E

  • Var(a⊤x | xΩ∗

k)

  • + Var
  • E(a⊤x | xΩ∗

k)

  • ≈ Var(a⊤x | xj

Ω∗

k) + 1

J

J

  • j=1
  • E(a⊤x | x(j)

Ω∗

k) − E(a⊤x)

2

Efficient if aa⊤ sparsity matches S for each subdomain. Sidén et al (2018, JCGS): Iterated blockwise Takahashi

slide-27
SLIDE 27

Method overview

◮ Hierarchical timescale combination of space-time random fields ◮ Preprocessing to estimate model parameters and non-Gaussianity ◮ Iterated linearisation in approximate Newton optimisation ◮ Distributed Preconditioned Conjugate Gradient solves ◮ Information is passed between the scales ◮ Within each scale, approximate multigrid solves (not implemented) ◮ Overlapping space-time domain decomposition within each multigrid level ◮ Direct Monte Carlo sampling: add suitable randomness to the RHS of the Qx|y

solves for

µ. ◮ Rao-Blackwellised variance estimation

Parameter estimation: In the project, several ad hoc methods are used; Timeseries subsets used for diurnal range distributions and temporal correlation parameters. Local estimation of spatial dependence paramters blended into a full spacetime SPDE model

slide-28
SLIDE 28

inlabru, a friendlier INLA interface

R-INLA A.data <- inla.spde.make.A(...) A.pred <- inla.spde.make.A(...) stack.data <- inla.stack(data=..., A=list(A.data, ...), effects=...) stack.pred <- inla.stack(data=..., A=list(A.pred, ...), effects=...) stack <- inla.stack(stack.data, stack.pred) formula <- y ~ ... + f(field, model=spde) result <- inla(...) ## Linear prediction: prediction <- result$summary.fitted.values[some.indices, "mean"]

http://inlabru.org

components <- ~ ... + field(map=coordinates, model=spde) formula <- y ~ ... + field result <- bru(...) ## Non-linear prediction (via direct posterior sampling) prediction <- predict(..., ~ cos(field)) ## Extra: non-linear formulas and marked LGCP capabilities formula <- y ~ field1 * exp(field2) formula <- coordinates + size ~ field1 + dnorm(size, field2, sd=exp(theta), log=TRUE)

slide-29
SLIDE 29

inlabru features (with paraphrased code)

◮ Automated model structure mapping, sp spatial objects

components <- myeffect(map = covariate(x, y), ...) + field(map = coordinates, model = spde) No user-side inla.stack or inla.spde.make.A calls needed!

◮ Nonlinear predictors, advanced likelihood wrappers, iterated INLA calls

formula <- y ~ myeffect + exp(field) formula <- coordinates ~ myeffect + exp(field) fit <- bru(components, like(family = "cp", formula = formula, ...), ...)

◮ Posterior sampling and prediction of (nearly) arbitrary expressions

# Sample and compute nonlinear functionals: generate(fit, formula = ~ exp(myeffect + exp(field)), ...) # Sample and compute posterior summaries, # here a data level posterior probability function: predict( fit, formula = ~ dpois(0:200, sum(weight * exp(myeffect + exp(field)))), ...)

slide-30
SLIDE 30

Optimisation algorithms

Best estimate, with Newton iteration

  • 1. Pick a linearisation point x0
  • 2. Setup RHS from the gradient of the minimisation problem
  • 3. Find search direction dk with linear solve
  • 4. Find a new linearisation point xk+1 = xk + αkdk
  • 5. Repeat from 2

Linear solve, with PCG iteration

The linear system for the best estimate and for samples have structure Qx = b.

  • 1. Start at some x0
  • 2. Compute residual rk = b − Qxk
  • 3. Apply preconditioner: dk = M −1rk
  • 4. Scale and rotate dk to conjugate search direction

dk

  • 5. Find new point xk+1 = xk + αk

dk

  • 6. Repeat from 2
slide-31
SLIDE 31

A toy problem for the diurnal range model

The choice of linearisation point is important.

◮ The linearised model is only close to the full model for short distances ◮ Large scale component x1, daily component x2 ◮ η1 = exp(x1) · 0.3, η2 = exp(x1)G−1(x2) ◮ Linearise at some x0 ◮ Plain Newton may overshoot the target ◮ With a simple line search minimising the distance between the current η-estimate

and the next linearisation point the method is robustified

◮ Only the daily component is local, and is the least defined; partially robustify

blockwise

◮ Robustify for each overlapping spatiotemporal block separately, and do blended

weighting

slide-32
SLIDE 32

Tracing the model predictor

10 20 30 40 50 0.8 1.0 1.2 1.4

eta1 eta2

Basic Newton, eta

10 20 30 40 50 0.8 1.0 1.2 1.4

eta1 eta2

Robust Newton, eta

10 20 30 40 50 0.8 1.0 1.2 1.4

eta1 eta2

Partially Robust Newton, eta

type

Linear Quadratic True

◮ The plain Newton method overshoots the target, ending up in an extreme place. ◮ Shorter steps in all or some components accelerates convergence

slide-33
SLIDE 33

Tracing the model components

1 2 3 4 1.0 1.1 1.2 1.3 1.4 1.5

x1 x2

Newton, x

1 2 3 4 1.0 1.1 1.2 1.3 1.4 1.5

x1 x2

Robust Newton, x

1 2 3 4 1.0 1.1 1.2 1.3 1.4 1.5

x1 x2

Partial Robust Newton, x

◮ The plain Newton method overshoots the target, ending up in an extreme place. ◮ Shorter steps in all or some components accelerates convergence

slide-34
SLIDE 34

Partly solved and unsolved problems

◮ High order operator preconditioners for iterative "matrix free" solvers; overlapping

block calculations.

◮ Multivariate likelihoods, e.g. Dirichlet:

Current work with Joaquín Martínez Minaya to convert Dirichlet likelihoods to independent Gaussians to "fool" INLA. Postprocess for full Laplace approximation.

slide-35
SLIDE 35

Quadratisation of Dirichlet likelihoods

◮ Multivariate likelihood for yc ∈ (0, 1), C

c=1 yc = 1:

(y1, . . . , yC|α) ∼ Dirichlet(α1, . . . , αC), 0 < αc, p(y1, . . . , yC|α) = 1 B(α)

C

  • c=1

yαc−1

c

◮ Example model: αic = βc0 + Xicβc1 ◮ R-INLA restriction: yi|η conditionally independent and only dependent on ηi. ◮ Iterative approximation solution:

  • 1. Construct a multivariate Gaussian approximation to the likelihood
  • 2. Rotate & scale the observations −

→ Conditionally independent N(0, 1)

pseudo-observations.

  • 3. Run INLA, and repeat the approximation at the new estimate
slide-36
SLIDE 36

Iterated-INLA and JAGS posterior densities: intercepts

5 10 0.0 0.1 0.2

β0 p(β0|y) R−jags R−INLA

Category 1

5 10 −0.3 −0.2 −0.1

β0 p(β0|y) R−jags R−INLA

Category 2

5 10 0.20 0.25 0.30 0.35 0.40

β0 p(β0|y) R−jags R−INLA

Category 3

5 10 0.00 0.05 0.10 0.15 0.20

β0 p(β0|y) R−jags R−INLA

Category 4

slide-37
SLIDE 37

Iterated-INLA and JAGS posterior densities: slopes

5 10 15 0.10 0.15 0.20 0.25

β1 p(β1|y) R−jags R−INLA

Category 1

5 10 15 0.25 0.30 0.35 0.40 0.45

β1 p(β1|y) R−jags R−INLA

Category 2

5 10 15 0.25 0.30 0.35 0.40

β1 p(β1|y) R−jags R−INLA

Category 3

5 10 15 −0.30 −0.25 −0.20 −0.15 −0.10

β1 p(β1|y) R−jags R−INLA

Category 4

slide-38
SLIDE 38

Partly solved and unsolved problems

◮ High order operator preconditioners for iterative "matrix free" solvers; overlapping

block calculations.

◮ Multivariate likelihoods, e.g. Dirichlet:

Current work with Joaquín Martínez Minaya to convert Dirichlet likelihoods to independent Gaussians to "fool" INLA. Postprocess for full Laplace approximation.

◮ Explicit stochastic boundary precision construction (ongoing work with Daniel

Simpson and David Bolin)

◮ Blending local stationary SPDE parameter estimates to globally non-stationary

models

◮ Making manifold SPDEs more accessible; brains and other internal organs! ◮ Software testing; INLA has no automated test suite! ◮ Generalising inlabru: New backend code to allow easier extensions, and

complete INLA support; improve programmability of INLA features without breaking existing code

◮ Documentation! Progressing, but slowly. Can now use roxygen2 also in INLA. ◮ "Large INLA"; a hypothetical new INLA implementation aimed at laaaaaarge

models! Missing piece: fast and accurate log-determinants

slide-39
SLIDE 39

Practical preconditioner (extra slide)

Minimise data reading:

◮ Sweep through time for each spatial subregion ◮ For each temporal scale at the same time; accumulate information until a macro

space-time block is constructed

◮ Approximate solve for a macro space-time block ◮ Homogenisation biases form their own blocks

Notes:

◮ Nested Schur complement alternative:

Experiments show that approximate nested Schur complements (Fine-Coarse-Fine) leads to a robust and efficient preconditioner, but requires re-reading all the data multiple times (2 fine scale steps)

◮ Separate preconditioning for each scale is faster per iteration (1 fine scale step):

Not having to re-read data multiple times is faster; offsets the need for more iterations

◮ Tradeoff unknown

slide-40
SLIDE 40

Spatial fields, observations, and stochastic models

◮ Partially observed spatial functions or objects related to latent spatial functions ◮ Wanted: estimates of the true values at observed and unobserved locations ◮ Wanted: quantified uncertainty about those values ◮ Complex measurement errors can be modeled using hierarchical random effects Spatio-temporal hierarchical model framework ◮ Observations y = {yi, i = 1, . . . , ny} ◮ Latent random field x(s, t), s ∈ Ω, t ∈ R ◮ Model parameters θ = {θj, j = 1, . . . , nθ}