Interpolation for huge spatial datasets: theory, implementations and - - PowerPoint PPT Presentation

interpolation for huge spatial datasets theory
SMART_READER_LITE
LIVE PREVIEW

Interpolation for huge spatial datasets: theory, implementations and - - PowerPoint PPT Presentation

Interpolation for huge spatial datasets: theory, implementations and ideas @ReinhardFurrer, I-Math/ICS, UZH NZZ.ch Uppsala, 2019/01/30 Joint work Emilio Porcu Florian Gerber Francois Bachoc and with contributions of several others


slide-1
SLIDE 1

@ReinhardFurrer, I-Math/ICS, UZH Uppsala, 2019/01/30

NZZ.ch

Interpolation for huge spatial datasets: theory, implementations and ideas

slide-2
SLIDE 2

2

Joint work ◮ Emilio Porcu ◮ Florian Gerber ◮ Francois Bachoc

and with contributions of several others

slide-3
SLIDE 3

Outlook

3

◮ Motivation ◮ Covariance approximations ◮ Gaussian equivalent measures ◮ Prediction and estimation ◮ Implementation and illustration Questions and ≪Fast-Forward≫ appreciated! Slides at http://www.math.uzh.ch/furrer/slides/

slide-4
SLIDE 4

Spatial statistics: prediction

4

s1 s sn

2

(s )

i

(s )

n

y y

i

s y(s )

2 1

(s ) y

Observations: y(s1), . . . , y(sn) First law of geography (Waldo Tobler): Everything is related to everything else, but near things are more related than distant things.

Source: wikipedia.org

slide-5
SLIDE 5

Spatial statistics: prediction

4

s s s1 si

n 2

signal noise

Predict the quantity of interest at an arbitrary location. Why? ◮ Fill-in missing data ◮ Force data onto a regular grid ◮ Smooth out the measurement error How? ◮ By eye ◮ Linear interpolation ◮ The correct way . . .

slide-6
SLIDE 6

Visual example

7

Source: Gerber et al (2018), TGRS

slide-7
SLIDE 7

Visual example

8

Source: Gerber et al (2018), TGRS

slide-8
SLIDE 8

Spatial statistics: prediction

9

s s s1 si

n 2

s0

?

Predict Z(s0) given y(s1), . . . , y(sn) Y (s) = xT(s)β + µ(s) + Z(s) + ε(s) Minimize mean squared prediction error (over all linear unbiased predictors)

  • Best Linear Unbiased Predictor:

BLUP = Cov

  • Z(spredict), Y (sobs)
  • Var
  • Y (sobs)

−1obs

  • Z(s0) = cT Σ−1 y

(one spatial process, no trend, known covariance structure;

  • therwise almost the same)
slide-9
SLIDE 9

Spatial statistics: prediction

9

s s s1 si

n 2

Predict Z(s0) given y(s1), . . . , y(sn) Y (s) = xT(s)β + µ(s) + Z(s) + ε(s)

0.0 0.2 0.4 0.6 0.8

Covariance Distance, lag h

  • C(dist(s1, s2))

C(dist(s1, sn))

Covariance matrix Σ contains elements C

  • dist(si, sj)
  • .
slide-10
SLIDE 10

Issues of basic, classical kriging

10

Cov(pred, obs) · Var(obs)−1 · obs = c Σ−1 y ◮ “Simple” spatial interpolation . . . . . . on paper or in class! ◮ BUT:

  • 1. Complex mean structure
  • 2. Unknown covariance function, unknown parameters
  • 3. Large spatial fields
  • 4. Non-stationary covariances
  • 5. Multivariate/space-time/data on the sphere

Many R packages . . . . . . many black boxes . . . Heaton et al. JABES

slide-11
SLIDE 11

Outlook

12

◮ Motivation ◮ Covariance approximations ◮ Gaussian equivalent measures ◮ Prediction and estimation ◮ Implementation and illustration

slide-12
SLIDE 12

Spatial modeling

13

Geostatistical model (GRF):

s s s1 si

n 2

0.0 0.2 0.4 0.6 0.8

Covariance Distance, lag h

  • C(dist(s1, s2))

C(dist(s1, sn))

Covariance matrix: Σ Lattice model (GMRF): E(Zi|z−i) = β

  • j neighbor of i

zj Var(Zi|z−i) = τ2 Gaussianity and regularity conditions:

Σ = τ2(I − B)−1

slide-13
SLIDE 13

Spatial modeling

14

Geostatistical model (GRF):

Σ

Lattice model (GMRF):

Σ−1 Σ Σapp

slide-14
SLIDE 14

Sparseness

16

Using sparse covariance functions for greater computational efficiency. Sparseness is guaranteed when ◮ the covariance function has a compact support ◮ a compact support is (artificially) imposed tapering

10 20 30 40

Distance, lag h Wendland Wendland

10 20 30 40

Distance, lag h Matern ν = 1.5 Wendland Matern * Wendland

slide-15
SLIDE 15

Tapering: asymptotic optimality

16

For an isotropic and stationary process with covariance C0(h), find a taper Cρ(h), such that kriging with C0(h)Cρ(h) is asymptotically optimal. MSPE(s0, C0Cρ) MSPE(s0, C0) = MSPEtaper MSPEtrue → 1 ̺(s0, C0Cρ) MSPE(s0, C0) = MSPEnaive MSPEtrue → 1 ̺(s0, C) = C(0) − cTΣ−1c

slide-16
SLIDE 16

Tapering: asymptotic optimality

16

For an isotropic and stationary process with covariance C0(h), find a taper Cρ(h), such that kriging with C0(h)Cρ(h) is asymptotically optimal. MSPE(s0, C0Cρ) MSPE(s0, C0) = MSPEtaper MSPEtrue → 1 ̺(s0, C0Cρ) MSPE(s0, C0) = MSPEnaive MSPEtrue → 1 ̺(s0, C) = C(0) − cTΣ−1c

C1(h) C1(h)

Proofs based on infill asymptotics and “misspecified” covariances Conditions on the tail behaviour of the spectrum of the (tapered) covariance

Furrer, Genton, Nychka (2006) JCGS Stein (2013) JCGS Bevilacqua et al (2019) AoS

slide-17
SLIDE 17

Covariance functions

17

Mat´ ern: Mν(r) = 21−ν Γ(ν)rνKν(r) ν > 0 Mν,α,σ2(r) = σ2 Mν(r/α) α > 0 σ2 > 0 Generalized Wendland: Wµ,κ(r) = 1 B(2κ, µ + 1)

  • r

u(u2 − r2)κ−1(1 − u)µ

+ du

κ > 0 µ ≥ (d + 1)/2 + κ Wµ,κ,β,σ2(r) = σ2 Wµ(r/β) β > 0 σ2 > 0

slide-18
SLIDE 18

Outlook

18

◮ Motivation ◮ Covariance approximations ◮ Gaussian equivalent measures ◮ Prediction and estimation ◮ Implementation and illustration

slide-19
SLIDE 19

Equivalence of Gaussian measures

19

P0, P1 two probability measures on a measureable space {Ω, S}.

  • Definition. P0, P1 are equivalent if P0(A) = 1, for any A ∈ S implies

P1(A) = 1 and vice versa. ◮ Restrict A to the σ-algebra generated by {Z(s), s ∈ D} Here: ‘≡’ equivalent on the paths of {Z(s), s ∈ D}. ◮ For Gaussian measures: equivalence or orthogonality first two moments discriminate: P(µ, C) ◮ Here: µ = 0 and D ⊂ Rd, d = 1, 2, 3

slide-20
SLIDE 20

Equivalence of Gaussian measures

20

Covariance functions C0, C1, associated spectral densities CF

0 , CF 1 .

D ⊂ Rd any bounded infinite set, s0 ∈ D.

  • Theorem. (Stein 2004)

If ∃a > 0, 0 < bℓ ≤ CF

0 (z) za ≤ bu < ∞,

z → ∞ and if

  • 0<b

zd−1

  • CF

1 (z) − CF 0 (z)

CF

0 (z)

2

dz < ∞, then P(C0) ≡ P(C1).

slide-21
SLIDE 21

Fourier transforms

21

MF

ν (z) = cte ·

1 (1 + z2)ν+d/2, z ≥ 0. WF

µ,κ(z) = cte · 1F2

  • λ; λ + µ

2, λ + µ 2 + 1 2; −z2 4

  • ,

λ = (d + 1)/2 + κ ≍ z−2λ for z → ∞,

slide-22
SLIDE 22

Equivalence of Gaussian measures

22

  • Theorem. (Zhang 2004)

Let P(Mν,α0,σ2

0), P(Mν,α1,σ2 1), D ⊂ Rd, d = 1, 2, 3.

P(Mν,α0,σ2

0) ≡ P(Mν,α1,σ2 1)

⇐ ⇒ σ2

  • α2ν

= σ2

1

  • α2ν

1

◮ d ≤ 3: not all parameters consistently estimable microergodic parameter σ2 α2ν ◮ d ≥ 5 all fine (Anderes 2010) ◮ d = 4 open . . .

slide-23
SLIDE 23

Equivalence of Gaussian measures

23

  • Theorem. (BFFP 2019)

Let P(Wµ,κ,β0,σ2

0), P(Wµ,κ,β1,σ2 1).

For a given κ ≥ 0, let µ > d + 1/2 + κ: P(Wµ,κ,β0,σ2

0) ≡ P(Wµ,κ,β1,σ2 1)

⇐ ⇒ σ2

  • β2κ+1

= σ2

1

  • β2κ+1

1

  • Theorem. (BFFP 2019)

Let P(Mν,α,σ2

0) and P(Wµ,κ,β,σ2 1), D ⊂ Rd, d = 1, 2, 3, κ > 0.

If ν = κ + 1/2, µ > d + 1/2 + κ: P(Mν,α,σ2

0) ≡ P(Wµ,κ,β,σ2 1)

⇐ ⇒ σ2

  • α2ν = cteν,κ,µ·σ2

1

  • β2κ+1
slide-24
SLIDE 24

Outlook

24

◮ Motivation ◮ Covariance approximations ◮ Gaussian equivalent measures ◮ Prediction and estimation ◮ Implementation and illustration

slide-25
SLIDE 25

Prediction: asymptotic optimality

25

Prediction of Z(s0) using “BLUP” under misspecification Zn(µ, κ, β).

  • Theorem. (BFFP 2019)

P(Wµ,κ,β,σ2): if µ > (d + 1)/2 + κ and for any fixed β1 > 0 Varµ,κ,β,σ2 Zn(µ, κ, β1) − Z(s0)

  • Varµ,κ,β,σ2

Zn(µ, κ, β) − Z(s0)

→ 1 P(Mν,α,σ2): if ν = κ + 1/2 and for any fixed β1 > 0 Varν,α,σ2 Zn(µ, κ, β1) − Z(s0)

  • Varν,α,σ2

Zn(ν, α) − Z(s0)

→ 1

slide-26
SLIDE 26

Prediction: presumed MSPE

26

Prediction of Z(s0) using “BLUP” under misspecification Zn(µ, κ, β).

  • Theorem. (BFFP 2019)

P(Wµ,κ,β,σ2): if σ2

  • β2κ+1

= σ2

1

  • β2κ+1

1

Varµ,κ,β1,σ2

1

Zn(µ, κ, β1) − Z(s0)

  • Varµ,κ,β,σ2

Zn(µ, κ, β1) − Z(s0)

→ 1 P(Mν,α,σ2): if ν = κ + 1/2, σ2 α2ν = cte · σ2

1

  • β2κ+1

1

Varµ,κ,β1,σ2

1

Zn(µ, κ, β1) − Z(s0)

  • Varν,α,σ2

Zn(µ, κ, β1) − Z(s0)

→ 1

slide-27
SLIDE 27

Estimation: tapering

27

Likelihood: − 1 2 log det(Σ(θ) ◦ T) − 1 2yT(Σ(θ) ◦ T)−1y − 1 2 log det(Σ(θ) ◦ T) − 1 2yT (Σ(θ) ◦ T)−1 ◦ T

  • y

◮ Dimension plays crucial role Kaufman et al. (2008) JASA ◮ Two-taper approach cannot be used in practice

slide-28
SLIDE 28

Estimation: Mat´ ern

28

Let P(Mν,α0,σ2

0), with certain constrains on the parameter space.

  • Theorem. (Shaby Kaufmann 2013)

Let {ˆ σ2

n, ˆ

αn} the ML estimator. Then as n → ∞ 1. ˆ σ2

n

  • ˆ

α2ν

n a.s.

− → σ2

  • α2ν

2. √n(ˆ σ2

n/ˆ

α2ν

n − σ2 0/α2ν 0 ) D

− → N

  • 0, 2(σ2

0/α2ν 0 )2

◮ Also holds for ˆ σ2

n(α) with misspecified α

slide-29
SLIDE 29

Estimation: generalized Wendland

29

Let P(Wµ,κ,β0,σ2

0), with certain constrains on the parameter space.

  • Theorem. (BFFP 2019)

Let ˆ σ2

n(β) the ML estimator with “known” β. Then, as n → ∞

  • 1. ˆ

σ2

n(β)

  • β2κ+1 a.s

− → σ2

  • β2κ+1
  • 2. √n(ˆ

σ2

n(β)

  • β2κ+1 − σ2
  • β2κ+1

)

D

− → N

  • 0, 2(σ2

0/β2κ+1

)2

slide-30
SLIDE 30

Estimation: generalized Wendland

30

Let P(Wν,α,σ2), with certain constrains on the parameter space.

  • Theorem. (BFFP 2019)

Let {ˆ σ2

n(ˆ

βn), ˆ βn} the ML estimator. Then, as n → ∞

  • 1. ˆ

σ2

n(ˆ

βn)

  • ˆ

β2κ+1

n a.s

− → σ2

  • β2κ+1
  • 2. √n(ˆ

σ2

n(ˆ

βn)

  • ˆ

β2κ+1

n

− σ2

  • β2κ+1

)

D

− → N

  • 0, 2(σ2

0/β2κ+1

)2

slide-31
SLIDE 31

Theory vs. practice

31

◮ Tapering is “obsolete”, but: – with practicable supports, slow convergence – evaluation of covariance function ◮ Univariate setting: Proofs based on infill asymptotics and “misspecified” covariances ◮ Multivariate setting: Proofs based on domain increasing framework Weak conditions on the taper (Furrer, Du, Bachoc 2016)

◮ asymptotics in practice?? ◭

slide-32
SLIDE 32

Outlook

32

◮ Motivation ◮ Covariance approximations ◮ Gaussian equivalent measures ◮ Prediction and estimation ◮ Implementation and illustration

slide-33
SLIDE 33

Implementation: software

33

Software to exploit the sparse structure spam64 for : ◮ an R package for sparse matrix algebra ◮ storage economical and fast ◮ versatile, intuitive and simple

See Furrer et al. (2006) JCGS; Furrer, Sain (2010) JSS

◮ R objects have at most 231 elements (almost) ◮ R does not ‘have’ 64-bit integers: stored as doubles ◮ 64-bit exploitation consists of type conversions between front-end R and pre-compiled code

Gerber, M¨

  • singer, Furrer (2017) CaGeo

Gerber, M¨

  • singer, Furrer (2018) SoftwareX
slide-34
SLIDE 34

Illustration

34

◮ residual field from AVHRR NDVI3g product: y = y2000−2009 − y1990−1999, n = 769, 940 observations ◮ Nonstationary Gaussian random field (zero mean)

slide-35
SLIDE 35

Illustration: nonstationary covariance

35

Σ(κ, β, τ) = κ D(β)

  • (1 − τ)I + τR
  • D(β)

◮ κ > 0: scaling parameter ◮ D(β) = diag(exp(Xβ)): controls strength via covariates ◮ τ ∈ [0, 1]: “no spatial correlation” vs “spatial correlation” ◮ I: identity matrix ◮ R: stationary correlation matrix: – compactly supported covariance – range 50km, sparsity 0.2%

slide-36
SLIDE 36

Illustration: optimization

36

◮ Fast is relative . . . optim suboptimal Task Function Time Sparsity Distances h <- nearest.dist(...) 23min 1.4 Gb 0.2o /

  • Covariances

T <- cov.Wend(h, ...) 2min 1.4 Gb 0.2o /

  • Cholesky

chol(T, ...) 29min 8.5 Gb 1.9o /

  • o (64-bit)

◮ optimization strategy: – (iterative) grid search over τ exploit multicore architecture – for given τ use quasi-Newton optimizer to optimize κ, β

slide-37
SLIDE 37

Illustration: nonstationary covariance

37

◮ with covariates “distance to nearest coast” and “elevation” ◮ diag(

Σ):

◮ BIC improvement compared to nonstationary model

. . . but not sufficiently flexible ◭

slide-38
SLIDE 38

Theory vs. practice

38

  • 1. Complex mean structure
  • 2. Unknown covariance function, unknown parameters
  • 3. Large spatial fields
  • 4. Non-stationary covariances
  • 5. Multivariate/space-time/

data on the sphere

gapfill (distribution free/ algorithmic)

slide-39
SLIDE 39

39

Collaboration with: – Emilio Porcu – Florian Gerber – Francois Bachoc – Moreno Bevilacqua – Kaspar M¨

  • singer

– former & present ‘Applied Statistics’ team . . . and many more 143282, 144973, 175529