Predicting missing values in spatio-temporal satellite data - - PowerPoint PPT Presentation

predicting missing values in spatio temporal satellite
SMART_READER_LITE
LIVE PREVIEW

Predicting missing values in spatio-temporal satellite data - - PowerPoint PPT Presentation

Predicting missing values in spatio-temporal satellite data Reinhard Furrer, I-Math/ICS, UZH NZZ.ch ISNPS2018, Salerno, 2018/06/14 Joint work Florian Gerber Emilio Porcu and with contributions of several others 2 Arctic NDVI NDVI =


slide-1
SLIDE 1

Reinhard Furrer, I-Math/ICS, UZH ISNPS2018, Salerno, 2018/06/14

NZZ.ch

Predicting missing values in spatio-temporal satellite data

slide-2
SLIDE 2

2

Joint work ◮ Florian Gerber ◮ Emilio Porcu

and with contributions of several others

slide-3
SLIDE 3

Arctic NDVI

3

NDVI = NIR − R NIR + R

slide-4
SLIDE 4

Why do we care?

4

Scientifically: ◮ Complex interplay between climate and vegetation/ecosystems ◮ Reflectance measurements as proxy for “greenness”

slide-5
SLIDE 5

Why do we care?

5

Scientifically: ◮ Complex interplay between climate and vegetation/ecosystems ◮ Reflectance measurements as proxy for “greenness” Statistically: ◮ Large, spatio-temporal datasets with complex structures at low resolution ◮ . . . many missing values

  • Imperfect “products”
slide-6
SLIDE 6

Basic, classical kriging

6

Cov(pred, data) · Var(data)−1 · data = Σpred,data Σ−1

data y

◮ “Simple” spatial interpolation . . . . . . on paper or in class! ◮ Flexible models for covariances are required ◮ Computational limits are quickly attained

slide-7
SLIDE 7

Flexible models

7

Space-time covariance functions on the sphere: ◮ Spectral representations/stochastic expansions ◮ Scale mixture representations ◮ Lagrangian framework See: Jeong, Jun, Genton, StatSci 2017 Porcu, Alegria, F, ISR 2018

slide-8
SLIDE 8

Computational limits

8

Reduce “problem size” See Heaton et al, arXiv:1710.05013 ◮ enforce sparsity (‘misspecification’, tapering, . . . ) ◮ Use R’s long vector support spam64 (64-bit with tailored packages for type conversions between front-end R and pre-compiled code) See Gerber, M¨

  • singer, F, CaGeo 2017; . . .
slide-9
SLIDE 9

Arctic NDVI data

9

MODIS NDIV data (satellite product MOD13A1)

slide-10
SLIDE 10

Kriging is smoothing

10

slide-11
SLIDE 11

Kriging is smoothing

10

slide-12
SLIDE 12

Interpolation using gapfill

11

slide-13
SLIDE 13

Interpolation using gapfill

12

145 161 177 193 2004 2005 2006 2007

Day of the year

0.2 0.4 0.6 0.8

NDVI

slide-14
SLIDE 14

gapfill: ranking of the images

13

r = 1 2 3 4 5 6 7 8 9 10 11 12

  • 161

177 193 2004 2005 2006 2007

Day of the year Year

0.2 0.4 0.6 0.8

NDVI

low high

slide-15
SLIDE 15

gapfill: quantile regression

14

Date: 193 doy 2004 177 doy 2006 177 doy 2005 193 doy 2006 193 doy 2005 Score: 0.65 0.71 0.77 0.88 0.91 Rank: 8 9 10 11 12 ˆ q: NA 0.64 NA 0.12 0.77

slide-16
SLIDE 16

gapfill: prediction uncertainties

15

145 161 177 193 2004 2005 2006 2007

Day of the year Year

0.2 0.4 0.6 0.8

NDVI

145 161 177 193 2004 2005 2006 2007

Day of the year Year

0.2 0.4 0.6 0.8

interval length

data and predictions uncertainties

slide-17
SLIDE 17

gapfill: location

16

slide-18
SLIDE 18

gapfill: comparison

17 RMSE ×103

slide-19
SLIDE 19

gapfill: uncertainties

18 (l) Uncertainty contribution from the indicated four steps of the gapfill procedure. (m) Average width of the 90% prediction intervals (40% missing values). (r) Average interval widths and coverage rate per day of the year.

slide-20
SLIDE 20

Summary

19

Implementation: spam64

gapfill

Intuition: statistical conceptual Model: frequentist based distribution free Uncertainties: formal resampling type Practicality: play ground competitive

slide-21
SLIDE 21

20

Collaboration with: – Emilio Porcu – Alfredo Alegria – Florian Gerber – Kaspar M¨

  • singer

– former & present ‘Applied Statistics’ team . . . and many more URPP Global Change and Biodiversity 143282, 144973, 175529

slide-22
SLIDE 22

References (some, alphabetical)

21 Furrer Sain (2010) spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields JSS 36 1–25 Furrer et al (2017) spam: Sparse Matrix algebra. R package version 2.2-0 Furrer et al (2017) spam64: 64-Bit Extension of the SPArse Matrix R Package ’spam’. R package version 2.2-0 Gerber Moesinger Furrer (2017) Extending R Packages to Support 64-bit Compiled Code: An Illustration with spam64 and GIMMS NDVI3g Data Comput Geosci 104 109-119 Gerber et al (2018) Predicting Missing Values in Spatio-Temporal Remote Sensing

  • Data. IEEE TGRS 56(5) 2841-2853

Gerber Moesinger Furrer (2017) dotCall64: An Efficient Interface to Compiled C/C++ and Fortran Code Supporting Long Vectors arXiv:1702.08188 Heaton et al (2017) A Case Study Competition among Methods for Analyzing Large Spatial Data arXiv:1710.05013 Porcu Alegria Furrer (2018) Modeling Temporally Evolving and Spatially Globally Dependent Data International Statistical Review first published.

slide-23
SLIDE 23

Appendix

22

slide-24
SLIDE 24

Goals and concepts

23

How to efficiently extend an R package for 64-bit use? ◮ EASY on maintainer: We do not want to re write source code! ◮ EASY on user: no loss for classical use (speed, complexity, . . . ) minimal effort for ‘expert’ use Illustrated here with use of .Fortran or .C calls (aka Foreign Function Interface).

slide-25
SLIDE 25

Implementation dotCall64

24

◮ choose 32-bit or 64-bit compiled code accordingly ◮ exploit DUP=FALSE use dotCall64

R function dotCall64 Compiled code 32-bit compiled code 64-bit compiled code Input Preprocess Postprocess Output Use 64-bit?

slide-26
SLIDE 26

Implementation spam64

25

◮ choose 32-bit or 64-bit compiled code accordingly ◮ exploit DUP=FALSE use ◮ spam64 is ‘sed‘ed out of spam ◮ .Fortran − → dotCall64::.C64

slide-27
SLIDE 27

Illustration

26

◮ residual field from AVHRR NDVI3g product: y = y2000−2009 − y1990−1999, n = 769, 940 observations ◮ Nonstationary Gaussian random field (zero mean)

slide-28
SLIDE 28

Nonstationary covariance matrix

27

Σ(κ, β, τ) = κ D(β)

  • (1 − τ)I + τR
  • D(β)

◮ κ > 0: scaling parameter ◮ D(β) = diag(exp(Xβ)): controls strength via covariates ◮ τ ∈ [0, 1]: “no spatial correlation” vs “spatial correlation” ◮ I: identity matrix ◮ R: stationary correlation matrix: – compactly supported covariance – range 50km, sparsity 0.2%

slide-29
SLIDE 29

Optimization

28

◮ Fast is relative . . . optim suboptimal Task Function Time Sparsity Distances h <- nearest.dist(...) 23min 1.4 Gb 0.2o /

  • Covariances

T <- cov.Wend(h, ...) 2min 1.4 Gb 0.2o /

  • Cholesky

chol(T, ...) 29min 8.5 Gb 1.9o /

  • o (64-bit)

◮ optimization strategy: – (iterative) grid search over τ exploit multicore architecture – for given τ use quasi-Newton optimizer to optimize κ, β

slide-30
SLIDE 30

Nonstationary covariance matrix

29

◮ with covariates “distance to nearest coast” and “elevation” ◮ diag(

Σ):

◮ BIC improvement (1%) compared to nonstationary model

. . . but not sufficiently flexible ◭