Imputing missing values in satellite data: From parametric to - - PowerPoint PPT Presentation

imputing missing values in satellite data from parametric
SMART_READER_LITE
LIVE PREVIEW

Imputing missing values in satellite data: From parametric to - - PowerPoint PPT Presentation

Imputing missing values in satellite data: From parametric to non-parametric approaches @ReinhardFurrer, I-Math/ICS, UZH NZZ.ch GI-Forum, ifgi, WWU M unster, 2018/11/13 Joint work Florian Gerber Emilio Porcu Francois Bachoc and


slide-1
SLIDE 1

@ReinhardFurrer, I-Math/ICS, UZH GI-Forum, ifgi, WWU M¨ unster, 2018/11/13

NZZ.ch

Imputing missing values in satellite data: From parametric to non-parametric approaches

slide-2
SLIDE 2

2

Joint work ◮ Florian Gerber ◮ Emilio Porcu ◮ Francois Bachoc

and with contributions of several others

slide-3
SLIDE 3

Outlook

3

◮ Motivation ◮ Parametric models and their issues ◮ A particular non-parametric approach ◮ Outcome of a biodiversity exercise Questions and ≪Fast-Forward≫ appreciated! Slides at http://www.math.uzh.ch/furrer/slides/

slide-4
SLIDE 4

Global Change and Biodiversity

4 www.gcb.uzh.ch

◮ University Research Priority Program, start in 2013 ◮ ≈ 50 members in 2 Faculties and 5 Institutes/Departments

slide-5
SLIDE 5

Global Change and Biodiversity

5 www.gcb.uzh.ch

slide-6
SLIDE 6

Linking MODIS with plot based data

6

Source: 10.1073/pnas.1703928114

“. . . we show that primary productivity, its temporal stability, and the decadal trend of a prolonged growing season strongly increase with biodiversity across heterogeneous landscapes, which is consistent over vast environmental, climatic, and altitudinal gradients. . . . ”

slide-7
SLIDE 7

Linking MODIS with plot based data

7

Now analysis in the Arctic, not Switzerland ◮ Species abundance plot scale measurements from the International Tundra Experiment (ITEX) ◮ NDVI satellite images and ASTER elevation data

Source: F. Gerber

slide-8
SLIDE 8

Arctic NDVI data

8

MODIS NDVI data (satellite product MOD13A1, NDVI = NIR − R NIR + R )

slide-9
SLIDE 9

Visual example

9

Source: Gerber et al (2018), TGRS

slide-10
SLIDE 10

Visual example

10

Source: Gerber et al (2018), TGRS

slide-11
SLIDE 11

Spatial statistics: prediction

12

s1 s sn

2

(s )

i

(s )

n

y y

i

s y(s )

2 1

(s ) y

Observations: y(s1), . . . , y(sn) Model: Y (s) = signal + noise Y (s) = trend + stochastic part + noise Y (s) = xT(s)β + α(s) + Z(s) + ε(s)

slide-12
SLIDE 12

Spatial statistics: prediction

13

s s s1 si

n 2

signal noise

Predict the quantity of interest at an arbitrary location. Why? ◮ Fill-in missing data ◮ Force data onto a regular grid ◮ Smooth out the measurement error How? ◮ By eye ◮ Linear interpolation ◮ The correct way . . .

slide-13
SLIDE 13

Spatial statistics: prediction

14

s s s1 si

n 2

Describing the covariance structure

0.0 0.2 0.4 0.6 0.8

Covariance Distance, lag h

  • C(dist(s1, s2))

C(dist(s1, sn))

Covariance matrix Σ contains elements C

  • dist(si, sj)
  • .
slide-14
SLIDE 14

Spatial statistics: prediction

15

s s s1 si

n 2

s0

?

Predict Z(s0) given y(s1), . . . , y(sn). Minimize mean squared prediction error (over all linear unbiased predictors)

  • Best Linear Unbiased Predictor:

BLUP = Cov

  • Z(spredict), Y (sobs)
  • Var
  • Y (sobs)

−1obs

  • Z(s0) = cT Σ−1 y

(one spatial process, no trend, known covariance structure;

  • therwise almost the same)
slide-15
SLIDE 15

Outlook

16

◮ Motivation ◮ Parametric models and their issues ◮ A particular non-parametric approach ◮ Outcome of a biodiversity exercise

slide-16
SLIDE 16

Issues of basic, classical kriging

17

Cov(pred, obs) · Var(obs)−1 · obs = c Σ−1 y ◮ “Simple” spatial interpolation . . . . . . on paper or in class! ◮ BUT:

  • 1. Complex mean structure
  • 2. Unknown parameters
  • 3. Large spatial fields
  • 4. Non-stationary covariances
  • 5. Space-time data on the sphere
slide-17
SLIDE 17

Issues of basic, classical kriging

18

  • 1. Complex mean structure
  • 2. Unknown parameters
  • 3. Large spatial fields
  • 4. Non-stationary covariances
  • 5. Space-time data on the sphere

◮ Parametric structure typically ok ◮ Non-parametric structure often creates “model clash”

slide-18
SLIDE 18

Issues of basic, classical kriging

19

  • 1. Complex mean structure
  • 2. Unknown parameters
  • 3. Large spatial fields
  • 4. Non-stationary covariances
  • 5. Space-time data on the sphere

◮ (method of moment estimation) ◮ Likelihood approaches

Cholesky factorizations

slide-19
SLIDE 19

Issues of basic, classical kriging

20

  • 1. Complex mean structure
  • 2. Unknown parameters
  • 3. Large spatial fields
  • 4. Non-stationary covariances
  • 5. Space-time data on the sphere

◮ Many R packages do perform kriging . . . . . . many black boxes . . . . . . to tailored situations See Heaton et al. arXiv:1710.05013/JABES forthcoming

Computational limits are quickly attained!

slide-20
SLIDE 20

Methods for large spatial datasets

21

◮ Sparse Covariance methods: — Covariance Tapering Furrer — Spatial Partitioning Heaton ◮ Sparse Precision methods: — Lattice Kriging Nychka — Multiresolution Approximations Katzfuss — Stochastic Partial Differential Equations Lindgren — Periodic Embedding Guinness — Nearest Neighbor Processes Datta ◮ Low rank approximation: — Fixed Rank Kriging Zammit-Mangion — Predictive Processes Finley ◮ Algorithmic approaches: — Gapfill Gerber — Local Approximate Gaussian Processes Gramacy — Metakriging Guhaniyogi

slide-21
SLIDE 21

Spatial modeling

22

Geostatistical model (GRF):

s s s1 si

n 2

0.0 0.2 0.4 0.6 0.8

Covariance Distance, lag h

  • C(dist(s1, s2))

C(dist(s1, sn))

Covariance matrix: Σ Lattice model (GMRF): E(Zi|z−i) = β

  • j neighbor of i

zj Var(Zi|z−i) = τ2 Gaussianity and regularity conditions:

Σ = τ2(I − B)−1

slide-22
SLIDE 22

Spatial modeling

23

Geostatistical model (GRF):

Σ

Lattice model (GMRF):

Σ−1 Σ Σapp

slide-23
SLIDE 23

Sparseness

25

Using sparse covariance functions for greater computational efficiency. Sparseness is guaranteed when ◮ the covariance function has a compact support ◮ a compact support is (artificially) imposed tapering

10 20 30 40

Distance, lag h Matern ν = 1.5 Wendland

10 20 30 40

Distance, lag h Matern ν = 1.5 Wendland Matern * Wendland

slide-24
SLIDE 24

Sparseness: prediction/estimation

26

◮ Univariate setting: Proofs based on infill asymptotics and “misspecified” covariances Conditions on the tail behaviour of the spectrum of the (tapered) covariance

Furrer, Genton, Nychka (2006) JCGS Kaufman, Schervish, Nychka (2008) JMVA Stein (2013) JCGS Bevilacqua et al (2018?) AoS

◮ Multivariate setting: Proofs based on domain increasing framework Weak conditions on the taper

Furrer, Du, Bachoc (2016) JMVA

slide-25
SLIDE 25

Software

27

Software to exploit the sparse structure spam64 for : ◮ an R package for sparse matrix algebra ◮ storage economical and fast ◮ versatile, intuitive and simple

See Furrer et al. (2006) JCGS; Furrer, Sain (2010) JSS

◮ R objects have at most 231 elements (almost) ◮ R does not ‘have’ 64-bit integers: stored as doubles ◮ 64-bit exploitation consists of type conversions between front-end R and pre-compiled code

Gerber, M¨

  • singer, Furrer (2017) CaGeo

Gerber, M¨

  • singer, Furrer (2018) SoftwareX
slide-26
SLIDE 26

Arctic NDVI data

30

MODIS NDIV data (satellite product MOD13A1, NDVI = NIR − R NIR + R )

slide-27
SLIDE 27

Kriging is smoothing

31

slide-28
SLIDE 28

Interpolation using gapfill

32

slide-29
SLIDE 29

Interpolation using gapfill

33

145 161 177 193 2004 2005 2006 2007

Day of the year

0.2 0.4 0.6 0.8

NDVI

slide-30
SLIDE 30

gapfill: ranking of the images

34

r = 1 2 3 4 5 6 7 8 9 10 11 12

  • 161

177 193 2004 2005 2006 2007

Day of the year Year

0.2 0.4 0.6 0.8

NDVI

low high

slide-31
SLIDE 31

gapfill: quantile regression

35

Date: 193 doy 2004 177 doy 2006 177 doy 2005 193 doy 2006 193 doy 2005 Score: 0.65 0.71 0.77 0.88 0.91 Rank: 8 9 10 11 12 ˆ q: NA 0.64 NA 0.12 0.77

slide-32
SLIDE 32

gapfill: prediction uncertainties

36

145 161 177 193 2004 2005 2006 2007

Day of the year Year

0.2 0.4 0.6 0.8

NDVI

145 161 177 193 2004 2005 2006 2007

Day of the year Year

0.2 0.4 0.6 0.8

interval length

data and predictions uncertainties

slide-33
SLIDE 33

gapfill: location

37

slide-34
SLIDE 34

gapfill: comparison

38 RMSE ×103

slide-35
SLIDE 35

gapfill: uncertainties

39 (l) Uncertainty contribution from the indicated four steps of the gapfill procedure. (m) Average width of the 90% prediction intervals (40% missing values). (r) Average interval widths and coverage rate per day of the year.

slide-36
SLIDE 36

Summary

40

Implementation: spam64

gapfill

Intuition: statistical conceptual Model: frequentist based distribution free Uncertainties: formal resampling type Practicality: play ground competitive

slide-37
SLIDE 37

Outlook

41

◮ Motivation ◮ Parametric models and their issues ◮ A particular non-parametric approach ◮ Outcome of a biodiversity exercise

slide-38
SLIDE 38

Biodiversity hypotheses

42

H1: Plant productivity (quantified through NDVI) is positively correlated with plot scale biodiversity H2: Landscape variability (quantified through NDVI and slope) is positively correlated with plot scale biodiversity H3: Slope induces a drainage effect and increases plot scale biodiversity

slide-39
SLIDE 39

Data

43

◮ Species abundance plot scale measurements from the International Tundra Experiment (ITEX) Shannon biodiversity index on site and plot scale ◮ Landsat NDVI satellite images and ASTER elevation data characterization of the landscape heterogeneity

Source: F. Gerber

slide-40
SLIDE 40

Data

43

◮ Species abundance plot scale measurements from the International Tundra Experiment (ITEX) Shannon biodiversity index on site and plot scale ◮ Landsat NDVI satellite images and ASTER elevation data characterization of the landscape heterogeneity

Source: F. Gerber

abisko alexfiord anwr atqasuk barrow bylot kanger kilpisjarvi sadvent sverdrup toolik zackenberg 1980 1985 1990 1995 2000 2005 2010 1980 1985 1990 1995 2000 2005 2010 1 3 5 7 9 11 1 3 5 7 9 11 1 3 5 7 9 11 1 3 5 7 9 11 1 3 5 7 9 11 1 3 5 7 9 11

month year

25 50 75 100

% Cloud cover

Source: F. Gerber

slide-41
SLIDE 41

Results

44

◮ Data did not provide evidence for the hypothesis H1–H3. ◮ Statistical power could be improved by adding additional plot data. ◮ Limited amount of Landsat images makes it difficult to measure their seasonal and annual variability. This confounds the temporal aggregation.

Source: F. Gerber

slide-42
SLIDE 42

45

Collaboration with: – Florian Gerber – Gabriela Schaepman-Strub – Rogier de Jong – Emilio Porcu – Francois Bachoc – Alfredo Alegria – Kaspar M¨

  • singer

– former & present ‘Applied Statistics’ team . . . and many more 143282, 144973, 175529

slide-43
SLIDE 43

Source: www.gcb.uzh.ch

www.gcb.uzh.ch/en/Events/URPP-GCB-Conferences/conference2019.html

slide-44
SLIDE 44

References (some, alphabetical)

Furrer Sain (2010) spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields JSS 36 1–25 Furrer et al (2017) spam: Sparse Matrix algebra. R package version 2.2-0 Furrer et al (2017) spam64: 64-Bit Extension of the SPArse Matrix R Package ’spam’. R package version 2.2-0 Gerber Moesinger Furrer (2017) Extending R Packages to Support 64-bit Compiled Code: An Illustration with spam64 and GIMMS NDVI3g Data Comput Geosci 104 109-119 Gerber et al (2018) Predicting Missing Values in Spatio-Temporal Remote Sensing

  • Data. IEEE TGRS 56(5) 2841-2853

Gerber Moesinger Furrer (2017) dotCall64: An Efficient Interface to Compiled C/C++ and Fortran Code Supporting Long Vectors SoftwareX 7 217-221 Heaton et al (2017f) A Case Study Competition among Methods for Analyzing Large Spatial Data arXiv:1710.05013/JABES forthcoming Porcu Alegria Furrer (2018) Modeling Temporally Evolving and Spatially Globally Dependent Data International Statistical Review 86 344–377 Complete list at: www.math.uzh.ch/furrer/research/publications.shtml