Lecture 20 Spatial Random Effects Models + Point Reference Spatial - - PowerPoint PPT Presentation

lecture 20
SMART_READER_LITE
LIVE PREVIEW

Lecture 20 Spatial Random Effects Models + Point Reference Spatial - - PowerPoint PPT Presentation

Lecture 20 Spatial Random Effects Models + Point Reference Spatial Data Colin Rundel 04/03/2017 1 Spatial Random Effects Models 2 Scottish Lip Cancer Data 3 Observed Expected 61 N 60 N 59 N value 80 58 N 60 40 57 N 20


slide-1
SLIDE 1

Lecture 20

Spatial Random Effects Models + Point Reference Spatial Data

Colin Rundel 04/03/2017

1

slide-2
SLIDE 2

Spatial Random Effects Models

2

slide-3
SLIDE 3

Scottish Lip Cancer Data

Observed Expected 8°W 6°W 4°W 2°W 8°W 6°W 4°W 2°W 55°N 56°N 57°N 58°N 59°N 60°N 61°N 20 40 60 80

value 3

slide-4
SLIDE 4

55°N 56°N 57°N 58°N 59°N 60°N 61°N 8°W 6°W 4°W 2°W 2 4 6

Obs/Exp

55°N 56°N 57°N 58°N 59°N 60°N 61°N 8°W 6°W 4°W 2°W 5 10 15 20

% Agg Fish Forest

4

slide-5
SLIDE 5

Neighborhood / weight matrix

5

slide-6
SLIDE 6

Moran’s I

morans_I = function(y, w) { n = length(y) y_bar = mean(y) num = sum(w * (y-y_bar) %*% t(y-y_bar)) denom = sum( (y-y_bar)^2 ) (n/sum(w)) * (num/denom) } morans_I(y = lip_cancer$Observed, w = diag(rowSums(W)) %*% W) ## [1] 0.2258758 morans_I(y = lip_cancer$Observed / lip_cancer$Expected, w = diag(rowSums(W)) %*% W) ## [1] 0.5323161 ape::Moran.I(lip_cancer$Observed / lip_cancer$Expected, weight = diag(rowSums(W)) %*% W) %>% str() ## List of 4 ## $ observed: num 0.666 ## $ expected: num -0.0182 ## $ sd : num 0.0784 ## $ p.value : num 0 6

slide-7
SLIDE 7

A hierachical model for lip cancer

We have observed counts of lip cancer for 56 districts in Scotland. Let yi represent the number of lip cancer for district i. yi ∼ Poisson(λi) log(λi) = log(Ei) + xiβ + ωi

ω ∼ N(0, σ2(D − ϕ W)−1)

where Ei is the expected counts for each region (and serves as an offet).

7

slide-8
SLIDE 8

Data prep & JAGS model

D = diag(rowSums(W)) X = model.matrix(~scale(lip_cancer$pcaff)) log_offset = log(lip_cancer$Expected) y = lip_cancer$Observed ## model{ ## for(i in 1:length(y)) { ## y[i] ~ dpois(lambda[i]) ## y_pred[i] ~ dpois(lambda[i]) ## log(lambda[i]) <- log_offset[i] + X[i,] %*% beta + omega[i] ## } ## ## for(i in 1:2) { ## beta[i] ~ dnorm(0,1) ## } ## ##

  • mega ~ dmnorm(rep(0,length(y)), tau * (D - phi*W))

## sigma2 <- 1/tau ## tau ~ dgamma(2, 2) ## phi ~ dunif(0,0.99) ## } 8

slide-9
SLIDE 9

Model Results

35000 40000 45000 50000 55000 60000 −0.4 0.2 Iterations

Trace of beta[1]

−0.5 0.0 0.5 0.0 1.0 2.0

Density of beta[1]

N = 1000 Bandwidth = 0.04924 35000 40000 45000 50000 55000 60000 0.0 0.2 0.4 Iterations

Trace of beta[2]

−0.1 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3

Density of beta[2]

N = 1000 Bandwidth = 0.02561

9

slide-10
SLIDE 10

35000 40000 45000 50000 55000 60000 0.4 0.7 1.0 Iterations

Trace of phi

0.4 0.5 0.6 0.7 0.8 0.9 1.0 4 8 12

Density of phi

N = 1000 Bandwidth = 0.01076 35000 40000 45000 50000 55000 60000 0.5 1.5 Iterations

Trace of sigma2

0.5 1.0 1.5 2.0 0.0 1.0

Density of sigma2

N = 1000 Bandwidth = 0.06232

10

slide-11
SLIDE 11

Predictions & Residuals

55°N 56°N 57°N 58°N 59°N 60°N 61°N 8°W 6°W 4°W 2°W 10 20 30

Predicted Cases

55°N 56°N 57°N 58°N 59°N 60°N 61°N 8°W 6°W 4°W 2°W −2 2

Residuals

11

slide-12
SLIDE 12

Residuals + RMSE + Moran’s I

#RMSE lip_cancer_pred$resid %>% .^2 %>% mean() %>% sqrt() ## [1] 1.498675 #Moran’s I morans_I(y = lip_cancer_pred$resid, w = diag(rowSums(W)) %*% W) ## [1] 0.05661104 ape::Moran.I(lip_cancer_pred$resid, weight = diag(rowSums(W)) %*% W) %>% str() ## List of 4 ## $ observed: num 0.0642 ## $ expected: num -0.0182 ## $ sd : num 0.0803 ## $ p.value : num 0.305

12

slide-13
SLIDE 13

Point Referenced Data

13

slide-14
SLIDE 14

Example - PM2.5 from CSN

The Chemical Speciation Network are a series of air quality monitors run by EPA (221 locations in 2007). We’ll look at a subset of the data from Nov 11th, 2007 (n=191) for just PM2.5.

25 30 35 40 45 50 −120 −100 −80

long lat pm25

10 20 30 40 10 20 30 40

pm25

14

slide-15
SLIDE 15

csn ## # A tibble: 191 × 5 ## site longitude latitude date pm25 ## <int> <dbl> <dbl> <dttm> <dbl> ## 1 10730023

  • 86.81500 33.55306 2007-11-14 19.43555

## 2 10732003

  • 86.92417 33.49972 2007-11-14 26.40000

## 3 10890014

  • 86.58637 34.68767 2007-11-14 13.40000

## 4 11011002

  • 86.25637 32.40712 2007-11-14 19.70000

## 5 11130001

  • 84.99917 32.47639 2007-11-14 22.60000

## 6 40139997 -112.09577 33.50383 2007-11-14 12.30000 ## 7 40191028 -110.98230 32.29515 2007-11-14 7.20000 ## 8 51190007

  • 92.28130 34.75619 2007-11-14 12.70000

## 9 60070002 -121.84222 39.75750 2007-11-14 10.00000 ## 10 60190008 -119.77222 36.78139 2007-11-14 32.26205 ## # ... with 181 more rows

15

slide-16
SLIDE 16

Aside - Splines

16

slide-17
SLIDE 17

Splines in 1d - Smoothing Splines

These are a mathematical analogue to the drafting splines represented using a penalized regression model. We want to find a function f x that best fits our observed data y y1 yn while being as smooth as possible. arg min

f x n i 1

yi f xi

2

f x 2 dx Interestingly, this minimization problem has an exact solution which is given by a mixture of weighted natural cubic splines (cubic splines that are linear in the tails) with knots at the observed data locations (xs).

17

slide-18
SLIDE 18

Splines in 1d - Smoothing Splines

These are a mathematical analogue to the drafting splines represented using a penalized regression model. We want to find a function f(x) that best fits our observed data y = y1, . . . , yn while being as smooth as possible. arg min

f(x) n

i=1

(yi − f(xi))2 + λ

f′′(x)2 dx Interestingly, this minimization problem has an exact solution which is given by a mixture of weighted natural cubic splines (cubic splines that are linear in the tails) with knots at the observed data locations (xs).

17

slide-19
SLIDE 19

Splines in 2d - Thin Plate Splines

Now imagine we have observed data of the form (xi, yi, zi) where we wish to predict zi given xi and yi for all i. We can naturally extend the smoothing spline model in two dimensions, arg min

f(x,y) n

i=1

(zi − f(xi, yi))2 + λ

∫ ∫ ( ∂2f

∂x2 + 2 ∂2f ∂x ∂y + ∂2f ∂y2

)

dx dy The solution to this equation has a natural representation using a weighted sum of radial basis functions with knots at the observed data locations f(x, y) =

n

i=1

wi d(xi, yi)2 log d(xi, yi).

18

slide-20
SLIDE 20

Fitting a TPS

library(fields) coords = select(csn, longitude, latitude) %>% as.matrix() tps = Tps(x = coords, Y=csn$pm25) pm25_pred = r pm25_pred[cells] = predict(tps, pred_coords) plot(pm25_pred) points(coords, pch=16, cex=0.5)

−120 −110 −100 −90 −80 −70 25 30 35 40 45 50 5 10 15 20 25 30 35

19

slide-21
SLIDE 21

Gaussin Process Models / Kriging

20

slide-22
SLIDE 22

Variogram

library(geoR) coords = csn %>% select(latitude, longitude) %>% as.matrix() d = dist(coords) %>% as.matrix() variog(coords = coords, data = csn$pm25, messages = FALSE, uvec = seq(0, max(d)/2, length.out=50)) %>% plot() 5 10 15 20 25 20 60 100 distance semivariance 21

slide-23
SLIDE 23

variog(coords = coords, data = csn$pm25, messages = FALSE, uvec = seq(0, max(d)/4, length.out=50)) %>% plot()

2 4 6 8 10 12 14 20 40 60 80 distance semivariance 22

slide-24
SLIDE 24

Isotropy / Anisotropy

v4 = variog4(coords = coords, data = csn$pm25, messages = FALSE, uvec = seq(0, max(d)/4, length.out = 50)) plot(v4) 2 4 6 8 10 12 14 20 40 60 80 100 120 distance semivariance 0° 45° 90° 135° 23

slide-25
SLIDE 25

GP Spatial Model

If we assume that our data is stationary and isotropic then we can use a Gaussian Process model to fit the data. We will assume an exponential covariance structure. y ∼ N(µ1, Σ)

{Σ}ij = σ2 exp(−r ∥si − sj∥) + σ2

n 1i=j

we can also view this as a spatial random effects model where y s s w s s w s si

2 n ij 2 exp

r si sj

24

slide-26
SLIDE 26

GP Spatial Model

If we assume that our data is stationary and isotropic then we can use a Gaussian Process model to fit the data. We will assume an exponential covariance structure. y ∼ N(µ1, Σ)

{Σ}ij = σ2 exp(−r ∥si − sj∥) + σ2

n 1i=j

we can also view this as a spatial random effects model where y(s) = µ(s) + w(s) + ϵ(s) w(s) ∼ N(0, Σ′)

ϵ(si) ∼ N(0, σ2

n)

{Σ′}ij = σ2 exp(−r ∥si − sj∥)

24

slide-27
SLIDE 27

Fitting with spBayes

library(spBayes) n = nrow(csn) n_samp = 20000 coords = select(csn, longitude, latitude) %>% as.matrix() max_range = max(dist(coords)) / 4 starting = list(phi = 3/3, sigma.sq = 33, tau.sq = 17) tuning = list(”phi”=0.1, ”sigma.sq”=0.1, ”tau.sq”=0.1) priors = list( beta.Norm = list(0, 1000), phi.Unif = c(3/max_range, 6), sigma.sq.IG = c(2, 2), tau.sq.IG = c(2, 2) )

25

slide-28
SLIDE 28

m = spLM(pm25 ~ 1, data = csn, coords = coords, starting = starting, priors = priors, cov.model = ”exponential”, n.samples = n_samp, tuning = tuning, n.report = n_samp/2) ## ---------------------------------------- ## General model description ## ---------------------------------------- ## Model fit with 191 observations. ## ## Number of covariates 1 (including intercept if specified). ## ## Using the exponential spatial correlation model. ## ## Number of MCMC samples 20000. ## ## Priors and hyperpriors: ## beta normal: ## mu: 0.000 ## cov: ## 1000.000 ## ## sigma.sq IG hyperpriors shape=2.00000 and scale=2.00000 ## tau.sq IG hyperpriors shape=2.00000 and scale=2.00000 ## phi Unif hyperpriors a=0.21888 and b=6.00000 ## ------------------------------------------------- ## Sampling ## ------------------------------------------------- ## Sampled: 10000 of 20000, 50.00% ## Report interval Metrop. Acceptance rate: 33.25% ## Overall Metrop. Acceptance rate: 33.25% ## ------------------------------------------------- ## Sampled: 20000 of 20000, 100.00% ## Report interval Metrop. Acceptance rate: 33.53% ## Overall Metrop. Acceptance rate: 33.39% ## ------------------------------------------------- 26

slide-29
SLIDE 29

m = spRecover(m, start=n_samp/2+1, thin = (n_samp/2)/1000) ## ------------------------------------------------- ## Recovering beta and w ## ------------------------------------------------- ## Sampled: 99 of 1000, 9.90% ## Sampled: 199 of 1000, 19.90% ## Sampled: 299 of 1000, 29.90% ## Sampled: 399 of 1000, 39.90% ## Sampled: 499 of 1000, 49.90% ## Sampled: 599 of 1000, 59.90% ## Sampled: 699 of 1000, 69.90% ## Sampled: 799 of 1000, 79.90% ## Sampled: 899 of 1000, 89.90% ## Sampled: 999 of 1000, 99.90% 27

slide-30
SLIDE 30

Parameter values

m$p.theta.recover.samples %>% mcmc() %>% plot()

200 400 600 800 1000 30 70 Iterations

Trace of sigma.sq

20 40 60 80 100 0.00 0.03

Density of sigma.sq

N = 1000 Bandwidth = 2.766 200 400 600 800 1000 10 20 Iterations

Trace of tau.sq

5 10 15 20 25 0.00 0.10

Density of tau.sq

N = 1000 Bandwidth = 0.802 200 400 600 800 1000 0.2 0.6 Iterations

Trace of phi

0.2 0.3 0.4 0.5 0.6 0.7 0.8 2 4

Density of phi

N = 1000 Bandwidth = 0.02228

28

slide-31
SLIDE 31

m$p.beta.recover.samples %>% mcmc() %>% plot() 200 400 600 800 1000 10 15 20 Iterations

Trace of (Intercept)

5 10 15 20 25 0.00 0.05 0.10 0.15 0.20 0.25

Density of (Intercept)

N = 1000 Bandwidth = 0.4045

29

slide-32
SLIDE 32

Predictions

m_pred = spPredict(m, pred_coords, pred.covars = matrix(1, nrow=nrow(pred_coords)), start=n_samp/2+1, thin=(n_samp/2)/1000) ## ---------------------------------------- ## General model description ## ---------------------------------------- ## Model fit with 191 observations. ## ## Prediction at 900 locations. ## ## Number of covariates 1 (including intercept if specified). ## ## Using the exponential spatial correlation model. ## ## ------------------------------------------------- ## Sampling ## ------------------------------------------------- ## Sampled: 100 of 1000, 9.90% ## Sampled: 200 of 1000, 19.90% ## Sampled: 300 of 1000, 29.90% ## Sampled: 400 of 1000, 39.90% ## Sampled: 500 of 1000, 49.90% ## Sampled: 600 of 1000, 59.90% ## Sampled: 700 of 1000, 69.90% ## Sampled: 800 of 1000, 79.90% ## Sampled: 900 of 1000, 89.90% ## Sampled: 1000 of 1000, 99.90% m_pred_summary = post_summary(t(m_pred$p.y.predictive.samples))

30

slide-33
SLIDE 33

splm_pm25_pred = r splm_pm25_pred[cells] = m_pred_summary$post_mean plot(splm_pm25_pred) points(coords, pch=16, cex=0.5) −120 −110 −100 −90 −80 −70 25 30 35 40 45 50 5 10 15 20 25 30 35

31

slide-34
SLIDE 34

JAGs Model

## model{ ## for(i in 1:length(y)){ ## y[i] ~ dnorm(beta + w[i], tau) ## mu_w[i] <- 0 ## } ## ## for(i in 1:length(y)){ ## for(j in 1:length(y)){ ## Sigma_w[i,j] <- sigma2_w * exp(-phi * d[i,j]) ## } ## } ## Sigma_w_inv <- inverse(Sigma_w) ## w ~ dmnorm(mu_w, Sigma_w_inv) ## ## beta ~ dnorm(0, 1/1000) ## sigma2_w ~ dgamma(2, 2) ## sigma2 ~ dgamma(2, 2) ## tau <- 1/sigma2 ## phi ~ dunif(3/14, 6) ## }

32

slide-35
SLIDE 35

10000 11000 12000 13000 14000 15000 10 12 14 Iterations

Trace of beta

10 12 14 16 0.0 0.2 0.4

Density of beta

N = 1000 Bandwidth = 0.2477 10000 11000 12000 13000 14000 15000 0.3 0.5 0.7 Iterations

Trace of phi

0.2 0.3 0.4 0.5 0.6 0.7 0.8 2 4

Density of phi

N = 1000 Bandwidth = 0.02054

33

slide-36
SLIDE 36

10000 11000 12000 13000 14000 15000 10 15 20 Iterations

Trace of sigma2

5 10 15 20 0.00 0.10

Density of sigma2

N = 1000 Bandwidth = 0.5785 10000 11000 12000 13000 14000 15000 15 25 Iterations

Trace of sigma2_w

10 15 20 25 30 0.00 0.08

Density of sigma2_w

N = 1000 Bandwidth = 0.7447

34