Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 - - PowerPoint PPT Presentation

lecture 18
SMART_READER_LITE
LIVE PREVIEW

Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 - - PowerPoint PPT Presentation

Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 Fitting areal models Revised SAR Model Formula Model ( ) = + =1 1 (, 2 1 )


slide-1
SLIDE 1

Lecture 18

Fitting CAR and SAR Models

Colin Rundel 11/07/2018

1

slide-2
SLIDE 2

Fitting areal models

slide-3
SLIDE 3

Revised SAR Model

  • Formula Model

𝑧(𝑡𝑗) = 𝑌𝑗⋅𝛾 + 𝜚

𝑜

𝑘=1

𝐸−1

𝑘𝑘 𝐵𝑗𝑘 (𝑧(𝑡𝑘) − 𝑌𝑘⋅𝛾) + 𝜗𝑗

𝝑 ∼ 𝒪(𝟏, 𝜏2𝐄−1)

  • Joint Model

𝐳 ∼ 𝒪 (𝐘𝜸, (𝐉 − 𝜚𝐄−1𝐁)−1𝜏2𝐄−1((𝐉 − 𝜚𝐄−1𝐁)−1)

𝑢)

2

slide-4
SLIDE 4

Revised CAR Model

  • Conditional Model

𝑧(𝑡𝑗)|𝐳−𝑡𝑗 ∼ 𝒪 (𝑌𝑗⋅𝛾 + 𝜚

𝑜

𝑘=1

𝐵𝑗𝑘 𝐸𝑗𝑗 (𝑧(𝑡𝑘) − 𝑌𝑘⋅𝛾), 𝜏2𝐸−1

𝑗𝑗 )

  • Joint Model

𝐳 ∼ 𝒪(𝐘𝜸, 𝜏2(𝐄 − 𝜚𝐁)−1)

3

slide-5
SLIDE 5

Example - NC SIDS

34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 5000 10000 15000 20000

BIR74

10 20 30 40

SID74

4

slide-6
SLIDE 6

ggplot() + geom_sf(data=nc, aes(fill=SID74/BIR74*1000))

34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 0.0 2.5 5.0 7.5

SID74/BIR74 * 1000 5

slide-7
SLIDE 7

Using spautolm from spdep

library(spdep) A = st_touches(nc, sparse=FALSE) listW = mat2listw(A) listW ## Characteristics of weights list object: ## Neighbour list object: ## Number of regions: 100 ## Number of nonzero links: 490 ## Percentage nonzero weights: 4.9 ## Average number of links: 4.9 ## ## Weights style: M ## Weights constants summary: ## n nn S0 S1 S2 ## M 100 10000 490 980 10696

6

slide-8
SLIDE 8

nc_coords = nc %>% st_centroid() %>% st_coordinates() plot(st_geometry(nc)) plot(listW, nc_coords, add=TRUE, col=”blue”, pch=16)

7

slide-9
SLIDE 9

Moran’s I

spdep::moran.test(nc$SID74, listW) ## ## Moran I test under randomisation ## ## data: nc$SID74 ## weights: listW ## ## Moran I statistic standard deviate = 2.1707, p-value = 0.01498 ## alternative hypothesis: greater ## sample estimates: ## Moran I statistic Expectation Variance ## 0.119089049

  • 0.010101010

0.003542176 spdep::moran.test(1000*nc$SID74/nc$BIR74, listW) ## ## Moran I test under randomisation ## ## data: 1000 * nc$SID74/nc$BIR74 ## weights: listW ## ## Moran I statistic standard deviate = 3.6355, p-value = 0.0001387 ## alternative hypothesis: greater ## sample estimates: ## Moran I statistic Expectation Variance ## 0.210046454

  • 0.010101010

0.003666802

8

slide-10
SLIDE 10

Geary’s C

spdep::geary.test(nc$SID74, listW) ## ## Geary C test under randomisation ## ## data: nc$SID74 ## weights: listW ## ## Geary C statistic standard deviate = 0.91949, p-value = 0.1789 ## alternative hypothesis: Expectation greater than statistic ## sample estimates: ## Geary C statistic Expectation Variance ## 0.88988684 1.00000000 0.01434105 spdep::geary.test(nc$SID74/nc$BIR74, listW) ## ## Geary C test under randomisation ## ## data: nc$SID74/nc$BIR74 ## weights: listW ## ## Geary C statistic standard deviate = 3.0989, p-value = 0.0009711 ## alternative hypothesis: Expectation greater than statistic ## sample estimates: ## Geary C statistic Expectation Variance ## 0.67796679 1.00000000 0.01079878

9

slide-11
SLIDE 11

CAR Model

nc_car = spautolm(formula = SID74/BIR74 ~ 1, data = nc, listw = listW, family = ”CAR”) summary(nc_car) ## ## Call: spautolm(formula = SID74/BIR74 ~ 1, data = nc, listw = listW, ## family = ”CAR”) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.00213872 -0.00083535 -0.00022355 0.00055014 0.00768640 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.00200203 0.00024272 8.2484 2.22e-16 ## ## Lambda: 0.13062 LR test value: 8.6251 p-value: 0.0033157 ## Numerical Hessian standard error of lambda: 0.030472 ## ## Log likelihood: 508.3767 ## ML residual variance (sigma squared): 2.1205e-06, (sigma: 0.0014562) ## Number of observations: 100 ## Number of parameters estimated: 3 ## AIC: -1010.8 10

slide-12
SLIDE 12

SAR Model

nc_sar = spautolm(formula = SID74/BIR74 ~ 1, data = nc, listw = listW, family = ”SAR”) summary(nc_sar) ## ## Call: spautolm(formula = SID74/BIR74 ~ 1, data = nc, listw = listW, ## family = ”SAR”) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.00209307 -0.00087039 -0.00020274 0.00051156 0.00762830 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.00201084 0.00023622 8.5127 < 2.2e-16 ## ## Lambda: 0.079934 LR test value: 8.8911 p-value: 0.0028657 ## Numerical Hessian standard error of lambda: 0.0246 ## ## Log likelihood: 508.5097 ## ML residual variance (sigma squared): 2.1622e-06, (sigma: 0.0014704) ## Number of observations: 100 ## Number of parameters estimated: 3 ## AIC: -1011 11

slide-13
SLIDE 13

Predictions

34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035

car_pred

0.0015 0.0020 0.0025 0.0030

sar_pred

12

slide-14
SLIDE 14

Residuals

34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 0.0000 0.0025 0.0050 0.0075

car_resid

0.0000 0.0025 0.0050 0.0075

sar_resid

13

slide-15
SLIDE 15

Predicted vs Observed

0.001 0.002 0.003 0.0015 0.0020 0.0025 0.0030 0.0000 0.0025 0.0050 0.0075 0.0100 0.0000 0.0025 0.0050 0.0075 0.0100

SID74/BIR74 SID74/BIR74 car_pred sar_pred 14

slide-16
SLIDE 16

What’s wrong?

Histogram of nc$SID74

nc$SID74 Frequency 10 20 30 40 10 20 30 40 50 60

Histogram of nc$SID74/nc$BIR74

nc$SID74/nc$BIR74 Frequency 0.000 0.004 0.008 5 10 15 20 25 30 35 −2 −1 1 2 −0.002 0.002 0.006

CAR Residuals

Theoretical Quantiles Sample Quantiles

15

slide-17
SLIDE 17

Comparing CAR vs SAR.

−0.002 0.000 0.002 0.004 0.006 0.008 −0.002 0.002 0.006

CAR vs SAR Residuals

nc$car_resid nc$sar_resid

16

slide-18
SLIDE 18

Stan CAR Model

car_model = ” data { int<lower=0> N; vector[N] y; matrix[N,N] A; matrix[N,N] D; } parameters { vector[N] w_s; real beta; real<lower=0> sigma2; real<lower=0> sigma2_w; real<lower=0,upper=1> phi; } transformed parameters { vector[N] y_pred = beta + w_s; } model { matrix[N,N] Sigma_inv = (D - phi * A) / sigma2; w_s ~ multi_normal_prec(rep_vector(0,N), Sigma_inv); beta ~ normal(0,10); sigma2 ~ cauchy(0,5); sigma2_w ~ cauchy(0,5); y ~ normal(beta+w_s, sigma2_w); } ” 17

slide-19
SLIDE 19

data = list( N = nrow(nc), y = nc$SID74 / nc$BIR74, A = A * 1, D = diag(rowSums(A)) ) car_fit = rstan::stan( model_code = car_model, data = data, iter = 10000, chains = 1, thin=20 )

Why don’t we use the conditional definition for the 𝑧’s?

18

slide-20
SLIDE 20

data = list( N = nrow(nc), y = nc$SID74 / nc$BIR74, A = A * 1, D = diag(rowSums(A)) ) car_fit = rstan::stan( model_code = car_model, data = data, iter = 10000, chains = 1, thin=20 )

Why don’t we use the conditional definition for the 𝑧’s?

18

slide-21
SLIDE 21

Model Results

beta phi sigma2 sigma2_w 50 100 150 200 250 0.0015 0.0020 0.0025 0.0030 0.4 0.6 0.8 1.0 5e−06 1e−05 0e+00 5e−04 1e−03

.iteration estimate 19

slide-22
SLIDE 22

Predictions

34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 0.002 0.004 0.006 0.008

bayes_car_pred

0e+00 5e−04 1e−03

bayes_car_resid 20

slide-23
SLIDE 23

0.000 0.001 0.002 0.003 0.000 0.001 0.002 0.003

car_pred bayes_car_pred 21

slide-24
SLIDE 24

Brief Aside - SAR Precision Matrix

Σ𝑇𝐵𝑆 = (𝐉 − 𝜚𝐄−1 𝐁)−1𝜏2 𝐄−1 ((𝐉 − 𝜚𝐄−1 𝐁)−1)

𝑢

Σ−1

𝑇𝐵𝑆 = ((𝐉 − 𝜚𝐄−1 𝐁)−1𝜏2 𝐄−1 ((𝐉 − 𝜚𝐄−1 𝐁)−1) 𝑢) −1

= (((𝐉 − 𝜚𝐄−1 𝐁)−1)

𝑢) −1 1

𝜏2 𝐄 (𝐉 − 𝜚𝐄−1 𝐁) = 1 𝜏2 (𝐉 − 𝜚𝐄−1 𝐁)𝑢 𝐄 (𝐉 − 𝜚𝐄−1 𝐁)

22

slide-25
SLIDE 25

Jags SAR Model

sar_model = ” data { int<lower=0> N; vector[N] y; matrix[N,N] W_tilde; matrix[N,N] D; } transformed data { matrix[N,N] I = diag_matrix(rep_vector(1, N)); } parameters { vector[N] w_s; real beta; real<lower=0> sigma2; real<lower=0> sigma2_w; real<lower=0,upper=1> phi; } transformed parameters { vector[N] y_pred = beta + w_s; } model { matrix[N,N] C = I - phi * W_tilde; matrix[N,N] Sigma_inv = C' * D * C / sigma2; w_s ~ multi_normal_prec(rep_vector(0,N), Sigma_inv); beta ~ normal(0,10); sigma2 ~ cauchy(0,5); sigma2_w ~ cauchy(0,5); y ~ normal(beta + w_s, sigma2_w); } ” D = diag(rowSums(A)) D_inv = diag(1/diag(D)) data = list( N = nrow(nc), y = nc$SID74 / nc$BIR74, x = rep(1, nrow(nc)), D_inv = D_inv, W_tilde = D_inv %*% A ) sar_fit = rstan::stan( model_code = sar_model, data = data, iter = 10000, chains = 1, thin=20 )

23

slide-26
SLIDE 26

Model Results

beta phi sigma2 sigma2_w 50 100 150 200 250 0.0012 0.0016 0.0020 0.0024 0.0028 0.0032 0.25 0.50 0.75 5e−06 1e−05 0.0000 0.0005 0.0010 0.0015

.iteration estimate 24

slide-27
SLIDE 27

Predictions

34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 0.002 0.004 0.006 0.008

bayes_sar_pred

−0.0005 0.0000 0.0005 0.0010 0.0015

bayes_sar_resid 25

slide-28
SLIDE 28

0.000 0.001 0.002 0.003 0.000 0.001 0.002 0.003

sar_pred bayes_sar_pred 26

slide-29
SLIDE 29

Comparing Predictions

# RMSE sqrt(mean(nc$bayes_car_resid^2)) ## [1] 0.0002092447 sqrt(mean(nc$bayes_sar_resid^2)) ## [1] 0.0002983034 sqrt(mean(nc$car_resid^2)) ## [1] 0.001448564 sqrt(mean(nc$sar_resid^2)) ## [1] 0.001470432

27

slide-30
SLIDE 30

Comparing Parameters

sigma2 sigma2_w beta phi 2.5e−06 5.0e−06 7.5e−06 1.0e−05 0.00000 0.00025 0.00050 0.00075 0.00100 0.00125 0.0015 0.0020 0.0025 0.2 0.4 0.6 0.8

estimate model

Stan CAR Stan SAR

28

slide-31
SLIDE 31

CAR vs SAR?

29

slide-32
SLIDE 32

Transforming the data

slide-33
SLIDE 33

Freeman-Tukey’s transformation

This is the transformation used by Cressie and Road in Spatial Data Analysis

  • f Regional Counts (1989).

𝐺𝑈 = √ 1000 (√ 𝑇𝐽𝐸74 𝐶𝐽𝑆74 + √𝑇𝐽𝐸74 + 1 𝐶𝐽𝑆74 ) Histogram of nc$FT

Frequency 0.0 1.0 2.0 3.0 10 20 30 40 −2 −1 1 2 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Normal Q−Q Plot

Sample Quantiles

30

slide-34
SLIDE 34

Other options

nc = mutate(nc, sqrt = sqrt((SID74+1)/BIR74), log = log((SID74+1)/BIR74), )

Histogram of nc$sqrt

nc$sqrt Frequency 0.02 0.04 0.06 0.08 0.10 15 30 −2 −1 1 2 0.04 0.10

Normal Q−Q Plot

Theoretical Quantiles Sample Quantiles

Histogram of nc$log

nc$log Frequency −7.5 −6.5 −5.5 −4.5 15 30 −2 −1 1 2 −7.0 −5.0

Normal Q−Q Plot

Theoretical Quantiles Sample Quantiles 31

slide-35
SLIDE 35

log transformation

34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W −7.0 −6.5 −6.0 −5.5 −5.0

log ## ## Moran I test under randomisation ## ## data: nc$log ## weights: listW ## ## Moran I statistic standard deviate = 4.9895, p-value = 3.027e-07 ## alternative hypothesis: greater ## sample estimates: ## Moran I statistic Expectation Variance ## 0.299245438

  • 0.010101010

0.003843927 32

slide-36
SLIDE 36

log CAR Model

car_model = ” data { int<lower=0> N; vector[N] y; matrix[N,N] A; matrix[N,N] D; } parameters { vector[N] w_s; real beta; real<lower=0> sigma2; real<lower=0> sigma2_w; real<lower=0,upper=1> phi; } transformed parameters { vector[N] y_pred = beta + w_s; } model { matrix[N,N] Sigma_inv = (D - phi * A) / sigma2; w_s ~ multi_normal_prec(rep_vector(0,N), Sigma_inv); beta ~ normal(0,10); sigma2 ~ cauchy(0,5); sigma2_w ~ cauchy(0,5); y ~ normal(beta+w_s, sigma2_w); } ” data = list( N = nrow(nc), y = nc$log, x = rep(1, nrow(nc)), A = A * 1, D = diag(rowSums(A)) ) car_log_fit = rstan::stan( model_code = car_model, data = data, iter = 10000, thin=20, chains = 2, cores = 2 )

33

slide-37
SLIDE 37

Chains

sigma2 sigma2_w beta phi 50 100 150 200 250 50 100 150 200 250 0.4 0.6 0.8 1.0 0.1 0.2 0.3 0.4 0.5 −8 −7 −6 −5 0.5 1.0 1.5

.iteration .value .chain

1 2

34

slide-38
SLIDE 38

Posteriors

sigma2 sigma2_w beta phi 0.5 1.0 1.5 0.1 0.2 0.3 0.4 0.5 −8 −7 −6 −5 0.4 0.6 0.8 1.0 1 2 3 4 5 1 2 3 4 5

.value density .chain

1 2

35

slide-39
SLIDE 39

Predictions

34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W −6.5 −6.0 −5.5

car_log_pred_post_mean

0.002 0.003 0.004 0.005 0.006

car_pred_post_mean 36

slide-40
SLIDE 40

Predicted vs Observed

post_pred_mean

  • bs_log_rate

84°W 82°W 80°W 78°W 76°W 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N −7.0 −6.5 −6.0 −5.5 −5.0

value

37

slide-41
SLIDE 41

Predicted vs Observed (cont)

post_pred_mean

  • bs_rate

84°W 82°W 80°W 78°W 76°W 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 0.0025 0.0050 0.0075 0.0100

value

38

slide-42
SLIDE 42

Model Fit

−6.5 −6.0 −5.5 0.0025 0.0050 0.0075 0.0100

(SID74 + 1)/BIR74 car_log_pred_post_mean 39

slide-43
SLIDE 43

Model Fit (cont.)

−6.5 −6.0 −5.5 −7.0 −6.5 −6.0 −5.5 −5.0 −4.5

log((SID74 + 1)/BIR74) car_log_pred_post_mean 40

slide-44
SLIDE 44

Model Fit (cont.)

0.001 0.002 0.003 0.004 0.005 0.006 0.0025 0.0050 0.0075 0.0100

(SID74 + 1)/BIR74 car_pred_post_mean 41

slide-45
SLIDE 45

Residuals

nc = mutate(nc, car_log_resid = car_log_pred_post_mean - log((SID74+1)/BIR74)) ggplot(nc, aes(fill=car_log_resid)) + geom_sf()

34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 84°W 82°W 80°W 78°W 76°W −0.6 −0.4 −0.2 0.0 0.2

car_log_resid spdep::moran.test(nc$car_log_resid, listW) ## ## Moran I test under randomisation ## ## data: nc$car_log_resid ## weights: listW ## ## Moran I statistic standard deviate = -1.1361, p-value = 0.872 ## alternative hypothesis: greater ## sample estimates: ## Moran I statistic Expectation Variance ##

  • 0.080372616
  • 0.010101010

0.003826085 42