Lecture 18 Models for areal data Colin Rundel 03/22/2017 1 areal - - PowerPoint PPT Presentation

lecture 18
SMART_READER_LITE
LIVE PREVIEW

Lecture 18 Models for areal data Colin Rundel 03/22/2017 1 areal - - PowerPoint PPT Presentation

Lecture 18 Models for areal data Colin Rundel 03/22/2017 1 areal / lattice data 2 Example - NC SIDS 3 SID79 E I 2 E I 2 Some properties of Morans I (when there is no spatial autocorrelation): E I 1 n 1 Var I EDA - Morans


slide-1
SLIDE 1

Lecture 18

Models for areal data

Colin Rundel 03/22/2017

1

slide-2
SLIDE 2

areal / lattice data

2

slide-3
SLIDE 3

Example - NC SIDS SID79

3

slide-4
SLIDE 4

EDA - Moran’s I

If we have observations at n spatial locations (s1, . . . sn) I = n

∑n

i=1

∑n

j=1 wij

∑n

i=1

∑n

j=1 wij

(

y(si) − ¯ y

)(

y(sj) − ¯ y

) ∑n

i=1

(

y(si) − ¯ y

)

where w is a spatial weights matrix. Some properties of Moran’s I (when there is no spatial autocorrelation):

  • E I

1 n 1

  • Var I

E I2 E I 2 Something ugly but closed form

  • Asymptotically,

I E I Var I

0 1

4

slide-5
SLIDE 5

EDA - Moran’s I

If we have observations at n spatial locations (s1, . . . sn) I = n

∑n

i=1

∑n

j=1 wij

∑n

i=1

∑n

j=1 wij

(

y(si) − ¯ y

)(

y(sj) − ¯ y

) ∑n

i=1

(

y(si) − ¯ y

)

where w is a spatial weights matrix. Some properties of Moran’s I (when there is no spatial autocorrelation):

  • E(I) = −1/(n − 1)
  • Var(I) = E(I2) − E(I)2 = Something ugly but closed form
  • Asymptotically,

I−E(I)

Var(I) ∼ N(0, 1) 4

slide-6
SLIDE 6

NC SIDS & Moran’s I

Lets start by using an adjacency matrix for w (shared county borders).

morans_I = function(y, w) { n = length(y) y_bar = mean(y) num = sum(w * (y-y_bar) %*% t(y-y_bar)) denom = sum( (y-y_bar)^2 ) (n/sum(w)) * (num/denom) } morans_I(y = nc$SID74, w = 1*st_touches(nc, sparse=FALSE)) ## [1] 0.119089 library(ape) Moran.I(nc$SID74, weight = 1*st_touches(nc, sparse=FALSE)) %>% str() ## List of 4 ## $ observed: num 0.148 ## $ expected: num -0.0101 ## $ sd : num 0.0627 ## $ p.value : num 0.0118 5

slide-7
SLIDE 7

EDA - Geary’s C

Like Moran’s I, if we have observations at n spatial locations (s1, . . . sn) C = n − 1 2 ∑n

i=1

∑n

j=1 wij

∑n

i=1

∑n

j=1 wij

(

y(si) − y(sj)

)2 ∑n

i=1

(

y(si) − ¯ y

)

where w is a spatial weights matrix. Some properties of Geary’s C:

C 2

  • If C

1 then no spatial autocorrelation

  • If C

1 then negative spatial autocorrelation

  • If C

1 then positive spatial autocorrelation

  • Geary’s C is inversely related to Moran’s I

6

slide-8
SLIDE 8

EDA - Geary’s C

Like Moran’s I, if we have observations at n spatial locations (s1, . . . sn) C = n − 1 2 ∑n

i=1

∑n

j=1 wij

∑n

i=1

∑n

j=1 wij

(

y(si) − y(sj)

)2 ∑n

i=1

(

y(si) − ¯ y

)

where w is a spatial weights matrix. Some properties of Geary’s C:

  • 0 < C < 2
  • If C ≈ 1 then no spatial autocorrelation
  • If C > 1 then negative spatial autocorrelation
  • If C < 1 then positive spatial autocorrelation
  • Geary’s C is inversely related to Moran’s I

6

slide-9
SLIDE 9

NC SIDS & Geary’s C

Again using an adjacency matrix for w (shared county borders).

gearys_C = function(y, w) { n = length(y) y_bar = mean(y) y_i = y %*% t(rep(1,n)) y_j = t(y_i) num = sum(w * (y_i-y_j)^2) denom = sum( (y-y_bar)^2 ) ((n-1)/(2*sum(w))) * (num/denom) } gearys_C(y = nc$SID74, w = 1*st_touches(nc, sparse=FALSE)) ## [1] 0.8898868 7

slide-10
SLIDE 10

Spatial Correlogram

d = nc %>% st_centroid() %>% st_distance() %>% strip_class() breaks = seq(0, max(d), length.out = 21) d_cut = cut(d, breaks) adj_mats = map( levels(d_cut), function(l) { (d_cut == l) %>% matrix(ncol=100) %>% ‘diag<-‘(0) } ) d = data_frame( dist = breaks[-1], morans = map_dbl(adj_mats, morans_I, y = nc$SID74), gearys = map_dbl(adj_mats, gearys_C, y = nc$SID74) ) 8

slide-11
SLIDE 11

morans gearys 2e+05 4e+05 6e+05 2e+05 4e+05 6e+05 0.0 0.5 1.0 0.0 0.2 0.4 0.6

dist value var

morans gearys

9

slide-12
SLIDE 12

0.0 0.5 1.0 0.0 0.2 0.4 0.6

morans gearys

10

slide-13
SLIDE 13

Autoregressive Models

11

slide-14
SLIDE 14

AR Models - Time

Lets just focus on the simplest case, an AR(1) process yt = δ + ϕ yt−1 + wt where wt ∼ N(0, σ2) and |ϕ| < 1, then E(yt) =

δ

1 − ϕ Var(yt) =

σ2

1 − ϕ

12

slide-15
SLIDE 15

AR Models - Time - Joint Distribution

Previously we saw that an AR(1) model can be represented using a multivariate normal distribution

     

y1 y2 . . . yn

     

∼ N

     

δ

1−ϕ

     

1 1 . . . 1

     

,

σ2

1−ϕ

     

1

ϕ · · · ϕn−1 ϕ

1

· · · ϕn−2

. . . . . . ... . . .

ϕn−1 ϕn−2 · · ·

1

           

In writing down the likelihood we also saw that an AR 1 is 1st order Markovian, f y1 yn f y1 f y2 y1 f y3 y2 y1 f yn yn

1 yn 2

y1 f y1 f y2 y1 f y3 y2 f yn yn

1 13

slide-16
SLIDE 16

AR Models - Time - Joint Distribution

Previously we saw that an AR(1) model can be represented using a multivariate normal distribution

     

y1 y2 . . . yn

     

∼ N

     

δ

1−ϕ

     

1 1 . . . 1

     

,

σ2

1−ϕ

     

1

ϕ · · · ϕn−1 ϕ

1

· · · ϕn−2

. . . . . . ... . . .

ϕn−1 ϕn−2 · · ·

1

           

In writing down the likelihood we also saw that an AR(1) is 1st order Markovian, f(y1, . . . , yn) = f(y1) f(y2|y1) f(y3|y2, y1) · · · f(yn|yn−1, yn−2, . . . , y1)

= f(y1) f(y2|y1) f(y3|y2) · · · f(yn|yn−1)

13

slide-17
SLIDE 17

Competing Definitions for yt

yt = δ + ϕ yt−1 + wt vs. yt|yt−1 ∼ N(δ + ϕ yt−1, σ2) In the case of time, both of these definitions result in the same multivariate distribution for y.

14

slide-18
SLIDE 18

Competing Definitions for yt

yt = δ + ϕ yt−1 + wt vs. yt|yt−1 ∼ N(δ + ϕ yt−1, σ2) In the case of time, both of these definitions result in the same multivariate distribution for y.

14

slide-19
SLIDE 19

AR in Space

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10

Even in the simplest spatial case there is no clear / unique ordering,

f y s1 y s10 f y s1 f y s2 y s1 f y s10 y s9 y s8 y s1 f y s10 f y s9 y s10 f y s1 y s2 y s3 y s10

Instead we need to think about things in terms of their neighbors /

  • neighborhoods. We will define N si to be the set of neighbors of location si.
  • If we define the neighborhood based on “touching” then

N s3 s2 s4

  • If we use distance within 2 units then N s3

s1 s2 s3 s4

  • etc.

15

slide-20
SLIDE 20

AR in Space

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10

Even in the simplest spatial case there is no clear / unique ordering,

f( y(s1), . . . , y(s10))

= f(

y(s1)) f( y(s2)|y(s1))

· · · f(

y(s10|y(s9), y(s8), . . . , y(s1))

= f(

y(s10)) f( y(s9)|y(s10))

· · · f(

y(s1|y(s2), y(s3), . . . , y(s10))

= ? Instead we need to think about things in terms of their neighbors /

  • neighborhoods. We will define N si to be the set of neighbors of location si.
  • If we define the neighborhood based on “touching” then

N s3 s2 s4

  • If we use distance within 2 units then N s3

s1 s2 s3 s4

  • etc.

15

slide-21
SLIDE 21

AR in Space

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10

Even in the simplest spatial case there is no clear / unique ordering,

f( y(s1), . . . , y(s10))

= f(

y(s1)) f( y(s2)|y(s1))

· · · f(

y(s10|y(s9), y(s8), . . . , y(s1))

= f(

y(s10)) f( y(s9)|y(s10))

· · · f(

y(s1|y(s2), y(s3), . . . , y(s10))

= ? Instead we need to think about things in terms of their neighbors /

  • neighborhoods. We will define N(si) to be the set of neighbors of location si.
  • If we define the neighborhood based on “touching” then

N(s3) = {s2, s4}

  • If we use distance within 2 units then N(s3) = {s1, s2, s3, s4}
  • etc.

15

slide-22
SLIDE 22

Defining the Spatial AR model

Here we will consider a simple average of neighboring observations, just like with the temporal AR model we have two options in terms of defining the autoregressive process,

  • Simultaneous Autogressve (SAR)

y(s) = δ + ϕ 1

|N(s)|

s′∈N(s)

y(s′) + N(0, σ2)

  • Conditional Autoregressive (CAR)

y(s)|y−s ∼ N

 δ + ϕ

1

|N(s)|

s′∈N(s)

y(s′), σ2

 

16

slide-23
SLIDE 23

Simultaneous Autogressve (SAR)

Using y(s) = δ + ϕ 1

|N(s)|

s′∈N(s)

y(s′) + N(0, σ2) we want to find the distribution of y =

(

y(s1), y(s2), . . . , y(sn)

)t

. First we need to define a weight matrix W where W

ij

1 N si if j N si

  • therwise

then we can write y as follows, y W y where

2 I 17

slide-24
SLIDE 24

Simultaneous Autogressve (SAR)

Using y(s) = δ + ϕ 1

|N(s)|

s′∈N(s)

y(s′) + N(0, σ2) we want to find the distribution of y =

(

y(s1), y(s2), . . . , y(sn)

)t

. First we need to define a weight matrix W where

{W}ij =

{

1/|N(si)| if j ∈ N(si)

  • therwise

then we can write y as follows, y W y where

2 I 17

slide-25
SLIDE 25

Simultaneous Autogressve (SAR)

Using y(s) = δ + ϕ 1

|N(s)|

s′∈N(s)

y(s′) + N(0, σ2) we want to find the distribution of y =

(

y(s1), y(s2), . . . , y(sn)

)t

. First we need to define a weight matrix W where

{W}ij =

{

1/|N(si)| if j ∈ N(si)

  • therwise

then we can write y as follows, y = δ + ϕ W y + ϵ where

ϵ ∼ N(0, σ2 I)

17

slide-26
SLIDE 26

A toy example

W =

  

1/2 1/2 1/2 1/2 1/2 1/2

  

18

slide-27
SLIDE 27

Back to SAR

y = δ + ϕ W y + ϵ

19

slide-28
SLIDE 28

Conditional Autogressve (CAR)

This is a bit trickier, in the case of the temporal AR process we actually went from joint distribution → conditional distributions (which we were then able to simplify). Since we don’t have a natural ordering we can’t get away with this (at least not easily). Going the other way, conditional distributions → joint distribution is difficult because it is possible to specify conditional distributions that lead to an improper joint distribution.

20

slide-29
SLIDE 29

Brook’s Lemma

For sets of observations x and y where p(x) > 0 ∀ x ∈ x and p(y) > 0 ∀ y ∈ y then p(y) p(x) =

n

i=1

p(yi | y1, . . . , yi−1, xi+1, . . . , xn) p(xi | x1, . . . , xi−1, yi+1, . . . , yn)

=

n

i=1

p(yi | x1, . . . , xi−1, yi+1, . . . , yn) p(xi | y1, . . . , yi−1, xi+1, . . . , xn)

21

slide-30
SLIDE 30

A simplified example

Let y = (y1, y2) and x = (x1, x2) then we can derive Brook’s Lemma for this case, p(y1, y2) = p(y1|y2)p(y2)

= p(y1|y2)

p(y2|x1) p(x1) p(x1|y2)

=

p(y1|y2) p(x1|y2) p(y2|x1) p(x1)

=

p(y1|y2) p(x1|y2) p(y2|x1) p(x1)

(p(x2|x1)

p(x2|x1)

)

=

p(y1|y2) p(x1|y2) p(y2|x1) p(x2|x1) p(x1, x2) p(y1, y2) p(x1, x2) = p(y1|y2) p(x1|y2) p(y2|x1) p(x2|x1)

22

slide-31
SLIDE 31

Utility?

Lets repeat that last example but consider the case where y = (y1, y2) but now we let x = (y1 = 0, y2 = 0) p(y1, y2) p(x1, x2) = p(y1, y2) p(y1 = 0, y2 = 0) p(y1, y2) = p(y1|y2) p(y1 = 0|y2) p(y2|y1 = 0) p(y2 = 0|y1 = 0) p(y1 = 0, y2 = 0) p(y1, y2) ∝ p(y1|y2) p(y2|y1 = 0) p(y1 = 0|y2)

p(y2|y1) p(y1|y2 = 0) p(y2 = 0|y1)

23

slide-32
SLIDE 32

As applied to a simple CAR

s1 s2

y(s1)|y(s2) ∼ N(ϕW12 y(s2), σ2) y(s2)|y(s1) ∼ N(ϕW21 y(s1), σ2) p y s1 y s2 p y s1 y s2 p y s2 y s1 p y s1 0 y s2 exp

1 2

2

y s1 W12 y s2

2

exp

1 2

2

y s2 W21 0 2 exp

1 2

2

W12y s2

2

exp 1 2

2

y s1 W12 y s2

2

y s2

2

W12y s2

2

exp 1 2

2

y s1

2

2 W12 y s1 y s2 y s2

2

exp 1 2

2 y

1 W12 W12 1 y 0 t 24

slide-33
SLIDE 33

As applied to a simple CAR

s1 s2

y(s1)|y(s2) ∼ N(ϕW12 y(s2), σ2) y(s2)|y(s1) ∼ N(ϕW21 y(s1), σ2) p( y(s1), y(s2))

p( y(s1)|y(s2)) p( y(s2)|y(s1) = 0) p( y(s1) = 0|y(s2))

exp(

1 2σ2 (y(s1) − ϕ W12 y(s2))2)

exp(

1 2σ2 (y(s2) − ϕ W21 0)2)

exp(

1 2σ2 (0 − ϕW12y(s2))2)

∝ exp

(

1 2σ2

(

(y(s1) − ϕ W12 y(s2))2 + y(s2)2 − (ϕW12y(s2))2)) ∝ exp

(

1 2σ2

(

y(s1)2 − 2ϕ W12 y(s1) y(s2) + y(s2)2))

∝ exp

(

1 2σ2 (y − 0)

(

1

−ϕW12 −ϕW12

1

)

(y − 0)t)

24

slide-34
SLIDE 34

Implicatiomns for y

µ = 0 Σ−1 =

1

σ2

(

1

−ϕW12 −ϕW12

1

)

=

1

σ2 (I − ϕ W) Σ = σ2(I − ϕ W)−1

we can then conclude that for y y s1 y s2

t,

y

2 I

W

1

which generalizes for all mean 0 CAR models.

25

slide-35
SLIDE 35

Implicatiomns for y

µ = 0 Σ−1 =

1

σ2

(

1

−ϕW12 −ϕW12

1

)

=

1

σ2 (I − ϕ W) Σ = σ2(I − ϕ W)−1

we can then conclude that for y = (y(s1), y(s2))t, y ∼ N

(

0, σ2(I − ϕ W)−1) which generalizes for all mean 0 CAR models.

25

slide-36
SLIDE 36

General Proof

Let y = (y(s1), . . . , y(sn)) and 0 = (y(s1) = 0, . . . , y(sn) = 0) then by Brook’s lemma,

p(y) p(0) =

n

i=1

p(yi|y1, . . . , yi−1, 0i+1, . . . , 0n) p(0i|y1, . . . , yi−1, 0i+1, . . . , 0n)

=

n

i=1

exp

(

1 2σ2

(

yi − ϕ∑

j<i Wij yj − ϕ∑ j>i 0j

)2)

exp

(

1 2σ2

(

0i − ϕ∑

j<i Wij yj − ϕ∑ j>i 0j

)2)

= exp

(

1 2σ2

n

i=1

(

yi − ϕ

j<i

Wij yj

)2

+

1 2σ2

n

i=1

(

ϕ

j<i

Wij yj

)2)

= exp

(

1 2σ2

n

i=1

y2

i − 2ϕyi

j<i

Wij yj

)

= exp

(

1 2σ2

n

i=1

y2

i − ϕ n

i=1 n

j=1

yi Wij yj

) (

if Wij = Wji

)

= exp

(

1 2σ2 yt(I − ϕW)y

)

26

slide-37
SLIDE 37

CAR vs SAR

  • Simultaneous Autogressve (SAR)

y(s) = ϕ

s′

Ws s′ Ws · y(s′) + ϵ y ∼ N(0, σ2 ((I − ϕW)−1)((I − ϕW)−1)t)

  • Conditional Autoregressive (CAR)

y(s)|y−s ∼ N

(∑

s′

Ws s′ Ws · y(s′), σ2

)

y ∼ N(0, σ2 (I − ϕW)−1)

27

slide-38
SLIDE 38

Generalization

  • Adopting different weight matrices, W
  • Between SAR and CAR model we move to a generic weight matrix

definition (beyond average of nearest neighbors)

  • In time we varied p in the AR(p) model, in space we adjust the weight

matrix.

  • In general having a symmetric W is helpful, but not required
  • More complex Variance (beyond σ2 I)
  • σ2 can be a vector (differences between areal locations)
  • E.g. since areal data tends to be aggregated - adjust variance based on

sample size

28