[PPT] - Areal Unit Data Regular Grids or Lattices Large Point-referenced PowerPoint Presentation

SLIDE 1

Areal Unit Data

Regular Grids or Lattices Large Point-referenced Datasets

Is there spatial pattern? We usually want to smooth the data. But, a tricky question. Inference for new areal units? Descriptive/algorithmic vs. Model-based

Basics of Areal Data Models – p. 1

SLIDE 2

Areal unit data

Basics of Areal Data Models – p. 2

SLIDE 3

Proximity matrices

W, entries wij (with wii = 0). Choices for wij: wij = 1 if i, j share a common boundary (possibly a

common vertex)

wij is an inverse distance between units wij = 1 if distance between units is ≤ K wij = 1 for m nearest neighbors. W is typically symmetric, but need not be

W: standardize row i by wi+ =

j wij (so matrix is now

row stochastic, but probably no longer symmetric).

W elements often called “weights”; interpretation

Could also define first-order neighbors W (1), second-order neighbors W (2), etc.

Basics of Areal Data Models – p. 3

SLIDE 4

Measures of spatial association

Moran’s I: essentially an “areal covariogram"

I = n

i

j wij(Yi − ¯

Y )(Yj − ¯ Y ) (

i=j wij) i(Yi − ¯

Y )2

Geary’s C: essentially an “areal variogram"

C = (n − 1)

i

j wij(Yi − Yj)2

(

i=j wij) i(Yi − ¯

Y )2

Both are asymptotically normal if Yi are i.i.d.; Moran has mean −1/(n − 1) ≈ 0, Geary has mean 1 Better significance testing by comparing to a collection

f say 1000 random permutations of the Yi

Basics of Areal Data Models – p. 4

SLIDE 5

Measures of spatial association (cont’d)

<503 504-525 526-562 >563

Figure 1: Choropleth map of 1999 average verbal SAT scores, lower 48 U.S. states.

Basics of Areal Data Models – p. 5

SLIDE 6

Measures of spatial association (cont’d)

For these data, we obtain a Moran’s I of 0.5833, with associated standard error estimate 0.0920 ⇒ very strong evidence against H0 : no spatial correlation We obtain a Geary’s C of 0.3775, with associated standard error estimate 0.1008 ⇒ again, very strong evidence against H0 (departure from 1) Warning: These data have not been adjusted for covariates, such as the proportion of students who take the exam (Midwestern colleges have historically relied

n the ACT, not the SAT; only the best and brightest

students in these states would bother taking the SAT)

Basics of Areal Data Models – p. 6

SLIDE 7

Correlogram (via Moran’s I)

distance rho(d) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

1.0
0.5

0.0 0.5 1.0

Replace wij with w(1)

ij taken from W (1) ⇒ I(1)

Replace wij with w(2)

ij taken from W (2) ⇒ I(2), etc.

Plot I(r) vs. r; expect an initial decline across r followed by variation around 0 ⇒ spatial pattern! spatial analogue of the temporal lag autocorrelation plot

Basics of Areal Data Models – p. 7

SLIDE 8

Rasterized binary data map

Basics of Areal Data Models – p. 8

SLIDE 9

Binary data correlogram

A version of a correlogram for a binary map, using two-way tables and log odds ratios at pixel level Note strongest pattern is to the north (N), but in no direction are the values ≈ 0 even at 40 km

Basics of Areal Data Models – p. 9

SLIDE 10

Spatial smoothers

To smooth Yi, replace with ˆ

Yi =

i wijYj

wi+

More generally, we could include the value actually

bserved for unit i, and revise our smoother to

(1 − α)Yi + α ˆ Yi

For 0 < α < 1, this is a linear (convex) combination in “shrinkage" form. How to choose α? Finally, we could try model-based smoothing, i.e., based on E(Yi|Data), i.e., the mean of the predictive

distribution. Smoothers then emerge as byproducts of

the hierarchical spatial models we use to explain the Yi’s

Basics of Areal Data Models – p. 10

SLIDE 11

Markov random fields

Consider Y = (Y1, Y2, ..., Yn) and the set of densities

{p(yi|yj, j = i)}

We know p(y1, y2, ...yn) determines {p(yi|yj, j = i)} (the set of full conditional distributions) Does {p(yi|yj, j = i)} determine p(y1, y2, ...yn)??? We need the notion of compatibility. With two variables, when are p(y1|y2) and p(y2|y1) compatible? Not always, e.g., p(y1|y2) = N(a + by2, σ2

1) and

p(y2|y1) = N(c + dy3

1, σ2 2)

Basics of Areal Data Models – p. 11

SLIDE 12

Brook’s Lemma

If the full conditionals are compatible, then Brook’s Lemma provides a way to construct the joint distribution from the full conditionals We can write the joint distribution as

p(y1, . . . , yn) = p(y1, y2, . . . , yn) p(y10, y2, . . . , yn) p(y2, y10, y3, . . . , yn) p(y20, y10, y3, . . . , yn) . . . p(yn, y10, . . . , yn−1,0) p(yn0, y10, . . . , yn−1,0)p(y10, . . . , yn0)

Replacing each joint distributions with conditional × marginal, the marginal terms cancel and we have

p(y1, . . . , yn) = p(y1|y2, . . . , yn) p(y10|y2, . . . , yn) p(y2|y10, y3, . . . , yn) p(y20|y10, y3, . . . , yn) . . . p(yn|y10, . . . , yn−1,0) p(yn0|y10, . . . , yn−1,0)p(y10, . . . , yn0)

Basics of Areal Data Models – p. 12

SLIDE 13

Brook’s Lemma cont.

We have the joint distribution on the left side in terms of the full conditional distributions on the right side And, if left side is proper, since it integrates to 1, the normalizing constant is determined by integrating the right side and then rescaling to 1 We have a constructive way to retrieve the joint distribution from the full conditional distributions Useful in many other problems

Basics of Areal Data Models – p. 13

SLIDE 14

“Local” modeling

Suppose we specify the full conditionals such that

p(yi|yj, j = i) = p(yi|yj ∈ ∂i) ,

where ∂i is the set of neighbors of cell (region) i. When does {p(yi|yj ∈ ∂i)} determine p(y1, y2, ...yn)? Def’n: a clique is a set of cells such that each element is a neighbor of every other element Def’n: a potential function of order k is a positive function of k arguments that is exchangeable in these

arguments. Potential of order 2 is Q(yi, yj) with

Q(yi, yj) = Q(yj, yi)

Def’n: p(y1, . . . , yn) is a Gibbs distribution if, as a function of the yi, it is a product of potentials on cliques. With potentials of order 2, p(y1, . . . , yn) = Πi<jQ(yi, yj)

Basics of Areal Data Models – p. 14

SLIDE 15

“local” modeling, cont.

For a continuous variable, with k = 2, we might take

Q(yi, yj) = exp(−wi,j(yi − yj))2

For binary data, k = 2, we might take

Q(yi, yj) = I(yi = yj) = yiyj + (1 − yi)(1 − yj)

Cliques of size 1 ⇔ independence Cliques of size 2 with above Q for continuous variables and wi,j = I(i ∼ j) ⇔ pairwise difference form

p(y1, y2, ...yn) ∝ exp

  − 1

2τ2

i,j

(yi − yj)2I(i ∼ j)

   and therefore p(yi|yj, j = i) = N(

j∈∂i yj/mi, τ2/mi),

where mi is the number of neighbors of i No interest in k > 2.

Basics of Areal Data Models – p. 15

SLIDE 16

Two primary results

Hammersley-Clifford Theorem: If we have a Markov Random Field (i.e., {p(yi|yj ∈ ∂i)} uniquely determine

p(y1, y2, ...yn)), then the latter is a Gibbs distribution

Geman and Geman result : If we have a joint Gibbs distribution, i.e., as defined above, then we have a Markov Random Field

Basics of Areal Data Models – p. 16

SLIDE 17

Conditional autoregressive (CAR) model

Gaussian (autonormal) case

p(yi|yj, j = i) = N

 

j

bijyj, τ2

i

  Using Brook’s Lemma we can obtain

p(y1, y2, ...yn) ∝ exp

−1

2y′D−1(I − B)y

where B = {bij} and D is diagonal with Dii = τ2

i .

⇒ suggests a multivariate normal distribution with µY = 0 and ΣY = (I − B)−1D D−1(I − B) symmetric requires bij

τ 2

i = bji

τ 2

j for all i, j

Basics of Areal Data Models – p. 17

SLIDE 18

CAR Model (cont’d)

Returning to W, let bij = wij/wi+ and τ2

i = τ2/wi+, so

p(y1, y2, ...yn) ∝ exp

− 1

2τ2y′(Dw − W)y

.

Dw diagonal with (Dw)ii = wi+ and with algebra, p(y1, y2, ...yn) ∝ exp

  − 1

2τ2

i=j

wij(yi − yj)2

   Intrinsic autoregressive (IAR) model! Improper since (Dw − W)1 = 0, so requires a constraint – say,

i yi = 0

So, not a data model, a random effects model!

τ2 represents both dispersion and spatial dependence

Basics of Areal Data Models – p. 18

SLIDE 19

CAR Model Issues

With τ2 unknown, what to do with power of τ2 in joint distribution? (n− # of “islands")

Basics of Areal Data Models – p. 19

SLIDE 20

CAR Model Issues

With τ2 unknown, what to do with power of τ2 in joint distribution? (n− # of “islands") “Proper version:” replace Dw − W by Dw − ρW, and choose ρ so that Σy = (Dw − ρW)−1 exists! This in turn implies Yi|Yj=i ∼ N(ρ

j wijYj, τ2/mi)

Basics of Areal Data Models – p. 19

SLIDE 21

CAR Model Issues

With τ2 unknown, what to do with power of τ2 in joint distribution? (n− # of “islands") “Proper version:” replace Dw − W by Dw − ρW, and choose ρ so that Σy = (Dw − ρW)−1 exists! This in turn implies Yi|Yj=i ∼ N(ρ

j wijYj, τ2/mi)

“To ρ or not to ρ?”

Basics of Areal Data Models – p. 19

SLIDE 22

CAR Model Issues

With τ2 unknown, what to do with power of τ2 in joint distribution? (n− # of “islands") “Proper version:” replace Dw − W by Dw − ρW, and choose ρ so that Σy = (Dw − ρW)−1 exists! This in turn implies Yi|Yj=i ∼ N(ρ

j wijYj, τ2/mi)

“To ρ or not to ρ?”

Advantages:

Basics of Areal Data Models – p. 19

SLIDE 23

CAR Model Issues

With τ2 unknown, what to do with power of τ2 in joint distribution? (n− # of “islands") “Proper version:” replace Dw − W by Dw − ρW, and choose ρ so that Σy = (Dw − ρW)−1 exists! This in turn implies Yi|Yj=i ∼ N(ρ

j wijYj, τ2/mi)

“To ρ or not to ρ?”

Advantages: makes distribution proper

Basics of Areal Data Models – p. 19

SLIDE 24

CAR Model Issues

With τ2 unknown, what to do with power of τ2 in joint distribution? (n− # of “islands") “Proper version:” replace Dw − W by Dw − ρW, and choose ρ so that Σy = (Dw − ρW)−1 exists! This in turn implies Yi|Yj=i ∼ N(ρ

j wijYj, τ2/mi)

“To ρ or not to ρ?”

Advantages: makes distribution proper adds parametric flexibility

Basics of Areal Data Models – p. 19

SLIDE 25

CAR Model Issues

With τ2 unknown, what to do with power of τ2 in joint distribution? (n− # of “islands") “Proper version:” replace Dw − W by Dw − ρW, and choose ρ so that Σy = (Dw − ρW)−1 exists! This in turn implies Yi|Yj=i ∼ N(ρ

j wijYj, τ2/mi)

“To ρ or not to ρ?”

Advantages: makes distribution proper adds parametric flexibility

ρ = 0 interpretable as independence

Basics of Areal Data Models – p. 19

SLIDE 26

CAR Models with ρ parameter

Disadvantages: why should we expect Yi to be a proportion of average

f neighbors – a sensible spatial interpretation?

calibration of ρ as a correlation, e.g.,

ρ = 0.80 yields 0.1 ≤ Moran’s I ≤ 0.15, ρ = 0.90 yields 0.2 ≤ Moran’s I ≤ 0.25, ρ = 0.99 yields Moran’s I ≤ 0.5

So, used with random effects, scope of spatial pattern may be limited

Basics of Areal Data Models – p. 20

SLIDE 27

More CAR modeling

Again, CAR is an improper prior for modeling random effects with inverse covariance matrix 1

τ 2Q where

Q = (DW − W)

To make a data model, Yi = µi + φi + ǫi where φi from a CAR and ǫi are i.i.d. N(0, γ2), i.e.,

Yi|µi, φi ∼ N(µi + φi, γ2)

Two variance components - τ2 captures structured spatial variation, γ2 captures unstructured variation If Q were full rank Σφ+ǫ = τ2Q−1 + γ2In. With reparametrization becomes σ2(λQ−1 + (1 − λ)In)

Basics of Areal Data Models – p. 21

SLIDE 28

cont.

But Q not full rank so instead write (Leroux et al.)

Σ−1

φ+ǫ = 1 σ2(λQ + (1 − λ)In)

Now Σ−1

φ+ǫ full rank

Σφ+ǫ = σ2(λQ + (1 − λ)In)−1

Corresponds to marginalization. As a result, we have random effects ηi with η ∼ N(0, Σφ+ǫ) Now E(ηi|ηj, j = i) =

λ 1−λ+λmi

j∼i ηj

Now Var(ηi|ηj, j = i) =

σ2 1−λ+λmi

λ = 1 is IAR model, λ = 0 is independence model

Basics of Areal Data Models – p. 22

SLIDE 29

cont.

Another variation allows weights that are spatially varying, directional, adaptive Now let ˜

wij = wijeZi where Zi = Z(s∗

i ) with s∗ i the

centroid of areal unit i and Z(·) a mean 0 Gaussian process (Berrocal et al.) A conditional CAR model, given {Zi} Useful for image analysis, i.e., for models such as

Yi = µi + φi where i indexes pixels, Y is a blurred

version of true image µ and the φi follow this conditional CAR specification

Basics of Areal Data Models – p. 23

SLIDE 30

Comments

The CAR specifies Σ−1

Y , not ΣY (as in the point-level

modeling case), so does not directly model association

(Σ−1

Y )ii = 1/τ2 i ; (Σ−1 Y )ij = 0 ⇔ cond’l independence

Prediction at new sites is ad hoc, in that if

p(y0|y1, y2, ...yn) = N(

j