- D. Gumprecht, W.G. Müller and J. Rodríguez-Díaz
D. Gumprecht, W.G. Mller and J. Rodrguez-Daz University of - - PowerPoint PPT Presentation
D. Gumprecht, W.G. Mller and J. Rodrguez-Daz University of - - PowerPoint PPT Presentation
Optimal Design for Detecting Spatial Dependence D. Gumprecht, W.G. Mller and J. Rodrguez-Daz University of Econommics Vienna, Austria Johannes-Kepler-University Linz, Austria University of Salamanca, Spain mODa 8 , Almagro, Spain, June
2
Spatial dependence
“All things are related but nearby things are more related than distant things.” (Tobler, 1970: the first law of geography) “Spatial dependency is the extent to which the value of an attribute in one location depends
- n the values of the attribute in nearby
locations.” (Fotheringham et al, 2002). “Spatial autocorrelation (…) is the correlation among values of a single variable strictly attributable to the proximity of those values in geographic space (…).” (Griffith, 2003). “Hell is a place with no spatial dependence.” (Goodchild, 2002)
source: Anselin, 1988 (Columbus, Ohio crime) 3
Random or Clustered?
source: M.Goodchild, 2002 4
Spatial Randomness
– values observed at a location do not depend on values observed at neighboring locations – observed spatial pattern of values is equally likely as any other spatial pattern – the location of values may be altered without affecting the information content of the data
adapted from Goodchild, 2002 5
Spatial Proximity (Weight) Matrix
- Matrix W (n x n) , where
each element wij represents a measure of nearness between regions Oi and Oj
- Possible Choices:
wij = 1, if Oi touches Oj wij = 1, if distance(Oi, Oj) d*
- A B C D E
A 0 1 0 1 0 B 1 0 1 1 1 C 0 1 0 0 1 D 1 1 0 0 1 E 0 1 1 1 0
6
Spatial weight matrices based on distance
- Distances dij usually measured centroid
to centroid.
- Most common choices are the inverse
distance wij = (1 - 1 1{i=j})/dij ,
- or the negative exponential
wij = exp{- δ dij} –1 1{i=j}.
- Row standardization ij = wij / j wij is
employed to keep spatial parameters comparable.
source: Anselin, 1988 7
Moran Scatter Plots
We can now draw a scatter plot between a variable y, and the “spatial lag” of y, Wy.
The slope of the regression line is Moran’s , which can be interpreted as the spatial autocorrelation, the correlation between variable y and the “spatial lag” Wy
8
Tests for Spatial Dependence
- Moran, 1950
- Cliff and Ord, 1981 for regression residuals
from y = Xβ + ε
- Anselin and Kelejian, 1997 investigate
y = Xβ + y + ε.
2
( )( ) ( )
ij i j i
n w y y y y y y − − = −
- (
)
1 2 1
, ( )
T T T T T
y M W W My M I X X X X y My
−
+ = = −
9
Random or Clustered?
Moran’s = -0.003
Moran’s = 0.511
10
Distribution of Moran’s under the H0: no spatial autocorrelation
- Inference is usually based on a normal
approximation, using a standardized z-value
- btained from the mean and variance of the
statistic, i.e. z() = (-E[])/Var[] ,
- which are given by (see Henshaw, 1966)
where K = ½M( +T)M.
- a saddle-point approximation and the exact
distribution was derived by Tiefelsdorf, 2000.
- asymptotic distributions under deviations can
be found in Kelejian and Prucha, 2001.
tr( ) [ | ] , K E H n k = −
- 2
2 2
2{( )tr( ) tr( ) } Var[ | ] ( ) ( 2) n k K K H n k n k − − = − − +
11
Distribution of Moran’s under the HA: spatial autocorrelation
- We assume that the data is generated by a so
called SAR model, i.e. y = Xβ + ε, where ε = ε + u, u being i.i.d.
- The normal approximation holds and the
mean and variance are now given by (see Tiefelsdorf, 2000) where the hii
* are derived from functions of the
covariance matrix of the errors, and with
1 2
1 1
[ ] (1 2 ) 1 2
n k n k ii A i i i i
h E H t dt t λ λ
∗ − − ∞ − = =
| = + ⋅ ⋅ ⋅ ⋅ + ⋅ ⋅
- ∏
- 2
2
Var[ ] E[ ] E[ ]
A A A
H H H | = | − |
- 1
2
2 2 1 1 1
2 ( ) E[ ] (1 2 ) (1 2 ) (1 2 )
n k n k n k ii jj ij A i i j i i j
h h h H t t dt t t λ λ λ
∗ ∗ ∗ − − − ∞ − = = =
⋅ + ⋅ | = + ⋅ ⋅ ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ + ⋅ ⋅
- ∏
12
Random or Clustered?
Moran’s = 0.511 z() = 5.675 Moran’s = -0.003 z() = 0.190
13
A Design Criterion
- Purpose: minimize the Type II error, i.e. the
probability that, given the alternative, the Moran’s test accepts the null hypothesis of no spatial autocorrelation.
- This leads us to the following design problem
- Of course we cannot use classical design theory
since the power 1-Ψ is not convex.
1
E( ) min P (1 ) Var( )
A
H
H H α
−
- −
| ≤ Φ −
- |
- 1
(1 ) Var[ ] E[ ] E[ ] arg min arg min Var[ ]
A X X A
H H H H
ξ ξ
α ξ
− ∗ ∈ ∈
- Φ
− | + | − | = Ψ = Φ
- |
14
Example: Anselin data
Moran’s = 0.511 z() = 5.675 1- = 0.799
15
Exchange type algorithms
- E.g. from a given design ξ and a set of
candidate points C exchange the pair which maximizes the decrease in Ψ. (Fedorov, 1972, requires evaluation of the criterion n(N-n) times at each step).
- Iterate as long as there is improvement.
- Variants by Wynn, 1970, Meyer &
Nachtsheim, 1995, Nguyen, 2002, etc.
- Simulated annealing, genetic algorithms
as alternatives?
16
Example 2: Anselin data
17
Example 2: Anselin data
Moran’s = 0.511 z() = 5.675 1- = 0.799 Moran’s = 0.417 z() = 1.914 1- = 0.983
18
- Anselin, Luc. 1988. Spatial Econometrics: Methods and Models.
Dordrecht, Amsterdam.
- Cliff, Andrew. Keith Ord. 1981. Spatial Processes: Models and
- Applications. London: Pion.
- Müller, Werner G. 2007. Collecting Spatial Data. Springer-Verlag
Berlin Heidelberg
- Tiefelsdorf, Michael. 2000. Modelling Spatial Processes. Springer-
Verlag Berlin Heidelberg New York.
References (www.ifas.jku.at)
www.endlessforest.org 19
thank you for your attention!
source: O'Sullivan and Unwin, 2002 20
Is it Spatially Random? Tougher than it looks to decide!
- Fact: It is observed that about
twice as many people sit catty/corner rather than opposite at tables in a restaurant
- Conclusion: psychological
preference for nearness
- In actuality: an outcome to
be expected from a random process: two ways to sit
- pposite, but four ways to
sit catty/corner
source: M.Goodchild 21
Why Spatial Autocorrelation Matters
- Spatial autocorrelation is of interest in its own right because it
suggests the operation of a spatial process
- Additionally, most statistical analyses are based on the assumption
that the values of observations in each sample are independent of
- ne another
– Positive spatial autocorrelation violates this, because samples taken from nearby areas are related to each other and are not independent
- In ordinary least squares regression (OLS), for example, the
correlation coefficients will be biased and their precision exaggerated
– Bias implies correlation coefficients may be higher than they really are
- They are biased because the areas with higher concentrations of
events will have a greater impact on the model estimate
– Exaggerated precision (lower standard error) implies they are more likely to be found “statistically significant”
- they will overestimate precision because, since events tend to be
concentrated, there are actually a fewer number of independent
- bservations than is being assumed.
22