Spatial dependence HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET - - PDF document

spatial dependence
SMART_READER_LITE
LIVE PREVIEW

Spatial dependence HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET - - PDF document

Spatial dependence HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI Everything is related to everything else, but nearby things are more related than distant things Spatial Data Mining This is usually true even for spatially


slide-1
SLIDE 1

HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI

Spatial Data Mining Spatial modelling

Antti Leino antti.leino@cs.helsinki.

Department of Computer Science

Spatial dependence

Everything is related to everything else, but nearby things are more related than distant things This is usually true even for spatially discrete phenomena

Typically depend on underlying factors that are

numerous not easy to measure spatially continuous

In other words, spatial correlation is an approximation Still, a useful one

Different scales

Sometimes useful to divide spatial dependence in two First order effects

Differences in intensity Other large-scale variation

Second order effects

Correlation between neighbouring places Other small-scale variation

First order variation

Distribution of the name Mustalampi 'Black Pond' Kernel estimate of the intensity

Second order variation

Again, the lake name Mustalampi K function

A measure for attraction between neighbouring instances Red: theoretical value for no attraction Blue: estimated value, constant intensity Green: estimated value, variable intensity

First or second order effects?

Same phenomenon can be modelled as either

Small-scale variation in intensity Large-scale spatial autocorrelation

In other words,

First order methods can be used for detailed study Second order methods can be used at low resolutions

Distinction between rst and second order effects is largely a decision during modelling

Choice has to be based on the goals of the study

slide-2
SLIDE 2

Dealing with space

No (a priori) direction

Correlations in a two-dimensional space Not reasonable to assume that correlation is directional

Hence: no obvious denition for

neighbourhood in point patterns proximity in area data

Boundary effects

Observations do not typically cover all the phenomenon In reality, correlation reaches to the unseen areas This is not available for analysis

Background concepts

Statistics commonly has certain methodological assumptions

Null hypothesis: the phenomenon is completely random Goal: prove that the null hypothesis is invalid Usually: phenomena follow the normal distribution

What does this mean for spatial data?

Complete spatial randomness Suitable probability distribution

Modelling spatial randomness

Spatial stochastic process Statistical model for a spatial phenomenon Represented by the joint probability distribution of a set of random variables

{X(s),s ∈ R} for point data {Y(A ),A ⊆ R} for area data

Normally only one realisation is observed

The actual values of the variable in each location

Modelling point patterns

Randomness: the Poisson process Independent events happening with a constant intensity λ In its basic form one-dimensional

E.g. time The probability of an event happening during an equal-sized time slot is uniform

The expected number of events in a time slot E(X(t)) = λt

Poisson process: example

Two time sequences generated from a Poisson process with λ = 2

A (•): 24 events B (◦): 17 events

Poisson process: example

Probability distribution of the expected value of events

λ = 2, t = 10

X(t) ∼ Poisson(20)

slide-3
SLIDE 3

Poisson process: from one to two dimensions

Easy to extend the Poisson process to a two-dimensional case Again, constant intensity λ The expected number of events in region A depends on the intensity and the area of A: E(X(A)) = λ|A| The spatial Poisson process is a model of what would happen if the events were independent from each other

No rst order variation No second order effects

First order variation: intensity

Instead of constant intensity λ an intensity function

λ(s) = lim

|ds|→0

E(X(ds))

|ds| ds a neighbourhood of point s E(X(ds)) the expected number of points in this neighbourhood

|ds| the size of the neighbourhood

The intensity at point s can be viewed as the density of events in an innitely small neighbourhood of s

Using the intensity function

A Poisson process can use the intensity function instead of a constant intensity Such a heterogeneous Poisson process models the rst order variation of a point pattern The expected number of events in a region A E(X(A )) =

  • A

λ(s)ds

Estimating intensity

Kernel estimation Represent each point by a symmetrical two-dimensional density function, e.g. normal distribution Estimate the intensity function as the sum of these density functions

ˆ λτ(s) =

1

δτ(s)

n

  • i=1

1

τ2 k s−si τ

  • s1,...,sn event points

k kernel function

τ > 0 bandwidth δτ(s) edge correction

Kernel estimation

Bandwidth denes how far from each point the effect reaches In effect, it species how detailed the variation in intensity is

Simulating a Poisson process

Homogeneous Poisson process: two phases

  • 1. Number of events in area A : n ∼ Poisson(λ|A |
  • 2. The locations for the events can be obtained from

a uniform distribution over A

Similarly for a heterogeneous Poisson process

  • 1. λ not constant
  • 2. Locations from a non-uniform distribution
slide-4
SLIDE 4

Measuring second order effects

Nearest neighbour measures

G(h): probability that the distance from a random event to the nearest other event ≤ h F(h): probability that the distance from a random location to the nearest event ≤ h

If events are clustered, G(h) < F(h) Only shows very small-scale attraction / repulsion Something else is required for scales larger than the nearest neighbour distance

K function

Measure for second order effects Basic case: constant λ, one point pattern

λK(h) = expected number of other events within

radius h of a random event For a homogeneous Poisson process K(h) = πh2

Also possible to measure Kinhom(h) for a heterogeneous point pattern For two point patterns

λjKij(h) = expected number of events of type j

within radius h of a random event of type i

K function: example

Two pairs of lake names

Mustalampi 'Black Pond' Valkealampi 'White Pond' Kuikkalampi 'Diver Pond' Ruunalampi 'Gelding Pond'

Spatial distributions and K functions

Blue line: homogeneous Kij Green line: heterogeneous Kinhom

ij

Modelling second order variation

Poisson cluster process Start with a Poisson process

Normally, a homogeneous process In principle, heterogeneous also possible, but difcult to estimate

This process generates parents Each parent generates a random number of daughters

Distributed independently around the parent These are the actual events

Spatially continuous phenomena

Observations from distinct points in space This time, measurements of a spatially continuous variable {Y(s),s ∈ R} Goal: model the behaviour of Y across R Again, useful to divide variation into rst and second order effects

First order properties of continuous data

Mean value surface {µ(s),s ∈ R},µ(s) = E(Y(s)) Normal statistical regression problem

Linear regression of Y(s) with spatial coordinates sx,sy Trend surface analysis More sophisticated methods available

Goal: interpolate the value of Y between the

  • bservation points

Y(s) = µ(s)

slide-5
SLIDE 5

Second order effects in continuous data

Usually better to assume Y(s) = µ(s)+U(s)

µ(s) global trend

U(s) spatially correlated residual, with

∀s ∈ R : E(U(s)) = 0

U(s) can be used to model second order effects Common assumption: U(s) is stationary

E(U(s)) and Var(U(s)) constant Cov(U(s),U(s′)) depends only on h = s′ −s In other words, the same in different parts of R

Often also isotropic

Cov(U(s),U(s′)) depends only on |h| In other words, the same in all directions

Predicting with second order effects

If the residual process {U(s),s ∈ R} is spatially correlated, it is possible to give better estimates than Y(s) = ˆ

µ(s)

Kriging: ˆ Y(s) = ˆ

µ(s)+ ˆ

U(s) Various methods for this

Beyond the scope of this course No general criterion for choosing, beyond see what works

Bottom line: modelling both rst and second order effects gives reasonably good predictions

Proximity in area data

Proximity matrix W wij =

  • 1 if Ai and Aj share a border

0 otherwise

A B C D E F A 1 1 1 B 1 1 1 1 C 1 1 D 1 1 1 E 1 1 1 1 F 1 1 1 1

More elaborate measures for proximity possible

First order variation

Simple option: moving averages

Replace the value for each area by the averages of its neighbours

ˆ µi = n

j=1wijyj

n

j=1wij

Convert to point data

E.g. represent each area by its centre Perform kernel estimation

Median polish

For regular grids Represent each grid cell as yij = µ+ri +cj +εij ri, cj row and column trends, εij random error

Second order effects

Moran's I statistic: spatial correlation I = nn

i=1

n

j=1wij(yi − ¯

y)(yj − ¯ y)

n

i=1(yi − ¯

y)2

i=j wij

  • Varies between −1 and +1, no autocorrelation

when I = 0

Geary's C statistic: variance of the difference of neighbouring values C =

(n−1)n

i=1

n

j=1wij(yi −yj)2

2

n

i=1(yi − ¯

y)2

i=j wij

  • Varies between 0 and 2, no autocorrelation when

C = 1

Summary

Lots of statistical methods for spatial modelling Different methods for point patterns, area data and continuous data

Some related to each other

If still interested, take a course in spatial statistics