Analysis of Areal Data: Should a Model with (Spatial) Dependence be - - PowerPoint PPT Presentation

analysis of areal data should a model with spatial
SMART_READER_LITE
LIVE PREVIEW

Analysis of Areal Data: Should a Model with (Spatial) Dependence be - - PowerPoint PPT Presentation

Analysis of Areal Data: Should a Model with (Spatial) Dependence be Considered? Petrut a C. Caragea Iowa State University Department of Statistics T h i n k S p a t i a l University of California at Santa Barbara Fall, 2010 P. Caragea


slide-1
SLIDE 1

Analysis of Areal Data: Should a Model with (Spatial) Dependence be Considered?

Petrut ¸a C. Caragea

Iowa State University Department of Statistics

T h i n k S p a t i a l University of California at Santa Barbara Fall, 2010

  • P. Caragea (ISU, Statistics)

S-value 1 / 26

slide-2
SLIDE 2

An Example – Drumlins in Ireland

Drumlins

  • Oval or elongated hills
  • Formed by the movement of glacial ice

sheets across rock debris.

  • County Down Ireland (Hill (1973))
  • P. Caragea (ISU, Statistics)

S-value 2 / 26

slide-3
SLIDE 3

Drumlins—details

  • Northern Ireland Drumlin

Belt

  • Griffith (2006): 3 regions in

County Down

  • P. Caragea (ISU, Statistics)

S-value 3 / 26

slide-4
SLIDE 4

Drumlins data

  • Overlaid 11 × 11 grids on each region (64 km2)
  • Tabulated number of drumlins in each grid cell
  • Regular lattice of count data

˜ κ = 1.934 ˜ κ = 1.942 ˜ κ = 1.264

  • P. Caragea (ISU, Statistics)

S-value 4 / 26

slide-5
SLIDE 5

Drumlins data

  • Overlaid 11 × 11 grids on each region (64 km2)
  • Tabulated number of drumlins in each grid cell
  • Regular lattice of count data

˜ κ = 1.934 ˜ κ = 1.942 ˜ κ = 1.264 Griffith (2006) fits various models to count data on regular grids

  • P. Caragea (ISU, Statistics)

S-value 4 / 26

slide-6
SLIDE 6

Spatial Context

  • Spatial domain D, locations {si : i = 1, . . . , n}, random variables

{Y (si) : i = 1, . . . , n}, neighborhood Ni

  • Markov property [Y (si)|{y(sj) : j = i}] = [Y (si)|y(Ni)]; i = 1, . . . , n
  • P. Caragea (ISU, Statistics)

S-value 5 / 26

slide-7
SLIDE 7

Spatial Context

  • Spatial domain D, locations {si : i = 1, . . . , n}, random variables

{Y (si) : i = 1, . . . , n}, neighborhood Ni

  • Markov property [Y (si)|{y(sj) : j = i}] = [Y (si)|y(Ni)]; i = 1, . . . , n

One parameter exponential families

  • Conditional distribution

fi(y(si)|y(Ni)) = exp [Ai(y(Ni))y(si) − Bi(y(Ni)) + C(y(si))]

  • Natural parameter function

Ai(y(Ni)) = τ −1(κi) + γ 1 m

  • sj∈Ni

{y(sj) − κj}

  • P. Caragea (ISU, Statistics)

S-value 5 / 26

slide-8
SLIDE 8

Spatial Context

  • Spatial domain D, locations {si : i = 1, . . . , n}, random variables

{Y (si) : i = 1, . . . , n}, neighborhood Ni

  • Markov property [Y (si)|{y(sj) : j = i}] = [Y (si)|y(Ni)]; i = 1, . . . , n

One parameter exponential families

  • Conditional distribution

fi(y(si)|y(Ni)) = exp [Ai(y(Ni))y(si) − Bi(y(Ni)) + C(y(si))]

  • Natural parameter function

Ai(y(Ni)) = τ −1(κi) + γ 1 m

  • sj∈Ni

{y(sj) − κj} Winsorized Poisson: Ai(y(Ni)) = log(κ) + γ 1 m

  • sj∈Ni

{y(sj) − κ}

  • P. Caragea (ISU, Statistics)

S-value 5 / 26

slide-9
SLIDE 9

Construction—Part 1

Available Moment Estimators

  • For marginal expectations:

ˆ E{Y (si)} = 1 n

n

  • i=1

Y (si) ≡ ˜ κ

  • P. Caragea (ISU, Statistics)

S-value 6 / 26

slide-10
SLIDE 10

Construction—Part 1

Available Moment Estimators

  • For marginal expectations:

ˆ E{Y (si)} = 1 n

n

  • i=1

Y (si) ≡ ˜ κ

  • For conditional expectations:

ˆ E {Y (si) | si ∈ Hℓ} = 1 | Hℓ |

  • si∈Hℓ

Y (si) ≡ Cℓ where for each ℓ = 1, . . . , q Hℓ ≡ {si : 1 m

  • sj∈Ni

y(sj) = hℓ}

  • P. Caragea (ISU, Statistics)

S-value 6 / 26

slide-11
SLIDE 11

Construction—Part 2

From model structure

  • Ec = E{Y (si) | si ∈ Hℓ} = τ(A(hℓ))
  • Em = E{Y (si)} = κ

Then Ai(hℓ) − τ −1(κ) = γ × (hℓ − κ) ⇒ τ −1(Ec) − τ −1(Em) = γ × (hℓ − Em) ⇒ τ −1(Cℓ) − τ −1(˜ κ)

  • r(Cℓ,˜

κ)

≈ γ × (hℓ − ˜ κ)

  • D(hℓ,˜

κ)

Define the S-value: S = q

ℓ=1 r(Cℓ, ˜

κ)D(hℓ, ˜ κ) q

ℓ=1{D(hℓ, ˜

κ)}2 Note: The S-value has the form of a crude estimator of γ.

  • P. Caragea (ISU, Statistics)

S-value 7 / 26

slide-12
SLIDE 12

Interpretation

Standard bound (Kaiser 2007)

  • | γ |< γsb ensures κ ≈ E{Y (si)}
  • γsb available for exponential family models
  • For Winsorized Poisson: γsb = log(R)−log(κ)

R−κ

  • P. Caragea (ISU, Statistics)

S-value 8 / 26

slide-13
SLIDE 13

Interpretation

Standard bound (Kaiser 2007)

  • | γ |< γsb ensures κ ≈ E{Y (si)}
  • γsb available for exponential family models
  • For Winsorized Poisson: γsb = log(R)−log(κ)

R−κ

Uses:

1 S/γsb is a measure of strength of dependence 2 if S >> γsb then κ = E{Y (si)} in model

  • directional dependencies
  • non-constant mean
  • P. Caragea (ISU, Statistics)

S-value 8 / 26

slide-14
SLIDE 14

Uses of the S-value: Detecting strength of dependence

Winsorized Poisson with κ = 5, R = 20 and γ = 0.0462 (γsb = 0.0924) 30×30 regular lattice Case 1

  • ˜

κ = 4.921

  • S/γsb = 0.853

PL Estimates:

  • ˆ

κ = 4.854

  • ˆ

γ = 0.0827 ⇒ ˆ γ/γsb = 0.895 Case 2

  • ˜

κ = 5.077

  • S/γsb = 0.353

PL Estimates:

  • ˆ

κ = 5.065

  • ˆ

γ = 0.0326 ⇒ ˆ γ/γsb = 0.353

  • ● ●
  • −3

−2 −1 1 2 3 −0.2 0.0 0.2 D r S−value=0.0788

  • ●●
  • −3

−2 −1 1 2 3 −0.2 0.0 0.2 D r S−value=0.0326

  • P. Caragea (ISU, Statistics)

S-value 9 / 26

slide-15
SLIDE 15

Uses of the S-value: Detecting directional dependence

Directional Winsorized Poisson with κ = 5, R = 20 and γ1 = 0.07 and γ2 = 0.001 (γsb = 0.092) Unidirectional

  • S/γsb = 0.947

PL Estimates:

  • ˆ

κ = 5.071

  • ˆ

γ = 0.0899 ⇒ ˆ γ/γsb = 0.973 Directional

  • S1/γsb = 0.7576

S2/γsb = 0.0086 PL Estimates:

  • ˆ

κ = 5.018

  • ˆ

γ1 = 0.0854 ⇒ ˆ γ1/γsb = 0.924

  • ˆ

γ2 = 0.0010 ⇒ ˆ γ2/γsb = 0.108 ˜ κ = 5.144

5 10 15 20 25 30 5 10 15 20 25 30 U−Coordinate V−Coordinate

  • −3

−2 −1 1 2 3 −0.2 0.0 0.1 0.2 D r S−value=0.0875

  • −3

−2 −1 1 2 3 −0.2 0.0 0.1 0.2 D (U−Direction) r (U−Direction) S−value=0.0700

  • ● ●
  • −3

−2 −1 1 2 3 −0.2 0.0 0.1 0.2 D (V−Direction) r (V−Direction) S−value=0.0008

  • P. Caragea (ISU, Statistics)

S-value 10 / 26

slide-16
SLIDE 16

Uses of the S-value: Detecting spatial trend

Data generated with trend and unidirectional dependence γ = 0.05

  • Const. mean

S/γsb = 1.8 With trend S/γsb = 0.47

5 10 20 30 5 10 20 30 U−Coordinate V−Coordinate

  • ●● ●

−4 −2 2 4 −1.0 0.0 1.0 D r S−value=0.1744

  • −4

−2 2 4 −0.5 0.0 0.5 1.0 D (Median Polish) r (Median Polish) S−value=0.0685

  • −4

−2 2 4 −0.5 0.0 0.5 1.0 D (ols) r (ols) S−value=0.0450

  • P. Caragea (ISU, Statistics)

S-value 11 / 26

slide-17
SLIDE 17

S-value in practice: Drumlins in Ireland

  • Neighborhood: 4 nearest neighbors
  • Winsorization value: R = 7

Calculated S-value Region Unidirectional N-S E-W NE-SW NW-SE 1 0.3681 0.3848 0.1848 0.2451 0.0688 2 0.3763 0.1574 0.3086 0.0948 0.1435 3 0.2197 0.0104 0.1961 0.0125 0.2015 Standard bound between 0.25 and 0.30.

  • P. Caragea (ISU, Statistics)

S-value 12 / 26

slide-18
SLIDE 18

Drumlins in Ireland: Simulation from the model

2000 simulations using the sample mean and S-values. Region 1

  • Model 1

Model 2 Model 3 1 2 3 4 5 6 7 Region 1

  • ˜

κ = 1.934 Region 2

  • Model 1

Model 2 Model 3 1 2 3 4 5 6 7 Region 2

  • ˜

κ = 1.942 Region 3

  • Model 1

Model 2 Model 3 0.5 1.0 1.5 2.0 Region 3

  • ˜

κ = 1.264

  • P. Caragea (ISU, Statistics)

S-value 13 / 26

slide-19
SLIDE 19

S-value in practice: Drumlins in Ireland

Calculated S-value Region Unidirectional N-S E-W NE-SW NW-SE 1 0.3681 0.3848 0.1848 0.2451 0.0688 2 0.3763 0.1574 0.3086 0.0948 0.1435 3 0.2197 0.0104 0.1961 0.0125 0.2015 Standard bound between 0.25 and 0.30.

  • P. Caragea (ISU, Statistics)

S-value 14 / 26

slide-20
SLIDE 20

Drumlins in Ireland: Simulation from the model

2000 simulations using the sample mean and S-values. Region 1

  • Model 1

Model 2 Model 3 1 2 3 4 5 6 7 Region 1

  • ˜

κ = 1.934 Region 2

  • Model 1

Model 2 Model 3 1 2 3 4 5 6 7 Region 2

  • ˜

κ = 1.942 Region 3

  • Model 1

Model 2 Model 3 0.5 1.0 1.5 2.0 Region 3

  • ˜

κ = 1.264

  • P. Caragea (ISU, Statistics)

S-value 15 / 26

slide-21
SLIDE 21

S-value in practice: Drumlins in Ireland

Calculated S-value Region Unidirectional N-S E-W NE-SW NW-SE 1 0.3681 0.3848 0.1848 0.2451 0.0688 2 0.3763 0.1574 0.3086 0.0948 0.1435 3 0.2197 0.0104 0.1961 0.0125 0.2015 Standard bound between 0.25 and 0.30.

  • P. Caragea (ISU, Statistics)

S-value 16 / 26

slide-22
SLIDE 22

Drumlins in Ireland: Simulation from the model

2000 simulations using the sample mean and S-values. Region 1

  • Model 1

Model 2 Model 3 1 2 3 4 5 6 7 Region 1

  • ˜

κ = 1.934 Region 2

  • Model 1

Model 2 Model 3 1 2 3 4 5 6 7 Region 2

  • ˜

κ = 1.942 Region 3

  • Model 1

Model 2 Model 3 0.5 1.0 1.5 2.0 Region 3

  • ˜

κ = 1.264

  • P. Caragea (ISU, Statistics)

S-value 17 / 26

slide-23
SLIDE 23

Drumlins in Ireland: Simulation from the model

2000 simulations using the sample mean and S-values. Region 1

  • Model 1

Model 2 Model 3 1 2 3 4 5 6 7 Region 1

  • ˜

κ = 1.934 Region 2

  • Model 1

Model 2 Model 3 1 2 3 4 5 6 7 Region 2

  • ˜

κ = 1.942 Region 3

  • Model 1

Model 2 Model 3 0.5 1.0 1.5 2.0 Region 3

  • ˜

κ = 1.264 Suggestion: Use a model with constant mean, and directional (NE-SW) dependence.

  • P. Caragea (ISU, Statistics)

S-value 17 / 26

slide-24
SLIDE 24

Another Example – White Oaks in the Driftless Region

White Oak (Quercus alba)

  • Outstanding tree among all trees
  • High grade wood-used for many things
  • P. Caragea (ISU, Statistics)

S-value 18 / 26

slide-25
SLIDE 25

Another Example – White Oaks in the Driftless Region

White Oak (Quercus alba)

  • Outstanding tree among all trees
  • High grade wood-used for many things

staves for barrels

  • P. Caragea (ISU, Statistics)

S-value 18 / 26

slide-26
SLIDE 26

Another Example – White Oaks in the Driftless Region

White Oak (Quercus alba)

  • Outstanding tree among all trees
  • High grade wood-used for many things

staves for barrels ⇒ “stave oak”

  • P. Caragea (ISU, Statistics)

S-value 18 / 26

slide-27
SLIDE 27

Another Example – White Oaks in the Driftless Region

White Oak (Quercus alba)

  • Outstanding tree among all trees
  • High grade wood-used for many things

staves for barrels ⇒ “stave oak”

  • P. Caragea (ISU, Statistics)

S-value 18 / 26

slide-28
SLIDE 28

White Oaks in the study region

White Oak: ˜ κ = 0.42

680000 700000 720000 740000 4760000 4780000 4800000 4820000 4840000 Abs Pres

  • P. Caragea (ISU, Statistics)

S-value 19 / 26

slide-29
SLIDE 29

White Oaks in the study region

White Oak: ˜ κ = 0.42

680000 700000 720000 740000 4760000 4780000 4800000 4820000 4840000 Abs Pres

  • Binary data
  • Standard bound is 4
  • Is there spatial dependence?
  • Directional?
  • Trends?

Much larger sample size here: about 7200 locations Natural parameter function for binary conditionals: Ai(y(Ni)) = log

  • κ

1 − κ

  • + γ 1

m

  • sj∈Ni

{y(sj) − κ}

  • P. Caragea (ISU, Statistics)

S-value 19 / 26

slide-30
SLIDE 30

Detect strength of dependence

S-value plot

  • −1.0

−0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 D r

S−value= 1.909

Results

  • ˜

κ = 0.418

  • S/γsb = 0.477

PL Estimates:

  • ˆ

κ = 0.407

  • ˆ

γ = 2.044 ⇒ ˆ γ/γsb = 0.511

  • P. Caragea (ISU, Statistics)

S-value 20 / 26

slide-31
SLIDE 31

Detecting directional dependence

NS direction: S1/γsb = 0.286

  • −1.0

−0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 D (U−Direction) r(V−Direction)

S−value= 1.145

EW direction: S1/γsb = 0.257

  • −1.0

−0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 D(V−Direction) r(V−Direction)

S−value= 1.029

  • P. Caragea (ISU, Statistics)

S-value 21 / 26

slide-32
SLIDE 32

Question: Is the percentage of clay in the top soil an important factor?

White Oak

680000 700000 720000 740000 4760000 4780000 4800000 4820000 4840000 Abs Pres

Percent Clay in top level soil

680000 700000 720000 740000 4760000 4780000 4800000 4820000 4840000 10 20 30 40

  • P. Caragea (ISU, Statistics)

S-value 22 / 26

slide-33
SLIDE 33

Detect spatial trend

No trend

  • −1.0

−0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 D r

S−value= 1.909

Predictor: Percent Clay

  • −1.0

−0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 D (OLS) r (OLS)

S−value= 1.012

  • P. Caragea (ISU, Statistics)

S-value 23 / 26

slide-34
SLIDE 34

Models suggested by exploratory analysis:

Consider models with spatial dependence and Percent Clay as predictor.

  • P. Caragea (ISU, Statistics)

S-value 24 / 26

slide-35
SLIDE 35

Models suggested by exploratory analysis:

Consider models with spatial dependence and Percent Clay as predictor. Prediction Error Model RMSPE Cst.Mean, Indep 0.243 Perc.Clay, Indep 0.239

  • Cst. Mean, Spatial

0.226 Perc.Clay, Spatial 0.224

  • P. Caragea (ISU, Statistics)

S-value 24 / 26

slide-36
SLIDE 36

Models suggested by exploratory analysis:

Consider models with spatial dependence and Percent Clay as predictor. Prediction Error Model RMSPE Cst.Mean, Indep 0.243 Perc.Clay, Indep 0.239

  • Cst. Mean, Spatial

0.226 Perc.Clay, Spatial 0.224 Sensitivity/specificity (cutoff=0.5) %Absence %Presence 100 89 21 79 42 83 35

  • P. Caragea (ISU, Statistics)

S-value 24 / 26

slide-37
SLIDE 37

Models suggested by exploratory analysis:

Consider models with spatial dependence and Percent Clay as predictor. Prediction Error Model RMSPE Cst.Mean, Indep 0.243 Perc.Clay, Indep 0.239

  • Cst. Mean, Spatial

0.226 Perc.Clay, Spatial 0.224 Sensitivity/specificity (cutoff=0.5) %Absence %Presence 100 89 21 79 42 83 35

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1−Specificity Sensitivity

  • Indep.Cst.Mean

Indep.Perc.Clay Spatial Cst.Mean Spatial Perc.Clay

  • P. Caragea (ISU, Statistics)

S-value 24 / 26

slide-38
SLIDE 38

Conclusions: Versatile exploratory tool

Guide model formulation

  • strength of dependence
  • type of dependence (e.g. directional)
  • aptness of mean structure

Connected to Model form

  • interpretation within distributional families
  • direct connection to dependence parameter(s)
  • P. Caragea (ISU, Statistics)

S-value 25 / 26

slide-39
SLIDE 39

Acknowledgements and further reading

Collaborators

  • Mark Kaiser, Professor of Statistics, ISU
  • Lisa Schulte and Kumudan Grubh, Forestry (NREM), ISU
  • Centered autologistic models: JABES 2009
  • Exploring dependence with data on spatial lattices: Biometrics 2009
  • More details on standard bounds can be found at

www.stat.iastate.edu Click on Publications and Preprint Series Specifically,

  • Musings about spatial dependence for MRFs: 2007-1
  • P. Caragea (ISU, Statistics)

S-value 26 / 26