BAYESIAN MODEL SELECTION IN SPATIAL LATTICE MODELS Victor De - - PowerPoint PPT Presentation

bayesian model selection in spatial lattice models victor
SMART_READER_LITE
LIVE PREVIEW

BAYESIAN MODEL SELECTION IN SPATIAL LATTICE MODELS Victor De - - PowerPoint PPT Presentation

BAYESIAN MODEL SELECTION IN SPATIAL LATTICE MODELS Victor De Oliveira Department of Management Science and Statistics The University of Texas at San Antonio San Antonio, TX USA victor.deoliveira@utsa.edu


slide-1
SLIDE 1

BAYESIAN MODEL SELECTION IN SPATIAL LATTICE MODELS Victor De Oliveira Department of Management Science and Statistics The University of Texas at San Antonio San Antonio, TX USA victor.deoliveira@utsa.edu http://faculty.business.utsa.edu/vdeolive Joint work with J.J. Song The Fourth Erich L. Lehmann Symposium, May 9–12, 2011

slide-2
SLIDE 2

Example 1: Phosphate Data Raw phosphate concentrations (in mg P/100 g of soil) collected over 16 by 16 regular lattice during several years in archaeological region of Greece

5 10 15 5 10 15 s1 (meters) s2 (meters) 121 112 108 91 68 59 294 50 101 27 71 48 36 71 66 83 108 101 75 83 52 55 50 41 30 47 47 55 75 108 62 80 50 88 77 77 73 50 50 59 57 55 57 38 71 17 52 60 91 166 68 60 32 47 45 34 57 60 64 68 32 48 27 88 116 66 34 62 77 41 23 38 68 68 73 33 60 66 62 143 60 62 80 59 75 57 27 57 55 53 80 80 62 91 71 68 77 104 75 41 33 131 41 37 64 45 62 21 60 38 47 77 73 62 27 44 53 53 52 36 64 28 44 45 60 62 34 47 75 83 71 77 83 73 77 59 59 38 32 55 60 30 41 59 57 71 66 83 85 85 77 83 45 47 48 68 80 44 64 64 68 68 88 116 108 85 91 73 37 41 38 36 19 57 47 131 80 83 80 88 73 73 97 62 31 45 34 66 71 85 80 121 91 136 108 108 80 80 73 55 34 62 41 80 75 101 50 71 91 94 94 91 75 68 59 57 55 66 40 57 68 73 80 71 125 83 66 77 71 47 55 77 59 45 55 59 60 48 68 71 57 60 55 53 57 62 64

slide-3
SLIDE 3

Example 2: Crime Data Homicide rates per 100, 000 habitants for 1980 in the south of US, with n = 1412 counties

−105 −100 −95 −90 −85 −80 −75 25 30 35 40 longitude latitude

0.00−4.80 4.80−7.85 7.85−10.92 10.92−15.03 15.03−42.34

slide-4
SLIDE 4

Models for Spatial Lattice Data

  • Conditional Autoregressive (CAR) Models:

Mostly studied and applied in Statistical literature

  • Simultaneously Autoregressive (SAR) Models:

Mostly studied and applied in Econometric/geography literature All of these require specifying a neighborhood system

slide-5
SLIDE 5

Neighborhood Systems Sites {1, . . . , n} are endowed with neighborhood system, {Ni : i = 1, . . . , n}, where Ni = neighbors of site i. Examples: Ni = {j : site j shares a boundary with site i} Ni = {j : 0 < dij < r} with r > 0 and dij the distance between sites i and j

slide-6
SLIDE 6

First and second order neighborhood systems

X

X

slide-7
SLIDE 7

Goal Model selection for spatial lattice data using a default Bayesian approach, where the competing models:

  • Have the same mean structure
  • Have different covariance structures
slide-8
SLIDE 8

CAR MODELS Conditional Specification: For i = 1, . . . , n (Yi | Y(i)) ∼ N(x′

iβ + n

  • j=1

cij(Yj − x′

jβ), τ2 i )

  • Y(i) = {Yj, j = i}
  • x′

j = (xj1, . . . , xjp)

  • β ∈ Rp,

τi > 0

  • cij ≥ 0

and cij > 0 iff i ∼ j

slide-9
SLIDE 9

Let M = diag(τ2

1, . . . , τ2 n) and C = (cij) satisfy

  • M−1C is symmetric, so cijτ2

j = cjiτ2 i

  • M−1(In − C) positive definite

Joint Specification:

Y ∼ Nn(Xβ, (In − C)−1M)

where X = (x1, . . . , xn)′

slide-10
SLIDE 10

Parameterization

  • M = σ2G, with σ2 > 0 unknown and G diagonal

(known)

  • C = φW, with φ ‘spatial parameter’ and W = (wij)

nonnegative “weight” known matrix (not necessarily symmetric), and wij > 0 iff i ∼ j Let A = (aij) [neighborhood matrix]: aij = 1 if i ∼ j, and aij = 0 otherwise

slide-11
SLIDE 11

Classes of CAR Models

  • Homogeneous CAR (HCAR):

G = In , W = A

  • Weighted CAR (WCAR) (Besag et al. 1991):

G = diag(|N1|−1, . . . , |Nn|−1) , W = GA with |Ni| = n

j=1 aij

  • Autocorrelation CAR (ACAR) (Cressie & Chang, 1989):

G = diag(|N1|−1, . . . , |Nn|−1) , W = G1/2AG−1/2

slide-12
SLIDE 12

Facts Assume the above conditions hold and G−1M is symmetric. Then: (a) G−1/2WG1/2 is symmetric (b) G−1/2WG1/2 and W have the same nonzero eigenvalues, and all are real (c) M and C determine a CAR model iff σ2 > 0 and φ ∈ (λ−1

n , λ−1 1 ), with λ1 ≥ . . . ≥ λn ordered eigenvalues

  • f G−1/2WG1/2

Parameter space: Ω = Rp × (0, ∞) × (λ−1

n , λ−1 1 )

slide-13
SLIDE 13

SAR MODELS Conditional Specification: For i = 1, . . . , n Yi = x′

iβ + n

  • j=1

bij(Yj − x′

jβ) + ǫi

  • ǫi ∼ N(0, ξ2

i ), independent

  • β ∈ Rp,

ξi > 0

  • bij ≥ 0

and bij > 0 iff i ∼ j Let M = diag(ξ2

1, . . . , ξ2 n) and B = (bij) satisfy that

In − B is nonsingular. Then Joint Specification:

Y ∼ Nn(Xβ, (In − B)−1M(In − B′)−1)

slide-14
SLIDE 14

Particular Model:

  • M = σ2In
  • B = φA

so

Y ∼ Nn(Xβ, σ2((In − φA)2)−1

Parameter space: Ω = Rp × (0, ∞) × (λ−1

n , λ−1 1 ), with

λ1 ≥ . . . ≥ λn the ordered eigenvalues of A

slide-15
SLIDE 15

MODEL SELECTION Let M1, M2, . . . , Mk be the candidate models (k ≥ 2) Mj is either HCAR, WCAR, ACAR or SAR parameterized by ηj = (β, σ2

j , φj) ∈ Ωj

with covariance depending on Gj and Aj φj ∈ (1/λ(j)

n , 1/λ(j) 1 ) with

λ(j)

1

≥ λ(j)

2

≥ . . . ≥ λ(j)

n

eigenvalues of:

  • Aj in case of HCAR, ACAR and SAR
  • G1/2

j

AjG1/2

j

in case of WCAR The approach proposed here assumes all models have the same mean structure

slide-16
SLIDE 16

Likelihood for Mj Lj(ηj; y) = (2πσ2

j )−n

2|Σ−1

φj |

1 2 exp { −

1 2σ2

j

(y − Xβ)′Σ−1

φj (y − Xβ)}

where Σ−1

φj =

            

In − φjAj

for HCAR models

G−1

j

− φjAj

for WCAR models

G−1

j

− φjG−1/2

j

AjG−1/2

j for ACAR models

(In − φjAj)2

for SAR models

slide-17
SLIDE 17

Prior for Mj π(ηj | Mj) ∝ π(φj | Mj) σ2

j

1Ωj(ηj)

Two options for π(φj | Mj):

  • Uniform:

πU(φj | Mj) = 1(1/λ(j)

n ,1/λ(j) 1 )(φj)

  • Independence Jeffreys:

πJ1(φj | Mj) =

n

  • i=1

( λ(j)

i

1 − φjλ(j)

i

)2 − 1 n[

n

  • i=1

λ(j)

i

1 − φjλ(j)

i

]2

1

2

1(1/λ(j)

n ,1/λ(j) 1 )(φj)

(De Oliveira & Song, 2008; De Oliveira, 2011)

slide-18
SLIDE 18

−0.2 −0.1 0.0 0.1 0.2 2 4 6 8 10 12

(a)

φ π(φ) prior

  • indep. Jeffreys

Jeffreys−rule uniform

slide-19
SLIDE 19

Bayes Factors & Posterior Model Probabilities π(Mi | y) π(Mj | y) = m(y | Mi)π(Mi) m(y | Mj)π(Mj) = Bij × prior oddsij where m(y | Mj) =

  • Ωj

Lj(ηj | y)π(ηj | Mj)dηj, and Bij = m(y | Mi) m(y | Mj) Hence π(Mj | y) =

 

k

  • l=1

π(Ml) π(Mj)Blj

 

−1

, j = 1, . . . , k = m(y | Mj)

k

l=1 m(y | Ml)

, when π(Mj) = 1 k

slide-20
SLIDE 20

Remarks

  • Bayes factors and posterior model probabilities are, in

general, undetermined when improper priors are used

  • Important exception occurs when competing models

have same invariance structure, up to individual model parameters that have proper priors (Berger et al., 1998)

  • CAR and SAR models fit this exception when all the

competing models have the same mean structure and π(φj | Mj) is proper

slide-21
SLIDE 21

Fact As φj → 1/λ(j)

i

; i = 1 or n πJ1(φj | Mj) = O((1 − φjλ(j)

i

)−1) so πJ1(φj | Mj) is not integrable (De Oliveira & Song, 2008). Instead we use (πJ1(φj | Mj))r, with r < 1, which is proper and has the same “shape”.

slide-22
SLIDE 22

For j = 1, . . . , k: m(y | Mj) = Kcj

1/λ(j)

1

1/λ(j)

n

h(φj, Mj, y)dφj where h(φj, Mj, y) = |Σ−1

φj |1/2|X′Σ−1 φj X|−1/2(S2 φj)−(n−p)/2π(φj | Mj)

S2

φj = (y − Xˆ

βφj)′Σ−1

φj (y − Xˆ

βφj)

ˆ

βφj = (X′Σ−1

φj X)−1X′Σ−1 φj y

K = Γ(n−p

2 )

π

n−p 2

, cj =

  1/λ(j)

1

1/λ(j)

n

π(φj | Mj)dφj

 

−1

slide-23
SLIDE 23

Note

  • For posterior model probabilities to be well defined

and calibrated, the proportionality constants in the like- lihoods and priors of all competing models should be retained

  • Computation of m(y | Mj) involves one-dimensional

integration over a bounded interval

slide-24
SLIDE 24

Computation

  • Computation of ˆ

cj straightforward: numerical quadrature or Monte Carlo

ˆ cj =

  • ( 1

λ(j)

1

− 1 λ(j)

n

) 1 m

m

  • l=1

(πJ1(φ(l)

j

| Mj))1/2

−1

with φ(1)

j

, . . . , φ(m)

j iid

∼ unif(1/λ(j)

n , 1/λ(j) 1 )

  • Computation of m(y | Mj) requires more care:

h(φj, Mj, y) is highly peaked and concentrated near the right boundary for moderate or large sample sizes. Hence almost constant and very close to zero over most of the integration region, and common numerical quadrature

  • r Monte Carlo estimates are often zero.
slide-25
SLIDE 25

−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 5 10 15 20 25 30 φ π(φ|y) −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 10 20 30 40 φ π(φ|y) −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 20 40 60 80 100 φ π(φ|y) −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 5 10 15 20 25 30 φ π(φ|y) −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 10 20 30 40 φ π(φ|y) −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 20 40 60 80 100 φ π(φ|y)

slide-26
SLIDE 26

A Solution (Importance Sampling) Let ˜ φj value that maximizes h(φj, Mj, y), t ∈ [3, 4] and ωj = (1/λ(j)

1

− ˜ φj)/t. Then ˆ m(y | Mj) =

  • Φ(t) − Φ
  • t1/λ(j)

n

− ˜ φj) 1/λ(j)

1

− ˜ φj)

2πKcjωj m

m

  • l=1
  • h(φ(l)

j , Mj, y)

exp{−(φ(l)

j

− ˜ φj)2/2ω2

j }

  • where φ(1)

j

, . . . , φ(m)

j iid

∼ N(˜ φj, ω2

j ) truncated to (1/λ(j) n , 1/λ(j) 1 )

slide-27
SLIDE 27

Example 1: Phosphate Data

  • Data were transformed to become closer to Gaussian
  • HCAR, WCAR, ACAR and SAR models as competing

models

  • First and second order neighborhood systems were

entertained

  • E{˜

Yi} is β1 (p = 1)

  • r

β1 + β2si1 + β3si2 (p = 3)

  • All models equally likely a priori
  • Both default priors were considered
slide-28
SLIDE 28

Results

models HCAR-1 HCAR-2 WCAR-1 WCAR-2 ACAR-1 ACAR-2 SAR-1 SAR-2 modified independence Jeffreys prior p = 1 0.099 2.2 × 10−8 0.321 4.0 × 10−8 0.443 5.1 × 10−8 0.136 1.3 × 10−5 p = 3 0.130 7.6 × 10−8 0.249 9.2 × 10−8 0.488 1.2 × 10−7 0.132 1.9 × 10−5 uniform prior p = 1 0.085 4.3 × 10−7 0.295 6.6 × 10−7 0.416 6.6 × 10−7 0.203 1.5 × 10−5 p = 3 0.148 6.3 × 10−7 0.221 1.6 × 10−9 0.443 8.7 × 10−7 0.186 2.1 × 10−5

slide-29
SLIDE 29

Example 2: Crime Data

  • Significant explanatory variables:

an index of resource deprivation, an index of population structure, median age, divorce rate and unemployment rate

  • HCAR, WCAR, ACAR and SAR models as competing

models

  • Consider the adjacency neighborhood system (AC),

and two distance-based neighborhood systems with r = 70 miles (D70) and r = 100 miles (D100)

  • All models equally likely a priori
  • Both default priors were considered
slide-30
SLIDE 30

Results models HCAR WCAR ACAR SAR modified independence Jeffreys prior AC 4.2 × 10−6 D70 0.857 0.065 D100 3.0 × 10−3 0.074 uniform prior AC 3.6 × 10−6 D70 0.822 0.074 D100 3.4 × 10−3 0.100

slide-31
SLIDE 31

Conclusions ⊕ Method does not require nested competing models ⊕ Method provides interpretable measures of how strongly the data support each competing model ⊕ Method does not require assessing subjective priors for model parameters ⊖ Method requires all competing models to have the same mean structure