Bayesian inference & process convolution models Dave Higdon, - - PowerPoint PPT Presentation

bayesian inference process convolution models dave higdon
SMART_READER_LITE
LIVE PREVIEW

Bayesian inference & process convolution models Dave Higdon, - - PowerPoint PPT Presentation

Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL 1 MOVING AVERAGE SPATIAL MODELS 2 Kernel basis representation for spatial processes z ( s ) Define m basis functions k 1 ( s ) , . . . , k m ( s )


slide-1
SLIDE 1

Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL

1

slide-2
SLIDE 2

MOVING AVERAGE SPATIAL MODELS

2

slide-3
SLIDE 3

Kernel basis representation for spatial processes z(s)

Define m basis functions k1(s), . . . , km(s).

2 2 4 6 8 10 12 0.0 0.4 0.8 s basis

Here kj(s) is normal density cetered at spatial location ωj: kj(s) = 1 √ 2π exp{−1 2(s − ωj)2} set z(s) =

m

  • j=1 kj(s)xj where x ∼ N(0, Im).

Can represent z = (z(s1), . . . , z(sn))T as z = Kx where Kij = kj(si)

3

slide-4
SLIDE 4

x and k(s) determine spatial processes z(s)

kj(s)xj z(s)

2 2 4 6 8 10 12 0.5 0.5 1.5 s basis 2 2 4 6 8 10 12 0.5 0.5 1.5 s z(s)

Continuous representation: z(s) =

m

  • j=1 kj(s)xj where x ∼ N(0, Im).

Discrete representation: For z = (z(s1), . . . , z(sn))T, z = Kx where Kij = kj(si)

4

slide-5
SLIDE 5

Estimate z(s) by specifying kj(s) and estimating x

kj(s)xj z(s)

2 2 4 6 8 10 12 1.0 0.5 0.0 0.5 1.0 s fitted bases 2 2 4 6 8 10 12 2.0 1.5 1.0 0.5 0.0 0.5 s y(s)

Discrete representation: For z = (z(s1), . . . , z(sn))T, z = Kx where Kij = kj(si) ⇒ standard regression model: y = Kx +

5

slide-6
SLIDE 6

Formulation for the 1-d example

Data y = (y(s1), . . . , y(sn))T observed at locations s1, . . . , sn. Once knot locations ωj, j = 1, . . . , m and kernel choice k(s) are specified, the remaining model formulation is trivial: Likelihood: L(y|x, λy) ∝ λ

n 2

y exp

    −1

2λy(y − Kx)T(y − Kx)

    

where Kij = k(ωj − si). Priors: π(x|λx) ∝ λ

m 2

x exp

    −1

2λxxTx

    

π(λx) ∝ λax−1

x

exp{−bxλx} π(λy) ∝ λay−1

y

exp{−byλy} Posterior: π(x, λx, λy|y) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • ×

λax+m

2 −1

x

exp

  • −λx[bx + .5xTx]
  • 6
slide-7
SLIDE 7

Posterior and full conditionals

Posterior: π(x, λx, λy|y) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • ×

λax+m

2 −1

x

exp

  • −λx[bx + .5xTx]
  • Full conditionals:

π(x| · · ·) ∝ exp{−1 2[λyxTKTKx − 2λyxTKTy + λxxTx]} π(λx| · · ·) ∝ λax+m

2 −1

x

exp

  • −λx[bx + .5xTx]
  • π(λy| · · ·) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • Gibbs sampler implementation

x| · · · ∼ N((λyKTK + λxIm)−1λyKTy, (λyKTK + λxIm)−1) λx| · · · ∼ Γ(ax + m 2 , bx + .5xTx) λy| · · · ∼ Γ(ay + n 2, by + .5(y − Kx)T(y − Kx))

7

slide-8
SLIDE 8

Gibbs sampler: intuition

Gibbs sampler for a bivariate normal density π(z) = π(z1, z2) ∝

  • 1 ρ

ρ 1

  • −1

2

exp

        −1

2 ( z1 z2 )

   1

ρ ρ 1

  

−1 

  z1

z2

           

Full conditionals of π(z): z1|z2 ∼ N(ρz2, 1 − ρ2) z2|z1 ∼ N(ρz1, 1 − ρ2)

  • initialize chain with

z0 ∼ N

      0    ,    1

ρ ρ 1

     

  • draw z1

1 ∼ N(ρz0 2, 1 − ρ2)

now (z1

1, z0 2)T ∼ π(z)

u 6 u

  • u

z0 = (z0

1, z0 2)

(z1

1, z0 2)

(z1

1, z1 2) = z1

  • 6

z1 z2

9

slide-9
SLIDE 9

Gibbs sampler: intuition

Gibbs sampler gives z0, z2, . . . , zT which can be treated as dependent draws from π(z). If z0 is not a draw from π(z), then the initial realizations will not have the correct distribution. In practice, the first 100?, 1000? realizations are discarded. The draws can be used to make inference about π(z):

  • Posterior mean of z is estimated by:

   ˆ

µ1 ˆ µ2

   = 1

T

T

  • k=1

   zk

1

zk

2

  

  • Posterior probabilities:
  • P(z1 > 1) = 1

T

T

  • k=1 I[zk

1 > 1]

  • P(z1 > z2) = 1

T

T

  • k=1 I[zk

1 > zk 2]

  • 90% interval: (z[5%]

1

, z[95%]

1

).

  • 6

z1 z2

10

slide-10
SLIDE 10

1-d example

m = 6 knots evenly spaced between −.3 and 1.2. n = 5 data points at s = .05, .25, .52, .65, .91. k(s) is N(0, sd = .3) ay = 10, by = 10 · (.252) ⇒ strong prior at λy = 1/.252; ax = 1, bx = .001

0.2 0.4 0.6 0.8 1 3 2 1 1 2 3 s y(s) 0.2 0.4 0.6 0.8 1 3 2 1 1 2 3 s kj(s) basis functions 200 400 600 800 1000 4 2 2 4 6 8 iteration xj 200 400 600 800 1000 10 20 30 40 50 iteration x, y 20

slide-11
SLIDE 11

1-d example

From posterior realizations of knot weights x, one can construct posterior real- izations of the smooth fitted function z(s) =

m

j=1 kj(s)xj.

Note strong prior on λy required since n is small.

0.2 0.4 0.6 0.8 1 3 2 1 1 2 3 s z(s), y(s) posterior realizations of z(s) 0.2 0.4 0.6 0.8 1 3 2 1 1 2 3 s z(s), y(s) mean & pointwise 90% bounds

21

slide-12
SLIDE 12

1-d example

m = 20 knots evenly spaced between −2 and 12. n = 18 data points evenly spaced between 0 and 10. k(s) is N(0, sd = 2)

s y 2 4 6 8 10

  • 0.5

0.0 0.5 1.0 1.5

data

s y 2 4 6 8 10

  • 0.5

0.0 0.5 1.0 1.5

posterior mean and 80% CI for z(s)

10 20 30 40 50 60 50 100 150 200 lambda_y posterior dist for lam_y 0.0 0.5 1.0 1.5 2.0 50 100 150 200 250 300 lambda_x posterior dist for lam_x

8

slide-13
SLIDE 13

1-d example

m = 20 knots evenly spaced between −2 and 12. n = 18 data points evenly spaced between 0 and 10. k(s) is N(0, sd = 1)

s y 2 4 6 8 10

  • 0.5

0.0 0.5 1.0 1.5

data

s y 2 4 6 8 10

  • 0.5

0.0 0.5 1.0 1.5

posterior mean and 80% CI for z(s)

100 200 300 400 500 600 50 100 150 200 lambda_y posterior dist for lam_y 0.5 1.0 1.5 2.0 2.5 3.0 3.5 50 100 150 lambda_x posterior dist for lam_x

9

slide-14
SLIDE 14

Basis representations for spatial processes z(s)

Represent z(s) at spatial locations s1, . . . , sn. z = (z(s1), . . . , z(sn))T ∼ N(0, Σz). Recall z = Kx, where KKT = Σz and x ∼ N(0, In). Gives a discrete representation of z(s) at locations s1, . . . , sn. discrete representation

2 4 6 8 10 1.0 0.5 0.0 0.5 1.0 s basis

z(si) =

n

  • j=1 Kijxj

continuous representation

2 4 6 8 10 1.0 0.5 0.0 0.5 1.0 s basis

z(s) =

n

  • j=1 kj(s)xj

Columns of K give basis functions. Can use a subset of these basis functions to reduce dimensionality.

10

slide-15
SLIDE 15

decomposition basis covariance Cholesky (w/piv)

2 4 6 8 10 1.0 0.5 0.0 0.5 1.0 s basis 2 4 6 8 10 2 4 6 8 10 location s location s

SVD

2 4 6 8 10 1.0 0.5 0.0 0.5 1.0 s basis 2 4 6 8 10 2 4 6 8 10 location s location s

kernels

2 4 6 8 10 1.0 0.5 0.0 0.5 1.0 s basis 2 4 6 8 10 2 4 6 8 10 location s location s

11

slide-16
SLIDE 16

How many basis kernels?

Define m basis functions k1(s), . . . , km(s). m = 20?

2 4 6 8 10 1.0 0.5 0.0 0.5 1.0 s basis 2 4 6 8 10 2 4 6 8 10 location s location s

m = 10?

2 4 6 8 10 1.0 0.5 0.0 0.5 1.0 s basis 2 4 6 8 10 2 4 6 8 10 location s location s

m = 6?

2 4 6 8 10 1.0 0.5 0.0 0.5 1.0 s basis 2 4 6 8 10 2 4 6 8 10 location s location s

12

slide-17
SLIDE 17

Moving average specifications for spatial models z(s)

smoothing kernel k(s) white noise process x = (x1, . . . , xm) at spatial locations ω1, . . . , ωm. x ∼ N(0, 1 λx Im) spatial process z(s) constructed by convolving x with smoothing kernel k(s) z(s) =

m

  • j=1 xjk(ωj − s)

⇒ z(s) is a Gaussian process with mean 0 and covariance given by Cov(z(s), z(s)) = 1 λx

m

  • j=1 k(ωj − s)k(ωj − s)

13

slide-18
SLIDE 18

14

slide-19
SLIDE 19

Example: constructing 1-d models for z(s)

s z(s), x(s) 2 4 6 8 10

  • 3
  • 1

1 2 3 s z(s), x(s) 2 4 6 8 10

  • 3
  • 1

1 2 3 s z(s), x(s) 2 4 6 8 10

  • 3
  • 1

1 2 3 s z(s), x(s) 2 4 6 8 10

  • 3
  • 1

1 2 3

m = 20 knot locations ω1, . . . , ωm equally spaced between −2 and 12. x = (x1, . . . , xm)T ∼ N(0, Im) z(s) =

m

k=1 k(ωk − s)xk

k(s) is a normal density with sd = 1

4, 1 2, and 1

4th frame uses k(s) = 1.4 exp{−1.4|s|}. General points:

  • smooth kernels required
  • spacing depends on kernel width

− knot spacing ≤ 1 sd for normal k(s)

  • kernel width is equivalent to scale parameter in

GP models

16

slide-20
SLIDE 20

17

slide-21
SLIDE 21

MRF formulation for the 1-d example

Data y = (y(s1), . . . , y(sn))T observed at locations s1, . . . , sn. Once knot locations ωj, j = 1, . . . , m and kernel choice k(s) are specified, the remaining model formulation is trivial: Likelihood: L(y|x, λy) ∝ λ

n 2

y exp

    −1

2λy(y − Kx)T(y − Kx)

    

where Kij = k(ωj − si)xj. Priors: π(x|λx) ∝ λ

m 2

x exp

    −1

2λxxTWx

    

π(λx) ∝ λax−1

x

exp{−bxλx} π(λy) ∝ λay−1

y

exp{−byλy} Posterior: π(x, λx, λy|y) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • ×

λax+m

2 −1

x

exp

  • −λx[bx + .5xTWx]
  • 18
slide-22
SLIDE 22

Posterior and full conditionals (MRF formulation)

Posterior: π(x, λx, λy|y) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • ×

λax+m

2 −1

x

exp

  • −λx[bx + .5xTWx]
  • Full conditionals:

π(x| · · ·) ∝ exp{−1 2[λyxTKTKx − 2λyxTKTy + λxxTWx]} π(λx| · · ·) ∝ λax+m

2 −1

x

exp

  • −λx[bx + .5xTWx]
  • π(λy| · · ·) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • Gibbs sampler implementation

x| · · · ∼ N((λyKTK + λxW)−1λyKTy, (λyKTK + λxW)−1) λx| · · · ∼ Γ(ax + m 2 , bx + .5xTx) λy| · · · ∼ Γ(ay + n 2, by + .5(y − Kx)T(y − Kx))

19

slide-23
SLIDE 23

1-d example - using MRF prior for x

0.0 0.5 1.0 2 4 6 8 10 2 4 6 8 10 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 2 4 6 8 10 0.0 0.5 1.0 2 4 6 8 10

post mean & 80% ci for z(s) post mean for x’s

spatial location s estimated process z(s)

(d) gp model (c) iid x’s; k(s) fat (b) rw x’s; k(s) skinny (a) iid x’s; k(s) skinny

estimated x & Kx

20

slide-24
SLIDE 24

8 hour max for ozone on a summer day in the Eastern US

  • 20

60 100

n = 510 ozone measurements ωk’s laid out on a hexagonal lattice as shown. k(s) is circular normal, with sd = lattice spacing. Choice of width for k(s):

  • could look at empirical variogram
  • could estimate using ML or cross-validation
  • could treat as additional parameter in posterior

21

slide-25
SLIDE 25

A multiresolution spatial model formulation

z(s) = zcoarse(s) + zfine(s)

  • 20

60 100

Coarse process: mc = 27 locations ωc

1, . . . , ωc mc on a hexagonal

grid. xc = (xc1, . . . , xcmc)T ∼ N(0, 1

λcImc)

coarse smoothing kernel kc(s) is normal with sd = coarse grid spacing. Fine process: mf = 87 locations ωf

1, . . . , ωf mf on a hexagonal

grid. xf = (xf1, . . . , xfmf)T ∼ N(0, 1

λfImf)

fine smoothing kernel kf(s) is normal with sd = fine grid spacing. note: coarse kernel width is twice the fine kernel width.

22

slide-26
SLIDE 26

Multiresolution formulation and full conditionals

Model: y = Kcxc + Kfxf + y = Kx + where K = ( Kc Kf ) and x =

   xc

xf

  

Define Wx =

   λcImc

λfImf

  

Gibbs sampler implementation then becomes x| · · · ∼ N((λyKTK + Wx)−1λyKTy, (λyKTK + Wx)−1) λc| · · · ∼ Γ(ax + mc 2 , bx + .5xT

c xc)

λf| · · · ∼ Γ(ax + mf 2 , bx + .5xT

f xf)

λy| · · · ∼ Γ(ay + n 2, by + .5(y − Kx)T(y − Kx))

23

slide-27
SLIDE 27

Multiresolution model for 8 hour max ozone

  • 20

60 100

coarse formulation coarse + fine formulation

24

slide-28
SLIDE 28

Basic binary classification example

5 10 15 20 5 10 15 20

data

5 10 15 20 5 10 15 20

true field

  • binary spatial process z∗(s)
  • spatial area partitioned into two regions: z∗(s) = 1 and z∗(s) = 0.
  • n = 10 measurements y = (y1, . . . , yn)T taken at spatial locations s1, . . . , sn.

yi = z∗(si) + i, i = 1, . . . , n; i

iid

∼ N(0, 1), i = 1, . . . , n

25

slide-29
SLIDE 29

Constructing a binary spatial process z∗(s)

5 10 15 20 5 10 15 20 2 2 s1

knot values x

s2 x 5 10 15 20 5 10 15 20 0.002 0.004 0.006 0.008 0.01 s1

smoothing kernel

s2 k(s)

5 1 1 5 2 X 5 10 15 20 Y

  • 4
  • 2

2 4 Z

z(s)

x convolved with k(s) gives z(s) x ∼ N(0, Im) where each xj is located at ωj; z(s) =

m

  • j=1 xjk(s − ωj)

5 1 1 5 2 X 5 10 15 20 Y

  • 4
  • 2

2 4 Z

z(s)

5 1 1 5 2 X 5 10 15 20 Y

  • 2 -1 0

1 2 3 Z

binary z*(s)

Now define the binary field z∗(s) by “clipping” z(s): z∗(s) = I[z(s) > 0].

26

slide-30
SLIDE 30

Model Formulation

Define n = 10 observations y = (y(s1), . . . , y(sn))T. Define z∗ = (z∗(s1), . . . , z∗(sn))T. Define x = (x(ω1), . . . , x(ωm))T to be the m = 25-vector of white noise knot values at spatial grid sites ω1, . . . , ωm. Recall z∗(s) and the vector z∗ are determined by the knot values x. Likelihood L(y|z∗) ∝ exp{−1 2(y − z∗)T(y − z∗)} Independent normal prior for x π(x) ∝ exp{−1 2xTx} Posterior distribution π(x|y) ∝ exp{−1 2(y − z∗)T(y − z∗) − 1 2xTx}

27

slide-31
SLIDE 31

Sampling the posterior via Metropolis

Full conditional distributions π(xj|x−j, y) ∝ exp{−1 2(y − z∗)T(y − z∗) − 1 2x2

j}, j = 1, . . . , m

Metropolis implementation for sampling from π(x|y): Initialize x at x = 0. Cycle thru full conditionals updating each xj according to Metropolis rules.

  • generate proposal x∗

j ∼ U[xj − r, xj + r].

  • compute acceptance probability

α = min

      1, π(x∗

j|x−j, y)

π(xj|x−j, y)

      

  • update xj to new value:

xnew

j

=

      

x∗

j

with probability α xj with probability 1 − α Here we ran for T = 1000 scans, giving realizations x1, . . . , xT from the poste-

  • rior. Discarded the first 100 for burn in.

Note: proposal width r tuned so that x∗

j is accepted about half the time.

28

slide-32
SLIDE 32

Sampling from non-standard multivariate distributions

Nick Metropolis – Computing pioneer at Los Alamos National Laboratory − inventor of the Monte Carlo method − inventor of Markov chain Monte Carlo: Equation of State Calculations by Fast Computing Machines (1953) by N. Metropolis, A. Rosenbluth,

  • M. Rosenbluth, A. Teller and E. Teller, Journal of

Chemical Physics. Originally implemented on the MANIAC1 computer at LANL Algorithm constructs a Markov chain whose realiza- tions are draws from the target (posterior) distribu- tion. Constructs steps that maintain detailed balance.

29

slide-33
SLIDE 33

Gibbs Sampling and Metropolis for a bivariate normal density

π(z1, z2) ∝

  • 1 ρ

ρ 1

  • −1

2

exp

        −1

2 ( z1 z2 )

   1

ρ ρ 1

  

−1 

  z1

z2

           

sampling from the full conditionals z1|z2 ∼ N(ρz2, 1 − ρ2) z2|z1 ∼ N(ρz1, 1 − ρ2) also called heat bath Metropolis updating: generate z∗

1 ∼ U[z1 − r, z1 + r]

calculate α = min{1, π(z∗

1,z2)

π(z1,z2) = π(z∗

1|z2)

π(z1|z2)}

set znew

1

=

      

z∗

1 with probability α

z1 with probability 1 − α

30

slide-34
SLIDE 34

Posterior realizations of z∗(s) = I[z(s) > 0]

31

slide-35
SLIDE 35

Posterior mean of z∗(s) = I[z(s) > 0]

5 10 15 20 5 10 15 20

data

5 10 15 20 5 10 15 20

true field

5 10 15 20 5 10 15 20

posterior mean

32

slide-36
SLIDE 36

Application – locating archeological sites

1.3 1.3 1.1 0.9 1.0 1.2 1.1 1.5 1.4 0.7 1.2 1.2 1.0 0.6 1.2 1.7 1.0 0.7 1.5 1.7 1.4 1.8 2.7 1.5 1.8 1.6 1.4 1.1 0.4 1.4 1.0 1.1 1.2 1.4 1.6 2.0 2.1 2.1 2.0 1.5 0.8 2.2 1.6 1.8 0.4 1.3 0.1 1.0 1.6 1.8 1.8 2.4 ? 2.4 2.8 2.0 2.6 1.8 1.9 1.5 1.4 0.1 0.6

  • 0.1

1.3 2.1 1.6 1.6 2.0 1.8 1.8 1.8 2.8 0.7 1.1

  • 1.1

0.2 0.3 0.4 0.2 1.6 2.0 1.9 2.4 2.5 2.0 1.4 1.4 1.3 1.3 0.6 1.8 1.4 0.7 0.7 0.6 1.8 1.7 1.9 1.9 1.8 1.4 1.5 1.1 1.2 0.4

  • 0.2

1.2 1.0

  • 0.1

0.3 1.2 1.2 1.7 1.6 1.8 1.7 1.5 1.8 1.6 0.7 0.1 1.3 1.2 0.6 0.6

  • 0.3

1.3 0.2 0.9 0.9 0.9 0.6

  • 0.4

1.3 1.6 1.7 0.7 0.3 1.2

  • 0.9

1.3 0.6 1.3 0.2 0.4 2.8 0.0 0.4 1.6 2.3 1.7 1.4 1.5 2.0 1.3 1.8 1.8 0.9 1.0 1.1

  • 0.4

1.1 1.6 1.2 1.8 1.3 1.2 2.9 1.3 ? ? 1.4 1.2 0.0 1.6 1.4 1.4 0.3

  • 0.7

0.4 1.7 1.3 0.1 1.4 2.5 ? ? 2.0

  • 0.4

0.7

  • 0.1

? 1.4 1.3 1.2 1.1 0.1 0.6 0.7

  • 0.1

1.2 1.4 3.2 2.0 1.2 0.9

  • 1.3

? 1.5 0.3 1.1 1.0 1.1 1.2 0.8 0.8 1.6 1.7 1.7 2.0 0.8 1.8 1.3 2.4 1.6 1.0 0.7 0.7

  • 0.2

0.4 0.8 1.0 0.9 ? ? 1.8 1.6 2.2 2.4 1.8 1.4 1.5 0.2 0.7 1.5

  • 0.4

2.2 0.8 4.4 1.2 1.4 2.0 2.4 2.4 2.6

  • 1

1 2 3 4

Data at 16 × 16 − 9 locations over 10 m2 area Assume y(si)|z∗(si) ∼ N(µ(z∗(si)), 1) – µ(0) = 1, µ(1) = 2. Recall z(s) =

m

j=1 xjk(s − ωj) and z∗(s) = I[z(s) > 0].

m = 8 × 8 lattice of knot locations ω1, . . . , ωm; sd of k(s) equal to knot spacings.

33

slide-37
SLIDE 37

Posterior mean and realizations for z∗(s) = I[z(s) > 0]

· · · · · ·· ·· · · · ·· · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · · ·· ·· · · · · ·· · · · ··· · · · · ·· · · · · · · · · · ·· · · · · · · ·· · · · · ·· · · · · · · · · · · · · · ·· · · · · · · · · · ··· ·· · · · · · · · · · ·· · · · · · · · · · · · · · · · ·· ·· · · · · · · · · · · · · · ··· ·· ·· · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · ·· · · · · · · · ·· · · · · ·· ·· · · ·· · · · · ·· · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · ·· · · · · · · · ··· · ·· · · · · · · · · · · · · · · · · · · · · · · · · ·· · · ··· · · · · · · · · · · · · ·· ·· · · ·· · · · · · · · · · · ·· · · · · · ··· · · · · · · · · · · · ·· · · · · · · · · · · · · · · ··· · · · · · · · · · · · · · · · · · · · · · ·· · · ·· · · · · ·· · · · · · · · ·· · · ·· · · · ·· · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · ·· ··· · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · ·· · · ·· · · ·· · · · ·· · · ·· · · · · · · · · · ·· · · · · ·· · · · · · · · ·· · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · ·· · · · · ·· · · · · · · · · ·· · · · · · ·· · · · · · · · · · · · · · ·· · ·· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · · · · · ·· · · ·· · · · · · · · · · ·· · · · · · · · · · · · · · · ··

34

slide-38
SLIDE 38

Bayesian formulation

Time Well.NW 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.N 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.NE 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.W 10 20 30 40 50 60 0.0 0.10 0.20 0.30 10 20 30 40 50 60 10 20 30 40 50 60 Time Well.E 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.SW 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.S 10 20 30 40 50 60 0.0 0.10 0.20 0.30 Time Well.SE 10 20 30 40 50 60 0.0 0.10 0.20 0.30

L(y|η(z)) ∝ |Σ|−1

2 exp

    −1

2(y − η(z))TΣ−1(y − η(z))

    

π(z|λz) ∝ λ

m 2

z exp

    −1

2zTWzz

    

π(λz) ∝ λaz−1

z

exp {bzλz} π(z, λz|y) ∝ L(y|η(z)) × π(z|λz) × π(λz)

42

slide-39
SLIDE 39

Posterior realizations of z under MRF and moving average priors

Well Data

0.42 0.34 0.93 0.52 0.17 P P P I I I I

MRF Realization

10 20 30 40 10 20 30

MRF Realization

10 20 30 40 10 20 30

MRF Posterior Mean

10 20 30 40 10 20 30

GP Realization

10 20 30 40 10 20 30

GP Realization

10 20 30 40 10 20 30

GP Posterior Mean

10 20 30 40 10 20 30

43

slide-40
SLIDE 40

References

  • D. Higdon (1998) A process-convolution approach to modeling temperatures

in the North Atlantic Ocean (with discussion), Environmental and Ecolog- ical Statistics, 5:173–190.

  • D. Higdon (2002) Space and space-time modeling using process convolutions,

in Quantitative Methods for Current Environmental Issues (C. Anderson and V. Barnett and P. C. Chatwin and A. H. El-Shaarawi, eds),37–56.

  • D. Higdon, H. Lee and C. Holloman (2003) Markov chain Monte Carlo

approaches for inference in computationally intensive inverse problems, in Bayesian Statistics 7, Proceedings of the Seventh Valencia Interna- tional Meeting (Bernardo, Bayarri, Berger, Dawid, Heckerman, Smith and West, eds).

44

slide-41
SLIDE 41

NON-STATIONARY SPATIAL CONVOLUTION MODELS

45

slide-42
SLIDE 42

A convolution-based approach for building non-stationary spatial models * =

white noise smoothing kernel spatial field z(s) =

m

  • j=1 xjk(ωj − s) =

m

  • j=1 xjks(ωj)

x ∼ N(0, Im) where each xj is located at ωj over a regular 2-d grid.

46

slide-43
SLIDE 43

Convolutions for constructing non-stationary spatial models

z(s) =

m

  • j=1 xjks(ωj)

x ∼ N(0, λ−1

x Im) at regular 2-d lattice locations ω1, . . . , ωm.

⇒ z(s) ∼ GP(0, C(s1, s2) where C(s1, s2) = λ−1

x m

  • j=1 ks1(ωj)ks2(ωj)

smoothing kernel ks(·) changes smoothly over spatial location

47

slide-44
SLIDE 44

Defining smoothly varying kernels via basis kernels

k1(·) k2(·) k3(·) Define ks(·) = w1(s)k1(· − s) + w2(s)k2(· − s) + w3(s)k3(· − s) ks(·) is a weighted combination of kernels centered at s. Define weights that change smoothly over space.

48

slide-45
SLIDE 45

Defining smoothly varying kernels via basis kernels

Define iid, mean 0 Gaussian Processes φ1(s), φ2(s) and φ3(s) Set wi(s) = exp{φi(s)} exp{φ1(s)} + exp{φ2(s)} + exp{φ3(s)} Estimate φi(s)’s like any other parameter in the analysis.

49

slide-46
SLIDE 46

An application to Piazza Road Superfund site

50

slide-47
SLIDE 47

MOVING AVERAGE/BASIS SPACE-TIME MODELS

51

slide-48
SLIDE 48

A convolution-based approach for building space-time models

marked point process smoothing kernel space-time field time space time space

Define space-time domain S × T Define discrete knot process x(s, t) on {(ω1, τ1), . . . , (ωm, τm)} within S × T Define smoothing kernel k(s, t) Construct space-time process z(s, t) z(s, t) =

m

  • j=1 k((s, t) − (ωj, τj))x(ωj, τj)
  • r with varying kernel

z(s, t) =

m

  • j=1 kst(ωj, τj)xj

52

slide-49
SLIDE 49

A Space-time model for ocean temperatures

longitude latitude

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N

6 8 10 14 degrees C

  • cean temperature at constant potential density

Data: y = (y1, . . . , yn)T at space-time locations: (s1, t1), . . . , (sn, tn) Times: 1910–1988 assume data are centered (¯ y = 0)

53

slide-50
SLIDE 50

Knot locations and kernels

longitude latitude

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N

  • smoothing kernels and latent grid

m knot locations over a space-time grid (ω1, τ1), . . . , (ωm, τm) Spatial knot locations shown; Temporal knot spacing ∼ 7 years; 1900-1995; Kernels vary with spatial location kst(ω, τ) = ks(ω, τ) use spatial mixture of normal “ba- sis” kernels.

54

slide-51
SLIDE 51

Formulation for the ocean example

Likelihood: L(y|x, λy) ∝ λ

n 2

y exp

    −1

2λy(y − Kx)T(y − Kx)

    

where Kij = ksiti(ωj, τj)xj. Priors: π(x|λx) ∝ λ

m 2

x exp

    −1

2λxxTx

    

π(λx) ∝ λax−1

x

exp{−bxλx} π(λy) ∝ λay−1

y

exp{−byλy} Posterior: π(x, λx, λy|y) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • ×

λax+m

2 −1

x

exp

  • −λx[bx + .5xTx]
  • 55
slide-52
SLIDE 52

Full conditionals for ocean formulation

Full conditionals: π(x| · · ·) ∝ exp{−1 2[λyxTKTKx − 2λyxTKTy + λxxTx]} π(λx| · · ·) ∝ λax+m

2 −1

x

exp

  • −λx[bx + .5xTx]
  • π(λy| · · ·) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • Gibbs sampler implementation

x| · · · ∼ N((λyKTK + λxIm)−1λyKTy, (λyKTK + λxIm)−1) xj| · · · ∼ N

    

λyrT

j kj

λykT

j kj + λx

, 1 λykT

j kj + λx

    

λx| · · · ∼ Γ(ax + m 2 , bx + .5xTx) λy| · · · ∼ Γ(ay + n 2, by + .5(y − Kx)T(y − Kx)) where kj is jth column of K and rj = y −

  • j=j kjxj.

56

slide-53
SLIDE 53

Posterior mean of the space-time temperature field

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 6 8 10 12 14

1960

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 6 8 10 12 14

1965

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 6 8 10 12 14

1970

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 6 8 10 12 14

1975

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 6 8 10 12 14

1980

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 6 8 10 12 14

1985

57

slide-54
SLIDE 54

Deviations from time-averaged mean temperature field

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N

  • 1

1

1960

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N

  • 1

1

1965

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N

  • 1

1

1970

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N

  • 1

1

1975

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N

  • 1

1

1980

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N

  • 1

1

1985

58

slide-55
SLIDE 55

Posterior probabilities of differing from time-averaged mean field

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 0.0 0.4 0.8

1960

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 0.0 0.4 0.8

1965

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 0.0 0.4 0.8

1970

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 0.0 0.4 0.8

1975

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 0.0 0.4 0.8

1980

30W 25W 20W 15W 10W 5W 20N 25N 30N 35N 40N 45N 50N 0.0 0.4 0.8

1985

59

slide-56
SLIDE 56

Alternative approaches for building space-time models

latent process smoothing kernel space-time field time space time space space

Define space-time domain S × T with T = {1, . . . , nt} Discrete knot process x(s, t) on {ω1, . . . , ωns} × {1, . . . , nt}, nt · ns = m Define spatial smoothing kernel k(s) Construct space-time process z(s, t) z(s, t) =

ns

  • j=1 k(s − ωj)x(ωj, t)
  • r with varying kernel

z(s, t) =

ns

  • j=1 kst(ωj)xjt

60

slide-57
SLIDE 57

8 hour max for ozone over summer days in the Eastern US

20 60 100

n = 510 ozone measurements each of 30 days ωk’s laid out on same hexagonal lattice

61

slide-58
SLIDE 58

Temporally evolving latent x(s, t) process

Times: t ∈ T = {1, . . . , nt}, nt = 30 spatial knot locations: W = {ω1, . . . , ωns}, ns = 27 x(s, t) defined on W × T Space-time process z(s, t) obtained by convolving x(s, t) with k(s): z(s, t) =

ns

  • j=1 k(ωj − s)x(ωj, t)

=

ns

  • j=1 ks(ωj)xjt

Specify locally linear MRF priors for each xj = (xj1, . . . , xjnt)T π(xj|λx) ∝ λ

nt 2 exp

    −1

2λxxT

j Wxj

    

where Wij =

                      

−1 if |i − j| = 1 1 if i = j = 1 or i = j = nt 2 if 1 < i = j < nt

  • therwise

So for x = (x11, x21 . . . , xns1, x12, . . . , xns2, . . . , x1nt, . . . , xnsnt)T π(x|λx) ∝ λ

ntns 2 exp

    −1

2λxxT(W ⊗ Ins)x

     62

slide-59
SLIDE 59

Formulation for temporally evolving z(s, t)

Data: at each time t, observe n-vector yt = (y1t, . . . , ynt)T at sites s1, . . . , sn. Likelihood for data observed at time t: L(yt|xt, λy) ∝ λ

n 2

y exp

    −1

2λy(yt − Ktxt)T(y − Ktxt)

    

where Kt

ij = k(ωj − si)

Define n · nt-vector y = (yT

1 , . . . , yT nt)T

Likelihood for entire data y: L(y|x, λy) ∝ λ

nnt 2

y

exp

    −1

2λy(y − Kx)T(y − Kx)

    

where K = diag(K1, . . . , Knt). Priors: π(x|λx) ∝ λ

m 2

x exp

    −1

2λxxTWx

    

π(λx) ∝ λax−1

x

exp{−bxλx} π(λy) ∝ λay−1

y

exp{−byλy}

63

slide-60
SLIDE 60

Posterior and full conditionals

π(x, λx, λy|y) ∝ λay+nnt

2 −1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • ×

λax+nsnt

2 −1

x

exp

  • −λx[bx + .5xTWx]
  • Full conditionals:

π(x| · · ·) ∝ exp{−1 2[λyxTKTKx − 2λyxTKTy + λxxTWx]} π(λx| · · ·) ∝ λax+m

2 −1

x

exp

  • −λx[bx + .5xTWx]
  • π(λy| · · ·) ∝ λay+n

2−1

y

exp

  • −λy[by + .5(y − Kx)T(y − Kx)]
  • Gibbs sampler implementation

x| · · · ∼ N((λyKTK + λxW)−1λyKTy, (λyKTK + λxW)−1) xjt| · · · ∼ N

    

λyrT

tjktj + nj¯

x∂j λykT

tjktj + njλx

, 1 λykT

j kj + njλx

    

λx| · · · ∼ Γ(ax + m 2 , bx + .5xTx) λy| · · · ∼ Γ(ay + n 2, by + .5(y − Kx)T(y − Kx)) where ktj is jth column of Kt, rtj = yt−

  • j=j ktjxtj, nj = number of neighbors
  • f xjt, and x∂jt =mean of neighbors of xjt

64

slide-61
SLIDE 61

DLM setup for ozone example

Given latent process xt = (x1,t, . . . , x27,t)T, t = 1, . . . , 30 yt = (y1t, . . . , ynyt)T at sites s1t, . . . , snyt yt = Ktxt + t xt = xt−1 + νt where Kt is the ny × 27 matrix given by: Kt

ij = k(sit − ωj), t = 1, . . . , 30,

t

i.i.d.

∼ N(0, σ2

), t = 1, . . . , 30,

νt

i.i.d.

∼ N(0, σ2

ν), t = 1, . . . , 30, and

x1 ∼ N(0, σ2

xI27).

can use dynamic linear model (DLM)/Kalman filter machinery single site MCMC works too See Stroud et.al. (1999) for alternative model.

65

slide-62
SLIDE 62

Posterior mean for first 9 days

20 60 100

66

slide-63
SLIDE 63

Posterior mean of selected xj’s

11 11 1 1 1 11 1 1 1 1 111 1 11 11 1 1 1 1 1 1 1 1 1 2 2 22 222222 2 2 2 22 222 2 22 2 2222222 2 333 3 3 3 3 3 3 3 333 3 3 3 33 333 3 333 3 3 3 33 t (days) latent proces values x(s,t) 5 10 15 20 25 30

  • 300
  • 100 0 100

300 independent over time 111 1111 11111 1 1 11 111 11 1 1 1 111 1 1 1 2 2 22 222222 22 2 222 22 2 22 2 2 2222222 333 33 3333 333 3 3 3 3 33 3333 33 3 333 3 3 t (days) latent proces values x(s,t) 5 10 15 20 25 30

  • 300
  • 100 0 100

300 random walk over time

67

slide-64
SLIDE 64

References

  • D. Higdon (1998) A process-convolution approach to modeling temperatures

in the North Atlantic Ocean (with discussion), Environmental and Ecolog- ical Statistics, 5:173–190.

  • J. Stroud, P. M¨

uler and B. Sanso (2001) Dynamic Models For Spatio- Temporal Data, Journal of the Royal Statistical Society, Series B, 63, 673–689.

  • D. Higdon (2002) Space and space-time modeling using process convolutions,

in Quantitative Methods for Current Environmental Issues (C. Anderson and V. Barnett and P. C. Chatwin and A. H. El-Shaarawi, eds), 37–56.

  • M. West and J. Harrison (1997) Bayesian Forecasting and Dynamic Mod-

els (Second Edition), Springer-Verlag.

68