P -spline ANOVA-type interaction models for spatio-temporal - - PowerPoint PPT Presentation

p spline anova type interaction models for spatio
SMART_READER_LITE
LIVE PREVIEW

P -spline ANOVA-type interaction models for spatio-temporal - - PowerPoint PPT Presentation

P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee and Mar a Durb an Universidad Carlos III de Madrid Department of Statistics IWSM Utrecht 2008 D.-J. Lee and M. Durban (UC3M) P -spline


slide-1
SLIDE 1

P-spline ANOVA-type interaction models for spatio-temporal smoothing

Dae-Jin Lee⋆ and Mar´ ıa Durb´ an

Universidad Carlos III de Madrid Department of Statistics

IWSM Utrecht 2008

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 1 / 26

slide-2
SLIDE 2

Outline

1 Motivation 2 Penalized splines for Spatio-Temporal data 3 ANOVA-Type Interaction Models 4 Application to O3 pollution in Europe 5 Conclusions

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 2 / 26

slide-3
SLIDE 3
  • 1. Motivation
  • Air pollution
  • Enviromental policies
  • Monitoring networks:

◮ European Environmental Agency (EEA) ◮ EMEP project (European Monitoring and Evaluation Programme)

  • Ozone (O3) is currently one of the air pollutants of most concern in

Europe.

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 3 / 26

slide-4
SLIDE 4

Monitoring stations across Europe

−5 5 10 15 20 25

  • sample of 45 monitoring stations

Monitoring station D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 4 / 26

slide-5
SLIDE 5

O3 time series plot for selected locations ◮ Seasonal pattern:

1999 2000 2001 2002 2003 2004 2005 20 40 60 80 100 120 140 time Spain Finland France UK

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 5 / 26

slide-6
SLIDE 6

O3 level from 01/2004 to 12/2005

Play animation

20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 6 / 26

slide-7
SLIDE 7
  • 1. Motivation

◮ Spatio-temporal data

  • Response variable, yijt

◮ measured over geographical locations, s = (xi, xj), with i, j = 1, .., n ◮ and over time periods, xt, for t = 1, ...., T

  • ISSUE: huge amount of data available

◮ e.g. : Environmental data, epidemiologic studies, disease mapping

applications, ...

  • Smoothing techniques:

◮ Study spatial and temporal trends. ◮ Space and time interactions. ◮ “Penalized Splines” (Eilers and Marx, 1996). D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 7 / 26

slide-8
SLIDE 8
  • 2. Penalized splines

◮ “The flexible smoother”

  • Methodology:

◮ Given the data (xi, yi), i = 1, ..., n. ◮ Fit a sum of local basis functions: f (xi) = Bθ ◮ Minimize the Penalized Sum of Squares:

yi − f (xi)2 + Penalty

◮ The Penalty controls the smoothness of the fit. Smoothing parameter: λ Apply a discrete penalty over coefficients θ, e.g. in 1d:

P = λD′D where D is a difference matrix acting on θ.

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 8 / 26

slide-9
SLIDE 9
  • 2. Penalized splines

◮ “The flexible smoother”

  • For array data (Currie et al., 2006):

◮ Generalized Linear Array Methods (GLAM):

f (x1, ..., xd) = Bθ

◮ where B is the Kronecker product of d B-splines basis:

B = B1 ⊗ B2 ⊗ .... ⊗ Bd

◮ Efficient Algorithms for smoothing on multidimensional grids

(e.g. mortality data, images, etc...).

◮ Easy representation as a Mixed Model:

f (x1, ..., xd) = Xβ + Zα

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 9 / 26

slide-10
SLIDE 10
  • 2. Penalized splines

◮ Example of GLAM:

  • 3d-case:

f (x1, x2, x3) = Bθ

  • Basis: B = B1 ⊗ B2 ⊗ B3

◮ θ can be expressed as a 3d-array A = {θ}ijk of dim. c1 × c2 × c3

θ(1,1,c3) θ(1,c2,c3) θ(1,1,1)

1,...,c2 columns rows 1,...,c1 layer 1,...,c3

  • θ(1,c2,1)
  • θ(c1,1,c3)

θ(c1,c2,c3) θ(c1,1,1) θ(c1,c2,1)

  • D.-J. Lee and M. Durban (UC3M)

’P-spline ANOVA-type models’ IWSM 2008 10 / 26

slide-11
SLIDE 11
  • 3d Penalty matrix:

◮ Set penalties over the 3d-array A:

P = λ1 D′

1D1 ⊗ Ic2 ⊗ Ic3

  • row-wise

+λ2 Ic1 ⊗ D′

2D2 ⊗ Ic3

  • column-wise

+λt Ic1 ⊗ Ic2 ⊗ D′

tDt

  • layer-wise

◮ For spatio-temporal data:

f ( longitude, latitude

  • Space

, time)

Spatial anisotropy (λ1 = λ2), different amount of smoothing for

latitude and longitude.

Temporal smoothing (λt) Space-time interaction. However spatial data are not over a regular grid. D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 11 / 26

slide-12
SLIDE 12
  • 2. Penalized splines

◮ Scattered data smoothing

  • For scattered data, Eilers et al. (2006), propose:

◮ “Row-wise Kronecker” product or Box-Product of B-spline basis.

  • Def. Box-Product:

B1 B2 = (B1 ⊗ 1′

c2) ⊙ (1′ c1 ⊗ B2)

where ⊙ is the element-wise product.

◮ We propose the use of for spatial data:

Although spatial data are not over a grid, the coefficients θ can be expressed in array form. Choose a moderate number of knots to cover the spatial domain.

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 12 / 26

slide-13
SLIDE 13
  • 2. Penalized splines

◮ Spatio-Temporal data smoothing

  • For spatio-temporal data, we propose:

Spatio-temporal B-splines Basis: B = Bs ⊗ Bt,

  • f dim. nt × c1c2c3

where Bs ≡ is the spatial B-spline basis (B1 B2) and Bt ≡ is the B-spline basis for time of dim. t × c3.

  • Note that:

◮ GLAM framework ◮ Mixed models () D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 13 / 26

slide-14
SLIDE 14
  • 2. Penalized splines

◮ Mixed Models representation

  • Reparameterize the basis B and coefficients θ:

Bθ = Xβ + Zα

  • Currie et al. (2006), use the Singular Value Decomposition (SVD) over

the Penalty P, i.e.:

D′D = [Un : Us]

  • 0q
  • Σ

U′

n

U′

s

  • The Penalty becomes (blockdiagonal), F = λ

Σ

  • Standard mixed model theory (REML)

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 14 / 26

slide-15
SLIDE 15
  • 3. ANOVA-Type Interaction Models

◮ Smooth-ANOVA decomposition models

  • Chen (1993), Gu (2002):

◮ “Smoothing-Spline ANOVA” (SS-ANOVA). ◮ Interpretation as “main effects” and “interactions”. ◮ Models of type:

  • y =

f (x1) + f (x2) + f (xt) “Main/additive effects” + f (x1, x2) + f (x1, xt) + f (x2, xt) “2-way interactions” + f (x1, x2, xt) “3-way interactions”

◮ PROBLEM: basis dimension (“curse of dimensionality”) D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 15 / 26

slide-16
SLIDE 16
  • 3. ANOVA-Type Interaction Models

◮ Smooth-ANOVA decomposition models

  • We propose ANOVA-Type models:

◮ Computationally efficient methodology based on low-rank P-splines and GLAM. ◮ For Spatio-temporal smoothing: Interpretation as:

  • main spatial and temporal effects,
  • spatial 2d effects (anisotropy) and
  • space-time interaction

◮ Our approach is based on: SVD properties and the mixed model representation D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 16 / 26

slide-17
SLIDE 17
  • ANOVA-Type Interaction models:

◮ 3d model: f (x1, x2, xt) with basis:

B = Bs ⊗ Bt and smoothing parameters (λ1, λ2, λt), can be decomposed as: f (x1) + f (x2) + f (xt) + f (x1, x2) + ... + f (x1, x2, xt) Reformulate as a mixed model and expand the basis X and Z.

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 17 / 26

slide-18
SLIDE 18

Expand X and Z Basis main effects 2-way interact. 3-way interact. X ≡ columns x1 : x2 : x3 (x1, x2) : (x2, x3) : (x1, x3) (x1, x2, x3) Z ≡ blocks

′′ ′′ ′′

Penalty F ≡ blockdiag λ1, λ2, λt and Σ1, Σ2, Σt

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 18 / 26

slide-19
SLIDE 19

◮ Full-ANOVA-type model: f (x1) + f (x2) + f (xt) + f (x1, x2) + ... + f (x1, x2, xt) different λ’s for each smooth f (·), with basis B = [ B1s ⊗ 1t : B2s ⊗ 1t : 1n ⊗ Bt : Bs ⊗ 1t : Bs ⊗ Bt ] However now, B is NOT full column-rank (“linear dependency”) Model is NOT identifiable ◮ The mixed model representation and the expansion of X and Z, allow us to identify the constraints to impose in order to maintain the identifiability of the model. ◮ In P-splines context: constraints are applied over regression coefficients θi,j,k

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 19 / 26

slide-20
SLIDE 20

Equivalent as in a 3-way factorial design main effects:

  • i

θ(1)

i

=

  • j

θ(2)

j

=

  • t

θ(3)

t

= 0

2-way interactions:

  • i,j

θ(1,2)

ij

=

  • i,t

θ(2,3)

it

=

  • j,t

θ(1,3)

jt

= 0

3-way interactions:

  • i,j,t

θ(1,2,3)

ijt

= 0

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 20 / 26

slide-21
SLIDE 21

Centering and scaling matrix: (Ic − 11′/c) θ(1,1,c3) θ(1,c2,c3) θ(1,1,1)

1,...,c2 columns rows 1,...,c1 layer 1,...,c3

  • θ(1,c2,1)
  • θ(c1,1,c3)

θ(c1,c2,c3) θ(c1,1,1) θ(c1,c2,1)

  • D.-J. Lee and M. Durban (UC3M)

’P-spline ANOVA-type models’ IWSM 2008 20 / 26

slide-22
SLIDE 22
  • 4. Application to O3 pollution in Europe

◮ Data and models

  • Sample of 45 monitoring stations
  • Monthly averages of O3 levels (in ug/m3 units)
  • from january 1999 to december 2005 (t = 1, ..., 84)

Models:

  • Additive: Spatial 2d + time:

f (x1, x2) + f (t)

  • ANOVA-type:

3d model: f (x1, x2, t) f (x1) + ... + f (x1, x2) + ... + f (x1, x2, t) ANOVA: f (x1, x2) + f (t) + f (x1, x2, t)

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 21 / 26

slide-23
SLIDE 23
  • 4. Application to O3 pollution in Europe

◮ Summary of results: Model AIC ED

  • Num. of λ’s

2d space + time 17629.67 61.92 2 + 1 = 3 3d 21588.45 64.98 3 ANOVA 17521.21 89.68 2 + 1 + 3 = 6

  • Better AIC values for ANOVA model
  • Effective Dimension: Trace of Hat matrix

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 22 / 26

slide-24
SLIDE 24

◮ Spatial 2d + time: f (x1, x2) + f (t)

−5 5 10 15 20 25 40 45 50 55 60 65 1999 2000 2001 2002 2003 2004 2005 50 60 70 80 90 time f(time)

Space-time interaction is not considered time smooth trend is additive

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 23 / 26

slide-25
SLIDE 25

◮ ANOVA Space-Time Interaction Model

Play animation

  • y

= f (x1, x2) + f (t) + f (x1, x2, t)

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 24 / 26

slide-26
SLIDE 26
  • 5. Conclusions
  • P-splines as unified framework:

Flexible multidimensional smoothing (mixed models) Low-rank Basis GLAM for spatial and spatio-temporal data

  • ANOVA-type models:

Interpretation as additive plus interactions smooth functions Identify which constraints to apply for model identifiability

  • More complex structures:

Incorporation of additional covariates with its interactions (e.g. year-month).

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 25 / 26

slide-27
SLIDE 27

THANKS FOR YOUR ATTENTION !!!

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 25 / 26

slide-28
SLIDE 28

References

◮ P-splines : Eilers, PHC. and Marx, BD.

  • Stat. Sci. (1996), 11:89-121.

Eilers, PHC., Currie, ID. and Durb´ an, M. CSDA (2006), 50(1):61-76. Currie, ID., Durb´ an M. and Eilers, PHC. JRSSB (2006), 68:1-22. ◮ SS-ANOVA: Chen, Z. JRSSB (1993), 55:473-491. Gu, C. Springer (2002)

D.-J. Lee and M. Durban (UC3M) ’P-spline ANOVA-type models’ IWSM 2008 26 / 26