Spatio-Temporal Smoothing of CO 2 Retrievals Noel Cressie Program - - PowerPoint PPT Presentation

spatio temporal smoothing of co 2 retrievals
SMART_READER_LITE
LIVE PREVIEW

Spatio-Temporal Smoothing of CO 2 Retrievals Noel Cressie Program - - PowerPoint PPT Presentation

Spatio-Temporal Smoothing of CO 2 Retrievals Noel Cressie Program in Spatial Statistics and Environmental Statistics Department of Statistics, The Ohio State University Matthias Katzfu ur Angewandte Mathematik, Universit Institut f at


slide-1
SLIDE 1

Spatio-Temporal Smoothing of CO2 Retrievals

Noel Cressie Program in Spatial Statistics and Environmental Statistics Department of Statistics, The Ohio State University Matthias Katzfuß Institut f¨ ur Angewandte Mathematik, Universit¨ at Heidelberg Acknowledgment: AIST grant from NASA’s ESTO (PI: Amy Braverman)

– p.1/36

slide-2
SLIDE 2

AIRS CO2 Data

Global satellite measurements of CO2 from the AIRS instrument (data below 60◦S are not available) Challenges of global remote-sensing data: Massiveness Need dimension reduction Sparseness Need to take advantage of spatial and temporal correlations Nonstationarity Need a flexible model Goal: Using AIRS CO2 data, map CO2 globally after “de-noising” and “gap-filling”; also map uncertainty

– p.2/36

slide-3
SLIDE 3

Hierarchical Spatio-Temporal Model

Time is discrete Write the hidden spatio-temporal process as Yt(s) at time t and location s The true process is observed imperfectly and incompletely Data Model: Zt(s) = Yt(s) + εt(s) εt(·) ∼ Gau(0, σ2

ε,tvε(·)), is measurement error that is assumed uncorrelated

across time and space σ2

ε,t and vε(·) are known

Goal: For any s0 on the globe, estimate Yt(s0); t ∈ {1, . . . , T}, after observing {Zt(si,t)}; define Z1:T ≡ (Z′

1, . . . , Z′ T )′,

where Zt ≡ (Z(s1,t), . . . , Z(snt,t))′

Consider a time series of spatial processes (e.g., Cressie and Wikle, 2011). Using a state space model, we smooth the data (cf. filtering)

– p.3/36

slide-4
SLIDE 4

Hierarchical Spatio-Temporal Model, ctd.

Process Model: Do not use a transport model. Take a secular approach and use a dynamical statistical model; let the data speak. Model the spatio-temporal variability statistically: Yt(s) = xt(s)′βt + St(s)′ηt + ξt(s) The dimension of ηt is r < < nt (dimension reduction; e.g., nt = 12515, r = 380).

– p.4/36

slide-5
SLIDE 5

Hierarchical Spatio-Temporal Model, ctd.

In vector notation, Zt = Yt + εt Yt = Xtβt + Stηt + ξt Σt ≡ var(Zt) = StKtS′

t + Dt, where Kt ≡ var(ηt)

Dt ≡ σ2

ξ,tVξ,t + σ2 ε,tVε,t , is a diagonal nt × nt matrix

Temporal dynamics on r-dimensional space ηt = Hηt−1 + δt U =

var(δt)

Kt = HKt+1H′ + U The part of the model for Yt given by Stηt + ξt, is called a Spatio-Temporal Random Effects (STRE) model Goal: Estimate Yt from Z1:T

– p.5/36

slide-6
SLIDE 6

Fixed Rank Smoothing (FRS)

For the moment, assume parameters θ are known An extension of the Kalman smoother gives, for t ∈ {1, . . . , T}, ηt|T ≡ E(ηt|Z1:T ) , Pt|T ≡ var(ηt|Z1:T ) , Pt,t−1|T ≡ cov(ηt, ηt−1|Z1:T ) ξt|T (s0) ≡ E(ξt(s0)|Z1:T ) , ct|T (s0) ≡ var(ξt(s0)|Z1:T ) , Rt|T (s0) ≡ cov(ηt, ξt(s0)|Z1:T ) Then, FRS yields (Cressie, Shi, & Kang, 2010) the estimate of Yt(s0). For t = 1, . . . , T: ˆ Yt(s0) = xt(s0)′βt + S(s0)′ηt|T + ξt|T (s0) , and the mean squared prediction error (MSPE), E( ˆ Yt(s0) − Yt(s0))2, can also be

  • calculated. Our measure of uncertainty is (MSPE)1/2

Rapid computation: Need to invert the very large nt × nt covariance matrix Σt;

  • nly inversion of r × r matrices and nt × nt diagonal matrices are required. Use

the Sherman-Morrison-Woodbury identity, over and over: (I + PKP ′)−1 = I − P(K−1 + P ′P)−1P ′

– p.6/36

slide-7
SLIDE 7

Parameters

The parameters, θ, consist of {βt}, {σ2

ξ,t}, and the elements of K0, H, and U

In FRS, θ was assumed known. In this presentation, we take an empirical-Bayes approach and “plug in” estimates of the components of θ. Ideally, estimate θ by maximum likelihood estimation (MLE); see Katzfuss and Cressie (2011a) A fully Bayes approach has also been developed (Katzfuss and Cressie, 2011b)

– p.7/36

slide-8
SLIDE 8

Empirical Bayes: Estimate the Parameters

Define innovations, αt ≡ Zt − Xtβt − StE(ηt|Z1:t−1); t = 1, . . . , T αt

ind.

∼ Gau(0, Σαt),

where

Σαt ≡ Stvar(ηt|Z1:t−1)S′

t + Dt; t = 1, . . . , T

Likelihood (Shumway & Stoffer, 2006): −2 log L(θ) ≡ −2f(α1, . . . , αT |θ) =

T

X

t=1

log |Σαt(θ)| +

T

X

t=1

αt(θ)′Σαt(θ)−1αt(θ) Finding MLEs analytically is intractable, even for T=1 (Katzfuss and Cressie, 2009)

– p.8/36

slide-9
SLIDE 9

EM Algorithm: Review

Suppose we observe Xobs ∼ f(xobs|θ), and we are interested in finding the MLE of θ If maximizing L(θ) ≡ f(xobs|θ) is hard, but there are missing data xmis such that maximizing the complete likelihood, Lc(θ) ≡ f(xobs, xmis|θ), is easy, we can use the EM algorithm to find the MLE of θ The EM Algorithm (Dempster, Laird, and Rubin, 1977): Starting with θ[0] = θ0, repeat the following two steps until convergence:

E-Step: Calculate Q(θ; θ[l]) ≡ Eθ[l]{log f(xobs, Xmis|θ)|xobs} M-Step: Calculate θ[l+1] = arg max

θ

Q(θ; θ[l])

– p.9/36

slide-10
SLIDE 10

The EM Algorithm in this Context (Katzfuss & Cressie, 2011a)

Reminder: Zt = Xtβt + Stηt + ξt + εt Missing (mis) “data”: {ηt : t = 0, 1, . . . , T} and {ξt : t = 1, . . . , T} −2 log Lc(θ) = −2 log f(η1:T , Z1:T , ξ1:T |θ) = log |K0| + tr(K−1 η0η′

0)

+

T

X

t=1

nt log σ2

ξ,t +

1 σ2

ξ,t T

X

t=1

tr(V −1

ξ,t ξtξ′ t)

+

T

X

t=1

log |U| +

T

X

t=1

tr{U−1(ηt − Hηt−1)(ηt − Hηt−1)′} +

T

X

t=1

1 σ2

ε,t

tr{V −1

ε,t (Zt − Xtβt − Stηt − ξt)(Zt − Xtβt − Stηt − ξt)′}

E-Step: FRS provides conditional expectations and covariances of ηt and ξt at θ[l], given Z1:T M-Step: Taking partial derivatives of Q(θ; θ[l]) is easy due to terms being additive

– p.10/36

slide-11
SLIDE 11

Our EM Algorithm

Choose initial value θ[0] in the parameter space Θ While ||θ[l+1] − θ[l]|| > δ:

  • 1. Run FRS with θ[l] to obtain η[l]

t|T , P [l] t|T , P [l] t,t−1|T , ξ[l] t|T , and

C[l]

t|T ≡ diag{ct|T (s1), . . . , ct|T (sn)}

  • 2. Calculate θ[l+1] as

β[l+1]

t

= (X′

tV −1 ε,t Xt)−1X′ tV −1 ε,t [Zt − Stη[l] t|T − ξ[l] t|T ]

σ2

ξ,t [l+1] = tr(V −1 ξ,t [C[l] t|T + ξ[l] t|T ξ[l] t|T ′])/nt

K[l+1]

t

= P [l]

t|T + η[l] t|T η[l] t|T ′

L[l+1]

t

= P [l]

t,t−1|T + η[l] t|T η[l] t−1|T ′

H[l+1] = (PT

t=1 L[l+1] t

)(PT −1

t=0 K[l+1] t

)−1 U[l+1] = (PT

t=1 K[l+1] t

− H[l+1] PT

t=1 L[l+1] t ′)/(T − 1)

  • 3. Repeat

– p.11/36

slide-12
SLIDE 12

Properties of the EM Estimators, ˆ θEM

The first θ[l] that meets the convergence criterion is called ˆ θEM If θ[0] ∈ Θ, then θ[l] ∈ Θ, for all l = 1, 2, . . . Because Q(θ; θ[l]) is continuous in both arguments, ˆ θEM is generally a solution to the likelihood equations (Wu, 1983) For T=1, this algorithm is equivalent to the SRE (spatial-only) EM algorithm (Katzfuss and Cressie, 2009) used in Fixed Rank Kriging (FRK) The computational cost of the algorithm is linear in each nt

– p.12/36

slide-13
SLIDE 13

Summary: EM-FRS

Run EM on the observed data to obtain EM parameter estimates, ˆ θEM Do FRS with EM estimates plugged in to obtain ˆ Yt(s0) and its corresponding root mean squared prediction error at each location s0; for example, ˆ Yt(s0)EM ≡ xt(s0)′ ˆ β

EM + S(s0)′ηEM

t|T + ξt|T (s0)EM

– p.13/36

slide-14
SLIDE 14

AIRS CO2 Data

Measurements are taken by the Atmospheric Infrared Sounder (AIRS) instrument,

  • n NASA’s Aqua satellite

Mid-tropospheric CO2 at roughly 1.30pm local time on May 1-16, 2003. Data are considered daily (t = 1, . . . , 16) Global coverage: Between −60◦ and 90◦ latitude. (Retrievals are not available below 60◦S) n1 = 12515, n2 = 12959, n3 = 12971, n4 = 12495, n5 = 11862, n6 = 12317, n7 = 12577, n8 = 12527, n9 = 12651, n10 = 12252, n11 = 12681, n12 = 12076, n13 = 12245, n14 = 12447, n15 = 12372, n16 = 12532 We map our estimates onto a (hexagonal) grid of size mt = 57, 065 for each day t The measurement unit is parts per million (ppm)

– p.14/36

slide-15
SLIDE 15

AIRS CO2 Data, ctd

Mid-tropospheric CO2 on May 1-4 (for example), 2003, as measured by AIRS (gridded)

– p.15/36

slide-16
SLIDE 16

Assumptions and Specifications

Level 2 data were moved onto a fine regular hexagonal grid of approximately the resolution of the AIRS footprint. On some occasions, more than one Level 2 datum was in a grid cell: The datum Zt(si) was obtained by averaging all Level 2 data in grid cell i on day t. Variances of measurement error and fine-scale variation: σ2

ε ≡ σ2 ε,1 = . . . = σ2 ε,16 = 5.6 (estimated from the variogram)

σ2

ξ ≡ σ2 ξ,1 = . . . = σ2 ξ,16

vε,t(si) is equal to the inverse of the number of Level 2 data that were averaged over the i-th (hexagonal) grid cell on day t Basis functions: r = 380 bisquare functions at three spatial resolutions, the same for all t = 1, . . . , 16. The core basis function is the bisquare function: b(u) = (1 − u − s2)2I(u − s ≤ 1) Trend: x(s) = (1 , lat(s))′ Make maps on a hexagonal grid of size mt = 57, 065 for each day t = 1, . . . , 16

– p.16/36

slide-17
SLIDE 17

Basis Functions: Bisquare Centers

Basis function centers, for all three resolutions

Res 1 Res 2 Res 3

– p.17/36

slide-18
SLIDE 18

EM-FRS Results

21 iterations until convergence

  • Approx. 4 min. per iteration on an eight-core server

Total computing time for EM estimation: 96 min. for 16 days of data Total computing time to obtain estimates of {Yt(·): t = 1, . . . , 16} using FRS: 15 min. for 16 days of data Overall computing time for EM-FRS: 111 min. for 16 days of data AIRS CO2 animations are available: www.stat.osu.edu/∼sses/collab co2.html The next slides show EM-FRS for May 1-16, 2003

– p.18/36

slide-19
SLIDE 19

EM-FRS Estimates, Day 1

– p.19/36

slide-20
SLIDE 20

EM-FRS Estimates, Day 2

– p.20/36

slide-21
SLIDE 21

EM-FRS Estimates, Day 3

– p.21/36

slide-22
SLIDE 22

EM-FRS Estimates, Day 4

– p.22/36

slide-23
SLIDE 23

EM-FRS Estimates, Day 5

– p.23/36

slide-24
SLIDE 24

EM-FRS Estimates, Day 6

– p.24/36

slide-25
SLIDE 25

EM-FRS Estimates, Day 7

– p.25/36

slide-26
SLIDE 26

EM-FRS Estimates, Day 8

– p.26/36

slide-27
SLIDE 27

EM-FRS Estimates, Day 9

– p.27/36

slide-28
SLIDE 28

EM-FRS Estimates, Day 10

– p.28/36

slide-29
SLIDE 29

EM-FRS Estimates, Day 11

– p.29/36

slide-30
SLIDE 30

EM-FRS Estimates, Day 12

– p.30/36

slide-31
SLIDE 31

EM-FRS Estimates, Day 13

– p.31/36

slide-32
SLIDE 32

EM-FRS Estimates, Day 14

– p.32/36

slide-33
SLIDE 33

EM-FRS Estimates, Day 15

– p.33/36

slide-34
SLIDE 34

EM-FRS Estimates, Day 16

– p.34/36

slide-35
SLIDE 35

Conclusions

The spatio-temporal random effects (STRE) model and Fixed Rank Smoothing (FRS): handles massive datasets allows rapid computation of estimates allows for nonstationarity provides estimates of uncertainty, namely (MSPE)1/2 Parameter estimation EM allows rapid computation and requires minimal user input Further details: Noel Cressie (ncressie@stat.osu.edu) An SSES Web-Project: www.stat.osu.edu/∼sses/collab co2.html

– p.35/36

slide-36
SLIDE 36

References

Cressie, N. (1993). Statistics for Spatial Data, rev. edn. Wiley, New York, NY. Cressie, N., Shi, T., and Kang, E.L. (2010). Journal of Computational and Graphical Statistics, 19, 724-745. Cressie, N. and Wikle, C.K. (2011). Statistics for Spatio-Temporal Data. Wiley, Hoboken, NJ. Dempster, A.P ., Laird, N.M., and Rubin, D.B. (1977). Journal of the Royal Statistical Society, Series B, 39, 1-38. Katzfuss, M. and Cressie, N. (2009). In 2009 Proceedings of the Joint Statistical

  • Meetings. American Statistical Association, Alexandria, VA, 3378-3390.

Katzfuss, M. and Cressie, N. (2011a). Journal of Time Series Analysis, 32, 430-446. Katzfuss, M. and Cressie, N. (2011b). OSU Dept. of Statistics Tech Report No. 853. Under revision for Environmetrics. Shumway, R.H. and Stoffer, D.S. (2006). Time Series Analysis and Its Applications, with R Examples, 2nd edn. Springer, New York, NY. Wu, C.F .J. (1983). Annals of Statistics, 11, 95-103.

– p.36/36