Ergodicity in data assimilation methods David Kelly Andy Majda Xin - - PowerPoint PPT Presentation

ergodicity in data assimilation methods
SMART_READER_LITE
LIVE PREVIEW

Ergodicity in data assimilation methods David Kelly Andy Majda Xin - - PowerPoint PPT Presentation

Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 7, 2016 Statistics and Applied Math seminar, U Chicago . David Kelly (CIMS) Data assimilation


slide-1
SLIDE 1

Ergodicity in data assimilation methods

David Kelly Andy Majda Xin Tong

Courant Institute New York University New York NY www.dtbkelly.com

April 7, 2016 Statistics and Applied Math seminar, U Chicago .

David Kelly (CIMS) Data assimilation April 7, 2016 1 / 34

slide-2
SLIDE 2

Part I: What is data assimilation?

David Kelly (CIMS) Data assimilation April 7, 2016 2 / 34

slide-3
SLIDE 3

What is data assimilation?

Suppose u satisfies du dt = F(u) with some unknown initial condition u0. We are most interested in geophysical models, so think high dimensional, nonlinear, possibly stochastic. Suppose we make partial, noisy observations at times t = h, 2h, . . . , nh, . . . yn = Hun + ξn where H is a linear operator (think low rank projection), un = u(nh), and ξn ∼ N(0, Γ) iid. The aim of data assimilation is to say something about the conditional distribution of un given the observations {y1, . . . , yn}

David Kelly (CIMS) Data assimilation April 7, 2016 3 / 34

slide-4
SLIDE 4

Illustration (Initialization)

x y Ψ

  • bs

Figure: The blue circle represents our guess of

  • u0. Due to the

uncertainty in u0, this is a probability measure.

David Kelly (CIMS) Data assimilation April 7, 2016 4 / 34

slide-5
SLIDE 5

Illustration (Forecast step)

x y Ψ

  • bs

Figure: Apply the time h flow map Ψ. This produces a new probability measure which is our forecasted estimate of u1. This is called the forecast step.

David Kelly (CIMS) Data assimilation April 7, 2016 4 / 34

slide-6
SLIDE 6

Illustration (Make an observation)

x y Ψ

  • bs

Figure: We make an

  • bservation

y 1 = Hu1 + ξ1. In the picture, we only observe the x variable.

David Kelly (CIMS) Data assimilation April 7, 2016 4 / 34

slide-7
SLIDE 7

Illustration (Analysis step)

x y Ψ

  • bs

Figure: Using Bayes formula we compute the conditional distribution

  • f u1|y 1. This new

measure (called the posterior) is the new estimate of u1. The uncertainty of the estimate is reduced by incorporating the

  • bservation. The

forecast distribution steers the update from the observation.

David Kelly (CIMS) Data assimilation April 7, 2016 4 / 34

slide-8
SLIDE 8

Bayes’ formula filtering update

Let Y n = {y0, y1, . . . , yn}. We want to compute the conditional density P(un+1|Y n+1), using P(un|Y n) and yn+1. By Bayes’ formula, we have P(un+1|Y n+1) = P(un+1|Y n, yn+1) ∝ P(yn+1|un+1)P(un+1|Y n) But we need to compute the integral P(un+1|Y n) =

  • P(un+1|Y n, un)P(un|Y n)dun .

David Kelly (CIMS) Data assimilation April 7, 2016 5 / 34

slide-9
SLIDE 9

The ‘optimal filtering’ framework is impractical for high dimensional models, as the integrals are impossible to compute and densities impossible to store. In numerical weather prediction, we have O(109) variables for ocean-atmosphere models (discretized PDEs).

David Kelly (CIMS) Data assimilation April 7, 2016 6 / 34

slide-10
SLIDE 10

The Kalman Filter

For a linear model un+1 = Mun + ηn+1, the Bayesian integral is Gaussian and can be computed explicitly. The conditional density is characterized by its mean and covariance mn+1 = (1 − K n+1H) mn + K n+1yn+1 C n+1 = (I − K n+1H) C n+1 , where

  • (

mn+1, C n+1) is the forecast mean and covariance.

  • K n+1 =

C n+1HT(Γ + H C n+1HT)−1 is the Kalman gain. The procedure of updating (mn, C n) → (mn+1, C n+1) is known as the Kalman filter.

David Kelly (CIMS) Data assimilation April 7, 2016 7 / 34

slide-11
SLIDE 11

Extended Kalman filter

Suppose we have a nonlinear model: un+1 = Φ(un) + Σ1/2ηn+1 where Φ is a nonlinear map, ηn Gaussian. The Extended Kalman filter is given by the same update formulas mn+1 = (1 − K n+1H) mn+1 + K n+1yn+1 C n+1 = (I − K n+1H) C n+1 , where mn+1 = Φ(mn) and C n+1 = DΦ(mn)C nDΦ(mn)T + Σ. Thus we approximate the forecast distribution with a Gaussian. Still too expensive for O(109) variables....

David Kelly (CIMS) Data assimilation April 7, 2016 8 / 34

slide-12
SLIDE 12

Ensemble Kalman filter (Evensen 94)

x y Ψ

  • bs

Figure: Start with K ensemble members drawn from some

  • distribution. Empirical

representation of u0. The ensemble members are denoted u(k)

0 .

Only KN numbers are stored. Better than Kalman if K < N.

David Kelly (CIMS) Data assimilation April 7, 2016 9 / 34

slide-13
SLIDE 13

Ensemble Kalman filter (Forecast step)

x y Ψ

  • bs

Figure: Apply the dynamics Ψ to each ensemble member.

David Kelly (CIMS) Data assimilation April 7, 2016 9 / 34

slide-14
SLIDE 14

Ensemble Kalman filter (Make obs)

x y Ψ

  • bs

Figure: Make an

  • bservation.

David Kelly (CIMS) Data assimilation April 7, 2016 9 / 34

slide-15
SLIDE 15

Ensemble Kalman filter (Analysis step)

x y Ψ

  • bs

Figure: Approximate the forecast distribution with a Gaussian. Fit the Gaussian using the empirical statistics of the ensemble.

David Kelly (CIMS) Data assimilation April 7, 2016 9 / 34

slide-16
SLIDE 16

How to implement the Gaussian approximation

The naive method is to simply write: P(y1|u1)P(u1) ∝ exp(−1 2|Γ−1/2(y1 − Hu1)|2) exp(−1 2| C

−1/2(u1 −

m1)|2) with the empirical statistics

  • m1 = 1

K

K

  • k=1

Ψ(k)(u(k)

0 )

  • C 1 =

1 K − 1

K

  • k=1
  • Ψ(k)(u(k)

0 ) −

m1 Ψ(k)(u(k)

0 ) −

m1 T . In the linear model case Ψ(un) = Mun + ηn, this produces an unbiased estimate of the posterior mean, but a biased estimate of the covariance.

David Kelly (CIMS) Data assimilation April 7, 2016 10 / 34

slide-17
SLIDE 17

How to implement the Gaussian approximation

A better approach is to sample using Randomized Maximum Likelihood (RML) method: Draw the sample u(k)

1

by minimizing the functional 1 2|Γ−1/2(y(k)

1

− Hu)|2 + 1 2| C

−1/2 1

(u − Ψ(k)(u(k)

0 ))|2

where y(k)

1

= y1 + ξ(k)

1

is a perturbed observation. In the linear case Ψ(un) = Mun + ηn, this produces iid Gaussian samples with mean and covariance satisfying the Kalman update equations, with C in place of the true forecast covariance. We end up with u(k)

1

= (1 − K 1H)Ψ(k)(u(k)

0 ) + K 1Hy(k) 1

David Kelly (CIMS) Data assimilation April 7, 2016 11 / 34

slide-18
SLIDE 18

Ensemble Kalman filter (Perturb obs)

x y Ψ

  • bs

Figure: Turn the

  • bservation into K

artificial observations by perturbing by the same source of observational noise.

y(k)

1

= y1 + ξ(k)

1

David Kelly (CIMS) Data assimilation April 7, 2016 12 / 34

slide-19
SLIDE 19

Ensemble Kalman filter (Analysis step)

x y Ψ

  • bs

Figure: Update each member using the Kalman update formula. The Kalman gain K 1 is computed using the ensemble covariance.

u(k)

1

= (1 − K 1H)Ψ(u(k)

0 ) + K 1Hy(k) 1

K 1 = C 1HT(Γ + H C 1HT)−1

  • C 1 =

1 K − 1

K

  • k=1

(Ψ(u(k)

0 ) −

mn+1)(Ψ(u(k)

0 ) −

mn+1)T

David Kelly (CIMS) Data assimilation April 7, 2016 12 / 34

slide-20
SLIDE 20

Part II: Ergodicity for DA methods:

Why are we interested in ergodicity? And what kind?

David Kelly (CIMS) Data assimilation April 7, 2016 13 / 34

slide-21
SLIDE 21

Two types of ergodicity

1 - Signal-Filter Ergodicity: Ergodicity for the homogeneous Markov chain (un, u(1)

n , . . . , u(K) n

). 2 - Conditional ergodicity: Let PY n

n (·; u(1) 0 , . . . , u(K)

) be the law of (u(1)

n , . . . , u(K) n

), given the observations Y n and initialization (u(1)

0 , . . . , u(K)

). We call the filter conditionally ergodic when any two such measures with different initialization, but same observations Y n, converge as n → ∞.

David Kelly (CIMS) Data assimilation April 7, 2016 14 / 34

slide-22
SLIDE 22

Two types of ergodicity

Signal-filter ergodicity tells us that:

  • The filter will not blow-up (catastrophic filter divergence).
  • The long-time statistics of the filter are stable with respect to

initialization.

  • The filter inherits ‘physical properties’ from the underlying model.

Conditional ergodicity suggests that the method is accurate, since all measures are synchronizing, they should be shadowing the true trajectory.

David Kelly (CIMS) Data assimilation April 7, 2016 15 / 34

slide-23
SLIDE 23

Animation I

The model du = −∇V (u)dt + σdW , where V (x, y) = (1 − x2 − y2)2. Observation ux(t) + ξ(t). For EnKF, the process is signal-filter ergodic, but it is not conditionally ergodic.

David Kelly (CIMS) Data assimilation April 7, 2016 16 / 34

slide-24
SLIDE 24

Animation II

The model du = −∇V (u)dt + σdW where V (x, y) = (1 − x2 − y2)2. Observation y = u1 + ξ. This filter (a type of particle filter) is both signal-filter ergodic and conditionally ergodic.

David Kelly (CIMS) Data assimilation April 7, 2016 17 / 34

slide-25
SLIDE 25

Today we focus on signal-filter

  • ergodicity. Conditional ergodicity is

much more difficult (but work in progress!).

David Kelly (CIMS) Data assimilation April 7, 2016 18 / 34

slide-26
SLIDE 26

The theoretical framework

A Markov chain {X n}n∈N on a state space X is called geometrically ergodic if it has a unique invariant measure π and for any initialization X 0 we have

  • EX 0f (X n) −
  • f (x)π(dx)
  • ≤ C(X 0)rn

for some r ∈ (0, 1) and any m-ble bdd f . The Meyn-Tweedie approach is to verify two assumptions that guarantee geometric ergodicity using a coupling argument: 1- Lyapunov function / Energy dissipation: En|X n+1|2 ≤ α|X n|2 + β with α ∈ (0, 1), β > 0. 2- Minorization: Find compact C ⊂ X, measure ν supported on C, κ > 0 such that P(x, A) ≥ κν(A) for all x ∈ C, A ⊂ X.

David Kelly (CIMS) Data assimilation April 7, 2016 19 / 34

slide-27
SLIDE 27

Minorization is inherited If we assume that the model and observational noise have nice densities (non-degenerate Gaussians, for instance) then the minorization condition is inherited from the model. Is the same true of the Lyapunov function?

David Kelly (CIMS) Data assimilation April 7, 2016 20 / 34

slide-28
SLIDE 28

Lyapunov function: Inheriting an energy principle Suppose we know the model satisfies an energy principle En|Ψ(u)|2 ≤ α|u|2 + β for α ∈ (0, 1), β > 0. Does the filter inherit the energy principle? En|u(k)

n+1|2 ≤ α′|u(k) n |2 + β′

David Kelly (CIMS) Data assimilation April 7, 2016 21 / 34

slide-29
SLIDE 29

Observable energy (Tong, Majda, K. 15)

We have u(k)

n+1 = (I − K n+1H)Ψ(u(k) n ) + K n+1y(k) n+1

Start by looking at the observed part: Hu(k)

n+1 = (H − HK n+1H)Ψ(u(k) n ) + HK n+1y(k) n+1 .

But notice that (H − HK n+1H) = (H − H C n+1HT(I + H C n+1HT)−1H) = (I + H C n+1HT)−1H Hence |(H − HK n+1H)Ψ(u(k)

n )| ≤ |HΨ(u(k) n )|

David Kelly (CIMS) Data assimilation April 7, 2016 22 / 34

slide-30
SLIDE 30

Observable energy (Tong, Majda, K. 15)

We have the energy estimate En|Hu(k)

n+1|2 ≤ (1 + δ)|HΨ(u(k) n )|2 + β′

for arb small δ. Unfortunately, the same trick doesn’t work for the unobserved variables ... However, if we assume an observable energy criterion instead: En|HΨ(u(k)

n )|2 ≤ α|Hu(k) n |2 + β

(⋆) Then we obtain a Lyapunov function for the observed components of the filter: En|Hu(k)

n |2 ≤ α′|Hu(k) n |2 + β′ .

  • eg. (⋆) is true for linear dynamics if there is no interaction between
  • bserved and unobserved variables at infinity.

David Kelly (CIMS) Data assimilation April 7, 2016 23 / 34

slide-31
SLIDE 31

Tells us that observed components will be statistically bounded, but not a Lyapunov function (unless we observe everything). Can we get around the problem by tweaking the algorithm?

David Kelly (CIMS) Data assimilation April 7, 2016 24 / 34

slide-32
SLIDE 32

Adaptive Covariance Inflation (Tong, Majda, K. 15)

We modify algorithm by introducing a covariance inflation :

  • C n+1 →

C n+1 + λn+1I where λn+1 ∝ Θn+11(Θn+1 > Λ) Θn+1 =

  • 1

K

K

  • k=1

|y(k)

n+1 − HΨ(u(k) n )|2

and Λ is some constant threshold. If the predictions are near the

  • bservations, then there is no inflation.
  • Thm. The modified EnKF inherits an energy principle from the model.

Ex|Ψ(x)|2 ≤ α|x|2 + β ⇒ En|u(k)

n+1|2 ≤ α′|u(k) n |2 + β′

Consequently, the modified EnKF is signal-filter ergodic.

David Kelly (CIMS) Data assimilation April 7, 2016 25 / 34

slide-33
SLIDE 33

Adaptive inflation schemes allows us to use cheap integration schemes in the forecast dynamics. These would usually lead to numerical blow-up.

David Kelly (CIMS) Data assimilation April 7, 2016 26 / 34

slide-34
SLIDE 34

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 EnKF EnKF−AI 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 EnKF−CI EnKF−CAI

Figure: RMS error for EnKF on 5d Lorenz-96 with sparse obs (1 node), strong turbulence regime. Euler method with course

step size. Lower panel has additional constant inflation which helps accuracy.

Applicable to more sophisticated geophysical models, such as 2-layer QG with course graining (Lee, Majda 16’).

David Kelly (CIMS) Data assimilation April 7, 2016 27 / 34

slide-35
SLIDE 35

Stability should not be taken for granted!

David Kelly (CIMS) Data assimilation April 7, 2016 28 / 34

slide-36
SLIDE 36

Catastrophic filter divergence

Lorenz-96: ˙ uj = (uj+1 − uj−2)uj−1 − uj + F with j = 1, . . . , 40. Periodic

  • BCs. Observe every fifth node. (Harlim-Majda 10, Gottwald-Majda 12)

True solution in a bounded set, but filter blows up to machine infinity in finite time!

David Kelly (CIMS) Data assimilation April 7, 2016 29 / 34

slide-37
SLIDE 37

For complicated models, only heuristic arguments offered as explanation.

Can we prove it for a simpler constructive model?

David Kelly (CIMS) Data assimilation April 7, 2016 30 / 34

slide-38
SLIDE 38

The rotate-and-lock map (K., Majda, Tong. PNAS 15.)

The model Ψ : R2 → R2 is a composition of two maps Ψ(x, y) = Ψlock(Ψrot(x, y)) where Ψrot(x, y) = ρ cos θ −ρ sin θ ρ sin θ ρ cos θ x y

  • and Ψlock rounds the input to the nearest point in the grid

G = {(m, (2n + 1)ε) ∈ R2 : m, n ∈ Z} . It is easy to show that this model has an energy dissipation principle: |Ψ(x, y)|2 ≤ α|(x, y)|2 + β for α ∈ (0, 1) and β > 0.

David Kelly (CIMS) Data assimilation April 7, 2016 31 / 34

slide-39
SLIDE 39

(a)

Figure: The red square is the trajectory un = 0. The blue dots are the positions of the forecast ensemble Ψ(u+

0 ),

Ψ(u−

0 ). Given the

locking mechanism in Ψ, this is a natural configuration.

David Kelly (CIMS) Data assimilation April 7, 2016 32 / 34

slide-40
SLIDE 40

(b)

Figure: We make an

  • bservation (H shown

below) and perform the analysis step. The green dots are u+

1 , u− 1 .

Observation matrix H = 1 ε−2 1

  • Truth un = (0, 0).

The filter is certain that the x-coordinate is ˆ x (the dashed line). The filter thinks the observation must be (ˆ x, ε−2ˆ x + u1,y), but it is actually (0, 0) + noise. The filter concludes that u1,y ≈ −ε−2ˆ x.

David Kelly (CIMS) Data assimilation April 7, 2016 32 / 34

slide-41
SLIDE 41

(c)

Figure: Beginning the next assimilation step. Apply Ψrot to the ensemble (blue dots)

David Kelly (CIMS) Data assimilation April 7, 2016 32 / 34

slide-42
SLIDE 42

(d)

Figure: Apply Ψlock. The blue dots are the forecast ensemble Ψ(u+

1 ), Ψ(u− 1 ). Exact

same as frame 1, but higher energy orbit. The cycle repeats leading to exponential growth.

David Kelly (CIMS) Data assimilation April 7, 2016 32 / 34

slide-43
SLIDE 43

Theorem (K.-Majda-Tong 15 PNAS)

For any N > 0 and any p ∈ (0, 1) there exists a choice of parameters such that P

  • |u(k)

n | ≥ Mn for all n ≤ N

  • ≥ 1 − p

where Mn is an exponentially growing sequence. ie - The filter can be made to grow exponentially for an arbitrarily long time with an arbitrarily high probability.

David Kelly (CIMS) Data assimilation April 7, 2016 33 / 34

slide-44
SLIDE 44

References

1 - D. Kelly, K. Law & A. Stuart. Well-Posedness And Accuracy Of The Ensemble Kalman Filter In Discrete And Continuous Time. Nonlinearity (2014). 2 - D. Kelly, A. Majda & X. Tong. Concrete ensemble Kalman filters with rigorous catastrophic filter divergence. Proc. Nat. Acad. Sci. (2015). 3 - X. Tong, A. Majda & D. Kelly. Nonlinear stability and ergodicity of ensemble based Kalman filters. Nonlinearity (2016). 4 - X. Tong, A. Majda & D. Kelly. Nonlinear stability of the ensemble Kalman filter with adaptive covariance inflation. To appear in Comm.

  • Math. Sci.

(2016). All my slides are on my website (www.dtbkelly.com) Thank you!

David Kelly (CIMS) Data assimilation April 7, 2016 34 / 34