Predicting chaotic systems with sparse data lessons from numerical - - PowerPoint PPT Presentation

predicting chaotic systems with sparse data lessons from
SMART_READER_LITE
LIVE PREVIEW

Predicting chaotic systems with sparse data lessons from numerical - - PowerPoint PPT Presentation

Predicting chaotic systems with sparse data lessons from numerical weather prediction David Kelly, Courant Institute, NYU 1 A toy atmospheric model We have a three dimensional variable evolving in ( x, y, z ) time, governed by a


slide-1
SLIDE 1

Predicting chaotic systems with sparse data

lessons from numerical weather prediction

David Kelly, Courant Institute, NYU

1

slide-2
SLIDE 2

A toy atmospheric model

˙ x = σ(y − x) ˙ y = x(ρ − z) − y ˙ z = xy − βz We have a three dimensional variable evolving in time, governed by a differential equation called the Lorenz model (x, y, z) The model is chaotic: an uncertain estimate of the current state becomes more uncertain over time

2

slide-3
SLIDE 3

3

A single trajectory

slide-4
SLIDE 4

4

An ensemble of trajectories

Small errors in the initial state will get larger over time

slide-5
SLIDE 5

5

Suppose that we make a partial observation of the ‘true’ trajectory every seconds

τ

Can we combine the model and the data to improve the estimate of the current state? x(nτ) + ξn n = 1, 2, . . . where is a source of observational noise. ξn

slide-6
SLIDE 6

Assimilating the x data

6

The blue curve is the true trajectory The dots are an ensemble

  • f state

estimates The state estimates synchronize with the true state

slide-7
SLIDE 7

Not just for toy models …

7

This precise idea is used in numerical weather prediction (any many many more applications) The model can be extremely high dimensional ( ~ 10^9 variables). The NWP model is a family of layered, discretized partial differential equations (PDEs).

slide-8
SLIDE 8

Not just for toy models …

8

Enormous amounts of observational data, that are accurate (small noise) but very sparse (we do not

  • bserve the whole ocean, the small scales)

Observational data is not simply ‘observing the x variable’, but complicated functions of many variables (eg. how does this GPS enabled buoy move through the ocean?)

slide-9
SLIDE 9

The next 25 slides

  • 1. The math behind data assimilation
  • 2. The linear model scenario
  • 3. The nonlinear model scenario

9

slide-10
SLIDE 10

The math of data assimilation

10

slide-11
SLIDE 11

11

We are tracking a d-dimensional vector whose motion is governed by the discrete time random dynamical system un

un+1 = ψ(un) + ηn

where is iid Gaussian noise (zero in the Lorenz example) and is a deterministic map.

ηn ∼ N(0, Σ)

ψ

The initial state is unknown exactly, but known to be distributed like a Gaussian

u0

u0 ∼ N(m0, C0)

The model

slide-12
SLIDE 12

12

We make a partial, noisy observation at every time step where is idd Gaussian noise and is the observation function ξn+1 ∼ N(0, Γ) h

The data

zn+1 = h(un+1) + ξn+1

slide-13
SLIDE 13

Bayes formula

13

By Bayes’ formula we have The state is a random variable. We would like to know the conditional distribution of given all the data up to time n un Zn = {z1, z2, . . . , zn} un P(un+1|Zn+1) = P(un+1|Zn, zn+1) = P(zn+1|un+1, Zn)P(un+1|Zn) P(zn+1|Zn) = P(zn+1|un+1)P(un+1|Zn) P(zn+1|Zn)

slide-14
SLIDE 14

The filtering cycle

14

P(un|Zn) 7! P(un+1|Zn+1) Ignoring the normalization constant … We can use this formula to perform an update procedure: which is called the filtering cycle. P(un+1|Zn+1) ∝ P(zn+1|un+1)P(un+1|Zn)

slide-15
SLIDE 15

15

x y Ψ

  • bs

The conditional distribution at time n P(un|Zn)

slide-16
SLIDE 16

16

P(un+1|Zn) The ‘forecast’ distribution (ie. the prior) x y

  • bs

ψ(un) + ηn

slide-17
SLIDE 17

17

x y

  • bs

Make an observation (here just the x variable)

slide-18
SLIDE 18

18

x y

  • bs

Re-weight the forecast distribution with the likelihood of the observation

P(un+1|Zn+1) ∝ P(zn+1|un+1)P(un+1|Zn)

The green distribution is the posterior

slide-19
SLIDE 19

In the linear model scenario, everything has a formula!

19

slide-20
SLIDE 20

The Kalman filter

20

When the dynamics and the observation function are both linear, the conditional random variable is a Gaussian and is characterized completely by its mean and covariance ψ h un|Zn (mn, Cn) The mean and covariance satisfy a recursion formula mn+1 = (I − Kn+1H)ψ(mn) + Kn+1zn+1 Cn+1 = (I − Kn+1H)(ψCnψT + Σ) The Kalman gain is the correct convex combination of model and data, it is determined by the forecast and data covariances. Kn+1

slide-21
SLIDE 21

What can we do for nonlinear models?

21

slide-22
SLIDE 22

3DVAR algorithm

22

mn+1 = (I − KH)ψ(mn) + Kzn+1 Obtain a state estimate using the Kalman update formula mn Since the model is nonlinear, distributions are no longer Gaussian and there is no ‘correct’ choice for the Kalman gain

slide-23
SLIDE 23

23

H = (1, 0, 0), K = (1, 1, 1)T

mn+1 =   x((n + 1)τ) + ξn+1 ψy(mn) + (x((n + 1)τ) + ξn+1 − ψx(mn)) ψz(mn) + (x((n + 1)τ) + ξn+1 − ψx(mn)  

slide-24
SLIDE 24

24

The 3DVAR algorithm is accurate (if properly tuned): in that the estimates get closer to the true state when

  • bservational noise is small and when enough

variables are observed. The Ensemble Kalman filter (EnKF) uses an ensemble

  • f ‘particles’ to find a state estimate and quantify the

uncertainty of the estimate. When the observations are sparse, we cannot expect

  • accuracy. Instead we would like a set of samples from

the posterior. P(un|Zn)

slide-25
SLIDE 25

25

The EnKF maintains an ensemble to represent the whole posterior and not just the mean {u(1)

n , u(2) n , . . . , u(K) n

} Each particle is updated much like the 3DVAR u(k)

n+1 = (I − Kn+1H)ψ(u(k) n ) + Kn+1zn+1

But the Kalman gain is chosen using the empirical covariance of the ‘forecast ensemble’ {ψ(u(1)

n ), , . . . , ψ(u(K) n

)}

slide-26
SLIDE 26

26

x y Ψ

  • bs

The ensemble represents the posterior at time n {u(1)

n , u(2) n , . . . , u(K) n

}

slide-27
SLIDE 27

27

x y

  • bs

The forecast ensemble {ψ(u(1)

n ), , . . . , ψ(u(K) n

)} P(un+1|Zn) represents the distribution

slide-28
SLIDE 28

28

x y

  • bs

Ensemble updated to {u(1)

n+1, . . . , u(K) n+1}

To incorporate the

  • bservation

The convex combination uses the forecast covariance

slide-29
SLIDE 29

29

The Canadian Weather Bureau uses EnKF for

  • perational NWP, with around 100 particles for a

~ 10^9 dimensional state variable (!!!) EnKF works surprisingly well with high dimensional models that are effectively lower dimensional. The covariance of the forecast ensemble {ψ(u(1)

n ), , . . . , ψ(u(K) n

)}

  • nly represents the directions of highest model
  • variation. Often much smaller dimension than

state.

slide-30
SLIDE 30

EnKF and the Sombrero SDE

30

du = rV (u)dt + σdW

Consider the two dimensional stochastic differential equation u = (x,y) with the ‘Sombrero potential’ V (x, y) = (1 − x2 − y2)2 And we only observe the x variable

slide-31
SLIDE 31

31

EnKF - only observing x, big red star is true state

slide-32
SLIDE 32

32

EnKF - exact same truth, different model noise

slide-33
SLIDE 33

33

Particle filter - sampling from the posterior, not just tracking the truth

slide-34
SLIDE 34

Pros and cons

34

Kalman filter - very efficient, but requires

  • linearity. Can be expensive in high

dimensions. 3DVAR - very efficient, requires tuning and has no uncertainty quantification EnKF - very efficient, works in high dimensions, provides UQ but not for non- Gaussians Particle filter - samples from the posterior, but very inefficient in high dimensions.

slide-35
SLIDE 35

Thank you for listening!

Slides are on my website dtbkelly.com

35