[PPT] - Data assimilation with stochastic model reduction Fei Lu PowerPoint Presentation

SLIDE 1

Data assimilation with stochastic model reduction

Fei Lu

Department of Mathematics, Johns Hopkins Joint with: Alexandre J. Chorin (UC Berkeley) Kevin K. Lin (U. of Arizona) Xuemin Tu (U. of Kansas)

Nov. 21, 2017

Banff International Research Station

1 / 24

SLIDE 2

Motivation: weather/climate prediction

ECMWF: 16 km horizontal grid → 109 freedoms The Lorenz 96 system Wilks 2005

Discrete partial data High-dimensional Full system Prediction

x′= f(x) + U(x, y), y′ = g(x, y). Observe only {x(nh)}N

n=1.

Forecast x(t), t ≥ Nh.

Complex full systems:

◮ can only afford to resolve x′ = f(x) ◮ y: unresolved variables (subgrid-scales)

Discrete partial observations: missing i.c. Ensemble prediction: need many simulations

2 / 24

SLIDE 3

Motivation: weather/climate prediction

ECMWF: 16 km horizontal grid → 109 freedoms The Lorenz 96 system Wilks 2005

Discrete partial data High-dimensional Full system Prediction

x′= f(x) + U(x, y), y′ = g(x, y). Observe only {x(nh)}N

n=1.

Forecast x(t), t ≥ Nh.

Complex full systems:

◮ can only afford to resolve x′ = f(x) ◮ y: unresolved variables (subgrid-scales)

Discrete partial observations: missing i.c. Ensemble prediction: need many simulations

→ Develop a reduced model that quantifies the model error U(x, y) captures key statistical + dynamical properties

3 / 24

SLIDE 4

Outline

Stochastic model reduction

(A first step: reduction from simulated data)

◮ Discrete-time stochastic parametrization (NARMA) ◮ Application to chaotic dynamical systems

Data assimilation with the reduced model

(An intermediate step: NARMA + noisy data → state estimation and prediction)

4 / 24

SLIDE 5

Stochastic model reduction

x′ = f(x) + U(x, y), y′ = g(x, y).

Data {x(nh)}N

n=1

Memory effects (Mori, Zwanzig, Chorin, Ghil, Majda, Wilks, . . . ) Takens Theorem: delay embedding Mori-Zwanzig formalism: “generalized Langevin equation” dx dt = f(x)

Markov term

+ t K(x(t − s), s)ds

memory

+ ˙ Wt

noise

, Goal: a non-Markovian stochastic reduced system for x

5 / 24

SLIDE 6

Differential system or discrete-time system? X ′ = f(X) + Z(t, ω) Xn+1 = Xn + Rh(Xn) + Zn informative, neat messy Inference1 likelihood Discretization2 error correction by data − − − − − − − − −

1Talay, Mattingly, Stuart, Higham, Milstein,Tretyakov, . . . 2Brockwell, Sørensen, Pokern, Wiberg, Samson,. . . 6 / 24

SLIDE 7

Discrete-time stochastic parametrization

NARMA(p, q) Xn = Xn−1 + Rh(Xn−1) + Zn, Zn = Φn + ξn, Φn =

p

j=1

ajXn−j +

r

j=1

s

i=1

bi,jPi(Xn−j)

Auto-Regression

+

q

j=1

cjξn−j

Moving Average

Rh(Xn−1) from a numerical scheme for x′ ≈ f(x) Φn depends on the past Tasks: Structure derivation: terms and orders (p, r, s, q) in Φn – physical laws, asymptotic behavior, discretization Parameter estimation: aj, bi,j, cj, and σ

– conditional likelihood methods

7 / 24

SLIDE 8

Application to chaotic dynamics systems

Example I: the Lorenz 96 system

A chaotic dynamical system (a simplified atmospheric model) d dt xk = xk−1 (xk+1 − xk−2) − xk + 10 − 1 J

j

yk,j, d dt yk,j = 1 ε [yk,j+1(yk,j−1 − yk,j+2) − yk,j + xk], where x ∈ R18, y ∈ R360. ǫ = 0.5 → no scale separation.

Wilks 2005

Find a reduced system for x ∈ R18 based on

➣ Data {x(nh)}N

n=1

➣ d

dt xk ≈ xk−1 (xk+1 − xk−2) − xk + 10. 8 / 24

SLIDE 9

NARMA: xn = xn−1 + Rh(xn−1) + zn; zn = Φn + ξn, Φn = a +

p

j=1

dx

l=1

bj,l(xn−j)l +

p

j=1

cjRh(xn−j) +

q

j=1

djξn−j. p = 2, dx = 3; q =

1,

h = 0.01; 0, h = 0.05.

9 / 24

SLIDE 10

NARMA: xn = xn−1 + Rh(xn−1) + zn; zn = Φn + ξn, Φn = a +

p

j=1

dx

l=1

bj,l(xn−j)l +

p

j=1

cjRh(xn−j) +

q

j=1

djξn−j. p = 2, dx = 3; q =

1,

h = 0.01; 0, h = 0.05. Polynomial autoregression (POLYAR)3 d dt xk = xk−1 (xk+1 − xk−2) − xk + 10 + U , U = P(xk) + ηk, with dηk(t) = φηk(t) + dBk(t). where P(x) = dx

j=0 ajxj. Optimal dx = 5. 10 / 24

SLIDE 11

Long-term statistics

Empirical probability density function (PDF)

−5 5 10 0.02 0.04 0.06 0.08 0.1 X1 PDF L96 POLYAR NARMA

h=0.01

−5 5 10 0.02 0.04 0.06 0.08 0.1 0.12 X1 PDF L96 POLYAR NARMA

h=0.05

Empirical autocorrelation function (ACF)

2 4 6 8 10 −0.2 0.2 0.4 0.6 0.8 time ACF L96 POLYAR NARMA

h=0.01

2 4 6 8 10 −0.5 0.5 1 time ACF L96 POLYAR NARMA

h=0.05 11 / 24

SLIDE 12

Prediction (h = 0.05) A typical ensemble forecast:

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5 5 10

time X1

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5 5 10

time X1

POLYAR NARMAX

forecast trajectories in cyan true trajectory in blue

12 / 24

SLIDE 13

Prediction (h = 0.05) A typical ensemble forecast:

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5 5 10

time X1

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5 5 10

time X1

POLYAR NARMAX

forecast trajectories in cyan true trajectory in blue

RMSE of many forecasts:

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 4 6 8 10 12 14 16 18 20

lead time RMSE

POLYAR NARMAX

Forecast time: POLYAR: T ≈ 1.25 NARMA: T ≈ 2.5 (Full model: T ≈ 2.5 “Best” forecast time achieved! )

13 / 24

SLIDE 14

Example II: the Kuramoto-Sivashinsky equation vt + vxx + vxxxx + vvx = 0 , t > 0, x ∈ [0, 2πν], periodic.

Spatio-temporally chaotic

solved with 128 Fourier modes

14 / 24

SLIDE 15

Example II: the Kuramoto-Sivashinsky equation vt + vxx + vxxxx + vvx = 0 , t > 0, x ∈ [0, 2πν], periodic.

Spatio-temporally chaotic

solved with 128 Fourier modes

Problem setting: ν = 3.43 Observing only 5 Fourier modes to predict their evolution Reduced models:

the truncated system not accurate Discrete-time sto. paramtrizationa: derive structure from inertial manifold → an effective NARMA model

aLu-Lin-Chorin17 15 / 24

SLIDE 16

Key point 1: long-term statistics ↔ Large time behavior of PDE4

Inertial manifolds M: - finite-dimensional, positively invariant manifolds

exponentially attracts all trajectories

Let v = u + w. Rewrite the KSE: du dt = Au + Pf(u + w) dw dt = Aw + Qf(u + w)

On M, w = ψ(u)

du dt = PAu + Pf(u + ψ(u)).

Approximate inertial manifolds (AIMs): approximate w = ψ(u)

dw dt ≈ 0 ⇒ w ≈ A−1Qf(u + w),

Fixed point: ψ0 = 0; ψn+1 = A−1Qf(u + ψn).

Key point 2: parametrize the AIM

AIM with 5 modes: unstable (An accurate AIM requires m = dim(u) to be large! ) use the terms; estimate their coefficients from data → an effective NARMA model

16 / 24

SLIDE 17

NARMA with AIMs5

The AIMs hint at how the high modes depend on the low modes: |k| > K : ˆ vk ≈ ψ1,k =

A−1Qf(u)
k ⇒ ˆ

vk ≈ ck

1≤|l|,|k−l|≤K

ˆ vl ˆ vk−l.

un

j =

un

j ,

1 ≤ j ≤ K; i K

l=j−K un l un j−l,

K < j ≤ 2K. A discrete-time stochastic system: (p = 2, q = 1)

un+1

k

= un

k + hRh k (un) + hzn k ,

zn

k = Φn k + ξn k ,

Φn

k(θk) = µk + p

j=0

bk,jun−j

k

+

K

j=1

ck,j un

j+K

un

j+K−k + ck,(K+1)Rh k (un) + q

j=1

dk,jξn−j

k

.

17 / 24

SLIDE 18

Long-term statistics:

−0.4 −0.2 0.2 0.4 0.6 10

−2

10 Real v4 pdf Data Truncated system NARMA

probability density function

10 20 30 40 50 −0.2 0.2 0.4 0.6 0.8 time ACF Data Truncated system NARMA

auto-correlation function

18 / 24

SLIDE 19

Prediction A typical forecast:

20 40 60 80 −0.5 0.5 v4 20 40 60 80 −0.4 −0.2 0.2 0.4 time t v4 the truncated system NARMA

RMSE of many forecasts:

20 40 60 80 5 10 15 lead time RMSE

NARMA the truncated system

Forecast time: the truncated system: T ≈ 5 the NARMA system: T ≈ 50

19 / 24

SLIDE 20

Part II: Data assimilation with the reduced model

x′ = f(x) + U(x, y), y′ = g(x, y).

Noisy data: x(nh) + W(n), n = 1, 2, . . . Data assimilation: state estimation and prediction EnKF: ensemble from forward model + Kalman update Assume: we can simulate the full system offline → use the solution to quantify model error U(x, y) by tuning inflation and localization of EnKF deriving a NARMA reduced model

20 / 24

SLIDE 21

The Lorenz 96 system

Wilks 2005

Estimate and predict x based on

➣ Noisy Data z(n) = x(nh) + W(n) ➣ Forward models L96x: d

dt xk ≈ xk−1 (xk+1 − xk−2) − xk + 10

(account for the model error by IL in EnKF ) NARMA (account for the model error by parametrization in the forward model)

21 / 24

SLIDE 22

Relative error of state estimation

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.005 0.01 0.015 0.02 0.025 0.03 std of observation noise relative error L96x + IL NARMA Full model + IL

Relative error for different observation noises. (ensemble size: =1000 for L96x and NARMA; =10 for the full model)

22 / 24

SLIDE 23

RMSE of state prediction

1 2 3 4 2 4 6 8 10 12 14 16 time RMSE L96x + IL NARMA Full model + IL

RMSE of 104 ensemble forecasts. (ensemble size: =1000 for L96x and NARMA; =10 for the full model)

Summary: The NARMA modeling improves performance of DA.

23 / 24

SLIDE 24

Summary and ongoing work

x′ = f(x) + U(x,y), y′= g(x,y).

Data {x(nh)}N

n=1

“X ′ = f(X) + Z(t, ω)” Inference “Xn+1 = Xn + Rh(Xn) + Zn ” for prediction Discretization Inference

Data-driven stochastic model reduction by Discrete-time stochastic parametrization effective non-Markovian reduced model (NARMA)

◮ captures key statistical-dynamical features ◮ makes medium-range forecasting

Improves performance of Data assimilation Open and ongoing work: if noisy data only?

DA with non-Markovian models inference for hidden non-Markovian models

Thank you!

24 / 24