Data-driven stochastic model reduction Fei Lu joint with Alexandre - - PowerPoint PPT Presentation

data driven stochastic model reduction
SMART_READER_LITE
LIVE PREVIEW

Data-driven stochastic model reduction Fei Lu joint with Alexandre - - PowerPoint PPT Presentation

Data-driven stochastic model reduction Fei Lu joint with Alexandre J. Chorin and Kevin Lin UC Berkeley Lawrence Berkeley National Laboratory U of Arizona July 18, 2016 ICERM, Brown University 1 / 22 Outline Part I: Motivation for


slide-1
SLIDE 1

Data-driven stochastic model reduction

Fei Lu

joint with Alexandre J. Chorin and Kevin Lin

UC Berkeley Lawrence Berkeley National Laboratory U of Arizona

July 18, 2016 ICERM, Brown University

1 / 22

slide-2
SLIDE 2

Outline

Part I: Motivation for stochastic model reduction Part II: A discrete-time approach Part III: Examples

2 / 22

slide-3
SLIDE 3

Motivation

Climate modeling, numerical weather prediction:

Discrete partial data High-dimensional Full system Prediction

d dt x = f(x) + U(x, y), d dt y = g(x, y).

Observe only {x(nδ)}N

n=1.

Forecast x(t), t ≥ Nδ. Possible approaches: Estimate y, and solve the full system

◮ The full system is expensive to solve ◮ The full system may be unknown

Construct a reduced model for x

3 / 22

slide-4
SLIDE 4

Why stochastic models Data {x(nδ)} ⇒ properties of the full system: long-term statistics (ergodicity) 1 N

N

  • i=1

F(x(niδ)) → E[F(x)] positive Lyapunov exponents the effects of y act like random forces Goal: develop a stochastic reduced system for x that can reproduce long-term statistics; make medium range predictions.

4 / 22

slide-5
SLIDE 5

Outline

Part I: Motivation for stochastic model reduction Part II: A discrete-time approach

Previous work: continuous-time approach Discrete-time stochastic parametrization

Part III: Examples

5 / 22

slide-6
SLIDE 6

The Mori-Zwanzig Formalism (Mori (65) and Zwanzig (73))

Rewrite the deterministic system in a form which resembles a generalized Langevin equation (Chorin-Hald (06)):

dXt dt = f(Xt) + Z t Γ(t − s, Xs)ds + Wt

Chorin et al (00), Zwanzig(01), Chorin-Hald(06)

equation + a projection operator + Duhamel’

Markovian memory noise

Message: memory exists in a closed system for x. Multi-level regression Atmospheric sciences: Kondrashov et al (05-15), Wilks (05) Hypoelliptic SDEs (Majda-Harlim (13,14)) dXt = f(Xt)dt + Vtdt, dVt = a(Xt, Vt)dt + σdWt.

6 / 22

slide-7
SLIDE 7

Continuous-time approach: dXt = f(Xt)dt + Vtdt, dVt = a(Xt, Vt)dt + σdWt.

Discrete partial data Prediction discretization parameter estimation structure derivation A continuous-time stochastic system A discrete-time stochastic system

7 / 22

slide-8
SLIDE 8

Continuous-time approach: dXt = f(Xt)dt + Vtdt, dVt = a(Xt, Vt)dt + σdWt.

Discrete partial data Prediction discretization parameter estimation structure derivation A continuous-time stochastic system A discrete-time stochastic system

Discrete-time approach: Xn = Φ(Xn−p:n−1, ξn−q:n−1) + ξn

Discrete partial data Prediction parameter estimation structure derivation A discrete-time stochastic system

8 / 22

slide-9
SLIDE 9

Discrete-time stochastic parametrization

Identify a discrete-time stochastic system: NARMA (Chorin-Lu (15)) Xn = Xn−1 + δRδ(Xn−1) + δZn, Zn = Φn + ξn, Φn =

p

  • j=1

ajZn−j +

r

  • j=1

s

  • i=1

bi,jPi(Xn−j) +

q

  • j=1

cjξn−j;

Rδ(Xn−1) from a discrete scheme for x′ ≈ f(x); Φn = Φ(Zn−p:n−1, Xn−r:n−1, ξn−q:n−1) depends on the past; ξn are i.i.d N(0, σ2);

a Nonlinear AutoRegression Moving Average model (NARMA).

Structure: terms in Φn; Parameters: aj, bj, cj, and σ.

9 / 22

slide-10
SLIDE 10

Xn = Xn−1 + δRδ(Xn−1) + δZn, Zn = Φn + ξn, Φn =

p

  • j=1

ajZn−j +

r

  • j=1

s

  • i=1

bi,jPi(Xn−j) +

q

  • j=1

cjξn−j; Parameter estimation: conditional maximum likelihood estimator (MLE)

Conditional on ξ1, . . . , ξq, the log-likelihood of Xq+1:N is l(θ|ξ1, . . . , ξq) = − N

n=q+1 |Zn−Φn|2 2σ2

− N−q

2

ln σ2.

when q = 0, MLE = least squares estimator (LSE) when q > 0, set ξ1 = · · · = ξq = 0

◮ ergodicity ⇒ MLE is consistent ⇒ independent of ξ1, . . . , ξq

Structure derivation/selection: stability, long-term statistics numerical schemes analytical properties of the full system

10 / 22

slide-11
SLIDE 11

Outline

Part I: Motivation for stochastic model reduction Part II: A discrete-time approach Part III: Examples:

The Lorenz 96 system The Kuramoto-Sivashinsky equation

11 / 22

slide-12
SLIDE 12

Example I: the Lorenz 96 system

d dt xk = xk−1 (xk+1 − xk−2) − xk + 10 − 1 J

  • j

yj,k, d dt yj,k = 1 ε [yj+1,k(yj−1,k − yj+2,k) − yj,k + xk], where x ∈ RK, y ∈ RJK.

Wilks, 2005

a chaotic system K = 18, J = 20: y ∈ R360 Find a reduced system for x ∈ R18 based on

➣ Data {x(nδ)}N

n=1

➣ d

dt xk ≈ xk−1 (xk+1 − xk−2) − xk + 10. 12 / 22

slide-13
SLIDE 13

NARMA [Chorin-Lu (15)]: xn = xn−1 + δRδ(xn−1) + δzn; zn = Φn + ξn, Φn = µ +

p

  • j=1

ajzn−j +

r

  • j=1

dx

  • l=1

bj,l(xn−j)l +

s

  • j=1

dR

  • l=1

cj,l(Rδ(xn−j))l +

q

  • j=1

djξn−j. Polynomial autoregression (POLYAR) [Wilks (05)]: d dt xk = xk−1 (xk+1 − xk−2) − xk + 10 + U , U = P(xk) + ηk, with ηk(t + δ) = φηk(t) + ξ(t) where P(x) = d

j=0 ajxj; ξ(t) ∼ N(0, σ2). 13 / 22

slide-14
SLIDE 14

Long-term statistics run the reduced systems for a long time; compare with the data:

−10 −5 5 10 15 0.02 0.04 0.06 0.08 0.1 0.12 0.14 x1 pdf L96 POLYAR NARMAX

empirical probability density function

1 2 3 4 5 6 7 8 9 10 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 time ACF L96 POLYAR NARMAX

autocorrelation function

14 / 22

slide-15
SLIDE 15

Short-term prediction A typical ensemble forecast:

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5 5 10

time X1

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5 5 10

time X1

POLYAR NARMAX

forecast trajectories in cyan true trajectory in blue

RMSE of many forecasts:

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 4 6 8 10 12 14 16 18 20

lead time RMSE

POLYAR NARMAX

Forecast time: POLYAR: t ≈ 1 NARMA: t ≈ 2.5

15 / 22

slide-16
SLIDE 16

Example II: the Kuramoto-Sivashinsky equation

The KSE with periodic BC: v(x, t) = v(x + 2πν, t)

∂v ∂t + ∂2v ∂x2 + ∂4v ∂x4 + v ∂v ∂x = 0 , x ∈ R, t > 0. spatio-temporally chaotic: Fourier: v(x, t) = +∞

k=−∞ ˆ

vkeiqkx, (here qk = k

ν )

d dt ˆ vk = (q2

k − q4 k)ˆ

vk − iqk 2

  • l=−∞

ˆ vl ˆ vk−l.

Problem setting: ν = 3.43

  • bserve only K = 5 modes, (ˆ

vk, k = 1, . . . , K) predict their evolution

16 / 22

slide-17
SLIDE 17

The truncated system is not accurate:

d dt ˆ vk = (q2

k − q4 k)ˆ

vk − iqk 2

  • |l|≤K,|k−l|≤K

ˆ vl ˆ vk−l, |k| ≤ K. NARMA

Xn = Xn−1 + δRδ(Xn−1) + δZn, Zn = Φn + ξn, Φn =

p

  • j=1

ajZn−j +

q

  • j=1

cjξn−j +

r

  • j=1

s

  • i=1

bi,jPi(Xn−j).

compute Rδ(Xn−1) (from the K-mode

truncated system)

structure for Φn:

◮ different modes have different

dynamics

◮ nonlinear interaction between

modes

17 / 22

slide-18
SLIDE 18

Large time behavior of the KSE (Constantin, Jolly, Kevrekidis, Titi et al (88-94)) Inertial manifold M

Let v = u + w. Rewrite the KSE: du dt = Au + Pf(u + w) dw dt = Aw + Qf(u + w) if M is the graph of a function w = ψ(u), du dt = Au + Pf(u + ψ(u)).

Approximate inertial manifolds (AIMs) Approximate the function w = ψ(u):

dw dt ≈ 0 ⇒ w ≈ A−1Qf(u + w),

Fixed point: ψ0 = 0; ψn+1 = A−1Qf(u + ψn). An accurate AIM requires m = dim(u) to be large! (distance to the attractor ∼ |qm|−γ )

18 / 22

slide-19
SLIDE 19

NARMA with AIMs (Lu-Lin-Chorin (15))

The AIMs hint at how the high modes depend on the low modes: |k| > K : ˆ vk ≈ ψ1,k =

  • A−1Qf(u)
  • k ⇒ ˆ

vk ≈ ck

  • 1≤|l|,|k−l|≤K

ˆ vl ˆ vk−l. Important: keep the terms, and estimate the coefficients

  • un

j =

  • un

j ,

1 ≤ j ≤ K; i K

l=j−K un l un j−l,

K < j ≤ 2K. A discrete-time stochastic system:

un

k = un−1 k

+ δRδ

k (un−1) + δzn k ,

zn

k = Φn k + ξn k ,

Φn

k = µk + p

  • j=1

ak,jzn−j

k

+

r

  • j=0

bk,jun−j

k

+

K

  • j=1

ck,j un

j+K

un

j+K−k

+ ck,(K+1)Rδ

k (un) + q

  • j=1

dk,jξn−j

k

.

19 / 22

slide-20
SLIDE 20

Long-term statistics:

−0.4 −0.2 0.2 0.4 0.6 10

−2

10 x4 pdf data truncated system NARMAX

v

probability density function

10 20 30 40 50 −0.2 0.2 0.4 0.6 0.8 lead time ACF data truncated system NARMAX

time

auto-correlation function

20 / 22

slide-21
SLIDE 21

Short-term prediction A typical forecast:

20 40 60 80 −0.5 0.5 v4 20 40 60 80 −0.4 −0.2 0.2 0.4 time t v4 the truncated system NARMA

RMSE of many forecasts:

20 40 60 80 5 10 15 lead time RMSE

the truncated system NARMA ensemble size = 1 ensemble size = 20

time

Forecast time: the truncated system: t ≈ 15 the NARMA system: t ≈ 50

21 / 22

slide-22
SLIDE 22

Summary

Data-driven stochastic model reduction:

Discrete partial data High-dimensional Full system Prediction Discrete-time stochastic reduced system

Xn = Φ(Xn−p:n−1, ξn−q:n−1) + Ψ(Xn−p:n−1, ξn−q:n−1)ξn integrate numerical scheme into statistical inference easy parameter estimation flexible structure selection

Thanks for your attention!

22 / 22