Nonlinear filtering with local couplings Alessio Spantini Ricardo - - PowerPoint PPT Presentation

nonlinear filtering with local couplings
SMART_READER_LITE
LIVE PREVIEW

Nonlinear filtering with local couplings Alessio Spantini Ricardo - - PowerPoint PPT Presentation

Nonlinear filtering with local couplings Alessio Spantini Ricardo Baptista Youssef Marzouk Massachusetts Institute of Technology Department of Aeronautics & Astronautics ISDA 2019 January 22nd, 2019 1 / 16 Assimilation step in ensemble


slide-1
SLIDE 1

Nonlinear filtering with local couplings

Alessio Spantini Ricardo Baptista Youssef Marzouk

Massachusetts Institute of Technology Department of Aeronautics & Astronautics ISDA 2019

January 22nd, 2019

1 / 16

slide-2
SLIDE 2

mit-logo.jpg

Assimilation step in ensemble filtering algorithms

At any assimilation time k, we have a Bayesian inverse problem:

xi

πX|Y = y∗ πX

prior posterior

◮ πX is the forecast distribution on Rn ◮ πY |X is the likelihood of the observations Y ∈ Rd ◮ πX|Y =y ∗ is the filtering distribution for a realization y ∗ of the data

Goal: sample the posterior given only M prior samples x1, . . . , xM

1 / 16

slide-3
SLIDE 3

mit-logo.jpg

Inference as a transportation of measures

◮ Seek a map T that pushes forward prior to posterior [Moselhy, 2012]

(x1, . . . , xM) ∼ πX = ⇒ (T(x1), . . . , T(xN)) ∼ πX|Y =y ∗

◮ The map induces a coupling between prior and posterior measures

xi T(xi)

πX|Y = y∗ πX

transport map How to construct a “good” coupling from samples?

2 / 16

slide-4
SLIDE 4

mit-logo.jpg

Joint space = ⇒ convex problem

T(y, x)

πX|Y = y∗ πX πY ,X

joint

◮ Construct a coupling T between the joint πY ,X and the posterior ◮ T can be computed via convex optimization given samples from πY ,X ◮ Sample πY ,X using the forecast ensemble and the likelihood

(y i, xi) y i ∼ πY |X=xi

◮ Intuition: generalization of perturbed observation EnKF

3 / 16

slide-5
SLIDE 5

mit-logo.jpg

Couple the joint with a standard normal

T : Rd+n → Rn

πX|Y = y∗ πY ,X

joint

The problem of finding T can be solved by computing a Knothe- Rosenblatt rearrangement S between πY ,X and N(0, Id+n)

N(0, Id+n)

S : Rd+n → Rd+n

πY ,X

joint

◮ In a few slides: we will show how to derive T from S

4 / 16

slide-6
SLIDE 6

mit-logo.jpg

Knothe-Rosenblatt (KR) rearrangement

◮ Definition: for any pair of densities π, η on Rn, there exists a unique

triangular and monotone map S : Rn → Rn such that

S♯ π = η

◮ Triangular function (nonlinear generalization of a triangular matrix)

S(x1, . . . , xn) =      S1(x1) S2(x1, x2) . . . Sn(x1, x2, . . . , xn)     

◮ Existence stem from general factorization properties of a density, e.g.,

π = πX1 πX2|X1 · · · πXn|X1,...,Xn−1

5 / 16

slide-7
SLIDE 7

mit-logo.jpg

Triangular maps enable conditional simulation

S(x1, . . . , xn) =      S1(x1) S2(x1, x2) . . . Sn(x1, x2, . . . , xn)     

◮ Each component Sk links marginal conditionals of π and η ◮ For instance, if η = N(0, I), then for all x1, . . . , xk−1 ∈ Rk−1

ξ → Sk(x1, . . . , xk−1, ξ) pushes πXk|X1:k−1(ξ|x1:k−1) to N(0, 1)

◮ Simulate the conditional πXk|X1:k−1 by inverting a 1-D map

ξ → Sk(x1:k−1, ξ) at Gaussian samples (need triangular structure)

6 / 16

slide-8
SLIDE 8

mit-logo.jpg

Back to filtering: the analysis map

◮ We are interested in the KR map S that pushes πY ,X to N(0, Id+n) ◮ The rearrangement has a typical block structure

S(y, x) =

  • SY (y)

SX(y, x)

  • ,

which suggests two properties: SX pushes πY ,X to N(0, In) ξ → SX(y ∗, ξ) pushes πX|Y =y ∗ to N(0, In)

◮ The analysis map that pushes πY ,X to πX|Y =y ∗ is then given by

T(y, x) = SX(y ∗, ·)−1 ◦ SX(y, x)

7 / 16

slide-9
SLIDE 9

mit-logo.jpg

The stochastic map algorithm

T(y, x)

πX|Y = y∗ πX πY ,X

joint

xi πY |X=xi

Transport map filter

  • 1. Compute forecast ensemble x1, . . . , xM
  • 2. Generate samples (y i, xi) from πY ,X with y i ∼ πY |X=xi
  • 3. Build an estimator

T of T

  • 4. Compute analysis ensemble as xa

i =

T(y i, xi) for i = 1, . . . , M

8 / 16

slide-10
SLIDE 10

mit-logo.jpg

How to build an estimator for the analysis map

◮ Recall the form of the analysis map

T(y, x) = SX(y ∗, ·)−1 ◦ SX(y, x), where S(y, x) =

  • SY (y)

SX(y, x)

  • ,

S♯ πY ,X = N(0, Id+n).

◮ We propose the following estimator

T of T,

  • T(y, x) =

SX(y ∗, ·)−1 ◦ SX(y, x), where S is a maximum likelihood estimator of S

◮ The MLE of S can be computed via convex optimization [Parno, 2014]

9 / 16

slide-11
SLIDE 11

mit-logo.jpg

Estimation of the KR rearrangement from samples

Given M samples x1, . . . , xM from a density π on Rn, we want estimate the KR rearrangement S that pushes forward π to N(0, In)

◮ Constrained MLE for S

  • S ∈ arg min

S∈H

1 M

M

  • i=1

log S−1

η

pullback

(xi), η = N(0, In), where H is an approximation space for the rearrangement

◮ Each component

Sk of S can be computed in parallel via differentiable convex optimization

  • Sk ∈ arg min

Sk∈Hk

1 M

M

  • i=1

1 2Sk(xi)2 − log ∂kSk(xi)

  • 10 / 16
slide-12
SLIDE 12

mit-logo.jpg

Parameterization of the map

  • Sk ∈ arg min

Sk∈Hk

1 M

M

  • i=1

1 2Sk(xi)2 − log ∂kSk(xi)

  • ◮ Optimization is not needed for nonlinear separable parameterizations
  • f the form

Sk(x1:k) = xk + g(x1:k−1) (linear regression)

◮ In general, need convex optimization (e.g., Newton’s method) ◮ Connection to EnKF: a linear parameterization of

Sk yields a particular form of EnKF with “perturbed observations”

◮ Choice of approximation space allows to control the bias-variance of

S

◮ Richer parameterizations yield less bias, but potentially higher variance

Strategy in high dimensions: depart gradually from the linear ansatz by introducing local nonlinearities + regularization

11 / 16

slide-13
SLIDE 13

mit-logo.jpg

The map is easy to “localize” in high dimensions

◮ Regularize the estimator

S of S by imposing sparsity, e.g.,

  • S(x1, . . . , x4) =

     

  • S1(x1)
  • S2(x1, x2)
  • S3(x1, x2, x3)
  • S4(x1,x2, x3, x4)

     

◮ The sparsity of the kth component of S depends on the sparsity of

the marginal conditional function πXk|X1:k−1(xk|x1:k−1)

◮ Quick heuristic: let each

Sk depend on variables (xj)j<k that are within a distance ℓ from xk in state-space. Estimate optimal ℓ offline

◮ Explicit link between sparsity of a nonlinear S and conditional

independence in non-Gaussian graphical models can be found in

[Inference via low-dimensional couplings, Spantini et al., 2018]

12 / 16

slide-14
SLIDE 14

Lorenz 96 in chaotic regime (40-dimensional state)

◮ A hard test-case configuration [*Bengtsson et al. 2003]:

dZj dt =

  • Zj+1 − Zj−2
  • Zj−1 − Zj + F,

j = 1, . . . , 40 Yj = Zj + Ej, j = 1, 3, 5 . . . , 39

◮ F = 8 (chaotic) and Ej ∼ N(0, 0.5) (small noise for PF) ◮ Time between observations: ∆obs = 0.4 (large) ◮ Results computed over 2000 assimilation cycles

#particles: 400 #particles: 200 RMSE EnKF MapF EnKF MapF median 0.73 0.58 0.73 0.66 mean 0.78 0.61 0.80 0.71 std 0.24 0.18 0.31 0.26 spread 0.78 0.69 0.82 0.75

◮ The nonlinear filter is ∼ 25% more accurate in RMSE than EnKF

slide-15
SLIDE 15

mit-logo.jpg

Lorenz 96: details on the filtering approximation

◮ Observations were assimilated one at a time ◮ Impose sparsity of the map with a 5-way interaction model (figure) ◮ Fully separable and nonlinear parameterization of each component

  • Sk(xj1, . . . , xjp, xk) = ψ(xj1) + . . . + ψ(xjp) + ψ(xk),

where ψ(x) = a0 + a1 · x +

i>1 ai exp( −(x − ci)2/σ). ◮ Much more general parameterizations are of course possible!

14 / 16

slide-16
SLIDE 16

mit-logo.jpg

A tradeoff between nonlinearities and ensemble size

100 200 400 600

# particles

0.6 0.7 0.8 0.9

Average RMSE

EnKF Order 1 Order 2 Order 3

15 / 16

slide-17
SLIDE 17

Discussion

Nonlinear filtering with local couplings:

◮ Nonlinear generalization of the EnKF: move the ensemble members

via local nonlinear transport maps, no weights or degeneracy

◮ Learn non-Gaussian features via convex optimization ◮ Easy to localize in high dimensions (sparse updates)

Extensions and open issues:

◮ There exists a square-root version of the filter ◮ Generalization to nonlinear smoothing ◮ How much nonlinearity in the update vs. ensemble size & structure? ◮ How to regularize the estimation? (e.g., LASSO) ◮ Accuracy and stability. Is it possible to establish consistency by

adapting the complexity of the maps to the forecast ensembles?