Variational mapping particle filter combining particle filters with - - PowerPoint PPT Presentation

variational mapping particle filter
SMART_READER_LITE
LIVE PREVIEW

Variational mapping particle filter combining particle filters with - - PowerPoint PPT Presentation

Variational mapping particle filter combining particle filters with local optimal transport Manuel Pulido 1 , 2 and Peter Jan vanLeeuwen 1 , 3 1 Data Assimilation Research Centre, University of Reading, UK 2 Department of Physics, Universidad


slide-1
SLIDE 1

Variational mapping particle filter

combining particle filters with local optimal transport Manuel Pulido 1,2 and Peter Jan vanLeeuwen1,3

1 Data Assimilation Research Centre, University of Reading, UK 2 Department of Physics, Universidad Nacional del Nordeste, Argentina 3 Department of Atmospheric and Oceanic Sciences, University of Colorado, USA

slide-2
SLIDE 2

Motivation

−9 −8 −7 −6 −5 −4 −3 x −9 −8 −7 −6 −5 −4 −3 y −9 −8 −7 −6 −5 −4 −3 x 14 15 16 17 18 19 20 y −9 −8 −7 −6 −5 −4 −3 y 14 15 16 17 18 19 20 z −9 −8 −7 −6 −5 −4 −3 x −9 −8 −7 −6 −5 −4 −3 y −9 −8 −7 −6 −5 −4 −3 x 14 15 16 17 18 19 20 z −9 −8 −7 −6 −5 −4 −3 y 14 15 16 17 18 19 20 z

2-d marginalized posterior pdf

(contours)

Particles from prediction pdf Resampled particles

Distribution of the particles in the SIR filter for the Lorenz-63 system using 100 particles after 115 cycles.

slide-3
SLIDE 3

Motivation

−8 −6 −4 x 0.000 0.025 0.050 0.075 0.100 0.125 0.150 p

(d)

−8 −6 −4 y 0.000 0.025 0.050 0.075 0.100 0.125 0.150

(e)

KDE prop q post p

14 16 18 20 z 0.0 0.1 0.2 0.3 0.4

(f)

✄ ✄ ✄ ✎

Prior pdf Posterior pdf

❈ ❈ ❈ ❲

SIR

Resulting marginalized densities. Red lines are the inferences made of the posterior by the SIR filter. True posterior density is in black lines. Sample impoverishement leads to a very crude representation of the posterior density even for a reasonable number of particles.

Can we improve the sample representation of densities beyond resampling?

slide-4
SLIDE 4

Homotopic particle flows

Given two densities, p and q, we want to express them as a homotopy, qλ(x|y) = p(y|x)λq(x) Cλ Transformation from the prior to the posterior den-

  • sity. limλ→1 qλ(x|y) = p(x|y)

The parameter λ varies from 0 to 1 (a pseudo-time)

Working with sample points, the deformations could be thought as particles moving in a flow according to a set of stochastic differential equations:

−5 5 10 x 0.0 0.1 0.2 0.3 0.4 p target q proposal −5 5 10 x 0.0 0.1 0.2 0.3 0.4 qλ q1 q0

dxλ dλ = vλ(xλ) + ηλ(xλ) The particles are tracers which are moved along the streamlines of the flow.

Daum and Huang (SPST, 2008)

slide-5
SLIDE 5

Approximations: Gaussian particle flows

Most of the works (Daum and Huang 08,11,13; Bunch and Gosill JASA 2016, Li and Coates IEEE 2017) on particle flows are based on (different flavors of) the Gaussian approximation. The velocity field (drift term) is given by v(x(j)

λ , λ) = A(λ)x(j) λ + b(λ)

A(λ) = −1 2PHT(HPHT + λR)−1H b(λ) = (I + 2λA)[(I + λA)PHTR−1y + Ax0] Gaussian particle flows are different from approximating the proposal distribution with the result of a Kalman filter. Here, the particles are moved smoothly and the covariances, P, are evolving (in pseudo-time) and so the velocity field.

slide-6
SLIDE 6

In this work we propose to combine particle flows and optimal transport

slide-7
SLIDE 7

Optimal transport

Deterministic flow (no diffussion processes), η = 0. Only unknown v(x). Can we determine v(x) from optimal transport? Given the proposal ‘mass’ distribution q(x) and the target ‘mass’ distribution p(z), we want to transport the mass units from q to p via a mapping T : x → z that minimizes the cost of transportation given by C =

  • q(x)c(x, T(x))dx = Ex∼q[c(x, T(x))]

Rigorously, T is assumed a diffeomorphic transport map T : Rn → Rn between the two probability measures (i.e. smooth and invertible). The optimal transport map T is the one that gives the minimal C. This is the classical Monge-Kantorovich problem.

slide-8
SLIDE 8

Kullback-Leibler divergence

For the sake of feasibility we take the distance measure between the random fields X and Z to be given by c(X, Z) = log q(z) p(x)

  • This is a divergence measure (enough for our purposes).

Considering the coupling z = T(x), the resulting transportation cost is C(p, qT) =

  • q(T(x)) log

q(T(x)) p(x)

  • dx = KL(pqT)

This is the Kullback-Leibler divergence between p and q(T). We want a map T that minimizes KL. In information terms, we are seeking to maximize the amount of information that is present in q from p, or to minimize the uncertainty, i.e. the relative entropy between q and p.

slide-9
SLIDE 9

Gradient flows

  • T is a sequence of mappings Ti. We want a T that is a composition of

T1 ◦ (T2 ◦ (T3 · · · ))) that minimizes the differences between p and qT in the sense of the Kullback-Leibler divergence.

Each transformation is a local optimal transport problem (Villani, 2008).

  • Suppose we produce a smooth transformation along an arbitrary direction

v(x), xλ+ǫ = Tλ(xλ) = xλ + ǫvλ(xλ) assuming ǫ small. T(x) is a small perturbation to the identity transformation.

  • The optimal local map is the one that gives the steepest descent direction
  • f the transport cost function. Under mild smoothness conditions, these local

maps give the optimal transport cost (Angenet, et al SIAM 2003).

We aim to find the direction v at x that gives the steepest descent of KL.

slide-10
SLIDE 10

Embedding the map in a reproducing kernel Hilbert space

Consider as space of mapping functions the unit ball of an RKHS such that vF≤ 1 (Liu and Wang, 2016). Any function from the RKHS can be expressed as the dot product by the kernel that defines the RKHS, v(x) = K(·, x), v(·)F The directional derivative of the KL results in DvKL =

  • q(x′) [K(x′, ·)∇x log p(x′|y) + ∇xK(x′, ·)] dx′, v(·)
  • F

. The definition of the gradient field is Dvf(x)

= ∇f(x), v for |v| = 1, therefore, we have found the gradient of KL:

∇KL(x) = −Ex′∼q [K(x′, x)∇x log p(x′|y) + ∇xK(x′, x)] .

slide-11
SLIDE 11

Monte Carlo representation of the KL gradient

The density q is represented through a set of samples x(1:Np)

i

where i is the mapping iteration. Using MC integration for the q expectation in the KL gradient ∇KL(x) ≈ − 1 Np

Np

  • l=1
  • K(x(l)

i , x)∇x log p(x(l) i ) + ∇xK(x(l) i , x)

  • .

The steepest descent direction of the transport cost (KL) is v(x) = −∇KL(x). Density transformation: qi(x(j)

i ) = qi−1(x(j) i−1)|J(x(j) i−1)|

The local mappings are defined as: x(j)

i+1 = T(x(j) i ) = x(j) i

− ǫ∇KL(x(j)

i )

The particles are pushed forward in the gradient flow to minimize KL. The flow itself depends on the interacting particles at that mapping iteration.

slide-12
SLIDE 12

Variational mapping particle filter

VMPF conducts an optimization It determines the optimal positions of the sample points. The sample points are the variational parameters.

Input: Given {x(j)

k−1}N j=1, yk, M(·), p(y|x) and p(η)

⊲ Parameters ǫ, Nt x(1:N)

k,0

← M(x(1:N)

k−1 ) + ηk

⊲ Forecast stage Optimization module: Given initial x(1:N)

k,0 , we seek a new set that minimizes KL.

while i < I and convergence criterion do ⊲ Neff < Nt or |∇KLi|/|∇KLi−1| < δ for j = 1, N do x(j)

k,i ← x(j) k,i−1 − ǫ ∇KL(x(j) k,i−1)

⊲ ∇KL requires ∇ log p, K, ∇K end for end while Output: {x(j)

k,i}N j=1

slide-13
SLIDE 13

Experiment with Lorenz-63

−9 −8 −7 −6 −5 −4 −3 x −9 −8 −7 −6 −5 −4 −3 y −9 −8 −7 −6 −5 −4 −3 x 14 15 16 17 18 19 20 y −9 −8 −7 −6 −5 −4 −3 y 14 15 16 17 18 19 20 z −9 −8 −7 −6 −5 −4 −3 x −9 −8 −7 −6 −5 −4 −3 y −9 −8 −7 −6 −5 −4 −3 x 14 15 16 17 18 19 20 z −9 −8 −7 −6 −5 −4 −3 y 14 15 16 17 18 19 20 z

Distribution of particles

  • f the prediction density

Distribution of particles

  • f the posterior density

after the VMPF

Model error N(0, Q) Observational error N(0, R) Radial basis functions as kernel: K(x, x′) = exp

  • −1/2x − x′2

A

  • Mahalanobis distance. Kernel covariance: A = γ1/2Q, γ bandwidth
slide-14
SLIDE 14

Marginal distributions

−8 −6 −4 x 0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 p

(a)

−8 −6 −4 y 0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175

(b)

last iter ini q post p

14 16 18 20 z 0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175

(c)

MPF experiment

❅ ❅ ❘

Excelent convergence

−8 −6 −4 x 0.000 0.025 0.050 0.075 0.100 0.125 0.150 p

(d)

−8 −6 −4 y 0.000 0.025 0.050 0.075 0.100 0.125 0.150

(e)

KDE prop q post p

14 16 18 20 z 0.0 0.1 0.2 0.3 0.4

(f)

SIR experiment

slide-15
SLIDE 15

Mean square error

50 100 150 200 Time 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 RMSE

(a)

50 100 150 200 Time 0.0 0.5 1.0 1.5 2.0

(b)

MPF SIR

50 100 150 200 Time 0.0 0.5 1.0 1.5 2.0 2.5 3.0

(c)

100 particles 20 particles 5 particles

✲SIR ✲MPF RMSE as a function of assimilation cycle. The time-averaged RMSE exhibits a robust behavior in the MPF: For 100 particles is 0.482, for 5 particles is 0.489!.

slide-16
SLIDE 16

Nonlinear observational operator

−2 −1 1 2 3 x 0.0 0.2 0.4 0.6 0.8 1.0 Pseudo-time −2 −1 1 2 3 x 0.000 0.005 0.010 0.015 0.020 0.025 0.030 p posterior prior MPF

H(x) = |x| p(x) = N(0.25, 1) y = 2.0, R = 0.5.

−10 10 x 0.0 0.1 0.2 0.3 0.4 0.5 p

(a)

−10 10 y 0.0 0.1 0.2 0.3 0.4 0.5

(b)

MPF ini q posterior 27.0 27.5 28.0 28.5 z 0.01 0.02 0.03 0.04 0.05 0.06

(c)

−10 10 x1 −15 −10 −5 5 10 15 x2 −10 10 x1 27.00 27.25 27.50 27.75 28.00 28.25 28.50 28.75 x3 −10 10 x2 27.00 27.25 27.50 27.75 28.00 28.25 28.50 28.75 x3 Posterior Prior

L63 - H(x) = |x|

slide-17
SLIDE 17

Lorenz-96. MPF comparison with stochastic EnKF

100 200 300 Time 0.4 0.6 0.8 1.0 1.2 1.4 1.6 RMSE

MPF 20 EnKF 20 EnKF 100

100 200 300 Time 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Spread

MPF 20 EnKF 20 EnKF 100 RMSE

Lorenz-96 40 variables All state observed 20-100 particles/members

100 200 300 400 500 Time 2 4 6 8 10 RMSE

(a)

MPF EnKF

100 200 300 400 500 Time 1.0 1.5 2.0 2.5 3.0 3.5 Spread

(b)

Lorenz-96 40 variables 13 sparse observations 20 particles/members

slide-18
SLIDE 18

High-dimensional test: Lorenz-96 with 1000 variables

Full state

  • bserved

Half state observed

50 100 150 200 250 300 Time −10 −5 5 10 X

MPF 50p Truth Obs

50 100 150 200 250 300 Time −10 −5 5 10 X

MPF 20p Truth Obs

Time series x1. Partially observed state.

An evaluation of MPF in geophysical models, i.e. quasi-geostropic and shallow water models, is underway.

slide-19
SLIDE 19

Conclusions

The MPF is a well-grounded bridge between Monte Carlo sampling and

  • ptimization

Pros:

  • In the limit of a single particle it gives the MAP estimation, Np = 1 →

∇KL = −∇ log p.

  • As a result: It exhibits a robust RMSE behavior for small number of

particles.

  • By construction, MPF maximizes the amount of information that is present

in q from p. Cons:

  • Complexity N2
  • p. More expensive than SIR (but without extra M

evaluations).

  • It requires a good representation of model error covariance, Q. Estimation

methods are required!

  • Kernel covariance (or its bandwidth) is also a free parameter.

Pulido and vanLeewen, 2018. Available in arXiv:1805.11380 .