 
              Variational mapping particle filter combining particle filters with local optimal transport Manuel Pulido 1 , 2 and Peter Jan vanLeeuwen 1 , 3 1 Data Assimilation Research Centre, University of Reading, UK 2 Department of Physics, Universidad Nacional del Nordeste, Argentina 3 Department of Atmospheric and Oceanic Sciences, University of Colorado, USA
Motivation 2-d marginalized posterior pdf −3 20 20 (contours) −4 19 19 −5 18 18 Particles from −6 17 17 y y z prediction pdf −7 16 16 −8 15 15 −9 14 14 −9 −8 −7 −6 −5 −4 −3 −9 −8 −7 −6 −5 −4 −3 −9 −8 −7 −6 −5 −4 −3 x x y −3 20 20 Resampled particles −4 19 19 −5 18 18 −6 17 17 y z z −7 16 16 −8 15 15 −9 14 14 −9 −8 −7 −6 −5 −4 −3 −9 −8 −7 −6 −5 −4 −3 −9 −8 −7 −6 −5 −4 −3 x x y Distribution of the particles in the SIR filter for the Lorenz-63 system using 100 particles after 115 cycles.
Motivation Posterior pdf (d) (e) (f) � ✠ � KDE 0.150 0.150 prop q 0.4 Prior pdf post p 0.125 SIR 0.125 ❈ ✄ 0.100 0.3 0.100 ❈ ✄ ❲ ❈ ✄ ✎ p 0.075 0.075 0.2 0.050 0.050 0.1 0.025 0.025 0.000 0.000 0.0 −8 −6 −4 −8 −6 −4 14 16 18 20 x y z Resulting marginalized densities. Red lines are the inferences made of the posterior by the SIR filter. True posterior density is in black lines. Sample impoverishement leads to a very crude representation of the posterior density even for a reasonable number of particles. Can we improve the sample representation of densities beyond resampling?
Homotopic particle flows 0.4 p target Given two densities, p and q , we want to express q proposal 0.3 them as a homotopy, 0.2 q λ ( x | y ) = p ( y | x ) λ q ( x ) 0.1 C λ 0.0 Transformation from the prior to the posterior den- −5 0 5 10 x sity. lim λ → 1 q λ ( x | y ) = p ( x | y ) 0.4 q λ q 1 The parameter λ varies from 0 to 1 (a pseudo-time) 0.3 q 0 Working with sample points, the deformations could 0.2 be thought as particles moving in a flow according 0.1 to a set of stochastic differential equations: 0.0 d x λ −5 0 5 10 d λ = v λ ( x λ ) + η λ ( x λ ) x The particles are tracers which are moved along the streamlines of the flow. Daum and Huang (SPST, 2008)
Approximations: Gaussian particle flows Most of the works (Daum and Huang 08,11,13; Bunch and Gosill JASA 2016, Li and Coates IEEE 2017) on particle flows are based on (different flavors of) the Gaussian approximation. The velocity field (drift term) is given by v ( x ( j ) λ , λ ) = A ( λ ) x ( j ) λ + b ( λ ) A ( λ ) = − 1 2 PH T ( HPH T + λ R ) − 1 H b ( λ ) = ( I + 2 λ A )[( I + λ A ) PH T R − 1 y + Ax 0 ] Gaussian particle flows are different from approximating the proposal distribution with the result of a Kalman filter. Here, the particles are moved smoothly and the covariances, P , are evolving (in pseudo-time) and so the velocity field.
In this work we propose to combine particle flows and optimal transport
Optimal transport Deterministic flow (no diffussion processes), η = 0. Only unknown v ( x ) . Can we determine v ( x ) from optimal transport? Given the proposal ‘mass’ distribution q ( x ) and the target ‘mass’ distribution p ( z ) , we want to transport the mass units from q to p via a mapping T : x → z that minimizes the cost of transportation given by � C = q ( x ) c ( x , T ( x )) d x = E x ∼ q [ c ( x , T ( x ))] Rigorously, T is assumed a diffeomorphic transport map T : R n → R n between the two probability measures (i.e. smooth and invertible). The optimal transport map T is the one that gives the minimal C . This is the classical Monge-Kantorovich problem.
Kullback-Leibler divergence For the sake of feasibility we take the distance measure between the random fields X and Z to be given by � q ( z ) � c ( X , Z ) = log p ( x ) This is a divergence measure (enough for our purposes). Considering the coupling z = T ( x ) , the resulting transportation cost is � � q ( T ( x )) � d x = KL ( p � q T ) C ( p , q T ) = q ( T ( x )) log p ( x ) This is the Kullback-Leibler divergence between p and q ( T ) . We want a map T that minimizes KL . In information terms, we are seeking to maximize the amount of information that is present in q from p , or to minimize the uncertainty, i.e. the relative entropy between q and p .
Gradient flows • T is a sequence of mappings T i . We want a T that is a composition of T 1 ◦ ( T 2 ◦ ( T 3 · · · ))) that minimizes the differences between p and q T in the sense of the Kullback-Leibler divergence. Each transformation is a local optimal transport problem (Villani, 2008). • Suppose we produce a smooth transformation along an arbitrary direction v ( x ) , x λ + ǫ = T λ ( x λ ) = x λ + ǫ v λ ( x λ ) assuming ǫ small. T ( x ) is a small perturbation to the identity transformation. • The optimal local map is the one that gives the steepest descent direction of the transport cost function. Under mild smoothness conditions, these local maps give the optimal transport cost (Angenet, et al SIAM 2003). We aim to find the direction v at x that gives the steepest descent of KL.
Embedding the map in a reproducing kernel Hilbert space Consider as space of mapping functions the unit ball of an RKHS such that � v � F ≤ 1 (Liu and Wang, 2016). Any function from the RKHS can be expressed as the dot product by the kernel that defines the RKHS, v ( x ) = � K ( · , x ) , v ( · ) � F The directional derivative of the KL results in � � � q ( x ′ ) [ K ( x ′ , · ) ∇ x log p ( x ′ | y ) + ∇ x K ( x ′ , · )] d x ′ , v ( · ) D v KL = − . F ∆ The definition of the gradient field is D v f ( x ) = �∇ f ( x ) , v � for | v | = 1, therefore, we have found the gradient of KL : ∇ KL ( x ) = −E x ′ ∼ q [ K ( x ′ , x ) ∇ x log p ( x ′ | y ) + ∇ x K ( x ′ , x )] .
Monte Carlo representation of the KL gradient The density q is represented through a set of samples x ( 1 : N p ) where i is the i mapping iteration. Using MC integration for the q expectation in the KL gradient N p ∇ KL ( x ) ≈ − 1 � � K ( x ( l ) i , x ) ∇ x log p ( x ( l ) i ) + ∇ x K ( x ( l ) � i , x ) . N p l = 1 The steepest descent direction of the transport cost (KL) is v ( x ) = −∇ KL ( x ) . Density transformation: q i ( x ( j ) i ) = q i − 1 ( x ( j ) i − 1 ) | J ( x ( j ) i − 1 ) | The local mappings are defined as: x ( j ) i + 1 = T ( x ( j ) i ) = x ( j ) − ǫ ∇ KL ( x ( j ) i ) i The particles are pushed forward in the gradient flow to minimize KL. The flow itself depends on the interacting particles at that mapping iteration.
Variational mapping particle filter VMPF conducts an optimization It determines the optimal positions of the sample points. The sample points are the variational parameters. Input: Given { x ( j ) k − 1 } N j = 1 , y k , M ( · ) , p ( y | x ) and p ( η ) ⊲ Parameters ǫ , N t x ( 1 : N ) ← M ( x ( 1 : N ) k − 1 ) + η k ⊲ Forecast stage k , 0 Optimization module: Given initial x ( 1 : N ) k , 0 , we seek a new set that minimizes KL. while i < I and convergence criterion do ⊲ N eff < N t or |∇ KL i | / |∇ KL i − 1 | < δ for j = 1 , N do x ( j ) k , i ← x ( j ) k , i − 1 − ǫ ∇ KL ( x ( j ) k , i − 1 ) ⊲ ∇ KL requires ∇ log p , K , ∇ K end for end while Output: { x ( j ) k , i } N j = 1
Experiment with Lorenz-63 −3 20 20 −4 19 19 −5 18 18 Distribution of particles −6 17 17 y y z of the prediction density −7 16 16 −8 15 15 −9 14 14 −9 −8 −7 −6 −5 −4 −3 −9 −8 −7 −6 −5 −4 −3 −9 −8 −7 −6 −5 −4 −3 x x y −3 20 20 −4 19 19 −5 18 18 Distribution of particles −6 17 17 y z z of the posterior density −7 16 16 −8 15 15 after the VMPF −9 14 14 −9 −8 −7 −6 −5 −4 −3 −9 −8 −7 −6 −5 −4 −3 −9 −8 −7 −6 −5 −4 −3 x x y Model error N ( 0 , Q ) Observational error N ( 0 , R ) − 1 / 2 � x − x ′ � 2 K ( x , x ′ ) = exp � � Radial basis functions as kernel: A Mahalanobis distance. Kernel covariance: A = γ 1 / 2 Q , γ bandwidth
Marginal distributions Excelent convergence (a) (b) (c) MPF experiment ❅ 0.175 0.175 0.175 last iter ❘ ❅ ini q 0.150 0.150 0.150 post p 0.125 0.125 0.125 0.100 0.100 0.100 p 0.075 0.075 0.075 0.050 0.050 0.050 0.025 0.025 0.025 0.000 0.000 0.000 −8 −6 −4 −8 −6 −4 14 16 18 20 x y z (d) (e) (f) SIR experiment KDE 0.150 0.150 prop q 0.4 post p 0.125 0.125 0.100 0.3 0.100 p 0.075 0.075 0.2 0.050 0.050 0.1 0.025 0.025 0.000 0.000 0.0 −8 −6 −4 −8 −6 −4 14 16 18 20 x y z
Mean square error 100 particles 20 particles 5 particles (a) (b) (c) 2.0 1.4 MPF 3.0 SIR 1.2 ✲ SIR 2.5 1.5 1.0 2.0 0.8 RMSE 1.0 1.5 0.6 1.0 0.4 0.5 ✲ MPF 0.5 0.2 0.0 0.0 0.0 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 Time Time Time RMSE as a function of assimilation cycle. The time-averaged RMSE exhibits a robust behavior in the MPF: For 100 particles is 0.482, for 5 particles is 0.489!.
Recommend
More recommend