nonlinear filtering with local couplings
play

Nonlinear filtering with local couplings Alessio Spantini Ricardo - PowerPoint PPT Presentation

Nonlinear filtering with local couplings Alessio Spantini Ricardo Baptista Youssef Marzouk Massachusetts Institute of Technology Department of Aeronautics & Astronautics ISDA 2019 January 22nd, 2019 1 / 16 Assimilation step in ensemble


  1. Nonlinear filtering with local couplings Alessio Spantini Ricardo Baptista Youssef Marzouk Massachusetts Institute of Technology Department of Aeronautics & Astronautics ISDA 2019 January 22nd, 2019 1 / 16

  2. Assimilation step in ensemble filtering algorithms At any assimilation time k , we have a Bayesian inverse problem: x i π X π X | Y = y ∗ prior posterior ◮ π X is the forecast distribution on R n ◮ π Y | X is the likelihood of the observations Y ∈ R d ◮ π X | Y = y ∗ is the filtering distribution for a realization y ∗ of the data Goal: sample the posterior given only M prior samples x 1 , . . . , x M mit-logo.jpg 1 / 16

  3. Inference as a transportation of measures ◮ Seek a map T that pushes forward prior to posterior [Moselhy, 2012] ( x 1 , . . . , x M ) ∼ π X = ⇒ ( T ( x 1 ) , . . . , T ( x N )) ∼ π X | Y = y ∗ ◮ The map induces a coupling between prior and posterior measures transport map T ( x i ) x i π X π X | Y = y ∗ How to construct a “good” coupling from samples? mit-logo.jpg 2 / 16

  4. Joint space = ⇒ convex problem joint π Y , X T ( y , x ) π X π X | Y = y ∗ ◮ Construct a coupling T between the joint π Y , X and the posterior ◮ T can be computed via convex optimization given samples from π Y , X ◮ Sample π Y , X using the forecast ensemble and the likelihood ( y i , x i ) y i ∼ π Y | X = x i ◮ Intuition: generalization of perturbed observation EnKF mit-logo.jpg 3 / 16

  5. Couple the joint with a standard normal T : R d + n → R n joint π Y , X π X | Y = y ∗ The problem of finding T can be solved by computing a Knothe- Rosenblatt rearrangement S between π Y , X and N (0 , I d + n ) S : R d + n → R d + n joint π Y , X N (0 , I d + n ) mit-logo.jpg ◮ In a few slides: we will show how to derive T from S 4 / 16

  6. Knothe-Rosenblatt (KR) rearrangement ◮ Definition: for any pair of densities π, η on R n , there exists a unique triangular and monotone map S : R n → R n such that S ♯ π = η ◮ Triangular function (nonlinear generalization of a triangular matrix)   S 1 ( x 1 )   S 2 ( x 1 , x 2 )   S ( x 1 , . . . , x n ) =   .  .  . S n ( x 1 , x 2 , . . . , x n ) ◮ Existence stem from general factorization properties of a density, e.g., π = π X 1 π X 2 | X 1 · · · π X n | X 1 ,..., X n − 1 mit-logo.jpg 5 / 16

  7. Triangular maps enable conditional simulation   S 1 ( x 1 )   S 2 ( x 1 , x 2 )   S ( x 1 , . . . , x n ) =   .  .  . S n ( x 1 , x 2 , . . . , x n ) ◮ Each component S k links marginal conditionals of π and η ◮ For instance, if η = N (0 , I ) , then for all x 1 , . . . , x k − 1 ∈ R k − 1 ξ �→ S k ( x 1 , . . . , x k − 1 , ξ ) pushes π X k | X 1: k − 1 ( ξ | x 1: k − 1 ) to N (0 , 1) ◮ Simulate the conditional π X k | X 1: k − 1 by inverting a 1-D map ξ �→ S k ( x 1: k − 1 , ξ ) at Gaussian samples ( need triangular structure ) mit-logo.jpg 6 / 16

  8. Back to filtering: the analysis map ◮ We are interested in the KR map S that pushes π Y , X to N (0 , I d + n ) ◮ The rearrangement has a typical block structure � � S Y ( y ) S ( y , x ) = , S X ( y , x ) which suggests two properties: S X pushes π Y , X to N (0 , I n ) ξ �→ S X ( y ∗ , ξ ) pushes π X | Y = y ∗ to N (0 , I n ) ◮ The analysis map that pushes π Y , X to π X | Y = y ∗ is then given by T ( y , x ) = S X ( y ∗ , · ) − 1 ◦ S X ( y , x ) mit-logo.jpg 7 / 16

  9. The stochastic map algorithm joint π Y , X T ( y , x ) π Y | X = x i x i π X π X | Y = y ∗ Transport map filter 1. Compute forecast ensemble x 1 , . . . , x M 2. Generate samples ( y i , x i ) from π Y , X with y i ∼ π Y | X = x i 3. Build an estimator � T of T i = � 4. Compute analysis ensemble as x a T ( y i , x i ) for i = 1 , . . . , M mit-logo.jpg 8 / 16

  10. How to build an estimator for the analysis map ◮ Recall the form of the analysis map T ( y , x ) = S X ( y ∗ , · ) − 1 ◦ S X ( y , x ) , where � � S Y ( y ) S ( y , x ) = , S ♯ π Y , X = N (0 , I d + n ) . S X ( y , x ) ◮ We propose the following estimator � T of T , S X ( y ∗ , · ) − 1 ◦ � T ( y , x ) = � � S X ( y , x ) , where � S is a maximum likelihood estimator of S ◮ The MLE of S can be computed via convex optimization [Parno, 2014] mit-logo.jpg 9 / 16

  11. Estimation of the KR rearrangement from samples Given M samples x 1 , . . . , x M from a density π on R n , we want estimate the KR rearrangement S that pushes forward π to N (0 , I n ) ◮ Constrained MLE for S M � 1 � log S − 1 S ∈ arg min η ( x i ) , η = N (0 , I n ) , ♯ M S ∈H � �� � i =1 pullback where H is an approximation space for the rearrangement S k of � ◮ Each component � S can be computed in parallel via differentiable convex optimization � 1 � M � 1 S k ∈ arg min 2 S k ( x i ) 2 − log ∂ k S k ( x i ) � M S k ∈H k i =1 mit-logo.jpg 10 / 16

  12. Parameterization of the map � 1 � M � 1 S k ∈ arg min 2 S k ( x i ) 2 − log ∂ k S k ( x i ) � M S k ∈H k i =1 ◮ Optimization is not needed for nonlinear separable parameterizations of the form � S k ( x 1: k ) = x k + g ( x 1: k − 1 ) ( linear regression ) ◮ In general, need convex optimization (e.g., Newton’s method) S k yields a ◮ Connection to EnKF: a linear parameterization of � particular form of EnKF with “perturbed observations” ◮ Choice of approximation space allows to control the bias-variance of � S ◮ Richer parameterizations yield less bias, but potentially higher variance Strategy in high dimensions: depart gradually from the linear ansatz by introducing local nonlinearities + regularization mit-logo.jpg 11 / 16

  13. The map is easy to “localize” in high dimensions ◮ Regularize the estimator � S of S by imposing sparsity , e.g.,   � S 1 ( x 1 )   �  S 2 ( x 1 , x 2 )  �   S ( x 1 , . . . , x 4 ) =   � S 3 ( x 1 , x 2 , x 3 )   � S 4 ( x 1 ,x 2 , x 3 , x 4 ) ◮ The sparsity of the k th component of S depends on the sparsity of the marginal conditional function π X k | X 1: k − 1 ( x k | x 1: k − 1 ) S k depend on variables ( x j ) j<k that are ◮ Quick heuristic: let each � within a distance ℓ from x k in state-space. Estimate optimal ℓ offline ◮ Explicit link between sparsity of a nonlinear S and conditional independence in non-Gaussian graphical models can be found in [Inference via low-dimensional couplings, Spantini et al., 2018] mit-logo.jpg 12 / 16

  14. Lorenz 96 in chaotic regime (40-dimensional state) ◮ A hard test-case configuration [*Bengtsson et al. 2003]: � � d Z j = Z j +1 − Z j − 2 Z j − 1 − Z j + F, j = 1 , . . . , 40 d t = Z j + E j , j = 1 , 3 , 5 . . . , 39 Y j ◮ F = 8 (chaotic) and E j ∼ N (0 , 0 . 5) ( small noise for PF ) ◮ Time between observations: ∆ obs = 0 . 4 ( large ) ◮ Results computed over 2000 assimilation cycles #particles: 400 #particles: 200 RMSE EnKF EnKF MapF MapF median 0.73 0.58 0.73 0.66 mean 0.78 0.80 0.61 0.71 std 0.24 0.18 0.31 0.26 spread 0.78 0.69 0.82 0.75 ◮ The nonlinear filter is ∼ 25% more accurate in RMSE than EnKF

  15. Lorenz 96: details on the filtering approximation ◮ Observations were assimilated one at a time ◮ Impose sparsity of the map with a 5 -way interaction model ( figure ) ◮ Fully separable and nonlinear parameterization of each component � S k ( x j 1 , . . . , x j p , x k ) = ψ ( x j 1 ) + . . . + ψ ( x j p ) + ψ ( x k ) , where ψ ( x ) = a 0 + a 1 · x + � i> 1 a i exp( − ( x − c i ) 2 /σ ) . ◮ Much more general parameterizations are of course possible! mit-logo.jpg 14 / 16

  16. A tradeoff between nonlinearities and ensemble size Average RMSE EnKF 0.9 Order 1 Order 2 Order 3 0.8 0.7 0.6 100 200 400 600 # particles mit-logo.jpg 15 / 16

  17. Discussion Nonlinear filtering with local couplings: ◮ Nonlinear generalization of the EnKF: move the ensemble members via local nonlinear transport maps, no weights or degeneracy ◮ Learn non-Gaussian features via convex optimization ◮ Easy to localize in high dimensions (sparse updates) Extensions and open issues: ◮ There exists a square-root version of the filter ◮ Generalization to nonlinear smoothing ◮ How much nonlinearity in the update vs. ensemble size & structure? ◮ How to regularize the estimation? (e.g., LASSO) ◮ Accuracy and stability. Is it possible to establish consistency by adapting the complexity of the maps to the forecast ensembles?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend