 
              High-dimensional estimation of nonlinear transformations for Bayesian filtering Ricardo Baptista, Daniele Bigoni, Alessio Spantini, Youssef Marzouk Massachusetts Institute of Technology Department of Aeronautics & Astronautics 7th International Symposium on Data Assimilation Kobe, Japan January 23, 2019 Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 1 / 16
Bayesian Approach to Filtering Non-Gaussian State-Space Model ◮ Model dynamics - transition kernel: x t ∼ f ( ·| x t − 1 ) ◮ Observations - likelihood model: y t ∼ g ( ·| x t ) x t − 1 x t + 1 x 0 x 1 x t y t − 1 y t + 1 y 1 y t Goal : Characterize filtering distributions π t | t := π ( x t | y 1 , . . . , y t ) Challenges of Filtering ◮ Complex nonlinear dynamics (e.g., chaotic system) ◮ Sparse observations in space and time ◮ Limited model evaluations available (e.g., small ensemble sizes) ◮ High-dimensional states, x t ∈ R d for d ∼ O ( 10 6 ) Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 2 / 16
Stochastic Maps Algorithm [Spantini et al., 2019] Generalization of EnKF for Inference Step Find a nonlinear map T that couples forecast π t | t − 1 and analysis π t | t Main Idea ◮ Learn T given N ≪ d forecast samples x ( i ) ∼ π t | t − 1 t ◮ Generate analysis samples T ( x ( i ) t ) ∼ π t | t for i = 1 , . . . , N Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 3 / 16
Building Block of Stochastic Maps Transport Maps [Moselhy et al., 2012] ◮ Deterministic coupling between densities π, η on R d such that π ( x ) = S # η ( x ) := η ◦ S ( x ) | det ( ∇ S ( x )) | ◮ Coupling exists and is unique for triangular and monotone maps   S 1 ( x 1 )   S 2 ( x 1 , x 2 )   S ( x ) =   . .   . S d ( x 1 , x 2 , . . . , x d ) ◮ For Gaussian η , find S by solving decoupled convex problems � 1 � 2 S k ( x ) 2 − log | ∂ k S k ( x ) | S D KL ( π || S # η ) min ⇔ min ∀ k E π S k Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 4 / 16
Triangular Maps Enable Conditional Sampling ◮ Each component S k characterizes one marginal conditional of π π ( x ) = π ( x 1 ) π ( x 2 | x 1 ) · · · π ( x d | x 1 , . . . , x d − 1 ) ◮ For π ( y , x ) and η ( z 1 , z 2 ) , consider the triangular map � S y ( y ) � S ( y , x ) = S x ( y , x ) ◮ The map x �→ S x ( y ∗ , x ) pushes forward π ( x | y ∗ ) to η ( z 2 ) Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 5 / 16
Stochastic Maps Algorithm Forecast Step Apply forward model to generate forecast ensemble x ( i ) ∼ f ( ·| x ( i ) t − 1 ) 1 t Analysis Step Perturbed observations : Sample y ( i ) ∼ g ( ·| x ( i ) t ) using forecast 1 t Estimate lower-triangular map � S that couples π y t , x t and N ( 0 , I ) 2 � � � S y ( y ) � S ( y , x ) = � S x ( y , x ) S x ( y ∗ , · ) − 1 ◦ � Compose maps � T ( y , x ) = � S x ( y , x ) 3 t ) ( i ) = � T ( y ( i ) t , x ( i ) Generate analysis ensemble ( x a t ) for i = 1 , . . . , N 4 Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 6 / 16
Performance of Stochastic Maps Lorenz-96 Model ◮ d = 40 with F = 8, ∆ t obs = 0 . 4 and 20 observations ◮ Structure for S is based on tuning localization radius Challenge : Build adaptive estimators for S using N ≪ d samples Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 7 / 16
Structure Inherited by Maps Theorem: Sparsity of Transport Maps [Spantini et al., 2018] Conditional independence of π defines functional dependence of S k ( x ) Lorenz-96 Model ◮ Estimate forecast covariance C t | t − 1 over 1000 assimilation cycles 10 9 8 7 6 5 4 3 2 1 Average C − 1 Sparsity of C − 1 t | t − 1 t | t − 1 Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 8 / 16
Learning Transport Maps with Sparse Structure Key Idea Learn rather than impose sparsity in map’s parameters Linear Transport Maps ◮ Linear components: S ( x ) = Lx , with lower-triangular L ◮ Approximating density: π = S # η = N ( 0 , C ) where C − 1 = LL T Connection to Linear Regression ◮ Normalize diagonal: S k ( x ) = L kk ( β 1 x 1 + · · · + β k − 1 x k − 1 + x k ) ◮ Rewrite optimization problem for linear map parameters: � 1 � kk ( x 1 : k − 1 β + x k ) 2 − log | L kk | 2 L 2 min L kk > 0 , β E π ◮ Using samples from π : � � − 1 / 2 ˆ � N � x 1 : k − 1 ˆ 2 N � x 1 : k − 1 β + x k � 2 1 1 β + x k � 2 β ∈ arg min 2 , L kk = 2 β Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 9 / 16
Learning Transport Maps with Sparse Structure Proposed Approach ◮ Add ℓ 1 -penalty for sparse linear regression (LASSO): ˆ 2 N � x 1 : k − 1 β + x k � 2 1 β ∈ arg min 2 + λ n � β � 1 β Existing Work in Filtering ◮ Learn bandwidth of inverse covariance ( C − 1 ) using BIC [Ueno, 2009] ◮ Add ℓ 1 -penalty to negative log-likelihood of C − 1 [Hou, 2016] ◮ Banding or tapering Cholesky factor of C − 1 [Nino-Ruiz, 2018] Maps Generalize to non-Gaussian Densities ◮ Parametrize monotone nonlinear maps using: � x k � S k ( x 1 , . . . , x k ) = β j ψ j ( x 1 : k − 1 ) + h α ( x 1 : k − 1 , t ) dt 0 j ◮ Add ℓ 1 -penalty to learn sparsity of β , α parameters Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 10 / 16
Theoretical Performance Assumptions : sub-Gaussian density π and basis functions ψ j ( x ) Theorem [BZM] For polynomial maps of degree m with sparsity s , with high probability � � � � � s 2 m log k π ( x k | x 1 : k − 1 ) || � S # k η � E π D KL N Takeaways ◮ Accurate estimation is feasible in high-dimensions with N ≪ k ◮ From factorization property of density, error in conditionals ensures � s 2 m log d D KL ( π || � S # η ) � d N ◮ ℓ 2 regularization requires N = O ( k ) samples for each component Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 11 / 16
Numerical Results ◮ Map components: S k ( x ) = � j β j ψ j ( x 1 : k − 1 ) + α k x k ◮ Solve ℓ 1 -penalized problem to estimate map coefficients ◮ Compare to oracle (known sparsity) and no regularization Total-order degree 2 Hermite basis ψ j with random coefficients: Error with increasing N Error with increasing d Accuracy extends to maps with nonlinear diagonal functions in practice Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 12 / 16
Transport Maps for Posterior Inference Linear Gaussian Problem ◮ Prior : x ∼ N ( µ, Σ pr ) with exponential covariance ◮ Likelihood : Local observations y = Hx + ǫ with ǫ ∼ N ( 0 , Γ) Takeaway ◮ Learning sparse prior-to-posterior map matches oracle scaling Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 13 / 16
Two Approaches for Posterior Sampling x | y ∗ ∼ � S x ) − 1 ◦ � x | y ∗ ∼ ( � T # π y , x for � T = ( � S x ) # η S x Takeaway ◮ Propagating forecast through composed maps has lower error Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 14 / 16
Performance of Stochastic Maps Lorenz-96 Model ◮ d = 40 with F = 8, ∆ t obs = 0 . 4 and 20 observations Takeaway ◮ Best estimators adapt complexity to extract information from samples Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 15 / 16
Conclusion and Outlook Summary ◮ Learned sparse transport maps for prior-to-posterior transformations ◮ Regularization via map sparsity extends to the nonlinear case ◮ Demonstrated log dependence of sample size on dimension Outlook on Future Work ◮ Exploration of sparse nonlinear transports in filtering applications ◮ Relate approximation errors to RMSE and metrics on distributions Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 16 / 16
Conclusion and Outlook Summary ◮ Learned sparse transport maps for prior-to-posterior transformations ◮ Regularization via map sparsity extends to the nonlinear case ◮ Demonstrated log dependence of sample size on dimension Outlook on Future Work ◮ Exploration of sparse nonlinear transports in filtering applications ◮ Relate approximation errors to RMSE and metrics on distributions Thank You Supported by the Air Force Office of Scientific Research Baptista ( rsb@mit.edu ) Estimating Transformations in Filtering 16 / 16
Recommend
More recommend