High-dimensional estimation of nonlinear transformations for - - PowerPoint PPT Presentation

high dimensional estimation of nonlinear transformations
SMART_READER_LITE
LIVE PREVIEW

High-dimensional estimation of nonlinear transformations for - - PowerPoint PPT Presentation

High-dimensional estimation of nonlinear transformations for Bayesian filtering Ricardo Baptista, Daniele Bigoni, Alessio Spantini, Youssef Marzouk Massachusetts Institute of Technology Department of Aeronautics & Astronautics 7th


slide-1
SLIDE 1

High-dimensional estimation of nonlinear transformations for Bayesian filtering

Ricardo Baptista, Daniele Bigoni, Alessio Spantini, Youssef Marzouk

Massachusetts Institute of Technology Department of Aeronautics & Astronautics

7th International Symposium on Data Assimilation Kobe, Japan January 23, 2019

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 1 / 16

slide-2
SLIDE 2

Bayesian Approach to Filtering

Non-Gaussian State-Space Model

◮ Model dynamics - transition kernel: xt ∼ f (·|xt−1) ◮ Observations - likelihood model: yt ∼ g(·|xt) xt xt−1 xt+1 yt yt+1 yt−1 x1 x0 y1

Goal: Characterize filtering distributions πt|t := π(xt|y1, . . . , yt) Challenges of Filtering

◮ Complex nonlinear dynamics (e.g., chaotic system) ◮ Sparse observations in space and time ◮ Limited model evaluations available (e.g., small ensemble sizes) ◮ High-dimensional states, xt ∈ Rd for d ∼ O(106)

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 2 / 16

slide-3
SLIDE 3

Stochastic Maps Algorithm [Spantini et al., 2019]

Generalization of EnKF for Inference Step Find a nonlinear map T that couples forecast πt|t−1 and analysis πt|t

Main Idea

◮ Learn T given N ≪ d forecast samples x(i) t

∼ πt|t−1

◮ Generate analysis samples T(x(i) t ) ∼ πt|t for i = 1, . . . , N

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 3 / 16

slide-4
SLIDE 4

Building Block of Stochastic Maps

Transport Maps [Moselhy et al., 2012]

◮ Deterministic coupling between densities π, η on Rd such that

π(x) = S#η(x) := η ◦ S(x)|det(∇S(x))|

◮ Coupling exists and is unique for triangular and monotone maps

S(x) =      S1(x1) S2(x1, x2) . . . Sd(x1, x2, . . . , xd)     

◮ For Gaussian η, find S by solving decoupled convex problems

min

S DKL(π||S#η)

⇔ min

Sk

Eπ 1 2Sk(x)2 − log |∂kSk(x)|

  • ∀k

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 4 / 16

slide-5
SLIDE 5

Triangular Maps Enable Conditional Sampling

◮ Each component Sk characterizes one marginal conditional of π

π(x) = π(x1)π(x2|x1) · · · π(xd|x1, . . . , xd−1)

◮ For π(y, x) and η(z1, z2), consider the triangular map

S(y, x) = Sy(y) Sx(y, x)

  • ◮ The map x → Sx(y∗, x) pushes forward π(x|y∗) to η(z2)

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 5 / 16

slide-6
SLIDE 6

Stochastic Maps Algorithm

Forecast Step

1

Apply forward model to generate forecast ensemble x(i)

t

∼ f (·|x(i)

t−1)

Analysis Step

1

Perturbed observations: Sample y(i)

t

∼ g(·|x(i)

t ) using forecast

2

Estimate lower-triangular map S that couples πyt,xt and N(0, I)

  • S(y, x) =
  • Sy(y)
  • Sx(y, x)
  • 3

Compose maps T(y, x) = Sx(y∗, ·)−1 ◦ Sx(y, x)

4

Generate analysis ensemble (xa

t)(i) =

T(y(i)

t , x(i) t ) for i = 1, . . . , N

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 6 / 16

slide-7
SLIDE 7

Performance of Stochastic Maps

Lorenz-96 Model

◮ d = 40 with F = 8, ∆tobs = 0.4 and 20 observations ◮ Structure for S is based on tuning localization radius

Challenge: Build adaptive estimators for S using N ≪ d samples

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 7 / 16

slide-8
SLIDE 8

Structure Inherited by Maps Theorem: Sparsity of Transport Maps [Spantini et al., 2018]

Conditional independence of π defines functional dependence of Sk(x) Lorenz-96 Model

◮ Estimate forecast covariance Ct|t−1 over 1000 assimilation cycles

1 2 3 4 5 6 7 8 9 10

Average C −1

t|t−1

Sparsity of C −1

t|t−1

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 8 / 16

slide-9
SLIDE 9

Learning Transport Maps with Sparse Structure Key Idea

Learn rather than impose sparsity in map’s parameters Linear Transport Maps

◮ Linear components: S(x) = Lx, with lower-triangular L ◮ Approximating density: π = S#η = N(0, C) where C −1 = LLT

Connection to Linear Regression

◮ Normalize diagonal: Sk(x) = Lkk(β1x1 + · · · + βk−1xk−1 + xk) ◮ Rewrite optimization problem for linear map parameters:

min

Lkk>0,β Eπ

1

2L2 kk(x1:k−1 β +xk)2 − log |Lkk|

  • ◮ Using samples from π:

ˆ β ∈ arg min

β 1 2N x1:k−1 β +xk2 2,

  • Lkk =
  • 1

N x1:k−1ˆ

β + xk2

2

−1/2

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 9 / 16

slide-10
SLIDE 10

Learning Transport Maps with Sparse Structure

Proposed Approach

◮ Add ℓ1-penalty for sparse linear regression (LASSO):

ˆ β ∈ arg min

β 1 2N x1:k−1 β +xk2 2 + λn β 1

Existing Work in Filtering

◮ Learn bandwidth of inverse covariance (C −1) using BIC [Ueno, 2009] ◮ Add ℓ1-penalty to negative log-likelihood of C −1 [Hou, 2016] ◮ Banding or tapering Cholesky factor of C −1 [Nino-Ruiz, 2018]

Maps Generalize to non-Gaussian Densities

◮ Parametrize monotone nonlinear maps using:

Sk(x1, . . . , xk) =

  • j

βjψj(x1:k−1) + xk hα(x1:k−1, t)dt

◮ Add ℓ1-penalty to learn sparsity of β, α parameters

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 10 / 16

slide-11
SLIDE 11

Theoretical Performance

Assumptions: sub-Gaussian density π and basis functions ψj(x)

Theorem [BZM]

For polynomial maps of degree m with sparsity s, with high probability Eπ

  • DKL
  • π(xk|x1:k−1) ||

S#

k η

  • s2m log k

N Takeaways

◮ Accurate estimation is feasible in high-dimensions with N ≪ k ◮ From factorization property of density, error in conditionals ensures

DKL(π|| S#η) d

  • s2m log d

N

◮ ℓ2 regularization requires N = O(k) samples for each component

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 11 / 16

slide-12
SLIDE 12

Numerical Results

◮ Map components: Sk(x) = j βjψj(x1:k−1) + αkxk ◮ Solve ℓ1-penalized problem to estimate map coefficients ◮ Compare to oracle (known sparsity) and no regularization

Total-order degree 2 Hermite basis ψj with random coefficients:

Error with increasing N Error with increasing d

Accuracy extends to maps with nonlinear diagonal functions in practice

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 12 / 16

slide-13
SLIDE 13

Transport Maps for Posterior Inference

Linear Gaussian Problem

◮ Prior: x ∼ N(µ, Σpr) with exponential covariance ◮ Likelihood: Local observations y = Hx + ǫ with ǫ ∼ N(0, Γ)

Takeaway

◮ Learning sparse prior-to-posterior map matches oracle scaling

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 13 / 16

slide-14
SLIDE 14

Two Approaches for Posterior Sampling

x|y∗ ∼ ( Sx)#η x|y∗ ∼ T#πy,x for T = ( Sx)−1 ◦ Sx

Takeaway

◮ Propagating forecast through composed maps has lower error

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 14 / 16

slide-15
SLIDE 15

Performance of Stochastic Maps

Lorenz-96 Model

◮ d = 40 with F = 8, ∆tobs = 0.4 and 20 observations

Takeaway

◮ Best estimators adapt complexity to extract information from samples

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 15 / 16

slide-16
SLIDE 16

Conclusion and Outlook Summary

◮ Learned sparse transport maps for prior-to-posterior transformations ◮ Regularization via map sparsity extends to the nonlinear case ◮ Demonstrated log dependence of sample size on dimension

Outlook on Future Work

◮ Exploration of sparse nonlinear transports in filtering applications ◮ Relate approximation errors to RMSE and metrics on distributions

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 16 / 16

slide-17
SLIDE 17

Conclusion and Outlook Summary

◮ Learned sparse transport maps for prior-to-posterior transformations ◮ Regularization via map sparsity extends to the nonlinear case ◮ Demonstrated log dependence of sample size on dimension

Outlook on Future Work

◮ Exploration of sparse nonlinear transports in filtering applications ◮ Relate approximation errors to RMSE and metrics on distributions

Thank You

Supported by the Air Force Office of Scientific Research

Baptista (rsb@mit.edu) Estimating Transformations in Filtering 16 / 16