transport methods for sampling low dimensional structure
play

Transport methods for sampling: low-dimensional structure and - PowerPoint PPT Presentation

Transport methods for sampling: low-dimensional structure and preconditioning Youssef Marzouk joint work with Daniele Bigoni, Matthew Parno, Alessio Spantini, & Olivier Zahm Department of Aeronautics and Astronautics Center for


  1. Transport methods for sampling: low-dimensional structure and preconditioning Youssef Marzouk joint work with Daniele Bigoni, Matthew Parno, Alessio Spantini, & Olivier Zahm Department of Aeronautics and Astronautics Center for Computational Engineering Statistics and Data Science Center Massachusetts Institute of Technology http://uqgroup.mit.edu Support from AFOSR, DARPA, DOE 12 July 2019 Marzouk et al. MIT 1 / 40

  2. Motivation: Bayesian inference in large-scale models Observations y Parameters x π pos ( x ) := π ( x | y ) ∝ π ( y | x ) π pr ( x ) � �� � Bayes’ rule ◮ Characterize the posterior distribution (density π pos ) ◮ This is a challenging task since: ◮ x ∈ R n is typically high-dimensional (e.g., a discretized function) ◮ π pos is non-Gaussian ◮ evaluations of the likelihood (hence π pos ) may be expensive ◮ π pos can be evaluated up to a normalizing constant Marzouk et al. MIT 2 / 40

  3. Computational challenges ◮ Extract information from the posterior (means, covariances, event probabilities, predictions) by evaluating posterior expectations: � E π pos [ h ( x )] = h ( x ) π pos ( x ) dx ◮ Key strategy for making this computationally tractable: ◮ Efficient and structure-exploiting sampling schemes Marzouk et al. MIT 3 / 40

  4. Computational challenges ◮ Extract information from the posterior (means, covariances, event probabilities, predictions) by evaluating posterior expectations: � E π pos [ h ( x )] = h ( x ) π pos ( x ) dx ◮ Key strategy for making this computationally tractable: ◮ Efficient and structure-exploiting sampling schemes ◮ This talk: relate to notions of coupling and transport. . . Marzouk et al. MIT 3 / 40

  5. Deterministic couplings of probability measures T η π Core idea ◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : R n → R n such that T ♯ η = π Marzouk et al. MIT 4 / 40

  6. Deterministic couplings of probability measures S = T − 1 η π Core idea ◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : R n → R n such that T ♯ η = π ◮ Equivalently, find S = T − 1 such that S ♯ π = η Marzouk et al. MIT 4 / 40

  7. Deterministic couplings of probability measures T η π Core idea ◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : R n → R n such that T ♯ η = π ◮ Equivalently, find S = T − 1 such that S ♯ π = η ◮ In principle, enables exact (independent, unweighted) sampling! Marzouk et al. MIT 4 / 40

  8. Deterministic couplings of probability measures T η π Core idea ◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : R n → R n such that T ♯ η = π ◮ Equivalently, find S = T − 1 such that S ♯ π = η ◮ Satisfying these conditions only approximately can still be useful! Marzouk et al. MIT 4 / 40

  9. Choice of transport map A useful building block is the Knothe–Rosenblatt rearrangement:   T 1 ( x 1 )   T 2 ( x 1 , x 2 )   T ( x ) =   . .   . T n ( x 1 , x 2 , . . . , x n ) ◮ Unique triangular and monotone map satisfying T ♯ η = π for absolutely continuous η, π on R n ◮ Jacobian determinant easy to evaluate ◮ Monotonicity is essentially one-dimensional: ∂ x k T k > 0 ◮ “Exposes” marginals, enables conditional sampling. . . Marzouk et al. MIT 5 / 40

  10. Choice of transport map A useful building block is the Knothe–Rosenblatt rearrangement:   T 1 ( x 1 )   T 2 ( x 1 , x 2 )   T ( x ) =   . .   . T n ( x 1 , x 2 , . . . , x n ) ◮ Unique triangular and monotone map satisfying T ♯ η = π for absolutely continuous η, π on R n ◮ Jacobian determinant easy to evaluate ◮ Monotonicity is essentially one-dimensional: ∂ x k T k > 0 ◮ “Exposes” marginals, enables conditional sampling. . . ◮ Numerical approximations can employ a monotone parameterization guaranteeing ∂ x k T k > 0. For example: � x k T k ( x 1 , . . . , x k ) = a k ( x 1 , . . . , x k − 1 )+ exp ( b k ( x 1 , . . . , x k − 1 , w )) dw 0 Marzouk et al. MIT 5 / 40

  11. How to construct triangular maps? Construction #1: “maps from densities,” i.e., variational characterization of the direct map T [Moselhy & M 2012] Marzouk et al. MIT 6 / 40

  12. How to construct triangular maps? Construction #1: “maps from densities,” i.e., variational characterization of the direct map T [Moselhy & M 2012] D KL ( η || T − 1 D KL ( T ♯ η || π ) = min min π ) ♯ T ∈T h T ∈T h △ △ ◮ π is the “target” density on R n ; η is, e.g., N ( 0 , I n ) ◮ T h △ is a set of monotone lower triangular maps ◮ T h →∞ contains the Knothe–Rosenblatt rearrangement △ ◮ Expectation is with respect to the reference measure η ◮ Compute via, e.g., Monte Carlo, sparse quadrature ◮ Use unnormalized evaluations of π and its gradients ◮ No MCMC or importance sampling ◮ In general non-convex, unless π is log-concave Marzouk et al. MIT 6 / 40

  13. Illustrative example 18 15 � log ∂ x k T k ] 12 min T E η [ − log π ◦ T − k 9 ◮ Parameterized map T ∈ T h △ ⊂ T △ ◮ Optimize over coefficients of 6 parameterization ◮ Use gradient-based optimization 3 ◮ The posterior is in the tail of the reference 0 3 3 0 3 Marzouk et al. MIT 6 / 40

  14. Illustrative example 18 15 � log ∂ x k T k ] 12 min T E η [ − log π ◦ T − k 9 ◮ Parameterized map T ∈ T h △ ⊂ T △ ◮ Optimize over coefficients of 6 parameterization ◮ Use gradient-based optimization 3 ◮ The posterior is in the tail of the reference 0 3 3 0 3 Marzouk et al. MIT 6 / 40

  15. Illustrative example 18 15 � log ∂ x k T k ] 12 min T E η [ − log π ◦ T − k 9 ◮ Parameterized map T ∈ T h △ ⊂ T △ ◮ Optimize over coefficients of 6 parameterization ◮ Use gradient-based optimization 3 ◮ The posterior is in the tail of the reference 0 3 3 0 3 Marzouk et al. MIT 6 / 40

  16. Illustrative example 18 15 � log ∂ x k T k ] 12 min T E η [ − log π ◦ T − k 9 ◮ Parameterized map T ∈ T h △ ⊂ T △ ◮ Optimize over coefficients of 6 parameterization ◮ Use gradient-based optimization 3 ◮ The posterior is in the tail of the reference 0 3 3 0 3 Marzouk et al. MIT 6 / 40

  17. Useful features ◮ Move samples; don’t just reweigh them ◮ Independent and cheap samples: x i ∼ η ⇒ T ( x i ) ◮ Clear convergence criterion, even with unnormalized target density: � � D KL ( T ♯ η || π ) ≈ 1 η 2 V ar η log T − 1 ¯ π ♯ Marzouk et al. MIT 7 / 40

  18. Useful features ◮ Move samples; don’t just reweigh them ◮ Independent and cheap samples: x i ∼ η ⇒ T ( x i ) ◮ Clear convergence criterion, even with unnormalized target density: � � D KL ( T ♯ η || π ) ≈ 1 η 2 V ar η log T − 1 π ¯ ♯ ◮ Can either accept bias or reduce it by: ◮ Increasing the complexity of the map T ∈ T h △ ◮ Sampling the pullback T − 1 π using MCMC or importance sampling ♯ (more on this later) Marzouk et al. MIT 7 / 40

  19. Useful features ◮ Move samples; don’t just reweigh them ◮ Independent and cheap samples: x i ∼ η ⇒ T ( x i ) ◮ Clear convergence criterion, even with unnormalized target density: � � D KL ( T ♯ η || π ) ≈ 1 η 2 V ar η log T − 1 ¯ π ♯ ◮ Can either accept bias or reduce it by: ◮ Increasing the complexity of the map T ∈ T h △ ◮ Sampling the pullback T − 1 π using MCMC or importance sampling ♯ (more on this later) ◮ Related transport constructions for inference and sampling: Stein variational gradient descent [Liu & Wang 2016, DeTommaso 2018], normalizing flows [Rezende & Mohamed 2015], SOS polynomial flow [Jaini et al. 2019], Gibbs flow [Heng et al. 2015], particle flow filter [Reich 2011], implicit sampling [Chorin et al. 2009–2015], etc. Marzouk et al. MIT 7 / 40

  20. How to construct triangular maps? Construction #2: “maps from samples” D KL ( π || S − 1 min D KL ( S ♯ π || η ) = min η ) ♯ S ∈S h S ∈S h △ △ ◮ Suppose we have Monte Carlo samples { x i } M i = 1 ∼ π ◮ For standard Gaussian η , this problem is convex and separable ◮ This is density estimation via transport! (cf. Tabak & Turner 2013) Marzouk et al. MIT 8 / 40

  21. How to construct triangular maps? Construction #2: “maps from samples” D KL ( π || S − 1 min D KL ( S ♯ π || η ) = min η ) ♯ S ∈S h S ∈S h △ △ ◮ Suppose we have Monte Carlo samples { x i } M i = 1 ∼ π ◮ For standard Gaussian η , this problem is convex and separable ◮ This is density estimation via transport! (cf. Tabak & Turner 2013) ◮ Equivalent to maximum likelihood estimation of S M � 1 � log S − 1 S ∈ arg max η ( x i ) , η = N ( 0 , I n ) , ♯ M S ∈S h � �� � △ i = 1 pullback S k of � ◮ Each component � S can be computed separately , via smooth convex optimization � 1 � M � 1 S k ∈ arg 2 S k ( x i ) 2 − log ∂ k S k ( x i ) � min M S k ∈S h △ , k i = 1 Marzouk et al. MIT 8 / 40

  22. Low-dimensional structure of transport maps Underlying challenge: maps in high dimensions ◮ Major bottleneck: representation of the map, e.g., cardinality of the map basis ◮ How to make the construction/representation of high-dimensional transports tractable? Marzouk et al. MIT 9 / 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend