Transport methods for sampling: low-dimensional structure and - PowerPoint PPT Presentation

Transport methods for sampling: low-dimensional structure and preconditioning Youssef Marzouk joint work with Daniele Bigoni, Matthew Parno, Alessio Spantini, & Olivier Zahm Department of Aeronautics and Astronautics Center for Computational Engineering Statistics and Data Science Center Massachusetts Institute of Technology http://uqgroup.mit.edu Support from AFOSR, DARPA, DOE 12 July 2019 Marzouk et al. MIT 1 / 40

Motivation: Bayesian inference in large-scale models Observations y Parameters x π pos ( x ) := π ( x | y ) ∝ π ( y | x ) π pr ( x ) � �� Bayes’ rule ◮ Characterize the posterior distribution (density π pos ) ◮ This is a challenging task since: ◮ x ∈ R n is typically high-dimensional (e.g., a discretized function) ◮ π pos is non-Gaussian ◮ evaluations of the likelihood (hence π pos ) may be expensive ◮ π pos can be evaluated up to a normalizing constant Marzouk et al. MIT 2 / 40

Computational challenges ◮ Extract information from the posterior (means, covariances, event probabilities, predictions) by evaluating posterior expectations: � E π pos [ h ( x )] = h ( x ) π pos ( x ) dx ◮ Key strategy for making this computationally tractable: ◮ Efficient and structure-exploiting sampling schemes Marzouk et al. MIT 3 / 40

Computational challenges ◮ Extract information from the posterior (means, covariances, event probabilities, predictions) by evaluating posterior expectations: � E π pos [ h ( x )] = h ( x ) π pos ( x ) dx ◮ Key strategy for making this computationally tractable: ◮ Efficient and structure-exploiting sampling schemes ◮ This talk: relate to notions of coupling and transport. . . Marzouk et al. MIT 3 / 40

Deterministic couplings of probability measures T η π Core idea ◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : R n → R n such that T ♯ η = π Marzouk et al. MIT 4 / 40

Deterministic couplings of probability measures S = T − 1 η π Core idea ◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : R n → R n such that T ♯ η = π ◮ Equivalently, find S = T − 1 such that S ♯ π = η Marzouk et al. MIT 4 / 40

Deterministic couplings of probability measures T η π Core idea ◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : R n → R n such that T ♯ η = π ◮ Equivalently, find S = T − 1 such that S ♯ π = η ◮ In principle, enables exact (independent, unweighted) sampling! Marzouk et al. MIT 4 / 40

Deterministic couplings of probability measures T η π Core idea ◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : R n → R n such that T ♯ η = π ◮ Equivalently, find S = T − 1 such that S ♯ π = η ◮ Satisfying these conditions only approximately can still be useful! Marzouk et al. MIT 4 / 40

Choice of transport map A useful building block is the Knothe–Rosenblatt rearrangement:   T 1 ( x 1 )   T 2 ( x 1 , x 2 )   T ( x ) =   . .   . T n ( x 1 , x 2 , . . . , x n ) ◮ Unique triangular and monotone map satisfying T ♯ η = π for absolutely continuous η, π on R n ◮ Jacobian determinant easy to evaluate ◮ Monotonicity is essentially one-dimensional: ∂ x k T k > 0 ◮ “Exposes” marginals, enables conditional sampling. . . Marzouk et al. MIT 5 / 40

Choice of transport map A useful building block is the Knothe–Rosenblatt rearrangement:   T 1 ( x 1 )   T 2 ( x 1 , x 2 )   T ( x ) =   . .   . T n ( x 1 , x 2 , . . . , x n ) ◮ Unique triangular and monotone map satisfying T ♯ η = π for absolutely continuous η, π on R n ◮ Jacobian determinant easy to evaluate ◮ Monotonicity is essentially one-dimensional: ∂ x k T k > 0 ◮ “Exposes” marginals, enables conditional sampling. . . ◮ Numerical approximations can employ a monotone parameterization guaranteeing ∂ x k T k > 0. For example: � x k T k ( x 1 , . . . , x k ) = a k ( x 1 , . . . , x k − 1 )+ exp ( b k ( x 1 , . . . , x k − 1 , w )) dw 0 Marzouk et al. MIT 5 / 40

How to construct triangular maps? Construction #1: “maps from densities,” i.e., variational characterization of the direct map T [Moselhy & M 2012] Marzouk et al. MIT 6 / 40

How to construct triangular maps? Construction #1: “maps from densities,” i.e., variational characterization of the direct map T [Moselhy & M 2012] D KL ( η || T − 1 D KL ( T ♯ η || π ) = min min π ) ♯ T ∈T h T ∈T h △ △ ◮ π is the “target” density on R n ; η is, e.g., N ( 0 , I n ) ◮ T h △ is a set of monotone lower triangular maps ◮ T h →∞ contains the Knothe–Rosenblatt rearrangement △ ◮ Expectation is with respect to the reference measure η ◮ Compute via, e.g., Monte Carlo, sparse quadrature ◮ Use unnormalized evaluations of π and its gradients ◮ No MCMC or importance sampling ◮ In general non-convex, unless π is log-concave Marzouk et al. MIT 6 / 40

Illustrative example 18 15 � log ∂ x k T k ] 12 min T E η [ − log π ◦ T − k 9 ◮ Parameterized map T ∈ T h △ ⊂ T △ ◮ Optimize over coefficients of 6 parameterization ◮ Use gradient-based optimization 3 ◮ The posterior is in the tail of the reference 0 3 3 0 3 Marzouk et al. MIT 6 / 40

Useful features ◮ Move samples; don’t just reweigh them ◮ Independent and cheap samples: x i ∼ η ⇒ T ( x i ) ◮ Clear convergence criterion, even with unnormalized target density: � � D KL ( T ♯ η || π ) ≈ 1 η 2 V ar η log T − 1 ¯ π ♯ Marzouk et al. MIT 7 / 40

Useful features ◮ Move samples; don’t just reweigh them ◮ Independent and cheap samples: x i ∼ η ⇒ T ( x i ) ◮ Clear convergence criterion, even with unnormalized target density: � � D KL ( T ♯ η || π ) ≈ 1 η 2 V ar η log T − 1 π ¯ ♯ ◮ Can either accept bias or reduce it by: ◮ Increasing the complexity of the map T ∈ T h △ ◮ Sampling the pullback T − 1 π using MCMC or importance sampling ♯ (more on this later) Marzouk et al. MIT 7 / 40

Useful features ◮ Move samples; don’t just reweigh them ◮ Independent and cheap samples: x i ∼ η ⇒ T ( x i ) ◮ Clear convergence criterion, even with unnormalized target density: � � D KL ( T ♯ η || π ) ≈ 1 η 2 V ar η log T − 1 ¯ π ♯ ◮ Can either accept bias or reduce it by: ◮ Increasing the complexity of the map T ∈ T h △ ◮ Sampling the pullback T − 1 π using MCMC or importance sampling ♯ (more on this later) ◮ Related transport constructions for inference and sampling: Stein variational gradient descent [Liu & Wang 2016, DeTommaso 2018], normalizing flows [Rezende & Mohamed 2015], SOS polynomial flow [Jaini et al. 2019], Gibbs flow [Heng et al. 2015], particle flow filter [Reich 2011], implicit sampling [Chorin et al. 2009–2015], etc. Marzouk et al. MIT 7 / 40

How to construct triangular maps? Construction #2: “maps from samples” D KL ( π || S − 1 min D KL ( S ♯ π || η ) = min η ) ♯ S ∈S h S ∈S h △ △ ◮ Suppose we have Monte Carlo samples { x i } M i = 1 ∼ π ◮ For standard Gaussian η , this problem is convex and separable ◮ This is density estimation via transport! (cf. Tabak & Turner 2013) Marzouk et al. MIT 8 / 40

How to construct triangular maps? Construction #2: “maps from samples” D KL ( π || S − 1 min D KL ( S ♯ π || η ) = min η ) ♯ S ∈S h S ∈S h △ △ ◮ Suppose we have Monte Carlo samples { x i } M i = 1 ∼ π ◮ For standard Gaussian η , this problem is convex and separable ◮ This is density estimation via transport! (cf. Tabak & Turner 2013) ◮ Equivalent to maximum likelihood estimation of S M � 1 � log S − 1 S ∈ arg max η ( x i ) , η = N ( 0 , I n ) , ♯ M S ∈S h � �� △ i = 1 pullback S k of � ◮ Each component � S can be computed separately , via smooth convex optimization � 1 � M � 1 S k ∈ arg 2 S k ( x i ) 2 − log ∂ k S k ( x i ) � min M S k ∈S h △ , k i = 1 Marzouk et al. MIT 8 / 40

Low-dimensional structure of transport maps Underlying challenge: maps in high dimensions ◮ Major bottleneck: representation of the map, e.g., cardinality of the map basis ◮ How to make the construction/representation of high-dimensional transports tractable? Marzouk et al. MIT 9 / 40

Transport methods for sampling: low-dimensional structure and - PowerPoint PPT Presentation

Transport methods for sampling: low-dimensional structure and preconditioning Youssef Marzouk joint work with Daniele Bigoni, Matthew Parno, Alessio Spantini, & Olivier Zahm Department of Aeronautics and Astronautics Center for

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS &

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona State University Dec. 18th,

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona State University Dec. 18th,

Object Detection Deep ConvNets for Recognition for... Images (global) Objects (local) Video

The K 3 form factor from four-flavor lattice QCD and | V us | Aida X. El-Khadra (University

Markov chain Monte Carlo methods Youssef Marzouk Department of Aeronautics and Astronautics

Cooperation and Competition among Business Schools 2008 International Business School Shanghai

ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications Kinjal Basu Amol Ghoting

Sec 1 Registration 2018 22 DEC 2017 Welcome A warm welcome to all parents! School Leaders

Learning to Optimally Segment Point Clouds Peiyun Hu, David Held, Deva Ramanan Carnegie Mellon

Managing Palmer Amaranth in Peanut Eric P. Prostko Extension Weed Specialist Department of Crop

Transport methods for sampling: low-dimensional structure and - PowerPoint PPT Presentation

Transport methods for sampling: low-dimensional structure and preconditioning Youssef Marzouk joint work with Daniele Bigoni, Matthew Parno, Alessio Spantini, & Olivier Zahm Department of Aeronautics and Astronautics Center for

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS &amp;

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona State University Dec. 18th,

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona State University Dec. 18th,

Object Detection Deep ConvNets for Recognition for... Images (global) Objects (local) Video

The K 3 form factor from four-flavor lattice QCD and | V us | Aida X. El-Khadra (University

Markov chain Monte Carlo methods Youssef Marzouk Department of Aeronautics and Astronautics

Cooperation and Competition among Business Schools 2008 International Business School Shanghai

ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications Kinjal Basu Amol Ghoting

Sec 1 Registration 2018 22 DEC 2017 Welcome A warm welcome to all parents! School Leaders

Learning to Optimally Segment Point Clouds Peiyun Hu, David Held, Deva Ramanan Carnegie Mellon

Managing Palmer Amaranth in Peanut Eric P. Prostko Extension Weed Specialist Department of Crop

Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS &

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling