MK Optimal Transport and entropic relaxations Soumik Pal - PowerPoint PPT Presentation

MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle Eigenfunctions seminar @ IISc Bangalore, August 30, 2019

Monge-Kantorovich Optimal Transport problem

Gaspard Monge 1781 Figure: by M. Cuturi P , Q - probabilities on X , Y , respectively, say both R d . c ( x , y ) - cost of transport. E.g., c ( x , y ) = � x − y � or 2 � x − y � 2 . c ( x , y ) = 1 Monge problem: minimize among T : R d → R d , T # P = Q , � c ( x , T ( x )) dP .

Leonid Kantorovich 1939 Figure: by M. Cuturi Π( P , Q ) - couplings of ( P , Q ) (joint dist. with given marginals). (Monge-) Kantorovich relaxation: minimize among ν ∈ Π( P , Q ) �� inf c ( x , y ) d ν . ν ∈ Π( P , Q )

Duality cost → price Among all functions φ ( y ) , ψ ( x ) s.t. φ ( y ) − ψ ( x ) ≤ c ( x , y ) , maximize profit �� sup φ ( y ) Q ( dy ) − ψ ( x ) P ( dx ) . φ,ψ (Kantorovich duality) inf cost = sup profit. For the optimal “Kantorovich potentials” φ c ( x ) − ψ c ( y ) = c ( x , y ) , “optimal coupling” ν c - almost surely.

Quadratic cost: Brenier’s theorem How do OT looks like? Very special! 2 � x − y � 2 . Assume P has density ρ 0 . c ( x , y ) = 1 (Y. Brenier) ∃ a convex F s.t. ( X , ∇ F ( X )) , X ∼ ρ 0 solves �� W 2 ( MK − OT ) 2 ( P , Q ) := inf c ( x , y ) d ν . Π( P , Q ) K.- potentials? F ∗ ( y ) - Legendre convex dual of F . φ c ( x ) = 1 − ψ c ( y ) = 1 2 � x � 2 − F ( x ) , 2 � y � 2 − F ∗ ( y ) . 2 � x − y � 2 , for y = ∇ F ( x ) , i.e., a.s. ν c . φ c ( x ) − ψ c ( y ) = 1

A generalized notion of convexity (Gangbo-McCann) Figure: by C. Villani Convex functions lie above their tangents. c -convex function ψ ( x ) lie above the cost curve c ( · , y ) , y ∈ ∂ c ψ ( x ) . optimal Kantorovich potentials are c -concave. ψ c ( x ) = sup y [ φ c ( y ) − c ( x , y )] , φ c ( y ) − ψ c ( x ) = c ( x , y ) , y ∈ ∂ c ψ ( x ) .

Convex cost: Gangbo - McCann ’96 c ( x , y ) = g ( x − y ) , g strictly convex + P has density ρ 0 . ∃ c -concave function ψ c ( x ) for which T ( x ) = x − ( ∇ g ) − 1 ◦ ∇ ψ c ( x ) is s.t. ( X , T ( X )) , X ∼ ρ 0 , ! solves the MK OT problem. T ( x ) ∈ ∂ c ψ c ( x ) . Monge solution is also MK solution. Does not cover g ( z ) = � z � or g ( z ) = 1 { z � = 0 } .

Existence of Monge solution Sufficient conditions (Bernard-Buffoni, Villani, De Philippis) X , Y bounded, open. P , Q have densities. c ( x , y ) ∈ C 2 . y �→ D x c ( x , y ) is injective for each x (Twist condition). x �→ D y c ( x , y ) is injective for each y . See book by Villani Chapter 10. Smoothness of optimal T . Ma-Trudinger-Wang ’05, Loeper ’09 (see Villani, Chap 12).

Transport in one dimension Suppose X = R = Y . for all convex c ( x , y ) = g ( x − y ) the OT map is well-known. Monotone transport AKA inverse c.d.f. transform. T ( x ) = G − 1 ◦ G 0 ( x ) , 1 G 0 , G 1 - c.d.f. of P , Q , resp, continuous. Optimal, unique if g is strict. (Homework)

Entropic Relaxation or Entropic Regularization

OT and statistics Goal: Fit data to model. Classical: MLE. Recent: minimize W 2 2 ( data , model ) . Better estimates, more stable, high dimension, Adversarial Network training. Problem is computation. Discrete MK-OT.

OT and statistics Goal: Fit data to model. Classical: MLE. Recent: minimize W 2 2 ( data , model ) . Better estimates, more stable, high dimension, Adversarial Network training. Problem is computation. Discrete MK-OT. Given two empirical distributions n n � � � � p i = 1 = p i δ x i , q j δ y j , q j , i = 1 j = 1 i j minimize � c , M � := � � j c ( x i , y j ) M ij , among all n × n matrices i M ≥ 0 with row sum p and col sum q .

Entropic relaxation, Cuturi ’13 Linear programing M . Simplex, interior point methods give complexity O ( n 3 log n ) . Pretty bad.

Entropic relaxation, Cuturi ’13 Linear programing M . Simplex, interior point methods give complexity O ( n 3 log n ) . Pretty bad. Define � Ent ( M ) = M ij log M ij , 0 log 0 = 0 . i , j

Entropic relaxation, Cuturi ’13 Linear programing M . Simplex, interior point methods give complexity O ( n 3 log n ) . Pretty bad. Define � Ent ( M ) = M ij log M ij , 0 log 0 = 0 . i , j For h > 0, minimize [ � c , M � + h Ent ( M )] . Penalizes degenerate solutions (sparse M ). Optimal h ↓ 0. n 2 log n � � Computational complexity ≈ O . How?

Entropic relaxation: solution For h > 0, minimize [ � c , M � + h Ent ( M )] . Solution (Lagrange multipliers + calculus): ∃ u , v ∈ R n � − 1 � M c = Diag ( u ) exp Diag ( v ) , i.e. , hc � − 1 � M c ( i , j ) = u i exp hc ( x i , y j ) v j , 1 ≤ i , j ≤ n . Remember this form. Will get back in continuum.

Sinkhorn algorithm AKA IPFP M c can be solved by Iterative Proportional Fitting Procedure. − 1 � � Start with M 0 = exp h c . Inductively ... Rescale rows of M k to get M k + 1 with row sum p . Rescale columns of M k + 1 to get M k + 2 with col sum q . Limit = M c . Called Sinkhorn iterations in Linear Algebra.

Entropic relaxation in continuum Recall X , Y ⊆ R d . Cost c ( x , y ) . P , Q have densities ρ 0 , ρ 1 . For density ν ∈ Π( ρ 0 , ρ 1 ) , � Ent ( ν ) = ν ( x , y ) log ν ( x , y ) dxdy . Entropic relaxation: h > 0, �� c ( x , y ) ν ( x , y ) dxdy + h Ent ( ν ) , ν ∈ Π( ρ 0 , ρ 1 ) minimize .

Entropic relaxation: continuum solution (Hobby - Pyke ’65, Rüschendorff-Thomsen ’93) Optimal solution � � a ( x ) + b ( y ) − 1 ν c ( x , y ) = exp hc ( x , y ) � − 1 � = u ( x ) exp hc ( x , y ) v ( y ) . Just like the discrete case. Can be computed by IPFP. Unfortunately, very slow convergence.

Entropic duality � Recall duality for MK-OT: inf Π( ρ 0 ,ρ 1 ) c ( x , y ) ν ( x , y ) dxdy �� = sup φ ( y ) ρ 1 ( y ) dy − ψ ( x ) ρ 0 ( x ) dx . φ ( y ) − ψ ( x ) ≤ c ( x , y ) Duality for entropic relaxation: Solve �� e φ ( y ) − 1 h c ( x , y ) − ψ ( x ) sup φ ( y ) ρ 1 ( y ) dy − ψ ( x ) ρ 0 ( x ) dx − h . Optimal solutions: ψ ( y ) = b ( y ) , φ ( x ) = − a ( x ) . a , b are Schrödinger potentials.

Schrödinger bridges, Large Deviations

Schrödinger’s problem: Lazy gas experiment Imagine N ≈ ∞ independent gas molecules in a cold chamber. � N Initial configuration of particles L 0 = 1 i = 1 δ x i ≈ P . N Each particle independent Brownian motion with σ 2 ≈ 0. � N Condition of the terminal configuration L 1 = 1 j = 1 δ y j ≈ Q . N (Schrödinger ’32) What is the probability of the above event? What is the most likely path followed by an individual gas molecule?

Föllmer’s reformulation ’88 Relative Entropy (RE) of µ w.r.t. ν � d µ � � H ( µ | ν ) = log d µ. d ν R - Law of σ 2 BM on C [ 0 , 1 ] , initial distribution P . Among all probability µ on C [ 0 , 1 ] s.t. X 0 ∼ P , X 1 ∼ Q , minimize H ( µ | R ) . Solution is Schrödinger bridge between P and Q . Take σ 2 ↓ 0.

Föllmer’s disintegration Brownian transition 1 � − 1 � 2 σ 2 � y − x � 2 p σ ( x , y ) = √ 2 π ) d exp . ( (Föllmer) Let R 01 be the law of ( X 0 , X 1 ) . Find ν ∈ Π( P , Q ) to minimize H ( ν | R 01 ) . Generate ( X 0 , X 1 ) from the minimizer. Schrödinger bridge is σ 2 Brownian bridge given X 0 = x 0 , X 1 = x 1 .

Entropic relxation and Schrödinger bridge Minimize H ( ν | R 01 ) is the same problem as � 1 � � � y − x � 2 d ν + σ 2 Ent ( ν ) minimize . 2 Entropic relaxation h = σ 2 for the quadratic cost. Schrödinger bridge description: solve the entropic relaxation and join by Brownian bridge. What happens when σ 2 ↓ 0?

Large deviation As h = σ 2 → 0 + , the optimal entropic coupling converges to the MK-optimal coupling. Recall Brenier: P ( dx ) = ρ 0 ( x ) dx , Q ( dy ) = ρ 1 ( y ) dy . ∃ F such that y = ∇ F ( x ) gives Monge. σ 2 Brownian bridge converges to a constant velocity straight line joining x and y . Can be made precise by Large Deviation theory. Let ρ t be law at time t of this limit. McCann interpolation between ρ 0 and ρ 1 . Remember this name for later.

( f , g ) transform of Markov processes How to describe the law of Schrödinger bridges? SDE? PDE? Markovian ( f , g ) transform of reversible Wiener measure W : d µ = f ( X 0 ) g ( X 1 ) d W , E W f ( X 0 ) g ( X 1 ) = 1 . Similar to Girsanov / Doob’s h -transform, but on both sides. Markovian diffusion both forward and backward.

Generators for Schrödinger bridges Let µ t be the law of the σ 2 = 1 Schrödinger bridge. Recall Schrödinger potentials: a ( x ) , b ( y ) . Define, heat-flows � e b ( X 1 ) | X t = y � � e a ( X 0 ) | X t = x � b t ( y ) = log W , a t ( x ) = log W . Schrödinger bridge is BM with drift ∇ b t forward in time. Schrödinger bridge is BM with drift ∇ a t backward in time. Most properties are poorly understood.

Dynamics and geometry

McCann interpolation Figure: by M. Cuturi � R d � P 2 - square integrable probabilities 2 � y − x � 2 . Recall: ρ 0 transported to ρ 1 . c ( x , y ) = 1 Square-root optimal cost W 2 ( ρ 0 , ρ 1 ) is a metric. ρ t = Law of ( 1 − t ) X + tT ( X ) , X ∼ ρ 0 , 0 ≤ t ≤ 1.

MK Optimal Transport and entropic relaxations Soumik Pal - PowerPoint PPT Presentation

MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle Eigenfunctions seminar @ IISc Bangalore, August 30, 2019 Monge-Kantorovich Optimal Transport problem Gaspard Monge 1781 Figure: by M. Cuturi P , Q -

On entropic cost optimal transport cost Soumik Pal University of Washington, Seattle

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Statistical aspects of stochastic algorithms for entropic optimal transportation between

Divergence, Gibbs measures, and entropic regularizations of optimal transport Soumik Pal

Entropic Causal Inference Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath and Babak

Maxima and entropic repulsion of Gaussian free field: Going beyond Z d Joe P. Chen Department of

Time energy entropic uncertainty relations: an algebraic approach Christian Bertoni, Yuxiang

Introduction to LP and SDP Hierarchies Madhur Tulsiani Princeton University Convex Relaxations

Relaxations Well Solved Problems Network Flows Marco Chiarandini Department of Mathematics

Dynamics of transport barrier relaxations in tokamak edge plasmas P . Beyer, S. Benkadda, G.

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Bridging the gap between Optimal Transport and MMD with Sinkhorn Divergences Aude Genevay MIT

Joint Local Transport Plan for West of England Bristol Transport Strategy The emerging transport

Roadmap Applicat ion Layer (User level) 16: Applicat ion, Transport , Transport Layer

INVESTMENT Funding streams available across the region; PRIORITIES & Recommended

Decarbonising transport Nick Shaw Deputy Head, Environment Strategy 18 June 2020 Transport is

STATUS UPDATE AGE-FRIENDLY ACTION PLAN ACTIVE TRANSPORTATION PLAN TRANSIT FUTURE PLAN

Municipal Class Environmental Assessments Mayfield Road PIC # 1 Chinguacousy Road to Winston

Community Input Sessions Tuesday, Sept. 29, 6:00-8:00 pm || Friday, Oct. 2, 12:00-1:00 pm On-Ramp

AM++: A Generalized Active Message Framework Andrew Lumsdaine Indiana University Large-Scale

An Abstract Application Layer Interface to Transport Services draft-trammell-taps-interface-00

MK Optimal Transport and entropic relaxations Soumik Pal - PowerPoint PPT Presentation

MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle Eigenfunctions seminar @ IISc Bangalore, August 30, 2019 Monge-Kantorovich Optimal Transport problem Gaspard Monge 1781 Figure: by M. Cuturi P , Q -

On entropic cost optimal transport cost Soumik Pal University of Washington, Seattle

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Statistical aspects of stochastic algorithms for entropic optimal transportation between

Divergence, Gibbs measures, and entropic regularizations of optimal transport Soumik Pal

Entropic Causal Inference Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath and Babak

Maxima and entropic repulsion of Gaussian free field: Going beyond Z d Joe P. Chen Department of

Time energy entropic uncertainty relations: an algebraic approach Christian Bertoni, Yuxiang

Introduction to LP and SDP Hierarchies Madhur Tulsiani Princeton University Convex Relaxations

Relaxations Well Solved Problems Network Flows Marco Chiarandini Department of Mathematics

Dynamics of transport barrier relaxations in tokamak edge plasmas P . Beyer, S. Benkadda, G.

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Bridging the gap between Optimal Transport and MMD with Sinkhorn Divergences Aude Genevay MIT

Joint Local Transport Plan for West of England Bristol Transport Strategy The emerging transport

Roadmap Applicat ion Layer (User level) 16: Applicat ion, Transport , Transport Layer

INVESTMENT Funding streams available across the region; PRIORITIES &amp; Recommended

Decarbonising transport Nick Shaw Deputy Head, Environment Strategy 18 June 2020 Transport is

STATUS UPDATE AGE-FRIENDLY ACTION PLAN ACTIVE TRANSPORTATION PLAN TRANSIT FUTURE PLAN

Municipal Class Environmental Assessments Mayfield Road PIC # 1 Chinguacousy Road to Winston

Community Input Sessions Tuesday, Sept. 29, 6:00-8:00 pm || Friday, Oct. 2, 12:00-1:00 pm On-Ramp

AM++: A Generalized Active Message Framework Andrew Lumsdaine Indiana University Large-Scale

An Abstract Application Layer Interface to Transport Services draft-trammell-taps-interface-00

INVESTMENT Funding streams available across the region; PRIORITIES & Recommended