mk optimal transport and entropic relaxations
play

MK Optimal Transport and entropic relaxations Soumik Pal - PowerPoint PPT Presentation

MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle Eigenfunctions seminar @ IISc Bangalore, August 30, 2019 Monge-Kantorovich Optimal Transport problem Gaspard Monge 1781 Figure: by M. Cuturi P , Q -


  1. MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle Eigenfunctions seminar @ IISc Bangalore, August 30, 2019

  2. Monge-Kantorovich Optimal Transport problem

  3. Gaspard Monge 1781 Figure: by M. Cuturi P , Q - probabilities on X , Y , respectively, say both R d . c ( x , y ) - cost of transport. E.g., c ( x , y ) = � x − y � or 2 � x − y � 2 . c ( x , y ) = 1 Monge problem: minimize among T : R d → R d , T # P = Q , � c ( x , T ( x )) dP .

  4. Leonid Kantorovich 1939 Figure: by M. Cuturi Π( P , Q ) - couplings of ( P , Q ) (joint dist. with given marginals). (Monge-) Kantorovich relaxation: minimize among ν ∈ Π( P , Q ) �� � inf c ( x , y ) d ν . ν ∈ Π( P , Q )

  5. Duality cost → price Among all functions φ ( y ) , ψ ( x ) s.t. φ ( y ) − ψ ( x ) ≤ c ( x , y ) , maximize profit �� � � sup φ ( y ) Q ( dy ) − ψ ( x ) P ( dx ) . φ,ψ (Kantorovich duality) inf cost = sup profit. For the optimal “Kantorovich potentials” φ c ( x ) − ψ c ( y ) = c ( x , y ) , “optimal coupling” ν c - almost surely.

  6. Quadratic cost: Brenier’s theorem How do OT looks like? Very special! 2 � x − y � 2 . Assume P has density ρ 0 . c ( x , y ) = 1 (Y. Brenier) ∃ a convex F s.t. ( X , ∇ F ( X )) , X ∼ ρ 0 solves �� � W 2 ( MK − OT ) 2 ( P , Q ) := inf c ( x , y ) d ν . Π( P , Q ) K.- potentials? F ∗ ( y ) - Legendre convex dual of F . φ c ( x ) = 1 − ψ c ( y ) = 1 2 � x � 2 − F ( x ) , 2 � y � 2 − F ∗ ( y ) . 2 � x − y � 2 , for y = ∇ F ( x ) , i.e., a.s. ν c . φ c ( x ) − ψ c ( y ) = 1

  7. A generalized notion of convexity (Gangbo-McCann) Figure: by C. Villani Convex functions lie above their tangents. c -convex function ψ ( x ) lie above the cost curve c ( · , y ) , y ∈ ∂ c ψ ( x ) . optimal Kantorovich potentials are c -concave. ψ c ( x ) = sup y [ φ c ( y ) − c ( x , y )] , φ c ( y ) − ψ c ( x ) = c ( x , y ) , y ∈ ∂ c ψ ( x ) .

  8. Convex cost: Gangbo - McCann ’96 c ( x , y ) = g ( x − y ) , g strictly convex + P has density ρ 0 . ∃ c -concave function ψ c ( x ) for which T ( x ) = x − ( ∇ g ) − 1 ◦ ∇ ψ c ( x ) is s.t. ( X , T ( X )) , X ∼ ρ 0 , ! solves the MK OT problem. T ( x ) ∈ ∂ c ψ c ( x ) . Monge solution is also MK solution. Does not cover g ( z ) = � z � or g ( z ) = 1 { z � = 0 } .

  9. Existence of Monge solution Sufficient conditions (Bernard-Buffoni, Villani, De Philippis) X , Y bounded, open. P , Q have densities. c ( x , y ) ∈ C 2 . y �→ D x c ( x , y ) is injective for each x (Twist condition). x �→ D y c ( x , y ) is injective for each y . See book by Villani Chapter 10. Smoothness of optimal T . Ma-Trudinger-Wang ’05, Loeper ’09 (see Villani, Chap 12).

  10. Transport in one dimension Suppose X = R = Y . for all convex c ( x , y ) = g ( x − y ) the OT map is well-known. Monotone transport AKA inverse c.d.f. transform. T ( x ) = G − 1 ◦ G 0 ( x ) , 1 G 0 , G 1 - c.d.f. of P , Q , resp, continuous. Optimal, unique if g is strict. (Homework)

  11. Entropic Relaxation or Entropic Regularization

  12. OT and statistics Goal: Fit data to model. Classical: MLE. Recent: minimize W 2 2 ( data , model ) . Better estimates, more stable, high dimension, Adversarial Network training. Problem is computation. Discrete MK-OT.

  13. OT and statistics Goal: Fit data to model. Classical: MLE. Recent: minimize W 2 2 ( data , model ) . Better estimates, more stable, high dimension, Adversarial Network training. Problem is computation. Discrete MK-OT. Given two empirical distributions n n � � � � p i = 1 = p i δ x i , q j δ y j , q j , i = 1 j = 1 i j minimize � c , M � := � � j c ( x i , y j ) M ij , among all n × n matrices i M ≥ 0 with row sum p and col sum q .

  14. Entropic relaxation, Cuturi ’13 Linear programing M . Simplex, interior point methods give complexity O ( n 3 log n ) . Pretty bad.

  15. Entropic relaxation, Cuturi ’13 Linear programing M . Simplex, interior point methods give complexity O ( n 3 log n ) . Pretty bad. Define � Ent ( M ) = M ij log M ij , 0 log 0 = 0 . i , j

  16. Entropic relaxation, Cuturi ’13 Linear programing M . Simplex, interior point methods give complexity O ( n 3 log n ) . Pretty bad. Define � Ent ( M ) = M ij log M ij , 0 log 0 = 0 . i , j For h > 0, minimize [ � c , M � + h Ent ( M )] . Penalizes degenerate solutions (sparse M ). Optimal h ↓ 0. n 2 log n � � Computational complexity ≈ O . How?

  17. Entropic relaxation: solution For h > 0, minimize [ � c , M � + h Ent ( M )] . Solution (Lagrange multipliers + calculus): ∃ u , v ∈ R n � − 1 � M c = Diag ( u ) exp Diag ( v ) , i.e. , hc � − 1 � M c ( i , j ) = u i exp hc ( x i , y j ) v j , 1 ≤ i , j ≤ n . Remember this form. Will get back in continuum.

  18. Sinkhorn algorithm AKA IPFP M c can be solved by Iterative Proportional Fitting Procedure. − 1 � � Start with M 0 = exp h c . Inductively ... Rescale rows of M k to get M k + 1 with row sum p . Rescale columns of M k + 1 to get M k + 2 with col sum q . Limit = M c . Called Sinkhorn iterations in Linear Algebra.

  19. Entropic relaxation in continuum Recall X , Y ⊆ R d . Cost c ( x , y ) . P , Q have densities ρ 0 , ρ 1 . For density ν ∈ Π( ρ 0 , ρ 1 ) , � Ent ( ν ) = ν ( x , y ) log ν ( x , y ) dxdy . Entropic relaxation: h > 0, �� � c ( x , y ) ν ( x , y ) dxdy + h Ent ( ν ) , ν ∈ Π( ρ 0 , ρ 1 ) minimize .

  20. Entropic relaxation: continuum solution (Hobby - Pyke ’65, Rüschendorff-Thomsen ’93) Optimal solution � � a ( x ) + b ( y ) − 1 ν c ( x , y ) = exp hc ( x , y ) � − 1 � = u ( x ) exp hc ( x , y ) v ( y ) . Just like the discrete case. Can be computed by IPFP. Unfortunately, very slow convergence.

  21. Entropic duality � Recall duality for MK-OT: inf Π( ρ 0 ,ρ 1 ) c ( x , y ) ν ( x , y ) dxdy �� � � = sup φ ( y ) ρ 1 ( y ) dy − ψ ( x ) ρ 0 ( x ) dx . φ ( y ) − ψ ( x ) ≤ c ( x , y ) Duality for entropic relaxation: Solve �� � � � e φ ( y ) − 1 h c ( x , y ) − ψ ( x ) sup φ ( y ) ρ 1 ( y ) dy − ψ ( x ) ρ 0 ( x ) dx − h . Optimal solutions: ψ ( y ) = b ( y ) , φ ( x ) = − a ( x ) . a , b are Schrödinger potentials.

  22. Schrödinger bridges, Large Deviations

  23. Schrödinger’s problem: Lazy gas experiment Imagine N ≈ ∞ independent gas molecules in a cold chamber. � N Initial configuration of particles L 0 = 1 i = 1 δ x i ≈ P . N Each particle independent Brownian motion with σ 2 ≈ 0. � N Condition of the terminal configuration L 1 = 1 j = 1 δ y j ≈ Q . N (Schrödinger ’32) What is the probability of the above event? What is the most likely path followed by an individual gas molecule?

  24. Föllmer’s reformulation ’88 Relative Entropy (RE) of µ w.r.t. ν � d µ � � H ( µ | ν ) = log d µ. d ν R - Law of σ 2 BM on C [ 0 , 1 ] , initial distribution P . Among all probability µ on C [ 0 , 1 ] s.t. X 0 ∼ P , X 1 ∼ Q , minimize H ( µ | R ) . Solution is Schrödinger bridge between P and Q . Take σ 2 ↓ 0.

  25. Föllmer’s disintegration Brownian transition 1 � − 1 � 2 σ 2 � y − x � 2 p σ ( x , y ) = √ 2 π ) d exp . ( (Föllmer) Let R 01 be the law of ( X 0 , X 1 ) . Find ν ∈ Π( P , Q ) to minimize H ( ν | R 01 ) . Generate ( X 0 , X 1 ) from the minimizer. Schrödinger bridge is σ 2 Brownian bridge given X 0 = x 0 , X 1 = x 1 .

  26. Entropic relxation and Schrödinger bridge Minimize H ( ν | R 01 ) is the same problem as � 1 � � � y − x � 2 d ν + σ 2 Ent ( ν ) minimize . 2 Entropic relaxation h = σ 2 for the quadratic cost. Schrödinger bridge description: solve the entropic relaxation and join by Brownian bridge. What happens when σ 2 ↓ 0?

  27. Large deviation As h = σ 2 → 0 + , the optimal entropic coupling converges to the MK-optimal coupling. Recall Brenier: P ( dx ) = ρ 0 ( x ) dx , Q ( dy ) = ρ 1 ( y ) dy . ∃ F such that y = ∇ F ( x ) gives Monge. σ 2 Brownian bridge converges to a constant velocity straight line joining x and y . Can be made precise by Large Deviation theory. Let ρ t be law at time t of this limit. McCann interpolation between ρ 0 and ρ 1 . Remember this name for later.

  28. ( f , g ) transform of Markov processes How to describe the law of Schrödinger bridges? SDE? PDE? Markovian ( f , g ) transform of reversible Wiener measure W : d µ = f ( X 0 ) g ( X 1 ) d W , E W f ( X 0 ) g ( X 1 ) = 1 . Similar to Girsanov / Doob’s h -transform, but on both sides. Markovian diffusion both forward and backward.

  29. Generators for Schrödinger bridges Let µ t be the law of the σ 2 = 1 Schrödinger bridge. Recall Schrödinger potentials: a ( x ) , b ( y ) . Define, heat-flows � e b ( X 1 ) | X t = y � � e a ( X 0 ) | X t = x � b t ( y ) = log W , a t ( x ) = log W . Schrödinger bridge is BM with drift ∇ b t forward in time. Schrödinger bridge is BM with drift ∇ a t backward in time. Most properties are poorly understood.

  30. Dynamics and geometry

  31. McCann interpolation Figure: by M. Cuturi � R d � P 2 - square integrable probabilities 2 � y − x � 2 . Recall: ρ 0 transported to ρ 1 . c ( x , y ) = 1 Square-root optimal cost W 2 ( ρ 0 , ρ 1 ) is a metric. ρ t = Law of ( 1 − t ) X + tT ( X ) , X ∼ ρ 0 , 0 ≤ t ≤ 1.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend