on entropic cost optimal transport cost
play

On entropic cost optimal transport cost Soumik Pal University of - PowerPoint PPT Presentation

On entropic cost optimal transport cost Soumik Pal University of Washington, Seattle arxiv:1905.12206 Eigenfunctions seminar @ IISc Bangalore, August 30, 2019 MK OT and entropic relaxation 0 , 1 - probability densities on X = R d =


  1. On entropic cost – optimal transport cost Soumik Pal University of Washington, Seattle arxiv:1905.12206 Eigenfunctions seminar @ IISc Bangalore, August 30, 2019

  2. MK OT and entropic relaxation ρ 0 , ρ 1 - probability densities on X = R d = Y . c ( x , y ) = g ( x − y ) , strictly convex, g ≥ 0, g ( z ) = 0 iff z = 0. Π( ρ 0 , ρ 1 ) - set of couplings. Probabilities on X × Y . Monge-Kantorovich (MK) OT problem: � W g ( ρ 0 , ρ 1 ) := inf ν ∈ Π ν ( g ( x − y )) = inf g ( x − y ) d ν. ν ∈ Π Entropic relaxation (Cuturi, Peyré). For h > 0, � K ′ h := inf ν ∈ Π [ ν ( g ( x − y )) + h Ent ( ν )] , Ent ( ν ) = ν ( x ) log ν ( x ) dx Fast algorithms for h > 0. Want h → 0.

  3. Entropic cost An equivalent form of entropic relaxation. Define “transition kernel”: � � p h ( x , y ) = 1 − 1 exp hg ( x − y ) , Λ h and joint distribution µ h ( x , y ) = ρ 0 ( x ) p h ( x , y ) . Relative entropy: � d ν � � H ( ν | µ ) = log d ν. d µ Define entropic cost K h = couplings ( ρ 0 ,ρ 1 ) H ( ν | µ h ) . inf K h = K ′ h / h − Ent ( ρ 0 ) + log Λ h

  4. Example: quadratic Wasserstein 2 � x − y � 2 . Consider g ( x − y ) = 1 p h ( x , y ) - transition of Brownian motion. h = temperature. � � − 1 p h ( x , y ) = ( 2 π h ) − d / 2 exp 2 h � x − y � 2 . In general, there need not be a stochastic process for p h ( x , y ) . Theorem (Y. Brenier ’87) There exists unique convex φ such that T ( x ) = ∇ φ ( x ) solves both Monge and Kantorovich OT problems for ( ρ 0 , ρ 1 ) .

  5. Schrödinger’s problem Brownian motion X - temperature h ≈ 0 “Condition” X 0 ∼ ρ 0 , X 1 ∼ ρ 1 . Exponentially rare. On this rare event what do particles do? Schrödinger ’31, Föllmer ’88, Léonard ’12. Particle initially at x moves close to ∇ φ ( x ) (Brenier map). In fact, lim h → 0 hK h = 1 2 W 2 2 ( ρ 0 , ρ 1 ) . True in general. For any g ( x − y ) : h → 0 hK h = W g ( ρ 0 , ρ 1 ) . lim Rate of convergence?

  6. Pointwise convergence Theorem (P. ’19) ρ 0 , ρ 1 compactly supported and continuous (+ smoothness etc.). Kantorovich potential uniformly convex. � � K h − 1 = 1 2 h W 2 lim 2 ( ρ 0 , ρ 1 ) 2 ( Ent ( ρ 1 ) − Ent ( ρ 0 )) . h → 0 + Complementary results known for gamma convergence. Pointwise convergence left open. Adams, Dirr, Peletier, Zimmer ’11 (1-d), Duong, Laschos, Renger ’13, Erbar, Maas, Renger ’15 (multidimension, Fokker-Planck).

  7. Divergence To state the result for a general g , need a new concept. For a convex function φ , Bregman divergence: D [ y | z ] = φ ( y ) − φ ( z ) − ( y − z ) · ∇ φ ( z ) ≥ 0 . If x ∗ = ∇ φ ( x ) , D [ y | x ∗ ] = 1 2 � y − x � 2 − φ c ( x ) − φ ∗ c ( y ) , where φ c , φ ∗ c are c-concave functions: φ c ( x ) = 1 c ( y ) = 1 2 � x � 2 − φ ( x ) , 2 � y � 2 − φ ∗ ( y ) . φ ∗ y ≈ x ∗ , D [ y | x ∗ ] ≈ ( y − x ∗ ) T A ( x ∗ )( y − x ∗ ) , A ( z ) = ∇ 2 φ ∗ ( z ) .

  8. Divergence Generalize to cost g . Monge solution given by (Gangbo - McCann) x ∗ = x − ( ∇ g ) − 1 ◦ ∇ ψ, for some c -concave function ψ . Dual c-concave function ψ ∗ . Divergence D [ y | x ∗ ] = g ( x − y ) − ψ ( x ) − ψ ∗ ( y ) ≥ 0 . y ≈ x ∗ , extract matrix A ( x ∗ ) from the Taylor series. Divergence/ A ( · ) measures sensitivity of Monge map. Related to cross-difference of Kim & McCann ’10, McCann ’12, Yang & Wong ’19.

  9. Pointwise convergence Theorem (P. ’19) ρ 0 , ρ 1 compactly supported, continuous (+ smoothness etc.). A ( · ) “uniformly elliptic”. � � � K h − 1 = 1 ρ 1 ( y ) log det( A ( y )) dy − 1 2 log det ∇ 2 g ( 0 ) . lim h W g ( ρ 0 , ρ 1 ) 2 h → 0 + For g ( x − y ) = � x − y � 2 / 2, log det ∇ 2 g ( 0 ) = 0, for φ (Brenier) � � 1 ρ 1 ( y ) log det( A ( y )) dy = 1 ρ 1 ( y ) log det( ∇ 2 φ ∗ ( y )) dy , 2 2 which is 1 2 ( Ent ( ρ 1 ) − Ent ( ρ 0 )) by simple calculation a la McCann.

  10. The Dirichlet transport

  11. Dirichlet transport, P.-Wong ’16 ∆ n - unit simplex { ( p 1 , . . . , p n ) : p i > 0 , � i p i = 1 } . ∆ n is an abelian group. e = ( 1 / n , . . . , 1 / n ) If p , q ∈ ∆ n , then � p − 1 � p i q i 1 / p i ( p ⊙ q ) i = � n , i = � n . j = 1 p j q j j = 1 1 / p j K-L divergence or relative entropy as “distance”: n � H ( q | p ) = q i log( q i / p i ) . i = 1 Take X = Y = ∆ n . � � � n � n � � 1 q i − 1 log q i e | p − 1 ⊙ q c ( p , q ) = H = log ≥ 0 . n p i n p i i = 1 i = 1

  12. Some economic motivation Market weights for n stocks: µ = ( µ 1 , . . . , µ n ) . µ i = Proportion of the total capital that belongs to i th stock. Investment portfolio: π = ( π 1 , . . . , π n ) ∈ ∆ n . Portfolio weights: π i = Proportion of the total value that belongs to i th stock . Markovian investments π = π ( µ ) : ∆ n → ∆ n . How to build robust portfolios that compare with an index, say, S&P 500? ONLY solutions given by the Dirichlet transport.

  13. Exponentially concave functions ϕ : ∆ n → R ∪ {−∞} is exponentially concave if e ϕ is concave. x �→ 1 2 log x is e-concave, but not x �→ 2 log x . Examples: p , r ∈ ∆ n , 0 < λ < 1. � ϕ ( p ) = 1 log p i . n i �� � �� � ϕ ( p ) = 1 p λ ϕ ( p ) = log r i p i , λ log . i i i (Fernholz ’02, P. and Wong ’15). Analog of Brenier’s Theorem: If ( p , q = F ( p )) is the Monge solution, then p − 1 = � ∇ ϕ ( q ) , Kantorovich potential . Smooth, MTW Khan & Zhang ’19.

  14. Back to the Dirichlet transport What is the corresponding probabilistic picture for the cost function � � e | p − 1 ⊙ q c ( p , q ) = H on the unit simplex ∆ n ? Symmetric Dirichlet distribution Dir ( λ ) : � n p λ/ n − 1 density ∝ . j j = 1 Probability distribution on the unit simplex. If U ∼ Dir ( · ) , � 1 � E ( U ) = e , Var ( U i ) = O . λ

  15. Dirichlet transition Haar measure on (∆ n , ⊙ ) is Dir ( 0 ) , ν ( p ) = � n i = 1 p − 1 . i Consider transition probability: p ∈ ∆ n , U ∼ Dir ( λ ) , Q = p ⊙ U . f λ ( p , q ) = c ν ( q ) exp ( − λ c ( p , q )) , (P.-Wong ’18) . Temperature: h = 1 λ . Let p h ( p , q ) = f 1 / h ( p , q ) . As h → 0 + , p h → δ p . As h → ∞ , Q → Dir ( 0 ) , Haar measure.

  16. Multiplicative Schrödinger problem Fix ρ 0 , ρ 1 . Let µ h ( p , q ) = ρ 0 ( p ) p h ( p , q ) . � Recall relative entropy: H ( ν | µ ) = log( d ν/ d µ ) d µ . Entropic cost K h = couplings ( ρ 0 ,ρ 1 ) H ( ν | µ h ) inf For ρ density on ∆ n , let Ent 0 ( ρ ) = H ( ρ | Dir ( 0 )) . Relative entropy w.r.t. Haar measure.

  17. Pointwise convergence Theorem (P. ’19) ρ 0 , ρ 1 are compactly supported + exponentially concave potential is “uniformly convex”. � � 1 � � = 1 h − n lim K h − C ( ρ 0 , ρ 1 ) 2 ( Ent 0 ( ρ 1 ) − Ent 0 ( ρ 0 )) . 2 h → 0 + C ( ρ 0 , ρ 1 ) is the optimal cost of transport with cost c . Not a metric, but a divergence. Not symmetric in ( ρ 0 , ρ 1 ) . AFAIK, the only such example known. Related to Erbar ’14 (jump processes), and Maas ’11 (Markov chains).

  18. Idea of the proof: approximate Schrödinger bridge

  19. Idea of the proof: Brownian case Recall, want to condition Brownian motion to have marginals ρ 0 , ρ 1 . p h ( x , y ) Brownian transition density at time h . µ h ( x , y ) = ρ 0 ( x ) p h ( x , y ) , joint distribution . If I can “guess” this conditional distribution ν h , then K h = couplings ( ρ 0 ,ρ 1 ) H ( ν | µ h ) = H ( � inf µ h | µ h ) . Can approximately do so for small h by a Taylor expansion in h .

  20. Idea of the proof: Brownian case It is known (Rüschendorf) that � µ h must be of the form � � − 1 µ h ( x , y ) = e a ( x )+ b ( y ) µ h ( x , y ) ∝ exp � hg ( x − y ) + a ( x ) + b ( y ) . φ - convex function from Brenier map. � � � � � x � 2 | y | 2 a ( x ) = 1 + h ζ h ( x ) , b ( y ) = 1 − φ ∗ ( y ) − φ ( x ) + h ξ h ( y ) , h 2 h 2 ζ h , ξ h are O ( 1 ) .

  21. Idea of the proof Thus, up to lower order terms, � � − 1 hg ( x − y ) + 1 h φ c ( x ) + 1 h φ ∗ � µ h ( x , y ) ∝ ρ 0 ( x ) exp c ( y ) � � − 1 = ρ 0 ( x ) exp hD [ y | x ∗ ] . If y − x ∗ is large, it gets penalized exponentially. Hence � � − 1 2 h ( y − x ∗ ) T ∇ 2 φ ∗ ( x ∗ )( y − x ∗ ) µ h ( x , y ) ∝ ρ 0 ( x ) exp � Gaussian transition kernel with mean x ∗ and covariance � � − 1 . ∇ 2 φ ∗ ( x ∗ ) h

  22. Idea of the proof For h ≈ 0, the Schrödinger bridge is approximately Gaussian. � � − 1 � � x ∗ , h ∇ 2 φ ∗ ( x ∗ ) Sample X ∼ ρ 0 , generate Y ∼ N . 1 ( 2 π h ) − d / 2 × µ h ( x , y ) ≈ ρ 0 ( x ) � � det( ∇ 2 φ ∗ ( x ∗ )) � � − 1 2 h ( y − x ∗ ) T ∇ 2 φ ∗ ( x ∗ )( y − x ∗ ) exp . Y is not exactly ρ 1 . Lower order corrections. Nevertheless, � µ h | µ h ) = 1 det ∇ 2 φ ∗ ( x ∗ ) ρ 0 ( x ) dx = 1 H ( � 2 ( Ent ( ρ 1 ) − Ent ( ρ 0 )) . 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend