On entropic cost optimal transport cost Soumik Pal University of - PowerPoint PPT Presentation

On entropic cost – optimal transport cost Soumik Pal University of Washington, Seattle arxiv:1905.12206 Eigenfunctions seminar @ IISc Bangalore, August 30, 2019

MK OT and entropic relaxation ρ 0 , ρ 1 - probability densities on X = R d = Y . c ( x , y ) = g ( x − y ) , strictly convex, g ≥ 0, g ( z ) = 0 iff z = 0. Π( ρ 0 , ρ 1 ) - set of couplings. Probabilities on X × Y . Monge-Kantorovich (MK) OT problem: � W g ( ρ 0 , ρ 1 ) := inf ν ∈ Π ν ( g ( x − y )) = inf g ( x − y ) d ν. ν ∈ Π Entropic relaxation (Cuturi, Peyré). For h > 0, � K ′ h := inf ν ∈ Π [ ν ( g ( x − y )) + h Ent ( ν )] , Ent ( ν ) = ν ( x ) log ν ( x ) dx Fast algorithms for h > 0. Want h → 0.

Entropic cost An equivalent form of entropic relaxation. Define “transition kernel”: � � p h ( x , y ) = 1 − 1 exp hg ( x − y ) , Λ h and joint distribution µ h ( x , y ) = ρ 0 ( x ) p h ( x , y ) . Relative entropy: � d ν � � H ( ν | µ ) = log d ν. d µ Define entropic cost K h = couplings ( ρ 0 ,ρ 1 ) H ( ν | µ h ) . inf K h = K ′ h / h − Ent ( ρ 0 ) + log Λ h

Example: quadratic Wasserstein 2 � x − y � 2 . Consider g ( x − y ) = 1 p h ( x , y ) - transition of Brownian motion. h = temperature. � � − 1 p h ( x , y ) = ( 2 π h ) − d / 2 exp 2 h � x − y � 2 . In general, there need not be a stochastic process for p h ( x , y ) . Theorem (Y. Brenier ’87) There exists unique convex φ such that T ( x ) = ∇ φ ( x ) solves both Monge and Kantorovich OT problems for ( ρ 0 , ρ 1 ) .

Schrödinger’s problem Brownian motion X - temperature h ≈ 0 “Condition” X 0 ∼ ρ 0 , X 1 ∼ ρ 1 . Exponentially rare. On this rare event what do particles do? Schrödinger ’31, Föllmer ’88, Léonard ’12. Particle initially at x moves close to ∇ φ ( x ) (Brenier map). In fact, lim h → 0 hK h = 1 2 W 2 2 ( ρ 0 , ρ 1 ) . True in general. For any g ( x − y ) : h → 0 hK h = W g ( ρ 0 , ρ 1 ) . lim Rate of convergence?

Pointwise convergence Theorem (P. ’19) ρ 0 , ρ 1 compactly supported and continuous (+ smoothness etc.). Kantorovich potential uniformly convex. � � K h − 1 = 1 2 h W 2 lim 2 ( ρ 0 , ρ 1 ) 2 ( Ent ( ρ 1 ) − Ent ( ρ 0 )) . h → 0 + Complementary results known for gamma convergence. Pointwise convergence left open. Adams, Dirr, Peletier, Zimmer ’11 (1-d), Duong, Laschos, Renger ’13, Erbar, Maas, Renger ’15 (multidimension, Fokker-Planck).

Divergence To state the result for a general g , need a new concept. For a convex function φ , Bregman divergence: D [ y | z ] = φ ( y ) − φ ( z ) − ( y − z ) · ∇ φ ( z ) ≥ 0 . If x ∗ = ∇ φ ( x ) , D [ y | x ∗ ] = 1 2 � y − x � 2 − φ c ( x ) − φ ∗ c ( y ) , where φ c , φ ∗ c are c-concave functions: φ c ( x ) = 1 c ( y ) = 1 2 � x � 2 − φ ( x ) , 2 � y � 2 − φ ∗ ( y ) . φ ∗ y ≈ x ∗ , D [ y | x ∗ ] ≈ ( y − x ∗ ) T A ( x ∗ )( y − x ∗ ) , A ( z ) = ∇ 2 φ ∗ ( z ) .

Divergence Generalize to cost g . Monge solution given by (Gangbo - McCann) x ∗ = x − ( ∇ g ) − 1 ◦ ∇ ψ, for some c -concave function ψ . Dual c-concave function ψ ∗ . Divergence D [ y | x ∗ ] = g ( x − y ) − ψ ( x ) − ψ ∗ ( y ) ≥ 0 . y ≈ x ∗ , extract matrix A ( x ∗ ) from the Taylor series. Divergence/ A ( · ) measures sensitivity of Monge map. Related to cross-difference of Kim & McCann ’10, McCann ’12, Yang & Wong ’19.

Pointwise convergence Theorem (P. ’19) ρ 0 , ρ 1 compactly supported, continuous (+ smoothness etc.). A ( · ) “uniformly elliptic”. � � � K h − 1 = 1 ρ 1 ( y ) log det( A ( y )) dy − 1 2 log det ∇ 2 g ( 0 ) . lim h W g ( ρ 0 , ρ 1 ) 2 h → 0 + For g ( x − y ) = � x − y � 2 / 2, log det ∇ 2 g ( 0 ) = 0, for φ (Brenier) � � 1 ρ 1 ( y ) log det( A ( y )) dy = 1 ρ 1 ( y ) log det( ∇ 2 φ ∗ ( y )) dy , 2 2 which is 1 2 ( Ent ( ρ 1 ) − Ent ( ρ 0 )) by simple calculation a la McCann.

The Dirichlet transport

Dirichlet transport, P.-Wong ’16 ∆ n - unit simplex { ( p 1 , . . . , p n ) : p i > 0 , � i p i = 1 } . ∆ n is an abelian group. e = ( 1 / n , . . . , 1 / n ) If p , q ∈ ∆ n , then � p − 1 � p i q i 1 / p i ( p ⊙ q ) i = � n , i = � n . j = 1 p j q j j = 1 1 / p j K-L divergence or relative entropy as “distance”: n � H ( q | p ) = q i log( q i / p i ) . i = 1 Take X = Y = ∆ n . � � � n � n � � 1 q i − 1 log q i e | p − 1 ⊙ q c ( p , q ) = H = log ≥ 0 . n p i n p i i = 1 i = 1

Some economic motivation Market weights for n stocks: µ = ( µ 1 , . . . , µ n ) . µ i = Proportion of the total capital that belongs to i th stock. Investment portfolio: π = ( π 1 , . . . , π n ) ∈ ∆ n . Portfolio weights: π i = Proportion of the total value that belongs to i th stock . Markovian investments π = π ( µ ) : ∆ n → ∆ n . How to build robust portfolios that compare with an index, say, S&P 500? ONLY solutions given by the Dirichlet transport.

Exponentially concave functions ϕ : ∆ n → R ∪ {−∞} is exponentially concave if e ϕ is concave. x �→ 1 2 log x is e-concave, but not x �→ 2 log x . Examples: p , r ∈ ∆ n , 0 < λ < 1. � ϕ ( p ) = 1 log p i . n i �� ϕ ( p ) = 1 p λ ϕ ( p ) = log r i p i , λ log . i i i (Fernholz ’02, P. and Wong ’15). Analog of Brenier’s Theorem: If ( p , q = F ( p )) is the Monge solution, then p − 1 = � ∇ ϕ ( q ) , Kantorovich potential . Smooth, MTW Khan & Zhang ’19.

Back to the Dirichlet transport What is the corresponding probabilistic picture for the cost function � � e | p − 1 ⊙ q c ( p , q ) = H on the unit simplex ∆ n ? Symmetric Dirichlet distribution Dir ( λ ) : � n p λ/ n − 1 density ∝ . j j = 1 Probability distribution on the unit simplex. If U ∼ Dir ( · ) , � 1 � E ( U ) = e , Var ( U i ) = O . λ

Dirichlet transition Haar measure on (∆ n , ⊙ ) is Dir ( 0 ) , ν ( p ) = � n i = 1 p − 1 . i Consider transition probability: p ∈ ∆ n , U ∼ Dir ( λ ) , Q = p ⊙ U . f λ ( p , q ) = c ν ( q ) exp ( − λ c ( p , q )) , (P.-Wong ’18) . Temperature: h = 1 λ . Let p h ( p , q ) = f 1 / h ( p , q ) . As h → 0 + , p h → δ p . As h → ∞ , Q → Dir ( 0 ) , Haar measure.

Multiplicative Schrödinger problem Fix ρ 0 , ρ 1 . Let µ h ( p , q ) = ρ 0 ( p ) p h ( p , q ) . � Recall relative entropy: H ( ν | µ ) = log( d ν/ d µ ) d µ . Entropic cost K h = couplings ( ρ 0 ,ρ 1 ) H ( ν | µ h ) inf For ρ density on ∆ n , let Ent 0 ( ρ ) = H ( ρ | Dir ( 0 )) . Relative entropy w.r.t. Haar measure.

Pointwise convergence Theorem (P. ’19) ρ 0 , ρ 1 are compactly supported + exponentially concave potential is “uniformly convex”. � � 1 � � = 1 h − n lim K h − C ( ρ 0 , ρ 1 ) 2 ( Ent 0 ( ρ 1 ) − Ent 0 ( ρ 0 )) . 2 h → 0 + C ( ρ 0 , ρ 1 ) is the optimal cost of transport with cost c . Not a metric, but a divergence. Not symmetric in ( ρ 0 , ρ 1 ) . AFAIK, the only such example known. Related to Erbar ’14 (jump processes), and Maas ’11 (Markov chains).

Idea of the proof: approximate Schrödinger bridge

Idea of the proof: Brownian case Recall, want to condition Brownian motion to have marginals ρ 0 , ρ 1 . p h ( x , y ) Brownian transition density at time h . µ h ( x , y ) = ρ 0 ( x ) p h ( x , y ) , joint distribution . If I can “guess” this conditional distribution ν h , then K h = couplings ( ρ 0 ,ρ 1 ) H ( ν | µ h ) = H ( � inf µ h | µ h ) . Can approximately do so for small h by a Taylor expansion in h .

Idea of the proof: Brownian case It is known (Rüschendorf) that � µ h must be of the form � � − 1 µ h ( x , y ) = e a ( x )+ b ( y ) µ h ( x , y ) ∝ exp � hg ( x − y ) + a ( x ) + b ( y ) . φ - convex function from Brenier map. � � � � � x � 2 | y | 2 a ( x ) = 1 + h ζ h ( x ) , b ( y ) = 1 − φ ∗ ( y ) − φ ( x ) + h ξ h ( y ) , h 2 h 2 ζ h , ξ h are O ( 1 ) .

Idea of the proof Thus, up to lower order terms, � � − 1 hg ( x − y ) + 1 h φ c ( x ) + 1 h φ ∗ � µ h ( x , y ) ∝ ρ 0 ( x ) exp c ( y ) � � − 1 = ρ 0 ( x ) exp hD [ y | x ∗ ] . If y − x ∗ is large, it gets penalized exponentially. Hence � � − 1 2 h ( y − x ∗ ) T ∇ 2 φ ∗ ( x ∗ )( y − x ∗ ) µ h ( x , y ) ∝ ρ 0 ( x ) exp � Gaussian transition kernel with mean x ∗ and covariance � � − 1 . ∇ 2 φ ∗ ( x ∗ ) h

Idea of the proof For h ≈ 0, the Schrödinger bridge is approximately Gaussian. � � − 1 � � x ∗ , h ∇ 2 φ ∗ ( x ∗ ) Sample X ∼ ρ 0 , generate Y ∼ N . 1 ( 2 π h ) − d / 2 × µ h ( x , y ) ≈ ρ 0 ( x ) � � det( ∇ 2 φ ∗ ( x ∗ )) � � − 1 2 h ( y − x ∗ ) T ∇ 2 φ ∗ ( x ∗ )( y − x ∗ ) exp . Y is not exactly ρ 1 . Lower order corrections. Nevertheless, � µ h | µ h ) = 1 det ∇ 2 φ ∗ ( x ∗ ) ρ 0 ( x ) dx = 1 H ( � 2 ( Ent ( ρ 1 ) − Ent ( ρ 0 )) . 2

On entropic cost optimal transport cost Soumik Pal University of - PowerPoint PPT Presentation

On entropic cost optimal transport cost Soumik Pal University of Washington, Seattle arxiv:1905.12206 Eigenfunctions seminar @ IISc Bangalore, August 30, 2019 MK OT and entropic relaxation 0 , 1 - probability densities on X = R d =

MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Divergence, Gibbs measures, and entropic regularizations of optimal transport Soumik Pal

Statistical aspects of stochastic algorithms for entropic optimal transportation between

Time energy entropic uncertainty relations: an algebraic approach Christian Bertoni, Yuxiang

Entropic Causal Inference Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath and Babak

Maxima and entropic repulsion of Gaussian free field: Going beyond Z d Joe P. Chen Department of

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Bridging the gap between Optimal Transport and MMD with Sinkhorn Divergences Aude Genevay MIT

Joint Local Transport Plan for West of England Bristol Transport Strategy The emerging transport

GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and

Entropic Analysis of Spectrum Sensing for Cognitive Radio Jim Gaines (Dr. Neal Patwari)

The Great Gatsby and Icarus Exposing Parallels and Problems within an Entropic Universe 5 May

Aaron Clauset Winter Mason Department of Computer Science

Concentration inequalities, the entropy method, search for super -concentration Concentration, ...

Mental Health in North Carolina: Challenges and Solutions Thava Mahadevan 1 Gary S. Cuddeback 1

Demographic Slides Memphis NAWMP Stakeholder Input Workshop 96% What is your country of

1 Social Science Articles with Innovation In the title 1956-2006 (in percent of all social

R OMIDEPSIN IN R ELAPSED /R EFRACTORY PTCL Central Review (IWC) Best Response N (130) %

Measurement There are two main systems of measurement: - The English system

Q1 2020 Earnings Review May 7, 2020 1 Cautionary ry Note Non-GAAP Measures This presentation