Data-driven optimal transport Esteban G. Tabak NYU - Courant - PowerPoint PPT Presentation

Data-driven optimal transport Esteban G. Tabak NYU - Courant Institute of Mathematical Sciences Giulio Trigila Technische Universit¨ at M¨ unchen Machine Learning Seminar, November 5 2013

The story in a picture µ (y) ρ (x) Y X π (x,y) x i y j

Optimal transport (Monge formulation) Find a plan to transport material from one location to another that minimizes the total transportation cost. In terms of the probability densities ρ ( x ) and µ ( y ) with x , y ∈ R n , � min y ( x ) M ( y ) = R n c ( x , y ( x )) ρ ( x ) dx . subject to � � ρ ( x ) dx = µ ( y ) dy y − 1 (Ω) Ω for all measurable sets Ω. If y is smooth and one to one, this is equivalent to the point-wise relation ρ ( x ) = µ ( y ( x )) J y ( x ) ,

Kantorovich formulation � π ( x , y ) K ( π ) = min c ( x , y ) π ( x , y ) dxdy subject to � � ρ ( x ) = π ( x , y ) dy , µ ( y ) = π ( x , y ) dx If c ( x , y ) is convex, min M ( y ) = max K ( π ) and π ( S ) = ρ ( { x : ( x , y ( x )) ∈ S } ) .

Dual formulation Maximize � � D ( u , v ) = u ( x ) ρ ( x ) dx + v ( x ) µ ( y ) dy over all continuous functions u and v satisfying u ( x ) + v ( y ) ≤ c ( x , y ) .

The standard example: c ( c , y ) = � y − x � 2 � � min c ( x , y ) π ( x , y ) dxdy → max x · y π ( x , y ) dxdy , with dual � � min u ( x ) ρ ( x ) dx + v ( y ) µ ( y ) dy , u ( x ) + v ( y ) ≥ x · y . y ( x · y − v ( y )) ≡ v ∗ ( x )  u ( x ) = max  x ( x · y − u ( x )) ≡ u ∗ ( y ) v ( y ) = max  and the optimal map for Monge’s problem is given by y ( x ) = ∇ u ( x ) , with potential u ( x ) satisfying the Monge-Ampere equation ρ ( x ) = µ ( ∇ u ( x )) det ( D 2 u )( x ) .

A data-based formulation In Kantorovich dual formulation, the objective function to maximize is a sum of two expected values: � � D ( u , v ) = u ( x ) ρ ( x ) dx + v ( x ) µ ( y ) dy . Hence, if ρ ( x ) and µ ( y ) are known only through samples, it is natural to pose the problem in terms of empirical means: Maximize m n D ( u , v ) = 1 u ( x i ) + 1 � � v ( y j ) m n i =1 j =1 over functions u and v satisfying u ( x ) + v ( y ) ≤ c ( x , y ) .

A purely discrete reduction Maximize m n D ( u , v ) = 1 u i + 1 � � v j (1) m n i =1 j =1 over vectors u and v satisfying u i + v j ≤ c ij . (2) This is dual to the uncapacitated transportation problem : Minimize � C ( π ) = c ij π ij (3) i , j subject to � = (4) π ij n j � π ij = m . (5) i

Fully constrained scenario m n D ( u , v ) = 1 u ( x i )+1 � � � � v ( y j ) = u ( x ) ρ ( x ) dx + v ( x ) µ ( y ) dy m n i =1 j =1 with m n ρ ( x ) = 1 δ ( x − x i ) , µ ( y ) = 1 � � δ ( y − y j ) , m n i =1 j =1 which brings us back to the purely discrete case.

The Lagrangian � L ( π, u , v ) = c ( x , y ) π ( x , y ) dxdy � � m � ρ ( x ) − 1 � − δ ( x − x i ) u ( x ) dx m i =1 �   n  µ ( y ) − 1 �  v ( y ) dy , − δ ( y − y j ) n j =1 where ρ ( x ) and µ ( y ) are shortcut notations for � ρ ( x ) = π ( x , y ) dy and � µ ( y ) = π ( x , y ) dx .

In terms of the Lagrangian, the problem can be formulated as a game: d : max π ( x , y ) ≥ 0 L ( π, u , v ) , min u ( x ) , v ( y ) with dual p : min u ( x ) , v ( y ) L ( π, u , v ) . max π ( x , y ) ≥ 0 If the functions u ( x ) and v ( y ) are unrestricted, the problem p becomes � p : min c ( x , y ) π ( x , y ) dxdy π ( x , y ) ≥ 0 subject to m n ρ ( x ) = 1 δ ( x − x i ) , µ ( y ) = 1 � � δ ( y − y j ) , m n i =1 j =1 as before.

A suitable relaxation Instead, restrictict the space of functions F from where u and v can be selected. If F is invariant under dilations: u ∈ F , λ ∈ R → λ u ∈ F , then the problem p becomes � p : min c ( x , y ) π ( x , y ) dxdy π ( x , y ) ≥ 0 � � m � ρ ( x ) − 1 � δ ( x − x i ) u ( x ) dx = 0 , m i =1 �   n  µ ( y ) − 1 �  v ( y ) dy δ ( y − y j ) = 0 n j =1 for all u ( x ) , v ( y ) ∈ F , a weak formulation of the constraints.

Some possible choices for F 1. Constants: The constraints just guaranty that ρ and µ integrate to one. 2. Linear Functions: The solution is a rigid displacement that moves the mean of { x i } into the mean of { y j } . 3. Quadratic Functions: A linear transformation mapping the empirical mean and covariance matrix of { x i } into those of { y j } . 4. Smooth functions with appropriate local bandwidth, neither to over nor under-resolve ρ ( x ) and µ ( y ).

A flow-based, primal-dual approach c = 1 2 � x − y � 2 , y ( x ) = ∇ u ( x ) . We think of the map y ( x ) as the endpoint of a time-dependent flow z ( x , t ), such that z ( x , 0) = x , and z ( x , ∞ ) = y ( x ) . z ( x , t ) follows the gradient flow  � � δ ˜ D t  z = −∇ z ˙   δ u u = 1 2 � x � 2   z ( x , 0) = x  where � � ˜ u ∗ ( y ) µ ( y ) dy D t = u ( z ) ρ t ( z ) dz + and ρ t is the evolving probability distribution underlying the points z ( x , t ).

The variational derivative of ˜ D adopts the form δ ˜ D t � � D 2 u � � µ ( ∇ u ) δ u = ρ t − 2 � x � 2 (the potential corresponding to the which, applied at u = 1 identity map), yields the simpler expression δ ˜ D t δ u ( z ) = ρ t ( z ) − µ ( z ) , so � z = −∇ z [ ρ t ( z ) − µ ( z )] ˙ z ( x , 0) = x

The probability density ρ t satisfies the continuity equation ∂ρ t ( z ) + ∇ z · [ ρ t ( z )˙ z ] = 0 , ∂ t yielding ∂ρ t ( z ) − ∇ z · [ ρ t ( z ) ∇ z ( ρ t ( z ) − µ ( z ))] = 0 , ∂ t a nonlinear heat equation that converges to ρ t = ∞ ( z ) = µ ( z ).

In terms of samples, 1. Objetive function: u ∗ ( y ) µ ( y ) dy → 1 u ( z i )+1 � � ˜ � � u ∗ ( y j ) D t = u ( z ) ρ t ( z ) dz + m n i j 2. Test functions: General u ( z ) ∈ F → localized u ( z , β ), with β a finite-dimensional parameter and appropriate bandwidth. u = δ ˜ D t 3. Gradient descent: − ˙ δ u ( z ) = ρ t ( z ) − µ ( z ) → D t ( z ) = 1 ∇ β u ( z i ) − 1 − ˙ β = ∇ β ˜ � � ∇ β u ( y j ) m n i j

Left to discuss: 1. Form of u ( z , β ), determination of bandwidth: u ( z , β ) = 1 � � z − z k � � 2 � x � 2 + � β k F k , α k k � 1 � 1 d α k ∝ . ρ t ( z k ) 2. Enforcement of the map’s optimality.

Some applications and extensions 1. Maps: fluid flow from tracers, effects of a medical treatment, resource allocation, . . . 2. Density estimation: ρ ( x ) 3. Classification and clustering: ρ k ( x ) 4. Regression: π ( x , y ) , dim( x ) � = dim( y ), � y − y k � 2 � c ( x , y ) = w k α 2 k + � x − x k � 2 k 5. Determination of worst scenario under given marginals: π ( x 1 , x 2 , . . . , x n )

Thanks!

Data-driven optimal transport Esteban G. Tabak NYU - Courant - PowerPoint PPT Presentation

Data-driven optimal transport Esteban G. Tabak NYU - Courant Institute of Mathematical Sciences Giulio Trigila Technische Universit at M unchen Machine Learning Seminar, November 5 2013 The story in a picture (y) (x) Y X (x,y)

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

False fasting is driven by pride False fasting is driven by pride False fasting is

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Joint Local Transport Plan for West of England Bristol Transport Strategy The emerging transport

GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and

1 Transport Layer Transport Layer RTT Estimation RTT Estimation Basic Idea SampleRTT :

COMP 431 The Transport Layer Internet Services & Protocols Transport services and protocols

testo 184 testo 184 D t Data Loggers for Transport L f T t testo 184: The new transport

Advanced Section #2: Optimal Transport AC 209B: Data Science 2 Javier Zazo Pavlos Protopapas

Optimal Transport for structured data with application on graphs Titouan Vayer Joint work with

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Data-Driven Research Program Data-Driven Research Program Linked Longitudinal Retrospective

Optimum and equilibrium in a transport problem with queue penalization effect. G. Crippa,

Optimal Transportation With Convex Constraints Ping Chen ( ) Science School Jiangsu

Foundations of Chemical Kinetics Lecture 20: The master equation Marc R. Roussel Department of

STA 331 2.0 Stochastic Processes 2. Markov Chains Dr Thiyanga S. Talagala August 4, 2020

Network Flow VI - Min-Cost Flow Applications Lecture 17 October 24, 2013 Sariel (UIUC) CS573

110401367 Transportation Engineering and Planning Dr. Randa Oqab Mujalli Introduction of

Uniqueness and purity in multi-agent matching problems Brendan Pass (joint with Y.-H. Kim (UBC))

Variational mapping particle filter combining particle filters with local optimal transport Manuel

Data-driven optimal transport Esteban G. Tabak NYU - Courant - PowerPoint PPT Presentation

Data-driven optimal transport Esteban G. Tabak NYU - Courant Institute of Mathematical Sciences Giulio Trigila Technische Universit at M unchen Machine Learning Seminar, November 5 2013 The story in a picture (y) (x) Y X (x,y)

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

False fasting is driven by pride False fasting is driven by pride False fasting is

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Joint Local Transport Plan for West of England Bristol Transport Strategy The emerging transport

GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and

1 Transport Layer Transport Layer RTT Estimation RTT Estimation Basic Idea SampleRTT :

COMP 431 The Transport Layer Internet Services &amp; Protocols Transport services and protocols

testo 184 testo 184 D t Data Loggers for Transport L f T t testo 184: The new transport

Advanced Section #2: Optimal Transport AC 209B: Data Science 2 Javier Zazo Pavlos Protopapas

Optimal Transport for structured data with application on graphs Titouan Vayer Joint work with

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Data-Driven Research Program Data-Driven Research Program Linked Longitudinal Retrospective

Optimum and equilibrium in a transport problem with queue penalization effect. G. Crippa,

Optimal Transportation With Convex Constraints Ping Chen ( ) Science School Jiangsu

Foundations of Chemical Kinetics Lecture 20: The master equation Marc R. Roussel Department of

STA 331 2.0 Stochastic Processes 2. Markov Chains Dr Thiyanga S. Talagala August 4, 2020

Network Flow VI - Min-Cost Flow Applications Lecture 17 October 24, 2013 Sariel (UIUC) CS573

110401367 Transportation Engineering and Planning Dr. Randa Oqab Mujalli Introduction of

Uniqueness and purity in multi-agent matching problems Brendan Pass (joint with Y.-H. Kim (UBC))

Variational mapping particle filter combining particle filters with local optimal transport Manuel

COMP 431 The Transport Layer Internet Services & Protocols Transport services and protocols