data driven optimal transport
play

Data-driven optimal transport Esteban G. Tabak NYU - Courant - PowerPoint PPT Presentation

Data-driven optimal transport Esteban G. Tabak NYU - Courant Institute of Mathematical Sciences Giulio Trigila Technische Universit at M unchen Machine Learning Seminar, November 5 2013 The story in a picture (y) (x) Y X (x,y)


  1. Data-driven optimal transport Esteban G. Tabak NYU - Courant Institute of Mathematical Sciences Giulio Trigila Technische Universit¨ at M¨ unchen Machine Learning Seminar, November 5 2013

  2. The story in a picture µ (y) ρ (x) Y X π (x,y) x i y j

  3. Optimal transport (Monge formulation) Find a plan to transport material from one location to another that minimizes the total transportation cost. In terms of the probability densities ρ ( x ) and µ ( y ) with x , y ∈ R n , � min y ( x ) M ( y ) = R n c ( x , y ( x )) ρ ( x ) dx . subject to � � ρ ( x ) dx = µ ( y ) dy y − 1 (Ω) Ω for all measurable sets Ω. If y is smooth and one to one, this is equivalent to the point-wise relation ρ ( x ) = µ ( y ( x )) J y ( x ) ,

  4. Kantorovich formulation � π ( x , y ) K ( π ) = min c ( x , y ) π ( x , y ) dxdy subject to � � ρ ( x ) = π ( x , y ) dy , µ ( y ) = π ( x , y ) dx If c ( x , y ) is convex, min M ( y ) = max K ( π ) and π ( S ) = ρ ( { x : ( x , y ( x )) ∈ S } ) .

  5. Dual formulation Maximize � � D ( u , v ) = u ( x ) ρ ( x ) dx + v ( x ) µ ( y ) dy over all continuous functions u and v satisfying u ( x ) + v ( y ) ≤ c ( x , y ) .

  6. The standard example: c ( c , y ) = � y − x � 2 � � min c ( x , y ) π ( x , y ) dxdy → max x · y π ( x , y ) dxdy , with dual � � min u ( x ) ρ ( x ) dx + v ( y ) µ ( y ) dy , u ( x ) + v ( y ) ≥ x · y . y ( x · y − v ( y )) ≡ v ∗ ( x )  u ( x ) = max  x ( x · y − u ( x )) ≡ u ∗ ( y ) v ( y ) = max  and the optimal map for Monge’s problem is given by y ( x ) = ∇ u ( x ) , with potential u ( x ) satisfying the Monge-Ampere equation ρ ( x ) = µ ( ∇ u ( x )) det ( D 2 u )( x ) .

  7. A data-based formulation In Kantorovich dual formulation, the objective function to maximize is a sum of two expected values: � � D ( u , v ) = u ( x ) ρ ( x ) dx + v ( x ) µ ( y ) dy . Hence, if ρ ( x ) and µ ( y ) are known only through samples, it is natural to pose the problem in terms of empirical means: Maximize m n D ( u , v ) = 1 u ( x i ) + 1 � � v ( y j ) m n i =1 j =1 over functions u and v satisfying u ( x ) + v ( y ) ≤ c ( x , y ) .

  8. A purely discrete reduction Maximize m n D ( u , v ) = 1 u i + 1 � � v j (1) m n i =1 j =1 over vectors u and v satisfying u i + v j ≤ c ij . (2) This is dual to the uncapacitated transportation problem : Minimize � C ( π ) = c ij π ij (3) i , j subject to � = (4) π ij n j � π ij = m . (5) i

  9. Fully constrained scenario m n D ( u , v ) = 1 u ( x i )+1 � � � � v ( y j ) = u ( x ) ρ ( x ) dx + v ( x ) µ ( y ) dy m n i =1 j =1 with m n ρ ( x ) = 1 δ ( x − x i ) , µ ( y ) = 1 � � δ ( y − y j ) , m n i =1 j =1 which brings us back to the purely discrete case.

  10. The Lagrangian � L ( π, u , v ) = c ( x , y ) π ( x , y ) dxdy � � m � ρ ( x ) − 1 � − δ ( x − x i ) u ( x ) dx m i =1 �   n  µ ( y ) − 1 �  v ( y ) dy , − δ ( y − y j ) n j =1 where ρ ( x ) and µ ( y ) are shortcut notations for � ρ ( x ) = π ( x , y ) dy and � µ ( y ) = π ( x , y ) dx .

  11. In terms of the Lagrangian, the problem can be formulated as a game: d : max π ( x , y ) ≥ 0 L ( π, u , v ) , min u ( x ) , v ( y ) with dual p : min u ( x ) , v ( y ) L ( π, u , v ) . max π ( x , y ) ≥ 0 If the functions u ( x ) and v ( y ) are unrestricted, the problem p becomes � p : min c ( x , y ) π ( x , y ) dxdy π ( x , y ) ≥ 0 subject to m n ρ ( x ) = 1 δ ( x − x i ) , µ ( y ) = 1 � � δ ( y − y j ) , m n i =1 j =1 as before.

  12. A suitable relaxation Instead, restrictict the space of functions F from where u and v can be selected. If F is invariant under dilations: u ∈ F , λ ∈ R → λ u ∈ F , then the problem p becomes � p : min c ( x , y ) π ( x , y ) dxdy π ( x , y ) ≥ 0 � � m � ρ ( x ) − 1 � δ ( x − x i ) u ( x ) dx = 0 , m i =1 �   n  µ ( y ) − 1 �  v ( y ) dy δ ( y − y j ) = 0 n j =1 for all u ( x ) , v ( y ) ∈ F , a weak formulation of the constraints.

  13. Some possible choices for F 1. Constants: The constraints just guaranty that ρ and µ integrate to one. 2. Linear Functions: The solution is a rigid displacement that moves the mean of { x i } into the mean of { y j } . 3. Quadratic Functions: A linear transformation mapping the empirical mean and covariance matrix of { x i } into those of { y j } . 4. Smooth functions with appropriate local bandwidth, neither to over nor under-resolve ρ ( x ) and µ ( y ).

  14. A flow-based, primal-dual approach c = 1 2 � x − y � 2 , y ( x ) = ∇ u ( x ) . We think of the map y ( x ) as the endpoint of a time-dependent flow z ( x , t ), such that z ( x , 0) = x , and z ( x , ∞ ) = y ( x ) . z ( x , t ) follows the gradient flow  � � δ ˜ D t  z = −∇ z ˙   δ u u = 1 2 � x � 2   z ( x , 0) = x  where � � ˜ u ∗ ( y ) µ ( y ) dy D t = u ( z ) ρ t ( z ) dz + and ρ t is the evolving probability distribution underlying the points z ( x , t ).

  15. The variational derivative of ˜ D adopts the form δ ˜ D t � � D 2 u � � µ ( ∇ u ) δ u = ρ t − 2 � x � 2 (the potential corresponding to the which, applied at u = 1 identity map), yields the simpler expression δ ˜ D t δ u ( z ) = ρ t ( z ) − µ ( z ) , so � z = −∇ z [ ρ t ( z ) − µ ( z )] ˙ z ( x , 0) = x

  16. The probability density ρ t satisfies the continuity equation ∂ρ t ( z ) + ∇ z · [ ρ t ( z )˙ z ] = 0 , ∂ t yielding ∂ρ t ( z ) − ∇ z · [ ρ t ( z ) ∇ z ( ρ t ( z ) − µ ( z ))] = 0 , ∂ t a nonlinear heat equation that converges to ρ t = ∞ ( z ) = µ ( z ).

  17. In terms of samples, 1. Objetive function: u ∗ ( y ) µ ( y ) dy → 1 u ( z i )+1 � � ˜ � � u ∗ ( y j ) D t = u ( z ) ρ t ( z ) dz + m n i j 2. Test functions: General u ( z ) ∈ F → localized u ( z , β ), with β a finite-dimensional parameter and appropriate bandwidth. u = δ ˜ D t 3. Gradient descent: − ˙ δ u ( z ) = ρ t ( z ) − µ ( z ) → D t ( z ) = 1 ∇ β u ( z i ) − 1 − ˙ β = ∇ β ˜ � � ∇ β u ( y j ) m n i j

  18. Left to discuss: 1. Form of u ( z , β ), determination of bandwidth: u ( z , β ) = 1 � � z − z k � � 2 � x � 2 + � β k F k , α k k � 1 � 1 d α k ∝ . ρ t ( z k ) 2. Enforcement of the map’s optimality.

  19. Some applications and extensions 1. Maps: fluid flow from tracers, effects of a medical treatment, resource allocation, . . . 2. Density estimation: ρ ( x ) 3. Classification and clustering: ρ k ( x ) 4. Regression: π ( x , y ) , dim( x ) � = dim( y ), � y − y k � 2 � c ( x , y ) = w k α 2 k + � x − x k � 2 k 5. Determination of worst scenario under given marginals: π ( x 1 , x 2 , . . . , x n )

  20. Thanks!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend