convergence rates for discretized optimal transport
play

Convergence rates for discretized optimal transport Quentin M - PowerPoint PPT Presentation

Convergence rates for discretized optimal transport Quentin M erigot Universit e Paris-Sud 11 Based on joint work with F. Chazal and A. Delalande Workshop on numerical solutions of HJB equations, Paris, January 2020 1 1. Motivations 2


  1. Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . 6 - 2

  2. Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . 6 - 3

  3. Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . � W 2 ,ρ ( µ, ν ) := � T µ − T ν � L 2 ( ρ ) − → [Ambrosio, Gigli, Savar´ e ’04] Riemannian geometry Optimal transport µ ∈ Prob 2 ( R d ) point x ∈ M geodesic distance d g ( x, y ) W 2 ( µ, ν ) T ρ Prob 2 ( R d ) ⊆ L 2 ( ρ, X ) tangent space T ρ M exp − 1 inverse exponential map ρ ( x ) ∈ T ρ M T µ ∈ T ρ Prob 2 ( X ) � exp − 1 ρ ( x ) − exp − 1 distance in tangent space ρ ( y ) � g ( x 0 ) � T µ − T ν � L 2 ( ρ ) 6 - 4

  4. Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . � W 2 ,ρ ( µ, ν ) := � T µ − T ν � L 2 ( ρ ) − → [Ambrosio, Gigli, Savar´ e ’04] Riemannian geometry Optimal transport µ ∈ Prob 2 ( R d ) point x ∈ M geodesic distance d g ( x, y ) W 2 ( µ, ν ) T ρ Prob 2 ( R d ) ⊆ L 2 ( ρ, X ) tangent space T ρ M exp − 1 inverse exponential map ρ ( x ) ∈ T ρ M T µ ∈ T ρ Prob 2 ( X ) � exp − 1 ρ ( x ) − exp − 1 distance in tangent space ρ ( y ) � g ( x 0 ) � T µ − T ν � L 2 ( ρ ) � Used in image analysis − → [Wang, Slepcev, Basu, Ozolek, Rohde ’13] 6 - 5

  5. Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . � W 2 ,ρ ( µ, ν ) := � T µ − T ν � L 2 ( ρ ) − → [Ambrosio, Gigli, Savar´ e ’04] Riemannian geometry Optimal transport µ ∈ Prob 2 ( R d ) point x ∈ M geodesic distance d g ( x, y ) W 2 ( µ, ν ) T ρ Prob 2 ( R d ) ⊆ L 2 ( ρ, X ) tangent space T ρ M exp − 1 inverse exponential map ρ ( x ) ∈ T ρ M T µ ∈ T ρ Prob 2 ( X ) � exp − 1 ρ ( x ) − exp − 1 distance in tangent space ρ ( y ) � g ( x 0 ) � T µ − T ν � L 2 ( ρ ) � Used in image analysis − → [Wang, Slepcev, Basu, Ozolek, Rohde ’13] → Representing family of probability measures by family of functions in L 2 ( ρ ) . − 6 - 6

  6. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . 7 - 1

  7. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. 7 - 2

  8. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) 7 - 3

  9. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) (0 . 8 T µ 1 + 0 . 2 T µ 0 ) # ρ 7 - 4

  10. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) 7 - 5

  11. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) 7 - 6

  12. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) 7 - 7

  13. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) What amount of the Wasserstein geometry is preserved by the embedding µ �→ T µ ? 7 - 8

  14. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] 8 - 1

  15. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) 8 - 2

  16. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). 8 - 3

  17. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). It is know that T µ k converges to T µ but convergence rates are unknown in general... 8 - 4

  18. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). It is know that T µ k converges to T µ but convergence rates are unknown in general... In general, the numerical analysis for optimal transport is virtually inexistent, whatever the discretization method. 8 - 5

  19. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). It is know that T µ k converges to T µ but convergence rates are unknown in general... In general, the numerical analysis for optimal transport is virtually inexistent, whatever the discretization method. 8 - 6

  20. 2. Continuity of µ �→ T µ . 9

  21. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . 10 - 1

  22. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . 10 - 2

  23. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ 10 - 3

  24. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. 10 - 4

  25. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. ◮ The map µ �→ T µ is not better than 1 2 -H¨ older. 10 - 5

  26. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. ◮ The map µ �→ T µ is not better than 1 2 -H¨ older. δ xθ + δ xθ + π Take ρ = 1 π Leb B(0 , 1) on R 2 , and define µ θ = , with x θ = (cos( θ ) , sin( θ )) . 2 � x θ � x θ | x � ≥ 0 Then T µ θ ( x ) = , x θ + π if not T µ θ T µ θ x θ x θ + π 10 - 6

  27. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. ◮ The map µ �→ T µ is not better than 1 2 -H¨ older. δ xθ + δ xθ + π Take ρ = 1 π Leb B(0 , 1) on R 2 , and define µ θ = , with x θ = (cos( θ ) , sin( θ )) . 2 � x θ � x θ | x � ≥ 0 so that � T µ θ − T µ θ + δ � 2 L 2 ( ρ ) ≥ Cδ Then T µ θ ( x ) = , x θ + π if not Since on the other hand, W 2 ( µ θ , µ θ + δ ) ≤ Cδ , x θ � T µ θ − T µ θ + δ � L 2 ( ρ ) ≥ C W 2 ( µ θ , µ θ + δ ) 1 / 2 x θ + π 10 - 7 δ

  28. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . 11 - 1

  29. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. 11 - 2

  30. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. 11 - 3

  31. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) 11 - 4

  32. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . 11 - 5

  33. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . ◮ Prop = ⇒ Thm: Kantorovich-Rubinstein theorem 11 - 6

  34. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � 11 - 7

  35. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � � (convexity: ψ ν ( y ) − ψ ν ( x ) ≥ � y − x |∇ ψ ν ( x ) � ) ≥ �∇ ψ µ − ∇ ψ ν |∇ ψ ν ( ∇ φ ν ) � d ρ 11 - 8

  36. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � � (convexity: ψ ν ( y ) − ψ ν ( x ) ≥ � y − x |∇ ψ ν ( x ) � ) ≥ �∇ ψ µ − ∇ ψ ν |∇ ψ ν ( ∇ φ ν ) � d ρ � = �∇ ψ µ − ∇ ψ ν | id � d ρ 11 - 9

  37. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � � (convexity: ψ ν ( y ) − ψ ν ( x ) ≥ � y − x |∇ ψ ν ( x ) � ) ≥ �∇ ψ µ − ∇ ψ ν |∇ ψ ν ( ∇ φ ν ) � d ρ � = �∇ ψ µ − ∇ ψ ν | id � d ρ �∇ ψ ν − ∇ ψ µ | id � d ρ + L � � ψ µ d( ν − µ ) ≥ 2 �∇ φ µ − ∇ φ ν � L 2 ( ρ ) � ⇒ ψ µ = φ ∗ ( T µ = ∇ φ µ L -Lipschitz ⇐ µ is L -strongly convex) 11 - 10

  38. Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 12 - 1

  39. Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 L 2 ( ρ ) ≤ C W 1 ( µ, ν ) α with α = 1 Corollary: � T µ − T ν � 2 2 d − 1 ( d +2) 12 - 2

  40. Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 L 2 ( ρ ) ≤ C W 1 ( µ, ν ) α with α = 1 Corollary: � T µ − T ν � 2 2 d − 1 ( d +2) ◮ The H¨ older exponent is terrible, but inequality holds without assumptions on µ, ν ! 12 - 3

  41. Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 L 2 ( ρ ) ≤ C W 1 ( µ, ν ) α with α = 1 Corollary: � T µ − T ν � 2 2 d − 1 ( d +2) ◮ The H¨ older exponent is terrible, but inequality holds without assumptions on µ, ν ! ◮ Proof of Berman’s theorem relies on techniques from complex geometry. 12 - 4

  42. 2. Global, dimension-independent, older-continuity of µ �→ T µ . H¨ 13

  43. Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . 14 - 1

  44. Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 14 - 2

  45. Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 1 5 < 1 ◮ Gap between lower-bound and upper bound for H¨ older exponent: 2 . The exponent 1 5 is certainly not optimal... 14 - 3

  46. Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 1 5 < 1 ◮ Gap between lower-bound and upper bound for H¨ older exponent: 2 . The exponent 1 5 is certainly not optimal... ◮ The constant C depend polynomially on diam( X ) , diam( Y ) . 14 - 4

  47. Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 1 5 < 1 ◮ Gap between lower-bound and upper bound for H¨ older exponent: 2 . The exponent 1 5 is certainly not optimal... ◮ The constant C depend polynomially on diam( X ) , diam( Y ) . ◮ Proof relies on the semidiscrete setting, i.e. the bound is established in the case µ = � i µ i δ y i , ν = � i ν i δ y i . and one concludes using a density argument. 14 - 5

  48. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) 15 - 1

  49. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ 15 - 2

  50. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) 15 - 3

  51. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) ◮ Let µ = � 1 ≤ i ≤ N µ i δ y i and ψ i = ψ ( y i ) . y 1 y 2 y 3 15 - 4

  52. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) ◮ Let µ = � Then, ψ ∗ | V i ( ψ ) := �·| y i � − ψ i where 1 ≤ i ≤ N µ i δ y i and ψ i = ψ ( y i ) . V i ( ψ ) = { x | ∀ j, � x | y i � − ψ i ≥ � x | y j � − ψ j } V 1 ( ψ ) V 2 ( ψ ) y 1 y 2 y 3 V 3 ( ψ ) 15 - 5

  53. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) ◮ Let µ = � Then, ψ ∗ | V i ( ψ ) := �·| y i � − ψ i where 1 ≤ i ≤ N µ i δ y i and ψ i = ψ ( y i ) . V i ( ψ ) = { x | ∀ j, � x | y i � − ψ i ≥ � x | y j � − ψ j } V 1 ( ψ ) V 2 ( ψ ) y 1 y 2 y 3 V 3 ( ψ ) � Thus, T ( ρ, µ ) = min ψ ∈ R N � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) + � i µ i ψ i i 15 - 6

  54. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i 16 - 1

  55. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . 16 - 2

  56. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i 16 - 3

  57. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ 16 - 4

  58. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i 16 - 5

  59. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types 16 - 6

  60. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − 16 - 7

  61. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . 16 - 8

  62. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . − → ρ ( V i ) = amount of customers for product y i . 16 - 9

  63. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . − → ρ ( V i ) = amount of customers for product y i . Optimal transport = finding prices satisfying capacity constraints ρ ( V i ( ψ )) = µ i . 16 - 10

  64. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . − → ρ ( V i ) = amount of customers for product y i . Optimal transport = finding prices satisfying capacity constraints ρ ( V i ( ψ )) = µ i . ◮ Algorithm (Oliker–Prussner): coordinate-wise increment. Complexity: O ( N 3 ) . 16 - 11

  65. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i y 5 y 2 Γ 15 ( ψ ) y 1 y 2 y 4 17 - 1

  66. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . y 5 y 2 Γ 15 ( ψ ) y 1 y 2 y 4 17 - 2

  67. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) y 1 y 2 y 4 17 - 3

  68. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) ◮ If Ω is connected and ψ ∈ E , then H is connected y 1 y 2 y 4 17 - 4

  69. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) ◮ If Ω is connected and ψ ∈ E , then H is connected y 1 ◮ L is the Laplacian of a connected graph = ⇒ Ker L = R · cst y 2 y 4 17 - 5

  70. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) ◮ If Ω is connected and ψ ∈ E , then H is connected y 1 ◮ L is the Laplacian of a connected graph = ⇒ Ker L = R · cst y 2 Corollary: Global convergence of a damped Newton algorithm. y 4 [Kitagawa, M., Thibert 16] 17 - 6

  71. Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 18 - 1

  72. Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 ψ 1 = Newt( ψ 0 ) NB: The points do not move. 18 - 2

  73. Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 ψ 1 = Newt( ψ 0 ) ψ 2 = Newt( ψ 1 ) NB: The points do not move. 18 - 3

  74. Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 ψ 1 = Newt( ψ 0 ) ψ 2 = Newt( ψ 1 ) NB: The points do not move. Convergence is very fast when spt( ρ ) convex: 17 Newton iterations for N ≥ 10 7 in 3D. 18 - 4

  75. Proof ingredients 19 - 1

  76. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . 19 - 2

  77. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . 19 - 3

  78. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, 19 - 4

  79. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t 19 - 5

  80. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t a) Control of the eigengap : � D G ( ψ t ) v | v � ≤ − C ( X ) � v � 2 � L 2 ( µ t ) if v d µ t = 0 . with µ t = G ( ψ t ) − → [Eymard, Gallou¨ et, Herbin ’00]. 19 - 6

  81. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t a) Control of the eigengap : � D G ( ψ t ) v | v � ≤ − C ( X ) � v � 2 � L 2 ( µ t ) if v d µ t = 0 . with µ t = G ( ψ t ) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µ t : Brunn-Minkowski’s inequality implies µ t ≥ (1 − t ) d µ 0 . 19 - 7

  82. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t a) Control of the eigengap : � D G ( ψ t ) v | v � ≤ − C ( X ) � v � 2 � L 2 ( µ t ) if v d µ t = 0 . with µ t = G ( ψ t ) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µ t : Brunn-Minkowski’s inequality implies µ t ≥ (1 − t ) d µ 0 . Combining a) and b) we get � ψ 1 − ψ 0 � 2 L 2 ( µ 0 ) � |� µ 1 − µ 0 | ψ 1 − ψ 0 �| 19 - 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend