Convergence rates for discretized optimal transport Quentin M - PowerPoint PPT Presentation

Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . 6 - 2

Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . 6 - 3

Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . � W 2 ,ρ ( µ, ν ) := � T µ − T ν � L 2 ( ρ ) − → [Ambrosio, Gigli, Savar´ e ’04] Riemannian geometry Optimal transport µ ∈ Prob 2 ( R d ) point x ∈ M geodesic distance d g ( x, y ) W 2 ( µ, ν ) T ρ Prob 2 ( R d ) ⊆ L 2 ( ρ, X ) tangent space T ρ M exp − 1 inverse exponential map ρ ( x ) ∈ T ρ M T µ ∈ T ρ Prob 2 ( X ) � exp − 1 ρ ( x ) − exp − 1 distance in tangent space ρ ( y ) � g ( x 0 ) � T µ − T ν � L 2 ( ρ ) 6 - 4

Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . � W 2 ,ρ ( µ, ν ) := � T µ − T ν � L 2 ( ρ ) − → [Ambrosio, Gigli, Savar´ e ’04] Riemannian geometry Optimal transport µ ∈ Prob 2 ( R d ) point x ∈ M geodesic distance d g ( x, y ) W 2 ( µ, ν ) T ρ Prob 2 ( R d ) ⊆ L 2 ( ρ, X ) tangent space T ρ M exp − 1 inverse exponential map ρ ( x ) ∈ T ρ M T µ ∈ T ρ Prob 2 ( X ) � exp − 1 ρ ( x ) − exp − 1 distance in tangent space ρ ( y ) � g ( x 0 ) � T µ − T ν � L 2 ( ρ ) � Used in image analysis − → [Wang, Slepcev, Basu, Ozolek, Rohde ’13] 6 - 5

Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . � W 2 ,ρ ( µ, ν ) := � T µ − T ν � L 2 ( ρ ) − → [Ambrosio, Gigli, Savar´ e ’04] Riemannian geometry Optimal transport µ ∈ Prob 2 ( R d ) point x ∈ M geodesic distance d g ( x, y ) W 2 ( µ, ν ) T ρ Prob 2 ( R d ) ⊆ L 2 ( ρ, X ) tangent space T ρ M exp − 1 inverse exponential map ρ ( x ) ∈ T ρ M T µ ∈ T ρ Prob 2 ( X ) � exp − 1 ρ ( x ) − exp − 1 distance in tangent space ρ ( y ) � g ( x 0 ) � T µ − T ν � L 2 ( ρ ) � Used in image analysis − → [Wang, Slepcev, Basu, Ozolek, Rohde ’13] → Representing family of probability measures by family of functions in L 2 ( ρ ) . − 6 - 6

Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . 7 - 1

Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. 7 - 2

Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) 7 - 3

Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) (0 . 8 T µ 1 + 0 . 2 T µ 0 ) # ρ 7 - 4

Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) What amount of the Wasserstein geometry is preserved by the embedding µ �→ T µ ? 7 - 8

Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] 8 - 1

Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) 8 - 2

Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). 8 - 3

Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). It is know that T µ k converges to T µ but convergence rates are unknown in general... 8 - 4

Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). It is know that T µ k converges to T µ but convergence rates are unknown in general... In general, the numerical analysis for optimal transport is virtually inexistent, whatever the discretization method. 8 - 5

Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). It is know that T µ k converges to T µ but convergence rates are unknown in general... In general, the numerical analysis for optimal transport is virtually inexistent, whatever the discretization method. 8 - 6

2. Continuity of µ �→ T µ . 9

Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . 10 - 1

Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . 10 - 2

Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ 10 - 3

Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. 10 - 4

Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. ◮ The map µ �→ T µ is not better than 1 2 -H¨ older. 10 - 5

Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. ◮ The map µ �→ T µ is not better than 1 2 -H¨ older. δ xθ + δ xθ + π Take ρ = 1 π Leb B(0 , 1) on R 2 , and define µ θ = , with x θ = (cos( θ ) , sin( θ )) . 2 � x θ � x θ | x � ≥ 0 Then T µ θ ( x ) = , x θ + π if not T µ θ T µ θ x θ x θ + π 10 - 6

Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. ◮ The map µ �→ T µ is not better than 1 2 -H¨ older. δ xθ + δ xθ + π Take ρ = 1 π Leb B(0 , 1) on R 2 , and define µ θ = , with x θ = (cos( θ ) , sin( θ )) . 2 � x θ � x θ | x � ≥ 0 so that � T µ θ − T µ θ + δ � 2 L 2 ( ρ ) ≥ Cδ Then T µ θ ( x ) = , x θ + π if not Since on the other hand, W 2 ( µ θ , µ θ + δ ) ≤ Cδ , x θ � T µ θ − T µ θ + δ � L 2 ( ρ ) ≥ C W 2 ( µ θ , µ θ + δ ) 1 / 2 x θ + π 10 - 7 δ

Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . 11 - 1

Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. 11 - 2

Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. 11 - 3

Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) 11 - 4

Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . 11 - 5

Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . ◮ Prop = ⇒ Thm: Kantorovich-Rubinstein theorem 11 - 6

Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � 11 - 7

Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � � (convexity: ψ ν ( y ) − ψ ν ( x ) ≥ � y − x |∇ ψ ν ( x ) � ) ≥ �∇ ψ µ − ∇ ψ ν |∇ ψ ν ( ∇ φ ν ) � d ρ 11 - 8

Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � � (convexity: ψ ν ( y ) − ψ ν ( x ) ≥ � y − x |∇ ψ ν ( x ) � ) ≥ �∇ ψ µ − ∇ ψ ν |∇ ψ ν ( ∇ φ ν ) � d ρ � = �∇ ψ µ − ∇ ψ ν | id � d ρ 11 - 9

Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � � (convexity: ψ ν ( y ) − ψ ν ( x ) ≥ � y − x |∇ ψ ν ( x ) � ) ≥ �∇ ψ µ − ∇ ψ ν |∇ ψ ν ( ∇ φ ν ) � d ρ � = �∇ ψ µ − ∇ ψ ν | id � d ρ �∇ ψ ν − ∇ ψ µ | id � d ρ + L � � ψ µ d( ν − µ ) ≥ 2 �∇ φ µ − ∇ φ ν � L 2 ( ρ ) � ⇒ ψ µ = φ ∗ ( T µ = ∇ φ µ L -Lipschitz ⇐ µ is L -strongly convex) 11 - 10

Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 12 - 1

Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 L 2 ( ρ ) ≤ C W 1 ( µ, ν ) α with α = 1 Corollary: � T µ − T ν � 2 2 d − 1 ( d +2) 12 - 2

Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 L 2 ( ρ ) ≤ C W 1 ( µ, ν ) α with α = 1 Corollary: � T µ − T ν � 2 2 d − 1 ( d +2) ◮ The H¨ older exponent is terrible, but inequality holds without assumptions on µ, ν ! 12 - 3

Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 L 2 ( ρ ) ≤ C W 1 ( µ, ν ) α with α = 1 Corollary: � T µ − T ν � 2 2 d − 1 ( d +2) ◮ The H¨ older exponent is terrible, but inequality holds without assumptions on µ, ν ! ◮ Proof of Berman’s theorem relies on techniques from complex geometry. 12 - 4

2. Global, dimension-independent, older-continuity of µ �→ T µ . H¨ 13

Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . 14 - 1

Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 14 - 2

Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 1 5 < 1 ◮ Gap between lower-bound and upper bound for H¨ older exponent: 2 . The exponent 1 5 is certainly not optimal... 14 - 3

Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 1 5 < 1 ◮ Gap between lower-bound and upper bound for H¨ older exponent: 2 . The exponent 1 5 is certainly not optimal... ◮ The constant C depend polynomially on diam( X ) , diam( Y ) . 14 - 4

Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 1 5 < 1 ◮ Gap between lower-bound and upper bound for H¨ older exponent: 2 . The exponent 1 5 is certainly not optimal... ◮ The constant C depend polynomially on diam( X ) , diam( Y ) . ◮ Proof relies on the semidiscrete setting, i.e. the bound is established in the case µ = � i µ i δ y i , ν = � i ν i δ y i . and one concludes using a density argument. 14 - 5

Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) 15 - 1

Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ 15 - 2

Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) 15 - 3

Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) ◮ Let µ = � 1 ≤ i ≤ N µ i δ y i and ψ i = ψ ( y i ) . y 1 y 2 y 3 15 - 4

Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) ◮ Let µ = � Then, ψ ∗ | V i ( ψ ) := �·| y i � − ψ i where 1 ≤ i ≤ N µ i δ y i and ψ i = ψ ( y i ) . V i ( ψ ) = { x | ∀ j, � x | y i � − ψ i ≥ � x | y j � − ψ j } V 1 ( ψ ) V 2 ( ψ ) y 1 y 2 y 3 V 3 ( ψ ) 15 - 5

Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) ◮ Let µ = � Then, ψ ∗ | V i ( ψ ) := �·| y i � − ψ i where 1 ≤ i ≤ N µ i δ y i and ψ i = ψ ( y i ) . V i ( ψ ) = { x | ∀ j, � x | y i � − ψ i ≥ � x | y j � − ψ j } V 1 ( ψ ) V 2 ( ψ ) y 1 y 2 y 3 V 3 ( ψ ) � Thus, T ( ρ, µ ) = min ψ ∈ R N � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) + � i µ i ψ i i 15 - 6

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i 16 - 1

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . 16 - 2

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i 16 - 3

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ 16 - 4

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i 16 - 5

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types 16 - 6

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − 16 - 7

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . 16 - 8

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . − → ρ ( V i ) = amount of customers for product y i . 16 - 9

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . − → ρ ( V i ) = amount of customers for product y i . Optimal transport = finding prices satisfying capacity constraints ρ ( V i ( ψ )) = µ i . 16 - 10

Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . − → ρ ( V i ) = amount of customers for product y i . Optimal transport = finding prices satisfying capacity constraints ρ ( V i ( ψ )) = µ i . ◮ Algorithm (Oliker–Prussner): coordinate-wise increment. Complexity: O ( N 3 ) . 16 - 11

Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i y 5 y 2 Γ 15 ( ψ ) y 1 y 2 y 4 17 - 1

Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . y 5 y 2 Γ 15 ( ψ ) y 1 y 2 y 4 17 - 2

Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) y 1 y 2 y 4 17 - 3

Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) ◮ If Ω is connected and ψ ∈ E , then H is connected y 1 y 2 y 4 17 - 4

Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) ◮ If Ω is connected and ψ ∈ E , then H is connected y 1 ◮ L is the Laplacian of a connected graph = ⇒ Ker L = R · cst y 2 y 4 17 - 5

Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) ◮ If Ω is connected and ψ ∈ E , then H is connected y 1 ◮ L is the Laplacian of a connected graph = ⇒ Ker L = R · cst y 2 Corollary: Global convergence of a damped Newton algorithm. y 4 [Kitagawa, M., Thibert 16] 17 - 6

Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 18 - 1

Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 ψ 1 = Newt( ψ 0 ) NB: The points do not move. 18 - 2

Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 ψ 1 = Newt( ψ 0 ) ψ 2 = Newt( ψ 1 ) NB: The points do not move. 18 - 3

Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 ψ 1 = Newt( ψ 0 ) ψ 2 = Newt( ψ 1 ) NB: The points do not move. Convergence is very fast when spt( ρ ) convex: 17 Newton iterations for N ≥ 10 7 in 3D. 18 - 4

Proof ingredients 19 - 1

Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . 19 - 2

Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . 19 - 3

Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, 19 - 4

Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t 19 - 5

Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t a) Control of the eigengap : � D G ( ψ t ) v | v � ≤ − C ( X ) � v � 2 � L 2 ( µ t ) if v d µ t = 0 . with µ t = G ( ψ t ) − → [Eymard, Gallou¨ et, Herbin ’00]. 19 - 6

Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t a) Control of the eigengap : � D G ( ψ t ) v | v � ≤ − C ( X ) � v � 2 � L 2 ( µ t ) if v d µ t = 0 . with µ t = G ( ψ t ) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µ t : Brunn-Minkowski’s inequality implies µ t ≥ (1 − t ) d µ 0 . 19 - 7

Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t a) Control of the eigengap : � D G ( ψ t ) v | v � ≤ − C ( X ) � v � 2 � L 2 ( µ t ) if v d µ t = 0 . with µ t = G ( ψ t ) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µ t : Brunn-Minkowski’s inequality implies µ t ≥ (1 − t ) d µ 0 . Combining a) and b) we get � ψ 1 − ψ 0 � 2 L 2 ( µ 0 ) � |� µ 1 − µ 0 | ψ 1 − ψ 0 �| 19 - 8

Convergence rates for discretized optimal transport Quentin M - PowerPoint PPT Presentation

Convergence rates for discretized optimal transport Quentin M erigot Universit e Paris-Sud 11 Based on joint work with F. Chazal and A. Delalande Workshop on numerical solutions of HJB equations, Paris, January 2020 1 1. Motivations 2

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

Periodic Orbits of Discretized Rotations Shigeki Akiyama, Univ. of Tsukuba 11 December 2012,

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Clearance Rates Office of Research and Data Analysis Clearance Rates Clearance rates are the

Advanced Macroeconomics 7. Exchange Rates, Interest Rates and Expectations Karl Whelan School of

Poisson Approximation for Two Scan Statistics with Rates of Convergence Xiao Fang (Joint work

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

II of large Number Lattin in probability almost convergence convergence sure - - "

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

Quasi Real-time Data Analytics for Free Electron Lasers March 21 st 2018 OSG AHM Amedeo Perazzo

Lexical Normalization for Neural Network Parsing Rob van der Goot, Gertjan van Noord University

Alex Suciu Northeastern University Topology Seminar Brandeis University March 29, 2016 A LEX S

Interoperability driven integration of biomedical data sources Douglas TEODORO a,1 , Rmy CHOQUET

Lecture 9: Prediction markets, fair games and martingales.. David Aldous March 2, 2016 The

On character varieties of 3-manifold groups Misha Kapovich June 22-23, 2015 A character-buildier

Tools Google, et. al. Also, Ive tended to focus on Windows-based tools (with some mention

Corporate Ecosystem Services Review URS Case Study December 17, 2012 CASE STUDY QUESTIONS

Convergence rates for discretized optimal transport Quentin M - PowerPoint PPT Presentation

Convergence rates for discretized optimal transport Quentin M erigot Universit e Paris-Sud 11 Based on joint work with F. Chazal and A. Delalande Workshop on numerical solutions of HJB equations, Paris, January 2020 1 1. Motivations 2

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

Periodic Orbits of Discretized Rotations Shigeki Akiyama, Univ. of Tsukuba 11 December 2012,

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Clearance Rates Office of Research and Data Analysis Clearance Rates Clearance rates are the

Advanced Macroeconomics 7. Exchange Rates, Interest Rates and Expectations Karl Whelan School of

Poisson Approximation for Two Scan Statistics with Rates of Convergence Xiao Fang (Joint work

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

II of large Number Lattin in probability almost convergence convergence sure - - &quot;

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

Quasi Real-time Data Analytics for Free Electron Lasers March 21 st 2018 OSG AHM Amedeo Perazzo

Lexical Normalization for Neural Network Parsing Rob van der Goot, Gertjan van Noord University

Alex Suciu Northeastern University Topology Seminar Brandeis University March 29, 2016 A LEX S

Interoperability driven integration of biomedical data sources Douglas TEODORO a,1 , Rmy CHOQUET

Lecture 9: Prediction markets, fair games and martingales.. David Aldous March 2, 2016 The

On character varieties of 3-manifold groups Misha Kapovich June 22-23, 2015 A character-buildier

Tools Google, et. al. Also, Ive tended to focus on Windows-based tools (with some mention

Corporate Ecosystem Services Review URS Case Study December 17, 2012 CASE STUDY QUESTIONS

II of large Number Lattin in probability almost convergence convergence sure - - "