1
Convergence rates for discretized optimal transport
Based on joint work with F. Chazal and A. Delalande
Workshop on numerical solutions of HJB equations, Paris, January 2020
Convergence rates for discretized optimal transport Quentin M - - PowerPoint PPT Presentation
Convergence rates for discretized optimal transport Quentin M erigot Universit e Paris-Sud 11 Based on joint work with F. Chazal and A. Delalande Workshop on numerical solutions of HJB equations, Paris, January 2020 1 1. Motivations 2
1
Workshop on numerical solutions of HJB equations, Paris, January 2020
2
3 - 1
◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1
µ (B)) = µ(B)
⇐ ⇒ ∀x ∈ R, λ([0, T −1
µ (x)]) = µ((−∞, x])
3 - 2
◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1
µ (B)) = µ(B)
⇐ ⇒ ∀x ∈ R, λ([0, T −1
µ (x)]) = µ((−∞, x])
◮ Tµ is the inverse cdf, also called quantile function.
3 - 3
◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1
µ (B)) = µ(B)
⇐ ⇒ ∀x ∈ R, λ([0, T −1
µ (x)]) = µ((−∞, x])
◮ Tµ is the inverse cdf, also called quantile function. How to extend this notion to a multivariate setting ?
3 - 4
◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1
µ (B)) = µ(B)
⇐ ⇒ ∀x ∈ R, λ([0, T −1
µ (x)]) = µ((−∞, x])
◮ Tµ is the inverse cdf, also called quantile function. How to extend this notion to a multivariate setting ? Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex.
3 - 5
◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1
µ (B)) = µ(B)
⇐ ⇒ ∀x ∈ R, λ([0, T −1
µ (x)]) = µ((−∞, x])
◮ Tµ is the inverse cdf, also called quantile function. How to extend this notion to a multivariate setting ? Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. ◮ Monge-Kantorovich quantile := Tµ. Need of a reference probability density ρ.
[Cherzonukov, Galichon, Hallin, Henry, ’15]
3 - 6
◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1
µ (B)) = µ(B)
⇐ ⇒ ∀x ∈ R, λ([0, T −1
µ (x)]) = µ((−∞, x])
◮ Tµ is the inverse cdf, also called quantile function. How to extend this notion to a multivariate setting ? Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. ◮ Monge-Kantorovich quantile := Tµ. Need of a reference probability density ρ. ◮ Tµ is unique ρ-a.e. but the convex function φµ is not necessarily unique.
[Cherzonukov, Galichon, Hallin, Henry, ’15]
3 - 7
◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1
µ (B)) = µ(B)
⇐ ⇒ ∀x ∈ R, λ([0, T −1
µ (x)]) = µ((−∞, x])
◮ Tµ is the inverse cdf, also called quantile function. How to extend this notion to a multivariate setting ? Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. ◮ Monge-Kantorovich quantile := Tµ. Need of a reference probability density ρ. ◮ Tµ is unique ρ-a.e. but the convex function φµ is not necessarily unique.
[Cherzonukov, Galichon, Hallin, Henry, ’15]
◮ Tµ : spt(ρ) → Rd is monotone: Tµ(x) − Tµ(y)|x − y ≥ 0.
4 - 1
Source: ρ = uniform probability density on B(0, 1) ⊆ R2 Target: µ = 1
N
”Monge-Kantorovich depth of yi” ≃ T −1
µ (yi).
[Cherzonukov, Galichon, Hallin, Henry]
4 - 2
Source: ρ = uniform probability density on B(0, 1) ⊆ R2 Target: µ = 1
N
”Monge-Kantorovich depth of yi” ≃ T −1
µ (yi).
[Cherzonukov, Galichon, Hallin, Henry]
isovalues of MK depth
5 - 1
◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |
p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =
1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd).
5 - 2
◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |
p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =
1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd). ◮ On Prob(X), with X ⊆ Rd compact, Wp metrizes narrow convergence i.e. limn→+∞ Wp(µn, µ) = 0 ⇐ ⇒ ∀φ ∈ C0(X), limn→+∞
5 - 3
◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |
p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =
1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd). ◮ On Prob(X), with X ⊆ Rd compact, Wp metrizes narrow convergence i.e. limn→+∞ Wp(µn, µ) = 0 ⇐ ⇒ ∀φ ∈ C0(X), limn→+∞
◮ On Prob(R), any monotone coupling γ between µ, ν is optimal in the def of Wp.
5 - 4
◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |
p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =
1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd). ◮ On Prob(X), with X ⊆ Rd compact, Wp metrizes narrow convergence i.e. limn→+∞ Wp(µn, µ) = 0 ⇐ ⇒ ∀φ ∈ C0(X), limn→+∞
◮ On Prob(R), any monotone coupling γ between µ, ν is optimal in the def of Wp. For instance γ := (Tµ, Tν)#ρ with ρ = Lebesgue on [0, 1] is monotone, implying Wp(µ, ν) =
5 - 5
◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |
p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =
1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd). ◮ On Prob(X), with X ⊆ Rd compact, Wp metrizes narrow convergence i.e. limn→+∞ Wp(µn, µ) = 0 ⇐ ⇒ ∀φ ∈ C0(X), limn→+∞
◮ On Prob(R), any monotone coupling γ between µ, ν is optimal in the def of Wp. For instance γ := (Tµ, Tν)#ρ with ρ = Lebesgue on [0, 1] is monotone, implying Wp(µ, ν) =
In particular, (Probp(R), Wp) embeds isometrically in Lp([0, 1]) !
5 - 6
◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |
p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =
1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd). ◮ On Prob(X), with X ⊆ Rd compact, Wp metrizes narrow convergence i.e. limn→+∞ Wp(µn, µ) = 0 ⇐ ⇒ ∀φ ∈ C0(X), limn→+∞
◮ On Prob(R), any monotone coupling γ between µ, ν is optimal in the def of Wp. For instance γ := (Tµ, Tν)#ρ with ρ = Lebesgue on [0, 1] is monotone, implying Wp(µ, ν) =
In particular, (Probp(R), Wp) embeds isometrically in Lp([0, 1]) ! The previous embedding is false in higher dimension: (Probp, Wp) is curved.
6 - 1
6 - 2
◮ We fix a reference measure, ρ = LebX with X ⊆ Rd convex compact with |X| = 1. Given µ ∈ Prob2(Rd), we define Tµ as the unique map satisfying (i) Tµ = ∇φµ a.e. for some convex function φµ : X → R and (ii) Tµ#ρ = µ.
6 - 3
◮ We fix a reference measure, ρ = LebX with X ⊆ Rd convex compact with |X| = 1. Given µ ∈ Prob2(Rd), we define Tµ as the unique map satisfying (i) Tµ = ∇φµ a.e. for some convex function φµ : X → R and (ii) Tµ#ρ = µ. ◮ The map µ ∈ Prob2(Rd) → Tµ ∈ L2(X) is an injective map, with image the space of (square-integrable) gradients of convex functions on X.
6 - 4
◮ We fix a reference measure, ρ = LebX with X ⊆ Rd convex compact with |X| = 1. Given µ ∈ Prob2(Rd), we define Tµ as the unique map satisfying (i) Tµ = ∇φµ a.e. for some convex function φµ : X → R and (ii) Tµ#ρ = µ. ◮ The map µ ∈ Prob2(Rd) → Tµ ∈ L2(X) is an injective map, with image
W2,ρ(µ, ν) := Tµ − TνL2(ρ) −
→ [Ambrosio, Gigli, Savar´ e ’04] the space of (square-integrable) gradients of convex functions on X.
Riemannian geometry Optimal transport point x ∈ M µ ∈ Prob2(Rd) geodesic distance dg(x, y) W2(µ, ν) tangent space TρM TρProb2(Rd) ⊆ L2(ρ, X) inverse exponential map exp−1
ρ (x) ∈ TρM
Tµ ∈ TρProb2(X) distance in tangent space exp−1
ρ (x) − exp−1 ρ (y)g(x0)
Tµ − TνL2(ρ)
6 - 5
◮ We fix a reference measure, ρ = LebX with X ⊆ Rd convex compact with |X| = 1. Given µ ∈ Prob2(Rd), we define Tµ as the unique map satisfying (i) Tµ = ∇φµ a.e. for some convex function φµ : X → R and (ii) Tµ#ρ = µ. ◮ The map µ ∈ Prob2(Rd) → Tµ ∈ L2(X) is an injective map, with image
W2,ρ(µ, ν) := Tµ − TνL2(ρ) −
→ [Ambrosio, Gigli, Savar´ e ’04]
Used in image analysis −
→ [Wang, Slepcev, Basu, Ozolek, Rohde ’13] the space of (square-integrable) gradients of convex functions on X.
Riemannian geometry Optimal transport point x ∈ M µ ∈ Prob2(Rd) geodesic distance dg(x, y) W2(µ, ν) tangent space TρM TρProb2(Rd) ⊆ L2(ρ, X) inverse exponential map exp−1
ρ (x) ∈ TρM
Tµ ∈ TρProb2(X) distance in tangent space exp−1
ρ (x) − exp−1 ρ (y)g(x0)
Tµ − TνL2(ρ)
6 - 6
◮ We fix a reference measure, ρ = LebX with X ⊆ Rd convex compact with |X| = 1. Given µ ∈ Prob2(Rd), we define Tµ as the unique map satisfying (i) Tµ = ∇φµ a.e. for some convex function φµ : X → R and (ii) Tµ#ρ = µ. ◮ The map µ ∈ Prob2(Rd) → Tµ ∈ L2(X) is an injective map, with image
W2,ρ(µ, ν) := Tµ − TνL2(ρ) −
→ [Ambrosio, Gigli, Savar´ e ’04]
Used in image analysis −
→ [Wang, Slepcev, Basu, Ozolek, Rohde ’13] − → Representing family of probability measures by family of functions in L2(ρ). the space of (square-integrable) gradients of convex functions on X.
Riemannian geometry Optimal transport point x ∈ M µ ∈ Prob2(Rd) geodesic distance dg(x, y) W2(µ, ν) tangent space TρM TρProb2(Rd) ⊆ L2(ρ, X) inverse exponential map exp−1
ρ (x) ∈ TρM
Tµ ∈ TρProb2(X) distance in tangent space exp−1
ρ (x) − exp−1 ρ (y)g(x0)
Tµ − TνL2(ρ)
7 - 1
◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k
2(µ, µi).
7 - 2
◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k
2(µ, µi).
− → Need to solve an optimisation problem every time the coefficients αi are changed.
7 - 3
◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k
2(µ, µi).
− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=
− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1)
7 - 4
◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k
2(µ, µi).
− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=
− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1) (0.8Tµ1 + 0.2Tµ0)#ρ
7 - 5
◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k
2(µ, µi).
− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=
− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1)
7 - 6
◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k
2(µ, µi).
− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=
− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1)
7 - 7
◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k
2(µ, µi).
− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=
− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1)
7 - 8
◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k
2(µ, µi).
− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=
− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1) What amount of the Wasserstein geometry is preserved by the embedding µ → Tµ?
8 - 1
Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =
i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)
where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k]
8 - 2
Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =
i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)
where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k] (Then, Wp(µk, µ) 1
k.)
8 - 3
Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =
i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)
where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k] ◮ Compute exactly the optimal transport plan Tµk between ρ and µk, (using a semi-discrete optimal transport solver). (Then, Wp(µk, µ) 1
k.)
8 - 4
Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. It is know that Tµk converges to Tµ but convergence rates are unknown in general... To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =
i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)
where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k] ◮ Compute exactly the optimal transport plan Tµk between ρ and µk, (using a semi-discrete optimal transport solver). (Then, Wp(µk, µ) 1
k.)
8 - 5
Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. It is know that Tµk converges to Tµ but convergence rates are unknown in general... To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =
i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)
where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k] ◮ Compute exactly the optimal transport plan Tµk between ρ and µk, (using a semi-discrete optimal transport solver). (Then, Wp(µk, µ) 1
k.)
In general, the numerical analysis for optimal transport is virtually inexistent, whatever the discretization method.
8 - 6
Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. It is know that Tµk converges to Tµ but convergence rates are unknown in general... To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =
i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)
where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k] ◮ Compute exactly the optimal transport plan Tµk between ρ and µk, (using a semi-discrete optimal transport solver). (Then, Wp(µk, µ) 1
k.)
In general, the numerical analysis for optimal transport is virtually inexistent, whatever the discretization method.
9
10 - 1
◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν).
10 - 2
◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν).
10 - 3
◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν). Thus, W2
2(µ, ν) ≤
10 - 4
◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν). Thus, W2
2(µ, ν) ≤
◮ The map µ → Tµ is continuous.
10 - 5
◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν). Thus, W2
2(µ, ν) ≤
◮ The map µ → Tµ is not better than 1
2-H¨
◮ The map µ → Tµ is continuous.
10 - 6
◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν). Thus, W2
2(µ, ν) ≤
◮ The map µ → Tµ is not better than 1
2-H¨
Take ρ = 1
πLebB(0,1) on R2, and define µθ = δxθ +δxθ+π 2
, with xθ = (cos(θ), sin(θ)). Then Tµθ(x) =
xθ|x ≥ 0 xθ+π if not , xθ xθ+π Tµθ Tµθ ◮ The map µ → Tµ is continuous.
10 - 7
◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν). Thus, W2
2(µ, ν) ≤
◮ The map µ → Tµ is not better than 1
2-H¨
Take ρ = 1
πLebB(0,1) on R2, and define µθ = δxθ +δxθ+π 2
, with xθ = (cos(θ), sin(θ)). Then Tµθ(x) =
xθ|x ≥ 0 xθ+π if not , so that Tµθ − Tµθ+δ2
L2(ρ) ≥ Cδ
xθ xθ+π δ Since on the other hand, W2(µθ, µθ+δ) ≤ Cδ, Tµθ − Tµθ+δL2(ρ) ≥ C W2(µθ, µθ+δ)1/2 ◮ The map µ → Tµ is continuous.
11 - 1
Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact If Tµ is L-Lipschitz, then Tµ − Tν2
2 ≤ C W1(µ, ν) with C = 4L diam(X).
11 - 2
Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact If Tµ is L-Lipschitz, then Tµ − Tν2
2 ≤ C W1(µ, ν) with C = 4L diam(X).
◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18].
11 - 3
Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact If Tµ is L-Lipschitz, then Tµ − Tν2
2 ≤ C W1(µ, ν) with C = 4L diam(X).
◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.
11 - 4
Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x) If Tµ is L-Lipschitz, then Tµ − Tν2
2 ≤ C W1(µ, ν) with C = 4L diam(X).
◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.
11 - 5
Prop: If Tµ is L-Lipschitz, then Tµ − Tν2
L2(ρ) ≤ −2L
Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x) If Tµ is L-Lipschitz, then Tµ − Tν2
2 ≤ C W1(µ, ν) with C = 4L diam(X).
◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.
11 - 6
Prop: If Tµ is L-Lipschitz, then Tµ − Tν2
L2(ρ) ≤ −2L
Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x) If Tµ is L-Lipschitz, then Tµ − Tν2
2 ≤ C W1(µ, ν) with C = 4L diam(X).
◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Prop= ⇒ Thm: Kantorovich-Rubinstein theorem
11 - 7
Prop: If Tµ is L-Lipschitz, then Tµ − Tν2
L2(ρ) ≤ −2L
Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x)
If Tµ is L-Lipschitz, then Tµ − Tν2
2 ≤ C W1(µ, ν) with C = 4L diam(X).
◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.
11 - 8
Prop: If Tµ is L-Lipschitz, then Tµ − Tν2
L2(ρ) ≤ −2L
Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x)
(convexity: ψν(y) − ψν(x) ≥ y − x|∇ψν(x)) ≥
If Tµ is L-Lipschitz, then Tµ − Tν2
2 ≤ C W1(µ, ν) with C = 4L diam(X).
◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.
11 - 9
Prop: If Tµ is L-Lipschitz, then Tµ − Tν2
L2(ρ) ≤ −2L
Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x)
(convexity: ψν(y) − ψν(x) ≥ y − x|∇ψν(x)) ≥
=
If Tµ is L-Lipschitz, then Tµ − Tν2
2 ≤ C W1(µ, ν) with C = 4L diam(X).
◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.
11 - 10
Prop: If Tµ is L-Lipschitz, then Tµ − Tν2
L2(ρ) ≤ −2L
Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x) (Tµ = ∇φµ L-Lipschitz ⇐ ⇒ ψµ = φ∗
µ is L-strongly convex)
(convexity: ψν(y) − ψν(x) ≥ y − x|∇ψν(x)) ≥
=
If Tµ is L-Lipschitz, then Tµ − Tν2
2 ≤ C W1(µ, ν) with C = 4L diam(X).
◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.
2 ∇φµ − ∇φνL2(ρ)
12 - 1
Thm (Berman, ’18): Let ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y compact. Then, ∇ψµ − ∇ψν2
L2(Y ) ≤ C W1(µ, ν)α with α = 1 2d−1
12 - 2
Thm (Berman, ’18): Let ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y compact. Then, ∇ψµ − ∇ψν2
L2(Y ) ≤ C W1(µ, ν)α with α = 1 2d−1
Corollary: Tµ − Tν2
L2(ρ) ≤ C W1(µ, ν)α with α = 1 2d−1(d+2)
12 - 3
Thm (Berman, ’18): Let ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y compact. Then, ∇ψµ − ∇ψν2
L2(Y ) ≤ C W1(µ, ν)α with α = 1 2d−1
Corollary: Tµ − Tν2
L2(ρ) ≤ C W1(µ, ν)α with α = 1 2d−1(d+2)
◮ The H¨
12 - 4
Thm (Berman, ’18): Let ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y compact. Then, ∇ψµ − ∇ψν2
L2(Y ) ≤ C W1(µ, ν)α with α = 1 2d−1
Corollary: Tµ − Tν2
L2(ρ) ≤ C W1(µ, ν)α with α = 1 2d−1(d+2)
◮ The H¨
◮ Proof of Berman’s theorem relies on techniques from complex geometry.
13
14 - 1
Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.
14 - 2
Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5. ◮ First global and dimension-independent stability result for optimal transport maps.
14 - 3
Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5. ◮ Gap between lower-bound and upper bound for H¨
1 5 < 1 2.
The exponent 1
5 is certainly not optimal...
◮ First global and dimension-independent stability result for optimal transport maps.
14 - 4
Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5. ◮ Gap between lower-bound and upper bound for H¨
1 5 < 1 2.
◮ The constant C depend polynomially on diam(X), diam(Y ). The exponent 1
5 is certainly not optimal...
◮ First global and dimension-independent stability result for optimal transport maps.
14 - 5
Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5. ◮ Gap between lower-bound and upper bound for H¨
1 5 < 1 2.
◮ Proof relies on the semidiscrete setting, i.e. the bound is established in the case µ =
i µiδyi, ν = i νiδyi.
and one concludes using a density argument. ◮ The constant C depend polynomially on diam(X), diam(Y ). The exponent 1
5 is certainly not optimal...
◮ First global and dimension-independent stability result for optimal transport maps.
15 - 1
◮ Let ρ, ν ∈ Probac
1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,
T (ρ, µ) = maxγ∈Γ(ρ,µ)
15 - 2
= minφ⊕ψ≥·|·
◮ Let ρ, ν ∈ Probac
1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,
T (ρ, µ) = maxγ∈Γ(ρ,µ)
Kantorovich duality
15 - 3
= minφ⊕ψ≥·|·
◮ Let ρ, ν ∈ Probac
1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,
T (ρ, µ) = maxγ∈Γ(ρ,µ)
= minψ
Legendre-Fenchel transform: ψ∗(x) = maxyx|y − ψ(y) Kantorovich duality
15 - 4
= minφ⊕ψ≥·|·
◮ Let µ =
1≤i≤N µiδyi and ψi = ψ(yi).
◮ Let ρ, ν ∈ Probac
1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,
T (ρ, µ) = maxγ∈Γ(ρ,µ)
= minψ
Legendre-Fenchel transform: y1 y2 y3 ψ∗(x) = maxyx|y − ψ(y) Kantorovich duality
15 - 5
= minφ⊕ψ≥·|·
◮ Let µ =
1≤i≤N µiδyi and ψi = ψ(yi).
◮ Let ρ, ν ∈ Probac
1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,
T (ρ, µ) = maxγ∈Γ(ρ,µ)
= minψ
Legendre-Fenchel transform: Then, ψ∗|Vi(ψ) := ·|yi − ψi where Vi(ψ) = {x | ∀j, x|yi − ψi ≥ x|yj − ψj} y1 y2 y3 V1(ψ) V3(ψ) V2(ψ) ψ∗(x) = maxyx|y − ψ(y) Kantorovich duality
15 - 6
= minφ⊕ψ≥·|·
◮ Let µ =
1≤i≤N µiδyi and ψi = ψ(yi).
◮ Let ρ, ν ∈ Probac
1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,
T (ρ, µ) = maxγ∈Γ(ρ,µ)
= minψ
Legendre-Fenchel transform: Then, ψ∗|Vi(ψ) := ·|yi − ψi where Vi(ψ) = {x | ∀j, x|yi − ψi ≥ x|yj − ψj} y1 y2 y3 V1(ψ) V3(ψ) V2(ψ) ψ∗(x) = maxyx|y − ψ(y) Thus, T (ρ, µ) = minψ∈RN
i
i µiψi
Kantorovich duality
16 - 1
Φ(ψ) :=
i
T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
16 - 2
Φ(ψ) :=
i
◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
16 - 3
Φ(ψ) :=
i
◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
16 - 4
Φ(ψ) :=
i
◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN
16 - 5
Φ(ψ) :=
i
◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ⇐ ⇒ T = ∇ψ∗ transports ρ onto
i µiδyi
16 - 6
Φ(ψ) :=
i
◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types ⇐ ⇒ T = ∇ψ∗ transports ρ onto
i µiδyi
16 - 7
Φ(ψ) :=
i
◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types − → given prices ψ ∈ RN, a customer x maximizes x|yi − ψi over all products. ⇐ ⇒ T = ∇ψ∗ transports ρ onto
i µiδyi
16 - 8
Φ(ψ) :=
i
◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types − → given prices ψ ∈ RN, a customer x maximizes x|yi − ψi over all products. − → Vi(ψ) = {x | i ∈ arg maxjx|yj − ψj} = customers choosing product yi. ⇐ ⇒ T = ∇ψ∗ transports ρ onto
i µiδyi
16 - 9
Φ(ψ) :=
i
◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types − → given prices ψ ∈ RN, a customer x maximizes x|yi − ψi over all products. − → Vi(ψ) = {x | i ∈ arg maxjx|yj − ψj} = customers choosing product yi. − → ρ(Vi) = amount of customers for product yi. ⇐ ⇒ T = ∇ψ∗ transports ρ onto
i µiδyi
16 - 10
Φ(ψ) :=
i
◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types − → given prices ψ ∈ RN, a customer x maximizes x|yi − ψi over all products. − → Vi(ψ) = {x | i ∈ arg maxjx|yj − ψj} = customers choosing product yi. − → ρ(Vi) = amount of customers for product yi. Optimal transport = finding prices satisfying capacity constraints ρ(Vi(ψ)) = µi. ⇐ ⇒ T = ∇ψ∗ transports ρ onto
i µiδyi
16 - 11
Φ(ψ) :=
i
◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −
i µiψi, where:
⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types − → given prices ψ ∈ RN, a customer x maximizes x|yi − ψi over all products. − → Vi(ψ) = {x | i ∈ arg maxjx|yj − ψj} = customers choosing product yi. − → ρ(Vi) = amount of customers for product yi. Optimal transport = finding prices satisfying capacity constraints ρ(Vi(ψ)) = µi. ⇐ ⇒ T = ∇ψ∗ transports ρ onto
i µiδyi
◮ Algorithm (Oliker–Prussner): coordinate-wise increment. Complexity: O(N 3).
17 - 1
Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,
∂Gi ∂ψj (ψ) = 1 yi−yj
∀i,
∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)
y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN))
17 - 2
Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,
∂Gi ∂ψj (ψ) = 1 yi−yj
∀i,
∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)
◮ If Ω = {ρ > 0} is connected and ψ ∈ E, then KerD2Φ(ψ) = R(1, . . . , 1). Let E = {ψ ∈ RN | ∀i, Gi(ψ) > 0} y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN))
17 - 3
Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,
∂Gi ∂ψj (ψ) = 1 yi−yj
∀i,
∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)
◮ If Ω = {ρ > 0} is connected and ψ ∈ E, then KerD2Φ(ψ) = R(1, . . . , 1). Let E = {ψ ∈ RN | ∀i, Gi(ψ) > 0} y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN)) (i, j) ∈ H ⇐ ⇒ Lij > 0 ◮ Consider the matrix L = DG(ψ) and the graph H:
17 - 4
Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,
∂Gi ∂ψj (ψ) = 1 yi−yj
∀i,
∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)
◮ If Ω = {ρ > 0} is connected and ψ ∈ E, then KerD2Φ(ψ) = R(1, . . . , 1). Let E = {ψ ∈ RN | ∀i, Gi(ψ) > 0} y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN)) (i, j) ∈ H ⇐ ⇒ Lij > 0 ◮ Consider the matrix L = DG(ψ) and the graph H: ◮ If Ω is connected and ψ ∈ E, then H is connected
17 - 5
Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,
∂Gi ∂ψj (ψ) = 1 yi−yj
∀i,
∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)
◮ If Ω = {ρ > 0} is connected and ψ ∈ E, then KerD2Φ(ψ) = R(1, . . . , 1). Let E = {ψ ∈ RN | ∀i, Gi(ψ) > 0} y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN)) (i, j) ∈ H ⇐ ⇒ Lij > 0 ◮ Consider the matrix L = DG(ψ) and the graph H: ◮ If Ω is connected and ψ ∈ E, then H is connected ◮ L is the Laplacian of a connected graph = ⇒ KerL = R · cst
17 - 6
Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,
∂Gi ∂ψj (ψ) = 1 yi−yj
∀i,
∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)
◮ If Ω = {ρ > 0} is connected and ψ ∈ E, then KerD2Φ(ψ) = R(1, . . . , 1). Let E = {ψ ∈ RN | ∀i, Gi(ψ) > 0} y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN)) (i, j) ∈ H ⇐ ⇒ Lij > 0 ◮ Consider the matrix L = DG(ψ) and the graph H: ◮ If Ω is connected and ψ ∈ E, then H is connected ◮ L is the Laplacian of a connected graph = ⇒ KerL = R · cst Corollary: Global convergence of a damped Newton algorithm.
[Kitagawa, M., Thibert 16]
18 - 1
ψ0 = 1
2 · 2
Source: ρ = uniform on [0, 1]2, Target: µ = 1
N
3]2
18 - 2
ψ0 = 1
2 · 2
Source: ρ = uniform on [0, 1]2, Target: µ = 1
N
3]2
ψ1 = Newt(ψ0) NB: The points do not move.
18 - 3
ψ0 = 1
2 · 2
Source: ρ = uniform on [0, 1]2, Target: µ = 1
N
3]2
ψ1 = Newt(ψ0) ψ2 = Newt(ψ1) NB: The points do not move.
18 - 4
ψ0 = 1
2 · 2
Source: ρ = uniform on [0, 1]2, Target: µ = 1
N
3]2
ψ1 = Newt(ψ0) ψ2 = Newt(ψ1) Convergence is very fast when spt(ρ) convex: 17 Newton iterations for N ≥ 107 in 3D. NB: The points do not move.
19 - 1
19 - 2
Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.
19 - 3
◮ Strategy of proof: let µk =
i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.
Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.
19 - 4
◮ Strategy of proof: let µk =
i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.
Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.
19 - 5
◮ Strategy of proof: let µk =
i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.
Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1
0 DG(ψt)v|v d t
Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.
19 - 6
◮ Strategy of proof: let µk =
i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.
Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1
0 DG(ψt)v|v d t
a) Control of the eigengap: DG(ψt)v|v ≤ −C(X)v2
L2(µt) if
with µt = G(ψt) − → [Eymard, Gallou¨ et, Herbin ’00]. Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.
19 - 7
◮ Strategy of proof: let µk =
i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.
Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1
0 DG(ψt)v|v d t
a) Control of the eigengap: DG(ψt)v|v ≤ −C(X)v2
L2(µt) if
with µt = G(ψt) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µt: Brunn-Minkowski’s inequality implies µt ≥ (1 − t)dµ0. Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.
19 - 8
◮ Strategy of proof: let µk =
i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.
Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1
0 DG(ψt)v|v d t
a) Control of the eigengap: DG(ψt)v|v ≤ −C(X)v2
L2(µt) if
with µt = G(ψt) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µt: Brunn-Minkowski’s inequality implies µt ≥ (1 − t)dµ0. Combining a) and b) we get ψ1 − ψ02
L2(µ0) |µ1 − µ0|ψ1 − ψ0|
Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.
19 - 9
◮ Strategy of proof: let µk =
i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.
Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1
0 DG(ψt)v|v d t
a) Control of the eigengap: DG(ψt)v|v ≤ −C(X)v2
L2(µt) if
with µt = G(ψt) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µt: Brunn-Minkowski’s inequality implies µt ≥ (1 − t)dµ0. Then, by Kantorovich-Rubinstein, Combining a) and b) we get ψ1 − ψ02
L2(µ0) |µ1 − µ0|ψ1 − ψ0|
≤ Lip(ψ1 − ψ0) W1(µ0, µ1) Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.
19 - 10
◮ Strategy of proof: let µk =
i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.
Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1
0 DG(ψt)v|v d t
a) Control of the eigengap: DG(ψt)v|v ≤ −C(X)v2
L2(µt) if
with µt = G(ψt) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µt: Brunn-Minkowski’s inequality implies µt ≥ (1 − t)dµ0. Then, by Kantorovich-Rubinstein, ◮ We lose a little in the exponent to control the difference between OT maps... Combining a) and b) we get ψ1 − ψ02
L2(µ0) |µ1 − µ0|ψ1 − ψ0|
≤ Lip(ψ1 − ψ0) W1(µ0, µ1) Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5. W2(µ0, µ1)
20
21 - 1
MNIST has M = 60 000 images grayscale images (64 × 64 pixels) representing digits.
21 - 2
Each image αℓ ∈ M64(R) is transformed into a probability measure on [0, 1]2 via µℓ =
1
ij
i,jδxi,xj,
with xi =
i 63
MNIST has M = 60 000 images grayscale images (64 × 64 pixels) representing digits.
21 - 3
Each image αℓ ∈ M64(R) is transformed into a probability measure on [0, 1]2 via µℓ =
1
ij
i,jδxi,xj,
with xi =
i 63
MNIST has M = 60 000 images grayscale images (64 × 64 pixels) representing digits. T ℓ = Tµℓ ∈ L2([0, 1], R2) [OT map from ρ = Leb[0,1]2 to µℓ]
21 - 4
Each image αℓ ∈ M64(R) is transformed into a probability measure on [0, 1]2 via µℓ =
1
ij
i,jδxi,xj,
with xi =
i 63
We run the K-Means method on the transport plans, with K = 20. MNIST has M = 60 000 images grayscale images (64 × 64 pixels) representing digits. T ℓ = Tµℓ ∈ L2([0, 1], R2) [OT map from ρ = Leb[0,1]2 to µℓ] Each cluster Xk ⊆ {0, . . . , M} yields an average transport plan Sk =
1 |Xk|
21 - 5
Each image αℓ ∈ M64(R) is transformed into a probability measure on [0, 1]2 via µℓ =
1
ij
i,jδxi,xj,
with xi =
i 63
We run the K-Means method on the transport plans, with K = 20. MNIST has M = 60 000 images grayscale images (64 × 64 pixels) representing digits. T ℓ = Tµℓ ∈ L2([0, 1], R2) [OT map from ρ = Leb[0,1]2 to µℓ] Each cluster Xk ⊆ {0, . . . , M} yields an average transport plan Sk =
1 |Xk|
and Sk
#ρ is the ”reconstructed measure”.
S3
#ρ
22 - 1
Optimal transport can be used to embed of Prob(Rd) into L2(ρ, Rd), with possible applications in data analysis. Computations can be easily performed using https://github.com/sd-ot
22 - 2
Optimal transport can be used to embed of Prob(Rd) into L2(ρ, Rd), with possible applications in data analysis. Computations can be easily performed using https://github.com/sd-ot The analysis of this approach relies on the stability theory for µ → Tµ, both with respect to W2, which has many open questions.
22 - 3
Optimal transport can be used to embed of Prob(Rd) into L2(ρ, Rd), with possible applications in data analysis. Computations can be easily performed using
https://github.com/sd-ot The analysis of this approach relies on the stability theory for µ → Tµ, both with respect to W2, which has many open questions.