Convergence rates for discretized optimal transport Quentin M - - PowerPoint PPT Presentation

convergence rates for discretized optimal transport
SMART_READER_LITE
LIVE PREVIEW

Convergence rates for discretized optimal transport Quentin M - - PowerPoint PPT Presentation

Convergence rates for discretized optimal transport Quentin M erigot Universit e Paris-Sud 11 Based on joint work with F. Chazal and A. Delalande Workshop on numerical solutions of HJB equations, Paris, January 2020 1 1. Motivations 2


slide-1
SLIDE 1

1

Convergence rates for discretized optimal transport

Based on joint work with F. Chazal and A. Delalande

Workshop on numerical solutions of HJB equations, Paris, January 2020

Quentin M´ erigot Universit´ e Paris-Sud 11

slide-2
SLIDE 2

2

  • 1. Motivations
slide-3
SLIDE 3

3 - 1

Motivation 1: Monge-Kantorovich Quantiles

◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1

µ (B)) = µ(B)

⇐ ⇒ ∀x ∈ R, λ([0, T −1

µ (x)]) = µ((−∞, x])

slide-4
SLIDE 4

3 - 2

Motivation 1: Monge-Kantorovich Quantiles

◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1

µ (B)) = µ(B)

⇐ ⇒ ∀x ∈ R, λ([0, T −1

µ (x)]) = µ((−∞, x])

◮ Tµ is the inverse cdf, also called quantile function.

slide-5
SLIDE 5

3 - 3

Motivation 1: Monge-Kantorovich Quantiles

◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1

µ (B)) = µ(B)

⇐ ⇒ ∀x ∈ R, λ([0, T −1

µ (x)]) = µ((−∞, x])

◮ Tµ is the inverse cdf, also called quantile function. How to extend this notion to a multivariate setting ?

slide-6
SLIDE 6

3 - 4

Motivation 1: Monge-Kantorovich Quantiles

◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1

µ (B)) = µ(B)

⇐ ⇒ ∀x ∈ R, λ([0, T −1

µ (x)]) = µ((−∞, x])

◮ Tµ is the inverse cdf, also called quantile function. How to extend this notion to a multivariate setting ? Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex.

slide-7
SLIDE 7

3 - 5

Motivation 1: Monge-Kantorovich Quantiles

◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1

µ (B)) = µ(B)

⇐ ⇒ ∀x ∈ R, λ([0, T −1

µ (x)]) = µ((−∞, x])

◮ Tµ is the inverse cdf, also called quantile function. How to extend this notion to a multivariate setting ? Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. ◮ Monge-Kantorovich quantile := Tµ. Need of a reference probability density ρ.

[Cherzonukov, Galichon, Hallin, Henry, ’15]

slide-8
SLIDE 8

3 - 6

Motivation 1: Monge-Kantorovich Quantiles

◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1

µ (B)) = µ(B)

⇐ ⇒ ∀x ∈ R, λ([0, T −1

µ (x)]) = µ((−∞, x])

◮ Tµ is the inverse cdf, also called quantile function. How to extend this notion to a multivariate setting ? Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. ◮ Monge-Kantorovich quantile := Tµ. Need of a reference probability density ρ. ◮ Tµ is unique ρ-a.e. but the convex function φµ is not necessarily unique.

[Cherzonukov, Galichon, Hallin, Henry, ’15]

slide-9
SLIDE 9

3 - 7

Motivation 1: Monge-Kantorovich Quantiles

◮ Given µ ∈ Prob(R), there exists a unique nondecreasing Tµ ∈ L1([0, 1]) satisfying Tµ#ρ = µ, with ρ = Lebesgue measure on [0, 1]. NB: Tµ#λ = µ ⇐ ⇒ ∀B ⊆ R, λ(T −1

µ (B)) = µ(B)

⇐ ⇒ ∀x ∈ R, λ([0, T −1

µ (x)]) = µ((−∞, x])

◮ Tµ is the inverse cdf, also called quantile function. How to extend this notion to a multivariate setting ? Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. ◮ Monge-Kantorovich quantile := Tµ. Need of a reference probability density ρ. ◮ Tµ is unique ρ-a.e. but the convex function φµ is not necessarily unique.

[Cherzonukov, Galichon, Hallin, Henry, ’15]

◮ Tµ : spt(ρ) → Rd is monotone: Tµ(x) − Tµ(y)|x − y ≥ 0.

slide-10
SLIDE 10

4 - 1

Numerical Example: Monge-Kantorovich Depth

Source: ρ = uniform probability density on B(0, 1) ⊆ R2 Target: µ = 1

N

  • 1≤i≤N δyi with N = 104 points

”Monge-Kantorovich depth of yi” ≃ T −1

µ (yi).

[Cherzonukov, Galichon, Hallin, Henry]

slide-11
SLIDE 11

4 - 2

Numerical Example: Monge-Kantorovich Depth

Source: ρ = uniform probability density on B(0, 1) ⊆ R2 Target: µ = 1

N

  • 1≤i≤N δyi with N = 104 points

”Monge-Kantorovich depth of yi” ≃ T −1

µ (yi).

[Cherzonukov, Galichon, Hallin, Henry]

isovalues of MK depth

slide-12
SLIDE 12

5 - 1

Wasserstein space

◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |

  • xp d µ < +∞}.

p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =

  • minγ∈Γ(µ,ν) x − yp d γ(x, y)

1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd).

slide-13
SLIDE 13

5 - 2

Wasserstein space

◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |

  • xp d µ < +∞}.

p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =

  • minγ∈Γ(µ,ν) x − yp d γ(x, y)

1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd). ◮ On Prob(X), with X ⊆ Rd compact, Wp metrizes narrow convergence i.e. limn→+∞ Wp(µn, µ) = 0 ⇐ ⇒ ∀φ ∈ C0(X), limn→+∞

  • φ d µn =
  • φ d µ.
slide-14
SLIDE 14

5 - 3

Wasserstein space

◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |

  • xp d µ < +∞}.

p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =

  • minγ∈Γ(µ,ν) x − yp d γ(x, y)

1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd). ◮ On Prob(X), with X ⊆ Rd compact, Wp metrizes narrow convergence i.e. limn→+∞ Wp(µn, µ) = 0 ⇐ ⇒ ∀φ ∈ C0(X), limn→+∞

  • φ d µn =
  • φ d µ.

◮ On Prob(R), any monotone coupling γ between µ, ν is optimal in the def of Wp.

slide-15
SLIDE 15

5 - 4

Wasserstein space

◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |

  • xp d µ < +∞}.

p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =

  • minγ∈Γ(µ,ν) x − yp d γ(x, y)

1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd). ◮ On Prob(X), with X ⊆ Rd compact, Wp metrizes narrow convergence i.e. limn→+∞ Wp(µn, µ) = 0 ⇐ ⇒ ∀φ ∈ C0(X), limn→+∞

  • φ d µn =
  • φ d µ.

◮ On Prob(R), any monotone coupling γ between µ, ν is optimal in the def of Wp. For instance γ := (Tµ, Tν)#ρ with ρ = Lebesgue on [0, 1] is monotone, implying Wp(µ, ν) =

  • [0,1] Tµ(t) − Tν(t)p d t
  • = Tµ − TνLp([0,1])
slide-16
SLIDE 16

5 - 5

Wasserstein space

◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |

  • xp d µ < +∞}.

p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =

  • minγ∈Γ(µ,ν) x − yp d γ(x, y)

1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd). ◮ On Prob(X), with X ⊆ Rd compact, Wp metrizes narrow convergence i.e. limn→+∞ Wp(µn, µ) = 0 ⇐ ⇒ ∀φ ∈ C0(X), limn→+∞

  • φ d µn =
  • φ d µ.

◮ On Prob(R), any monotone coupling γ between µ, ν is optimal in the def of Wp. For instance γ := (Tµ, Tν)#ρ with ρ = Lebesgue on [0, 1] is monotone, implying Wp(µ, ν) =

  • [0,1] Tµ(t) − Tν(t)p d t
  • = Tµ − TνLp([0,1])

In particular, (Probp(R), Wp) embeds isometrically in Lp([0, 1]) !

slide-17
SLIDE 17

5 - 6

Wasserstein space

◮ Let Probp(Rd) = {µ ∈ Prob(Rd) |

  • xp d µ < +∞}.

p-Wasserstein distance between µ, ν ∈ Probp(Rd): Wp(µ, ν) =

  • minγ∈Γ(µ,ν) x − yp d γ(x, y)

1/p . where Γ(µ, ν) = couplings between µ and ν ⊆ Prob(Rd × Rd). ◮ On Prob(X), with X ⊆ Rd compact, Wp metrizes narrow convergence i.e. limn→+∞ Wp(µn, µ) = 0 ⇐ ⇒ ∀φ ∈ C0(X), limn→+∞

  • φ d µn =
  • φ d µ.

◮ On Prob(R), any monotone coupling γ between µ, ν is optimal in the def of Wp. For instance γ := (Tµ, Tν)#ρ with ρ = Lebesgue on [0, 1] is monotone, implying Wp(µ, ν) =

  • [0,1] Tµ(t) − Tν(t)p d t
  • = Tµ − TνLp([0,1])

In particular, (Probp(R), Wp) embeds isometrically in Lp([0, 1]) ! The previous embedding is false in higher dimension: (Probp, Wp) is curved.

slide-18
SLIDE 18

6 - 1

Motivation 2: ”Linearization” of W2

slide-19
SLIDE 19

6 - 2

Motivation 2: ”Linearization” of W2

◮ We fix a reference measure, ρ = LebX with X ⊆ Rd convex compact with |X| = 1. Given µ ∈ Prob2(Rd), we define Tµ as the unique map satisfying (i) Tµ = ∇φµ a.e. for some convex function φµ : X → R and (ii) Tµ#ρ = µ.

slide-20
SLIDE 20

6 - 3

Motivation 2: ”Linearization” of W2

◮ We fix a reference measure, ρ = LebX with X ⊆ Rd convex compact with |X| = 1. Given µ ∈ Prob2(Rd), we define Tµ as the unique map satisfying (i) Tµ = ∇φµ a.e. for some convex function φµ : X → R and (ii) Tµ#ρ = µ. ◮ The map µ ∈ Prob2(Rd) → Tµ ∈ L2(X) is an injective map, with image the space of (square-integrable) gradients of convex functions on X.

slide-21
SLIDE 21

6 - 4

Motivation 2: ”Linearization” of W2

◮ We fix a reference measure, ρ = LebX with X ⊆ Rd convex compact with |X| = 1. Given µ ∈ Prob2(Rd), we define Tµ as the unique map satisfying (i) Tµ = ∇φµ a.e. for some convex function φµ : X → R and (ii) Tµ#ρ = µ. ◮ The map µ ∈ Prob2(Rd) → Tµ ∈ L2(X) is an injective map, with image

W2,ρ(µ, ν) := Tµ − TνL2(ρ) −

→ [Ambrosio, Gigli, Savar´ e ’04] the space of (square-integrable) gradients of convex functions on X.

Riemannian geometry Optimal transport point x ∈ M µ ∈ Prob2(Rd) geodesic distance dg(x, y) W2(µ, ν) tangent space TρM TρProb2(Rd) ⊆ L2(ρ, X) inverse exponential map exp−1

ρ (x) ∈ TρM

Tµ ∈ TρProb2(X) distance in tangent space exp−1

ρ (x) − exp−1 ρ (y)g(x0)

Tµ − TνL2(ρ)

slide-22
SLIDE 22

6 - 5

Motivation 2: ”Linearization” of W2

◮ We fix a reference measure, ρ = LebX with X ⊆ Rd convex compact with |X| = 1. Given µ ∈ Prob2(Rd), we define Tµ as the unique map satisfying (i) Tµ = ∇φµ a.e. for some convex function φµ : X → R and (ii) Tµ#ρ = µ. ◮ The map µ ∈ Prob2(Rd) → Tµ ∈ L2(X) is an injective map, with image

W2,ρ(µ, ν) := Tµ − TνL2(ρ) −

→ [Ambrosio, Gigli, Savar´ e ’04]

Used in image analysis −

→ [Wang, Slepcev, Basu, Ozolek, Rohde ’13] the space of (square-integrable) gradients of convex functions on X.

Riemannian geometry Optimal transport point x ∈ M µ ∈ Prob2(Rd) geodesic distance dg(x, y) W2(µ, ν) tangent space TρM TρProb2(Rd) ⊆ L2(ρ, X) inverse exponential map exp−1

ρ (x) ∈ TρM

Tµ ∈ TρProb2(X) distance in tangent space exp−1

ρ (x) − exp−1 ρ (y)g(x0)

Tµ − TνL2(ρ)

slide-23
SLIDE 23

6 - 6

Motivation 2: ”Linearization” of W2

◮ We fix a reference measure, ρ = LebX with X ⊆ Rd convex compact with |X| = 1. Given µ ∈ Prob2(Rd), we define Tµ as the unique map satisfying (i) Tµ = ∇φµ a.e. for some convex function φµ : X → R and (ii) Tµ#ρ = µ. ◮ The map µ ∈ Prob2(Rd) → Tµ ∈ L2(X) is an injective map, with image

W2,ρ(µ, ν) := Tµ − TνL2(ρ) −

→ [Ambrosio, Gigli, Savar´ e ’04]

Used in image analysis −

→ [Wang, Slepcev, Basu, Ozolek, Rohde ’13] − → Representing family of probability measures by family of functions in L2(ρ). the space of (square-integrable) gradients of convex functions on X.

Riemannian geometry Optimal transport point x ∈ M µ ∈ Prob2(Rd) geodesic distance dg(x, y) W2(µ, ν) tangent space TρM TρProb2(Rd) ⊆ L2(ρ, X) inverse exponential map exp−1

ρ (x) ∈ TρM

Tµ ∈ TρProb2(X) distance in tangent space exp−1

ρ (x) − exp−1 ρ (y)g(x0)

Tµ − TνL2(ρ)

slide-24
SLIDE 24

7 - 1

Example: barycenter computation

◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k

  • 1≤i≤k αi W2

2(µ, µi).

slide-25
SLIDE 25

7 - 2

Example: barycenter computation

◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k

  • 1≤i≤k αi W2

2(µ, µi).

− → Need to solve an optimisation problem every time the coefficients αi are changed.

slide-26
SLIDE 26

7 - 3

Example: barycenter computation

◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k

  • 1≤i≤k αi W2

2(µ, µi).

− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=

  • 1
  • i αi
  • i αiTµi
  • # ρ.

− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1)

slide-27
SLIDE 27

7 - 4

Example: barycenter computation

◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k

  • 1≤i≤k αi W2

2(µ, µi).

− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=

  • 1
  • i αi
  • i αiTµi
  • # ρ.

− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1) (0.8Tµ1 + 0.2Tµ0)#ρ

slide-28
SLIDE 28

7 - 5

Example: barycenter computation

◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k

  • 1≤i≤k αi W2

2(µ, µi).

− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=

  • 1
  • i αi
  • i αiTµi
  • # ρ.

− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1)

slide-29
SLIDE 29

7 - 6

Example: barycenter computation

◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k

  • 1≤i≤k αi W2

2(µ, µi).

− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=

  • 1
  • i αi
  • i αiTµi
  • # ρ.

− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1)

slide-30
SLIDE 30

7 - 7

Example: barycenter computation

◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k

  • 1≤i≤k αi W2

2(µ, µi).

− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=

  • 1
  • i αi
  • i αiTµi
  • # ρ.

− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1)

slide-31
SLIDE 31

7 - 8

Example: barycenter computation

◮ Barycenter in Wasserstein space: µ1, . . . , µk ∈ Prob2(Rd), α1, . . . , αk ≥ 0: µ := arg min1≤i≤k

  • 1≤i≤k αi W2

2(µ, µi).

− → Need to solve an optimisation problem every time the coefficients αi are changed. ◮ ”Linearized” Wasserstein barycenters: µ :=

  • 1
  • i αi
  • i αiTµi
  • # ρ.

− → Simple expression once the transport maps Tµi : ρ → µi have been computed. spt(µ0) spt(µ1) What amount of the Wasserstein geometry is preserved by the embedding µ → Tµ?

slide-32
SLIDE 32

8 - 1

Motivation 3: numerical analysis of optimal transport

Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =

i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)

where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k]

slide-33
SLIDE 33

8 - 2

Motivation 3: numerical analysis of optimal transport

Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =

i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)

where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k] (Then, Wp(µk, µ) 1

k.)

slide-34
SLIDE 34

8 - 3

Motivation 3: numerical analysis of optimal transport

Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =

i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)

where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k] ◮ Compute exactly the optimal transport plan Tµk between ρ and µk, (using a semi-discrete optimal transport solver). (Then, Wp(µk, µ) 1

k.)

slide-35
SLIDE 35

8 - 4

Motivation 3: numerical analysis of optimal transport

Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. It is know that Tµk converges to Tµ but convergence rates are unknown in general... To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =

i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)

where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k] ◮ Compute exactly the optimal transport plan Tµk between ρ and µk, (using a semi-discrete optimal transport solver). (Then, Wp(µk, µ) 1

k.)

slide-36
SLIDE 36

8 - 5

Motivation 3: numerical analysis of optimal transport

Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. It is know that Tµk converges to Tµ but convergence rates are unknown in general... To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =

i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)

where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k] ◮ Compute exactly the optimal transport plan Tµk between ρ and µk, (using a semi-discrete optimal transport solver). (Then, Wp(µk, µ) 1

k.)

In general, the numerical analysis for optimal transport is virtually inexistent, whatever the discretization method.

slide-37
SLIDE 37

8 - 6

Motivation 3: numerical analysis of optimal transport

Theorem (Brenier, McCann) Given ρ ∈ Probac(Rd) and µ ∈ Prob(Rd), ∃! ρ-a.e. Tµ : Rd → Rd such that Tµ#ρ = µ and Tµ = ∇φ with φ convex. It is know that Tµk converges to Tµ but convergence rates are unknown in general... To solve numerically an OT problem between ρ ∈ Probac(Rd) and µ ∈ Prob([0, 1]d): ◮ Approximate µ by a discrete measure, for instance µk =

i1≤...≤ik µ(Bi1,...,ik)δ(i1/k,...,ik/k)

where Bi1,...,ik is the cube [(i1 − 1)/k, i1/k] × . . . [(id − 1)/k, id/k] ◮ Compute exactly the optimal transport plan Tµk between ρ and µk, (using a semi-discrete optimal transport solver). (Then, Wp(µk, µ) 1

k.)

In general, the numerical analysis for optimal transport is virtually inexistent, whatever the discretization method.

slide-38
SLIDE 38

9

  • 2. Continuity of µ → Tµ.
slide-39
SLIDE 39

10 - 1

Elementary remarks

◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν).

slide-40
SLIDE 40

10 - 2

Elementary remarks

◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν).

slide-41
SLIDE 41

10 - 3

Elementary remarks

◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν). Thus, W2

2(µ, ν) ≤

  • x − y2 d γ(x, y) =
  • Tµ(x) − Tν(x)2 d ρ(x).
slide-42
SLIDE 42

10 - 4

Elementary remarks

◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν). Thus, W2

2(µ, ν) ≤

  • x − y2 d γ(x, y) =
  • Tµ(x) − Tν(x)2 d ρ(x).

◮ The map µ → Tµ is continuous.

slide-43
SLIDE 43

10 - 5

Elementary remarks

◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν). Thus, W2

2(µ, ν) ≤

  • x − y2 d γ(x, y) =
  • Tµ(x) − Tν(x)2 d ρ(x).

◮ The map µ → Tµ is not better than 1

2-H¨

  • lder.

◮ The map µ → Tµ is continuous.

slide-44
SLIDE 44

10 - 6

Elementary remarks

◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν). Thus, W2

2(µ, ν) ≤

  • x − y2 d γ(x, y) =
  • Tµ(x) − Tν(x)2 d ρ(x).

◮ The map µ → Tµ is not better than 1

2-H¨

  • lder.

Take ρ = 1

πLebB(0,1) on R2, and define µθ = δxθ +δxθ+π 2

, with xθ = (cos(θ), sin(θ)). Then Tµθ(x) =

xθ|x ≥ 0 xθ+π if not , xθ xθ+π Tµθ Tµθ ◮ The map µ → Tµ is continuous.

slide-45
SLIDE 45

10 - 7

Elementary remarks

◮ The map µ → Tµ is reverse-Lipschitz, i.e. Tµ − TνL2(ρ) ≥ W2(µ, ν). Indeed: since Tµ#ρ = µ and Tν#ρ = ν, one has γ := (Tµ, Tν)#ρ ∈ Γ(µ, ν). Thus, W2

2(µ, ν) ≤

  • x − y2 d γ(x, y) =
  • Tµ(x) − Tν(x)2 d ρ(x).

◮ The map µ → Tµ is not better than 1

2-H¨

  • lder.

Take ρ = 1

πLebB(0,1) on R2, and define µθ = δxθ +δxθ+π 2

, with xθ = (cos(θ), sin(θ)). Then Tµθ(x) =

xθ|x ≥ 0 xθ+π if not , so that Tµθ − Tµθ+δ2

L2(ρ) ≥ Cδ

xθ xθ+π δ Since on the other hand, W2(µθ, µθ+δ) ≤ Cδ, Tµθ − Tµθ+δL2(ρ) ≥ C W2(µθ, µθ+δ)1/2 ◮ The map µ → Tµ is continuous.

slide-46
SLIDE 46

11 - 1

Local 1

2-H¨

  • lder continuity

Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact If Tµ is L-Lipschitz, then Tµ − Tν2

2 ≤ C W1(µ, ν) with C = 4L diam(X).

slide-47
SLIDE 47

11 - 2

Local 1

2-H¨

  • lder continuity

Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact If Tµ is L-Lipschitz, then Tµ − Tν2

2 ≤ C W1(µ, ν) with C = 4L diam(X).

◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18].

slide-48
SLIDE 48

11 - 3

Local 1

2-H¨

  • lder continuity

Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact If Tµ is L-Lipschitz, then Tµ − Tν2

2 ≤ C W1(µ, ν) with C = 4L diam(X).

◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.

slide-49
SLIDE 49

11 - 4

Local 1

2-H¨

  • lder continuity

Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x) If Tµ is L-Lipschitz, then Tµ − Tν2

2 ≤ C W1(µ, ν) with C = 4L diam(X).

◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.

slide-50
SLIDE 50

11 - 5

Local 1

2-H¨

  • lder continuity

Prop: If Tµ is L-Lipschitz, then Tµ − Tν2

L2(ρ) ≤ −2L

  • (ψµ − ψν) d(µ − ν).

Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x) If Tµ is L-Lipschitz, then Tµ − Tν2

2 ≤ C W1(µ, ν) with C = 4L diam(X).

◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.

slide-51
SLIDE 51

11 - 6

Local 1

2-H¨

  • lder continuity

Prop: If Tµ is L-Lipschitz, then Tµ − Tν2

L2(ρ) ≤ −2L

  • (ψµ − ψν) d(µ − ν).

Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x) If Tµ is L-Lipschitz, then Tµ − Tν2

2 ≤ C W1(µ, ν) with C = 4L diam(X).

◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Prop= ⇒ Thm: Kantorovich-Rubinstein theorem

slide-52
SLIDE 52

11 - 7

Local 1

2-H¨

  • lder continuity

Prop: If Tµ is L-Lipschitz, then Tµ − Tν2

L2(ρ) ≤ −2L

  • (ψµ − ψν) d(µ − ν).

Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x)

  • ψν d(µ − ν) =
  • ψν d(∇φµ#ρ − ∇φν#ρ) =
  • ψν(∇φµ) − ψν(∇φν) d ρ

If Tµ is L-Lipschitz, then Tµ − Tν2

2 ≤ C W1(µ, ν) with C = 4L diam(X).

◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.

slide-53
SLIDE 53

11 - 8

Local 1

2-H¨

  • lder continuity

Prop: If Tµ is L-Lipschitz, then Tµ − Tν2

L2(ρ) ≤ −2L

  • (ψµ − ψν) d(µ − ν).

Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x)

  • ψν d(µ − ν) =
  • ψν d(∇φµ#ρ − ∇φν#ρ) =
  • ψν(∇φµ) − ψν(∇φν) d ρ

(convexity: ψν(y) − ψν(x) ≥ y − x|∇ψν(x)) ≥

  • ∇ψµ − ∇ψν|∇ψν(∇φν) d ρ

If Tµ is L-Lipschitz, then Tµ − Tν2

2 ≤ C W1(µ, ν) with C = 4L diam(X).

◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.

slide-54
SLIDE 54

11 - 9

Local 1

2-H¨

  • lder continuity

Prop: If Tµ is L-Lipschitz, then Tµ − Tν2

L2(ρ) ≤ −2L

  • (ψµ − ψν) d(µ − ν).

Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x)

  • ψν d(µ − ν) =
  • ψν d(∇φµ#ρ − ∇φν#ρ) =
  • ψν(∇φµ) − ψν(∇φν) d ρ

(convexity: ψν(y) − ψν(x) ≥ y − x|∇ψν(x)) ≥

  • ∇ψµ − ∇ψν|∇ψν(∇φν) d ρ

=

  • ∇ψµ − ∇ψν|id d ρ

If Tµ is L-Lipschitz, then Tµ − Tν2

2 ≤ C W1(µ, ν) with C = 4L diam(X).

◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.

slide-55
SLIDE 55

11 - 10

Local 1

2-H¨

  • lder continuity

Prop: If Tµ is L-Lipschitz, then Tµ − Tν2

L2(ρ) ≤ −2L

  • (ψµ − ψν) d(µ − ν).

Thm: Assume ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y ⊆ Rd compact ◮ Let φµ : X → R convex s.t. Tµ = ∇φµ. ψµ : Y → R its Legendre transform: ψµ(y) = maxx∈Xx|y − φµ(x) (Tµ = ∇φµ L-Lipschitz ⇐ ⇒ ψµ = φ∗

µ is L-strongly convex)

  • ψν d(µ − ν) =
  • ψν d(∇φµ#ρ − ∇φν#ρ) =
  • ψν(∇φµ) − ψν(∇φν) d ρ

(convexity: ψν(y) − ψν(x) ≥ y − x|∇ψν(x)) ≥

  • ∇ψµ − ∇ψν|∇ψν(∇φν) d ρ

=

  • ∇ψµ − ∇ψν|id d ρ

If Tµ is L-Lipschitz, then Tµ − Tν2

2 ≤ C W1(µ, ν) with C = 4L diam(X).

◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis.

  • ψµ d(ν − µ) ≥
  • ∇ψν − ∇ψµ|id d ρ + L

2 ∇φµ − ∇φνL2(ρ)

slide-56
SLIDE 56

12 - 1

Global H¨

  • lder continuity

Thm (Berman, ’18): Let ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y compact. Then, ∇ψµ − ∇ψν2

L2(Y ) ≤ C W1(µ, ν)α with α = 1 2d−1

slide-57
SLIDE 57

12 - 2

Global H¨

  • lder continuity

Thm (Berman, ’18): Let ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y compact. Then, ∇ψµ − ∇ψν2

L2(Y ) ≤ C W1(µ, ν)α with α = 1 2d−1

Corollary: Tµ − Tν2

L2(ρ) ≤ C W1(µ, ν)α with α = 1 2d−1(d+2)

slide-58
SLIDE 58

12 - 3

Global H¨

  • lder continuity

Thm (Berman, ’18): Let ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y compact. Then, ∇ψµ − ∇ψν2

L2(Y ) ≤ C W1(µ, ν)α with α = 1 2d−1

Corollary: Tµ − Tν2

L2(ρ) ≤ C W1(µ, ν)α with α = 1 2d−1(d+2)

◮ The H¨

  • lder exponent is terrible, but inequality holds without assumptions on µ, ν!
slide-59
SLIDE 59

12 - 4

Global H¨

  • lder continuity

Thm (Berman, ’18): Let ρ ∈ Probac(X) and µ, ν ∈ Prob(Y ) with X, Y compact. Then, ∇ψµ − ∇ψν2

L2(Y ) ≤ C W1(µ, ν)α with α = 1 2d−1

Corollary: Tµ − Tν2

L2(ρ) ≤ C W1(µ, ν)α with α = 1 2d−1(d+2)

◮ The H¨

  • lder exponent is terrible, but inequality holds without assumptions on µ, ν!

◮ Proof of Berman’s theorem relies on techniques from complex geometry.

slide-60
SLIDE 60

13

  • 2. Global, dimension-independent,

  • lder-continuity of µ → Tµ.
slide-61
SLIDE 61

14 - 1

Main theorem

Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.

slide-62
SLIDE 62

14 - 2

Main theorem

Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5. ◮ First global and dimension-independent stability result for optimal transport maps.

slide-63
SLIDE 63

14 - 3

Main theorem

Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5. ◮ Gap between lower-bound and upper bound for H¨

  • lder exponent:

1 5 < 1 2.

The exponent 1

5 is certainly not optimal...

◮ First global and dimension-independent stability result for optimal transport maps.

slide-64
SLIDE 64

14 - 4

Main theorem

Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5. ◮ Gap between lower-bound and upper bound for H¨

  • lder exponent:

1 5 < 1 2.

◮ The constant C depend polynomially on diam(X), diam(Y ). The exponent 1

5 is certainly not optimal...

◮ First global and dimension-independent stability result for optimal transport maps.

slide-65
SLIDE 65

14 - 5

Main theorem

Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5. ◮ Gap between lower-bound and upper bound for H¨

  • lder exponent:

1 5 < 1 2.

◮ Proof relies on the semidiscrete setting, i.e. the bound is established in the case µ =

i µiδyi, ν = i νiδyi.

and one concludes using a density argument. ◮ The constant C depend polynomially on diam(X), diam(Y ). The exponent 1

5 is certainly not optimal...

◮ First global and dimension-independent stability result for optimal transport maps.

slide-66
SLIDE 66

15 - 1

Semidiscrete OT for c(x, y) = −x|y

◮ Let ρ, ν ∈ Probac

1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,

T (ρ, µ) = maxγ∈Γ(ρ,µ)

  • x|y d γ(x, y)
slide-67
SLIDE 67

15 - 2

Semidiscrete OT for c(x, y) = −x|y

= minφ⊕ψ≥·|·

  • φ d ρ +
  • ψ d µ

◮ Let ρ, ν ∈ Probac

1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,

T (ρ, µ) = maxγ∈Γ(ρ,µ)

  • x|y d γ(x, y)

Kantorovich duality

slide-68
SLIDE 68

15 - 3

Semidiscrete OT for c(x, y) = −x|y

= minφ⊕ψ≥·|·

  • φ d ρ +
  • ψ d µ

◮ Let ρ, ν ∈ Probac

1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,

T (ρ, µ) = maxγ∈Γ(ρ,µ)

  • x|y d γ(x, y)

= minψ

  • ψ∗ d ρ +
  • ψ d µ

Legendre-Fenchel transform: ψ∗(x) = maxyx|y − ψ(y) Kantorovich duality

slide-69
SLIDE 69

15 - 4

Semidiscrete OT for c(x, y) = −x|y

= minφ⊕ψ≥·|·

  • φ d ρ +
  • ψ d µ

◮ Let µ =

1≤i≤N µiδyi and ψi = ψ(yi).

◮ Let ρ, ν ∈ Probac

1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,

T (ρ, µ) = maxγ∈Γ(ρ,µ)

  • x|y d γ(x, y)

= minψ

  • ψ∗ d ρ +
  • ψ d µ

Legendre-Fenchel transform: y1 y2 y3 ψ∗(x) = maxyx|y − ψ(y) Kantorovich duality

slide-70
SLIDE 70

15 - 5

Semidiscrete OT for c(x, y) = −x|y

= minφ⊕ψ≥·|·

  • φ d ρ +
  • ψ d µ

◮ Let µ =

1≤i≤N µiδyi and ψi = ψ(yi).

◮ Let ρ, ν ∈ Probac

1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,

T (ρ, µ) = maxγ∈Γ(ρ,µ)

  • x|y d γ(x, y)

= minψ

  • ψ∗ d ρ +
  • ψ d µ

Legendre-Fenchel transform: Then, ψ∗|Vi(ψ) := ·|yi − ψi where Vi(ψ) = {x | ∀j, x|yi − ψi ≥ x|yj − ψj} y1 y2 y3 V1(ψ) V3(ψ) V2(ψ) ψ∗(x) = maxyx|y − ψ(y) Kantorovich duality

slide-71
SLIDE 71

15 - 6

Semidiscrete OT for c(x, y) = −x|y

= minφ⊕ψ≥·|·

  • φ d ρ +
  • ψ d µ

◮ Let µ =

1≤i≤N µiδyi and ψi = ψ(yi).

◮ Let ρ, ν ∈ Probac

1 (Rd) and Γ(ρ, µ) = couplings between ρ, µ,

T (ρ, µ) = maxγ∈Γ(ρ,µ)

  • x|y d γ(x, y)

= minψ

  • ψ∗ d ρ +
  • ψ d µ

Legendre-Fenchel transform: Then, ψ∗|Vi(ψ) := ·|yi − ψi where Vi(ψ) = {x | ∀j, x|yi − ψi ≥ x|yj − ψj} y1 y2 y3 V1(ψ) V3(ψ) V2(ψ) ψ∗(x) = maxyx|y − ψ(y) Thus, T (ρ, µ) = minψ∈RN

i

  • Vi(ψ)x|yi − ψi d ρ(x) +

i µiψi

Kantorovich duality

slide-72
SLIDE 72

16 - 1

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

slide-73
SLIDE 73

16 - 2

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

slide-74
SLIDE 74

16 - 3

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

slide-75
SLIDE 75

16 - 4

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN

slide-76
SLIDE 76

16 - 5

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ⇐ ⇒ T = ∇ψ∗ transports ρ onto

i µiδyi

slide-77
SLIDE 77

16 - 6

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types ⇐ ⇒ T = ∇ψ∗ transports ρ onto

i µiδyi

slide-78
SLIDE 78

16 - 7

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types − → given prices ψ ∈ RN, a customer x maximizes x|yi − ψi over all products. ⇐ ⇒ T = ∇ψ∗ transports ρ onto

i µiδyi

slide-79
SLIDE 79

16 - 8

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types − → given prices ψ ∈ RN, a customer x maximizes x|yi − ψi over all products. − → Vi(ψ) = {x | i ∈ arg maxjx|yj − ψj} = customers choosing product yi. ⇐ ⇒ T = ∇ψ∗ transports ρ onto

i µiδyi

slide-80
SLIDE 80

16 - 9

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types − → given prices ψ ∈ RN, a customer x maximizes x|yi − ψi over all products. − → Vi(ψ) = {x | i ∈ arg maxjx|yj − ψj} = customers choosing product yi. − → ρ(Vi) = amount of customers for product yi. ⇐ ⇒ T = ∇ψ∗ transports ρ onto

i µiδyi

slide-81
SLIDE 81

16 - 10

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types − → given prices ψ ∈ RN, a customer x maximizes x|yi − ψi over all products. − → Vi(ψ) = {x | i ∈ arg maxjx|yj − ψj} = customers choosing product yi. − → ρ(Vi) = amount of customers for product yi. Optimal transport = finding prices satisfying capacity constraints ρ(Vi(ψ)) = µi. ⇐ ⇒ T = ∇ψ∗ transports ρ onto

i µiδyi

slide-82
SLIDE 82

16 - 11

Optimality condition and economic interpretation

Φ(ψ) :=

i

  • Vi(ψ)x|yi − ψi d ρ(x)

◮ Gradient: ∇Φ(ψ) = −(Gi(ψ))1≤i≤N where Gi(ψ) = ρ(Vi(ψ)). ψ ∈ RN is a minimizer of dual pb ⇐ ⇒ ∀i, ρ(Vi(ψ)) = µi T (ρ, µ) = minψ∈RN Φ(ψ) −

i µiψi, where:

⇐ ⇒ G(ψ) = µ with G = (G1, . . . , GN), µ ∈ RN ◮ Economic interpretation: ρ = density of customers, {yi}1≤i≤N = product types − → given prices ψ ∈ RN, a customer x maximizes x|yi − ψi over all products. − → Vi(ψ) = {x | i ∈ arg maxjx|yj − ψj} = customers choosing product yi. − → ρ(Vi) = amount of customers for product yi. Optimal transport = finding prices satisfying capacity constraints ρ(Vi(ψ)) = µi. ⇐ ⇒ T = ∇ψ∗ transports ρ onto

i µiδyi

◮ Algorithm (Oliker–Prussner): coordinate-wise increment. Complexity: O(N 3).

slide-83
SLIDE 83

17 - 1

Hessian on Φ and Newton’s Algorithm

Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,

∂Gi ∂ψj (ψ) = 1 yi−yj

  • Γij(ψ) ρ(x) d x where Γij = Vi(ψ) ∩ Vj(ψ).

∀i,

∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)

y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN))

slide-84
SLIDE 84

17 - 2

Hessian on Φ and Newton’s Algorithm

Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,

∂Gi ∂ψj (ψ) = 1 yi−yj

  • Γij(ψ) ρ(x) d x where Γij = Vi(ψ) ∩ Vj(ψ).

∀i,

∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)

◮ If Ω = {ρ > 0} is connected and ψ ∈ E, then KerD2Φ(ψ) = R(1, . . . , 1). Let E = {ψ ∈ RN | ∀i, Gi(ψ) > 0} y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN))

slide-85
SLIDE 85

17 - 3

Hessian on Φ and Newton’s Algorithm

Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,

∂Gi ∂ψj (ψ) = 1 yi−yj

  • Γij(ψ) ρ(x) d x where Γij = Vi(ψ) ∩ Vj(ψ).

∀i,

∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)

◮ If Ω = {ρ > 0} is connected and ψ ∈ E, then KerD2Φ(ψ) = R(1, . . . , 1). Let E = {ψ ∈ RN | ∀i, Gi(ψ) > 0} y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN)) (i, j) ∈ H ⇐ ⇒ Lij > 0 ◮ Consider the matrix L = DG(ψ) and the graph H:

slide-86
SLIDE 86

17 - 4

Hessian on Φ and Newton’s Algorithm

Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,

∂Gi ∂ψj (ψ) = 1 yi−yj

  • Γij(ψ) ρ(x) d x where Γij = Vi(ψ) ∩ Vj(ψ).

∀i,

∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)

◮ If Ω = {ρ > 0} is connected and ψ ∈ E, then KerD2Φ(ψ) = R(1, . . . , 1). Let E = {ψ ∈ RN | ∀i, Gi(ψ) > 0} y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN)) (i, j) ∈ H ⇐ ⇒ Lij > 0 ◮ Consider the matrix L = DG(ψ) and the graph H: ◮ If Ω is connected and ψ ∈ E, then H is connected

slide-87
SLIDE 87

17 - 5

Hessian on Φ and Newton’s Algorithm

Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,

∂Gi ∂ψj (ψ) = 1 yi−yj

  • Γij(ψ) ρ(x) d x where Γij = Vi(ψ) ∩ Vj(ψ).

∀i,

∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)

◮ If Ω = {ρ > 0} is connected and ψ ∈ E, then KerD2Φ(ψ) = R(1, . . . , 1). Let E = {ψ ∈ RN | ∀i, Gi(ψ) > 0} y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN)) (i, j) ∈ H ⇐ ⇒ Lij > 0 ◮ Consider the matrix L = DG(ψ) and the graph H: ◮ If Ω is connected and ψ ∈ E, then H is connected ◮ L is the Laplacian of a connected graph = ⇒ KerL = R · cst

slide-88
SLIDE 88

17 - 6

Hessian on Φ and Newton’s Algorithm

Proposition: ◮ If ρ ∈ C0(X) and (yi)1≤i≤N is generic, then Φ ∈ C2(RN) and ∀i = j,

∂Gi ∂ψj (ψ) = 1 yi−yj

  • Γij(ψ) ρ(x) d x where Γij = Vi(ψ) ∩ Vj(ψ).

∀i,

∂Gi ∂ψi (ψ) = − j=i ∂Gi ∂ψj (ψ)

◮ If Ω = {ρ > 0} is connected and ψ ∈ E, then KerD2Φ(ψ) = R(1, . . . , 1). Let E = {ψ ∈ RN | ∀i, Gi(ψ) > 0} y1 y2 y2 y4 y5 Γ15(ψ) (Recall that Gi(ψ) = ρ(Vi(ψ)) and ∇Φ = −(G1, . . . , GN)) (i, j) ∈ H ⇐ ⇒ Lij > 0 ◮ Consider the matrix L = DG(ψ) and the graph H: ◮ If Ω is connected and ψ ∈ E, then H is connected ◮ L is the Laplacian of a connected graph = ⇒ KerL = R · cst Corollary: Global convergence of a damped Newton algorithm.

[Kitagawa, M., Thibert 16]

slide-89
SLIDE 89

18 - 1

Numerical example

ψ0 = 1

2 · 2

Source: ρ = uniform on [0, 1]2, Target: µ = 1

N

  • 1≤i≤N δyi with yi uniform i.i.d. in [0, 1

3]2

slide-90
SLIDE 90

18 - 2

Numerical example

ψ0 = 1

2 · 2

Source: ρ = uniform on [0, 1]2, Target: µ = 1

N

  • 1≤i≤N δyi with yi uniform i.i.d. in [0, 1

3]2

ψ1 = Newt(ψ0) NB: The points do not move.

slide-91
SLIDE 91

18 - 3

Numerical example

ψ0 = 1

2 · 2

Source: ρ = uniform on [0, 1]2, Target: µ = 1

N

  • 1≤i≤N δyi with yi uniform i.i.d. in [0, 1

3]2

ψ1 = Newt(ψ0) ψ2 = Newt(ψ1) NB: The points do not move.

slide-92
SLIDE 92

18 - 4

Numerical example

ψ0 = 1

2 · 2

Source: ρ = uniform on [0, 1]2, Target: µ = 1

N

  • 1≤i≤N δyi with yi uniform i.i.d. in [0, 1

3]2

ψ1 = Newt(ψ0) ψ2 = Newt(ψ1) Convergence is very fast when spt(ρ) convex: 17 Newton iterations for N ≥ 107 in 3D. NB: The points do not move.

slide-93
SLIDE 93

19 - 1

Proof ingredients

slide-94
SLIDE 94

19 - 2

Proof ingredients

Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.

slide-95
SLIDE 95

19 - 3

Proof ingredients

◮ Strategy of proof: let µk =

i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.

Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.

slide-96
SLIDE 96

19 - 4

Proof ingredients

◮ Strategy of proof: let µk =

i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.

Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.

slide-97
SLIDE 97

19 - 5

Proof ingredients

◮ Strategy of proof: let µk =

i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.

Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1

0 DG(ψt)v|v d t

Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.

slide-98
SLIDE 98

19 - 6

Proof ingredients

◮ Strategy of proof: let µk =

i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.

Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1

0 DG(ψt)v|v d t

a) Control of the eigengap: DG(ψt)v|v ≤ −C(X)v2

L2(µt) if

  • v d µt = 0.

with µt = G(ψt) − → [Eymard, Gallou¨ et, Herbin ’00]. Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.

slide-99
SLIDE 99

19 - 7

Proof ingredients

◮ Strategy of proof: let µk =

i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.

Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1

0 DG(ψt)v|v d t

a) Control of the eigengap: DG(ψt)v|v ≤ −C(X)v2

L2(µt) if

  • v d µt = 0.

with µt = G(ψt) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µt: Brunn-Minkowski’s inequality implies µt ≥ (1 − t)dµ0. Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.

slide-100
SLIDE 100

19 - 8

Proof ingredients

◮ Strategy of proof: let µk =

i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.

Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1

0 DG(ψt)v|v d t

a) Control of the eigengap: DG(ψt)v|v ≤ −C(X)v2

L2(µt) if

  • v d µt = 0.

with µt = G(ψt) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µt: Brunn-Minkowski’s inequality implies µt ≥ (1 − t)dµ0. Combining a) and b) we get ψ1 − ψ02

L2(µ0) |µ1 − µ0|ψ1 − ψ0|

Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.

slide-101
SLIDE 101

19 - 9

Proof ingredients

◮ Strategy of proof: let µk =

i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.

Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1

0 DG(ψt)v|v d t

a) Control of the eigengap: DG(ψt)v|v ≤ −C(X)v2

L2(µt) if

  • v d µt = 0.

with µt = G(ψt) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µt: Brunn-Minkowski’s inequality implies µt ≥ (1 − t)dµ0. Then, by Kantorovich-Rubinstein, Combining a) and b) we get ψ1 − ψ02

L2(µ0) |µ1 − µ0|ψ1 − ψ0|

≤ Lip(ψ1 − ψ0) W1(µ0, µ1) Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5.

slide-102
SLIDE 102

19 - 10

Proof ingredients

◮ Strategy of proof: let µk =

i µk i δyi for k ∈ {0, 1}, assume all µk i > 0.

Consider ψk ∈ RY s.t. G(ψk) = µk, and ψt = ψ0 + tv with v = ψ1 − ψ0. Then, µ1 − µ0|v = G(ψ1) − G(ψ0)|v = 1

0 DG(ψt)v|v d t

a) Control of the eigengap: DG(ψt)v|v ≤ −C(X)v2

L2(µt) if

  • v d µt = 0.

with µt = G(ψt) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µt: Brunn-Minkowski’s inequality implies µt ≥ (1 − t)dµ0. Then, by Kantorovich-Rubinstein, ◮ We lose a little in the exponent to control the difference between OT maps... Combining a) and b) we get ψ1 − ψ02

L2(µ0) |µ1 − µ0|ψ1 − ψ0|

≤ Lip(ψ1 − ψ0) W1(µ0, µ1) Thm (M., Delalande, Chazal ’19): Let X convex compact with |X| = 1 and ρ = LebX, and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob(Y ), Tµ − TνL2(X) ≤ C W2(µ, ν)1/5. W2(µ0, µ1)

slide-103
SLIDE 103

20

A toy application

slide-104
SLIDE 104

21 - 1

Example: k-Means for MNIST digits

MNIST has M = 60 000 images grayscale images (64 × 64 pixels) representing digits.

slide-105
SLIDE 105

21 - 2

Example: k-Means for MNIST digits

Each image αℓ ∈ M64(R) is transformed into a probability measure on [0, 1]2 via µℓ =

1

  • i,j αℓ

ij

  • i,j αℓ

i,jδxi,xj,

with xi =

i 63

MNIST has M = 60 000 images grayscale images (64 × 64 pixels) representing digits.

slide-106
SLIDE 106

21 - 3

Example: k-Means for MNIST digits

Each image αℓ ∈ M64(R) is transformed into a probability measure on [0, 1]2 via µℓ =

1

  • i,j αℓ

ij

  • i,j αℓ

i,jδxi,xj,

with xi =

i 63

MNIST has M = 60 000 images grayscale images (64 × 64 pixels) representing digits. T ℓ = Tµℓ ∈ L2([0, 1], R2) [OT map from ρ = Leb[0,1]2 to µℓ]

slide-107
SLIDE 107

21 - 4

Example: k-Means for MNIST digits

Each image αℓ ∈ M64(R) is transformed into a probability measure on [0, 1]2 via µℓ =

1

  • i,j αℓ

ij

  • i,j αℓ

i,jδxi,xj,

with xi =

i 63

We run the K-Means method on the transport plans, with K = 20. MNIST has M = 60 000 images grayscale images (64 × 64 pixels) representing digits. T ℓ = Tµℓ ∈ L2([0, 1], R2) [OT map from ρ = Leb[0,1]2 to µℓ] Each cluster Xk ⊆ {0, . . . , M} yields an average transport plan Sk =

1 |Xk|

  • ℓ∈X T ℓ,
slide-108
SLIDE 108

21 - 5

Example: k-Means for MNIST digits

Each image αℓ ∈ M64(R) is transformed into a probability measure on [0, 1]2 via µℓ =

1

  • i,j αℓ

ij

  • i,j αℓ

i,jδxi,xj,

with xi =

i 63

We run the K-Means method on the transport plans, with K = 20. MNIST has M = 60 000 images grayscale images (64 × 64 pixels) representing digits. T ℓ = Tµℓ ∈ L2([0, 1], R2) [OT map from ρ = Leb[0,1]2 to µℓ] Each cluster Xk ⊆ {0, . . . , M} yields an average transport plan Sk =

1 |Xk|

  • ℓ∈X T ℓ,

and Sk

#ρ is the ”reconstructed measure”.

S3

slide-109
SLIDE 109

22 - 1

Summary

Optimal transport can be used to embed of Prob(Rd) into L2(ρ, Rd), with possible applications in data analysis. Computations can be easily performed using https://github.com/sd-ot

slide-110
SLIDE 110

22 - 2

Summary

Optimal transport can be used to embed of Prob(Rd) into L2(ρ, Rd), with possible applications in data analysis. Computations can be easily performed using https://github.com/sd-ot The analysis of this approach relies on the stability theory for µ → Tµ, both with respect to W2, which has many open questions.

slide-111
SLIDE 111

22 - 3

Summary

Optimal transport can be used to embed of Prob(Rd) into L2(ρ, Rd), with possible applications in data analysis. Computations can be easily performed using

Thank you for your attention!

https://github.com/sd-ot The analysis of this approach relies on the stability theory for µ → Tµ, both with respect to W2, which has many open questions.