Statistical aspects of stochastic algorithms for entropic optimal - - PowerPoint PPT Presentation

statistical aspects of stochastic algorithms for entropic
SMART_READER_LITE
LIVE PREVIEW

Statistical aspects of stochastic algorithms for entropic optimal - - PowerPoint PPT Presentation

Stochastic algorithm for optimal transport Statistical aspects of stochastic algorithms for entropic optimal transportation between probability measures J er emie Bigot Institut de Math ematiques de Bordeaux Equipe Image, Optimisation


slide-1
SLIDE 1

Stochastic algorithm for optimal transport

Statistical aspects of stochastic algorithms for entropic optimal transportation between probability measures

J´ er´ emie Bigot

Institut de Math´ ematiques de Bordeaux Equipe Image, Optimisation et Probabilit´ es (IOP)

Universit´ e de Bordeaux Joint work with Bernard Bercu (IMB, Bordeaux) Statistical modeling for shapes and imaging The Mathematics of Imaging, IHP , March 2019

slide-2
SLIDE 2

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

1

Motivations from of a ressource allocation problem

2

Wassertein optimal transport

3

Regularized optimal transport and stochastic optimisation

4

Data-driven choice of the regularization parameter ?

slide-3
SLIDE 3

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Data at hand 1 : locations of Police stations in Chicago spatial locations of reported incidents of crime (with the exception of murders) in Chicago in 2014 Questions (of interest ?) : given the location of a crime, which Police station should intervene ? how updating the answer in an “online fashion” along the year ?

  • 1. Open Data from Chicago : https://data.cityofchicago.org
slide-4
SLIDE 4

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Locations y1, . . . , yJ of Police stations in Chicago

slide-5
SLIDE 5

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Spatial location X1 of the first reported incident of crime in Chicago in the year 2014

slide-6
SLIDE 6

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Spatial locations X1, X2 of reported incidents of crime in Chicago in chronological order

slide-7
SLIDE 7

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Spatial locations X1, X2, X3 of reported incidents of crime in Chicago in chronological order

slide-8
SLIDE 8

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Spatial locations X1, . . . , X4 of reported incidents of crime in Chicago in chronological order

slide-9
SLIDE 9

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Spatial locations X1, . . . , X5 of reported incidents of crime in Chicago in chronological order

slide-10
SLIDE 10

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Spatial locations of reported incidents of crime in Chicago in chronological order (first 100)

slide-11
SLIDE 11

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Spatial locations of reported incidents of crime in Chicago in chronological order (first 1000)

slide-12
SLIDE 12

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Spatial locations X1, . . . , XN of reported incidents of crime in Chicago in chronological order (total N = 16104)

slide-13
SLIDE 13

Stochastic algorithm for optimal transport Motivations from of a ressource allocation problem

An example of a ressource allocation problem

Heat map (kernel density estimation) of spatial locations of reported incidents of crime in Chicago in 2014

slide-14
SLIDE 14

Stochastic algorithm for optimal transport Wassertein optimal transport

1

Motivations from of a ressource allocation problem

2

Wassertein optimal transport

3

Regularized optimal transport and stochastic optimisation

4

Data-driven choice of the regularization parameter ?

slide-15
SLIDE 15

Stochastic algorithm for optimal transport Wassertein optimal transport

Statistical approach to ressource allocation

Modeling assumptions : spatial locations of reported incidents of crime : a sequence of iid random variables X1, . . . , Xn sampled from an unknown probability measure µ with support X ⊂ R2 locations of Police station : a known and discrete probability measure ν =

J

  • j=1

νjδyj where

yj ∈ R2 represent the spatial location of the j-th Police station νj is a positive weight representing the “capacity” of each Police station (we took νj = 1/J that is uniform weights)

slide-16
SLIDE 16

Stochastic algorithm for optimal transport Wassertein optimal transport

Statistical approach to ressource allocation

Point of view in this talk : ressource allocation can be solved by finding an optimal transportation map T : X → {y1, . . . , yJ} which pushes forward µ onto ν = J

j=1 νjδyj (notation : T#µ = ν), with

respect to a given cost function, e.g. a distance on X c(x, y) = x − yℓp = d

  • k=1

(xk − yk)p 1/p , x, y ∈ Rd (here d = 2) Question : how doing on-line estimation of such a map using the

  • bservations X1, . . . , Xn ∼iid µ ?
slide-17
SLIDE 17

Stochastic algorithm for optimal transport Wassertein optimal transport

Optimal transport between probability measures

Let T : X → {y1, . . . , yJ} such that T#µ = ν Let Π(µ, ν) be the set of probability measures on X × X with marginals µ and ν Definition The optimal transport problem between µ and ν is W0(µ, ν) = min

T : T#µ=ν

  • X

c(x, T(x))dµ(x), (Monge’s formulation)

  • r

W0(µ, ν) = min

π∈Π(µ,ν)

  • X×X

c(x, y)dπ(x, y), (Kantorovich’s formulation) where c(x, y) is the cost function of moving mass from x to y.

slide-18
SLIDE 18

Stochastic algorithm for optimal transport Wassertein optimal transport

An example of semi-discrete optimal transport

Optimal transport of an absolutely continuous measure µ onto a discrete measure ν (black dots)

slide-19
SLIDE 19

Stochastic algorithm for optimal transport Wassertein optimal transport

An example of semi-discrete optimal transport

Optimal transport of µ onto the discrete measure ν (black dots) - Optimal map T for the Euclidean cost c(x, y) = x − yℓ2

slide-20
SLIDE 20

Stochastic algorithm for optimal transport Wassertein optimal transport

Semi-discrete optimal transport

Unicity of an optimal mapping T : supp(µ) → {y1, . . . , yJ} such that T#µ = ν given, for all 1 ≤ j ≤ J, by 1 T−1(yj) =

  • x ∈ supp(µ) : c(x, yj)−v∗

j,0 ≤ c(x, yk)−v∗ k,0 for all 1 ≤ k ≤ J

  • where v∗

0 ∈ RJ is any maximizer of the un-regularized semi-dual

problem of the Kantorovich’s formulation of OT. The sets {T−1(yj)} are the so-called Laguerre cells (important concept from computational geometry).

  • 1. M´

erigot (2018), Cuturi and Peyr´ e (2017)

slide-21
SLIDE 21

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

1

Motivations from of a ressource allocation problem

2

Wassertein optimal transport

3

Regularized optimal transport and stochastic optimisation

4

Data-driven choice of the regularization parameter ?

slide-22
SLIDE 22

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Optimal transport between probability measures

Problem : computational cost of optimal transport for data analysis 1 Case of discrete measures : if µ =

K

  • i=1

µiδxi and ν =

K

  • j=1

νjδyj then the cost to evaluate W0(µ, ν) (linear program) is generally O(K3 log K)

  • 1. See the recent book by Cuturi & Peyr´

e (2018)

slide-23
SLIDE 23

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Regularized optimal transport

Definition (Cuturi (2013)) Let µ and ν be any probability measures supported on X. Then, the regularized optimal transport problem between µ and ν is Wε(µ, ν) = min

π∈Π(µ,ν)

  • X×X

c(x, y)dπ(x, y) + εKL(π|µ ⊗ ν), where ǫ > 0 (regularization parameter) and KL(π|ξ) =

  • X×X
  • log

dπ dξ (x, y)

  • − 1
  • dπ(x, y), with ξ = µ ⊗ ν.

Case of discrete measures : for ǫ > 0 Sinkhorn algorithm (iterative scheme) to compute Wε(µ, ν) computational cost of O(K2) at each iteration

slide-24
SLIDE 24

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Stochastic optimal transport

Proposition (Genevay, Cuturi, Peyr´ e and Bach (2016)) Let µ be any probability measure and ν = J

j=1 νjδyj. For ε > 0, solve

the smooth concave maximization problem Wε(µ, ν) = max

v∈RJ Hε(v), where

Hε(v) := E[hε(X, v)]

  • Stochastic optimization

where X is a random variable with distribution µ, and for x ∈ X and v ∈ RJ, hε(x, v) =

J

  • j=1

vjνj − ε log J

  • j=1

exp vj − c(x, yj) ε

  • νj
  • − ε.
slide-25
SLIDE 25

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Stochastic algorithm 1

For fixed ǫ > 0, Robbins-Monro algorithm to compute a minimizer v∗ := v∗

ε ∈ arg min v∈RJ

E[hε(X, v)] Let X1, . . . , Xn ∼iid µ, choose V0 ∈ RJ and a sequence γn+1 of steps with ∞

n=1 γn = +∞ and ∞ n=1 γ2 n < +∞ and do

  • Vn+1 =

Vn + γn+1∇vhε(Xn+1, Vn) Easy computation of gradients for ǫ > 0 (smooth optimization) ∇vhε(x, v) = ν − π(x, v) where π(x, v) ∈ RJ with πj(x, v) = J

  • k=1

νk exp vk − c(x, yk) ε −1 νj exp vj − c(x, yj) ε

  • 1. Genevay, Cuturi, Peyr´

e and Bach (2016), Galerne, Leclaire, Rabin (2018)

slide-26
SLIDE 26

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Contribution in our work 1

Main results on the sequence Vn : assume that the step γn = γ/n where γ > 0 satisfies γ > 1 2ρ∗ where ρ∗ denotes the (second) smallest value of the Hessian matrix −∇2Hε(v) at v = v∗,

  • r that γn = γ/nc where γ > 0 and 1/2 < c < 1.

Proposition Then, limn→∞ Vn = v∗ almost surely, and one has the asymptotic normality of √ nc Vn − v∗ as n → +∞.

  • 1. Bercu, B. & Bigot, J. (2018) ArXiv :1812.09150
slide-27
SLIDE 27

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Contribution in our work 1

Main results on the sequence Vn : assume that the step γn = γ/n where γ > 0 satisfies γ > 1 2ρ∗ where ρ∗ denotes the (second) smallest value of the Hessian matrix −∇2Hε(v) at v = v∗,

  • r that γn = γ/nc where γ > 0 and 1/2 < c < 1.

Interestingly, one has that −∇2Hε(v∗) = 1 ε

  • I

E

  • π(X, v∗)π(X, v∗)T

− diag(ν)

  • which is not far from the covariance matrix of a multinomial

distribution, implying that 1 ε min

1≤j≤J νj ≤ ρ∗ ≤ 1

ε, hence we took γ = ε 2 min1≤j≤J νj

  • 1. Bercu, B. & Bigot, J. (2018) ArXiv :1812.09150
slide-28
SLIDE 28

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Contribution in our work 1

Main goal : estimation of the Wasserstein functional Wε(µ, ν) based

  • n X1, . . . , Xn ∼iid µ and assuming that ν is known

A simple recursive estimator :

  • Wn = 1

n

n

  • k=1

hε(Xk, Vk−1). Main results : a.s. convergence of Wn + asymptotic normality with same conditions for γn √n

  • Wn − Wε(µ, ν)

L − → N

  • 0, σ2

ε(µ, ν)

  • where the asymptotic variance σ2

ε(µ, ν) can also be estimated in a

recursive manner

  • σ 2

n = 1

n

n

  • k=1

h2

ε(Xk,

Vk−1) − W 2

n .

  • 1. Bercu, B. & Bigot, J. (2018) ArXiv :1812.09150
slide-29
SLIDE 29

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Contribution in our work

Main goal : estimation of the Wasserstein functional Wε(µ, ν) based

  • n X1, . . . , Xn ∼iid µ and assuming that ν is known

A simple recursive estimator :

  • Wn = 1

n

n

  • k=1

hε(Xk, Vk−1). Rate of convergence of the expected excess risk :

  • Rn = Hε(v∗) − I

E[ Wn] = 1 n

n

  • k=1
  • Hε(v∗) − I

E[Hε( Vk−1)]

  • Here, Hε is not strongly concave, but satisfies a generalized

self-concordance property 1 allowing to have convergence of Rn faster than 1/√n for γn = γ/nc where γ > 0 and 3/4 < c < 1

  • 1. Bach (2014)
slide-30
SLIDE 30

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Contribution in our work 1

Proposition (Generalized self-concordance) For any v ∈ RJ, we have

  • ∇Hε(v) − ∇2Hε(v∗)(v − v∗)

1 ε2√ 2 v − v∗2. Moreover, assume that v − v∗ ≤ A for some A > 0. Then, ∇Hε(v), v − v∗ ≤        −ρ∗g √ 2 ε A

  • v − v∗2

if A ≤ 1, −ρ∗ A g √ 2 ε

  • v − v∗2

if A ≥ 1. where g(η) := 1 η

  • 1 − exp(−η)
  • ≥ exp(−η)
  • 1. Bercu, B. & Bigot, J. (2018) ArXiv :1812.09150
slide-31
SLIDE 31

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Simulated data

Optimal transport of an absolutely continuous measure µ onto a discrete measure ν (black dots)

slide-32
SLIDE 32

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Simulated data

Samples X1, . . . , XN ∼iid µ (with N = 20000) and discrete measure ν (black dots)

slide-33
SLIDE 33

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Simulated data

Convergence of the algorithm using the quadratic cost c(x, y) = x − y2

ℓ2

slide-34
SLIDE 34

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Simulated data

Estimated optimal map ˆ Tε,N for the quadratic cost c(x, y) = x − y2

ℓ2

after N = 20000 iterations and ε = 0.005

slide-35
SLIDE 35

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Laguerre cells estimation

Estimation of Laguerre cells after n iterations

  • T−1

ε,n(yj) =

  • x ∈ supp(µ) : c(x, yj)−

Vn,j ≤ c(x, yk)− Vn,k for all 1 ≤ k ≤ J

  • where

Vn,j denotes the j-entry of the vector Vn considered as an estimation of a maximizer of the un-regularized semi-dual problem v∗

0 ∈ arg min v∈RJ

E[h0(X, v)] where v → h0(x, v) is not differentiable ! Question (of interest ?) : how estimating the true Laguerre cells T−1(yj) =

  • x ∈ supp(µ) : c(x, yj)−v∗

j,0 ≤ c(x, yk)−v∗ k,0 for all 1 ≤ k ≤ J

  • bet letting ε = εn → 0 ?
slide-36
SLIDE 36

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Spatial locations X1, . . . , XN of reported incidents of crime in Chicago in chronological order (total N = 16104)

slide-37
SLIDE 37

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Convergence of the algorithm using the Euclidean cost c(x, y) = x − yℓ2

slide-38
SLIDE 38

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Estimated optimal map ˆ Tε,n for the Euclidean cost c(x, y) = x − yℓ2 after n = 100 iterations and ε = 0.005

slide-39
SLIDE 39

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Estimated optimal map ˆ Tε,n for the Euclidean cost c(x, y) = x − yℓ2 after n = 1000 iterations and ε = 0.005

slide-40
SLIDE 40

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Estimated optimal map ˆ Tε,n for the Euclidean cost c(x, y) = x − yℓ2 after n = 2000 iterations and ε = 0.005

slide-41
SLIDE 41

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Estimated optimal map ˆ Tε,n for the Euclidean cost c(x, y) = x − yℓ2 after n = 3000 iterations and ε = 0.005

slide-42
SLIDE 42

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Estimated optimal map ˆ Tε,N for the Euclidean cost c(x, y) = x − yℓ2 after N = 16104 iterations and ε = 0.005

slide-43
SLIDE 43

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Estimated optimal map ˆ Tε,N for the ℓ1 cost c(x, y) = x − yℓ1 after N = 16104 iterations and ε = 0.005

slide-44
SLIDE 44

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Estimated optimal map ˆ Tε,N : Euclidean versus ℓ1 cost with ε = 0.005

slide-45
SLIDE 45

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Estimated optimal map ˆ Tε,N : Euclidean versus ℓ1 cost with ε = 0.01

slide-46
SLIDE 46

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Estimated optimal map ˆ Tε,N : Euclidean versus ℓ1 cost with ε = 0.1

slide-47
SLIDE 47

Stochastic algorithm for optimal transport Regularized optimal transport and stochastic optimisation

Numerical experiments - Real data

Estimated optimal map ˆ Tε,N : Euclidean versus ℓ1 cost with ε = 0.2

slide-48
SLIDE 48

Stochastic algorithm for optimal transport Data-driven choice of the regularization parameter ?

1

Motivations from of a ressource allocation problem

2

Wassertein optimal transport

3

Regularized optimal transport and stochastic optimisation

4

Data-driven choice of the regularization parameter ?

slide-49
SLIDE 49

Stochastic algorithm for optimal transport Data-driven choice of the regularization parameter ?

Regularized Wasserstein barycenters 1

Observations of n discrete measures ˜ νpi = 1

pi

pi

j=1 δXi,j for 1 ≤ i ≤ n

supported on X ⊂ Rd. Use of entropically regularized Wasserstein cost ˆ µε

n,p = argmin µ∈P2(X)

1 n

n

  • i=1

W2

2,ε (µ, ˜

νpi) (Sinkhorn barycenter), where W2

2,ε(µ, ν) = inf π

  • X
  • X

|x − y|2π(x, y)dxdy − εH(π), where H(π) is the entropy of the transport plan π

  • 1. Bigot, J., Cazelles, E. & Papadakis, N. (2018) ArXiv :1804.08962
slide-50
SLIDE 50

Stochastic algorithm for optimal transport Data-driven choice of the regularization parameter ?

Regularization using the Sinkhorn barycenter

A subset of 8 histograms (out of n = 15) from random variables sampled from

  • ne-dimensional Gaussian mixtures distributions νi (with random means and

variances) and binning of the data (Xi,j)1≤i≤n ;1≤p on a grid of size N = 28 with p1 = . . . = pn = 50.

slide-51
SLIDE 51

Stochastic algorithm for optimal transport Data-driven choice of the regularization parameter ?

Regularization using the Sinkhorn barycenter 1

Three Sinkhorn barycenters ˆ µε

n,p associated to the parameters

ε = 0.18, 1.94, 9.5. The trade-off function ε →

Bias

  • B(ε) +b

Variance

  • V(ε) which attains its optimum at

ˆ ε = 1.94 using the Goldenshluger-Lepski’s principle (L-curve criterion)

  • 1. Bigot, J., Cazelles, E. & Papadakis, N. (2018) ArXiv :1804.08962
slide-52
SLIDE 52

Stochastic algorithm for optimal transport Data-driven choice of the regularization parameter ?

The Goldenshluger-Lepski’s principle 1

Consider a finite collection of estimators (ˆ µε

n,p)ε for ε ∈ Λ.

The GL method consists in choosing a value ˆ ε which minimizes the bias-variance trade-off function : ˆ ε = arg min

ε∈Λ

B(ε) + bV(ε) with “bias term” as B(ε) = sup

˜ ε≤ε

µε

n,p − ˆ

µ˜

ε n,p|2 − bV(˜

ε)

  • +

and a “variance term” V(ε) chosen proportional to an upper bound on the variance of the Sinkhorn barycenter ˆ µε

n,p (with b > 0 another

tuning constant !)

  • 1. e.g. for density estimation see Lacour and Massart (2016)
slide-53
SLIDE 53

Stochastic algorithm for optimal transport Data-driven choice of the regularization parameter ?

Flow cytometry data 1

Measurements from n = 15 patients restricted to a bivariate projection : FSC versus SSC cell markers. Main issue : data alignement and density estimation for cells clustering

  • 1. Bigot, J., Cazelles, E. & Papadakis, N. (2018) ArXiv :1804.08962
slide-54
SLIDE 54

Stochastic algorithm for optimal transport Data-driven choice of the regularization parameter ?

Flow cytometry data 1

The trade-off function ε → B(ε) + bV(ε) Sinkhorn barycenter associated to the parameter ˆ ε = 3.1

  • 1. Bigot, J., Cazelles, E. & Papadakis, N. (2018) ArXiv :1804.08962
slide-55
SLIDE 55

Stochastic algorithm for optimal transport Data-driven choice of the regularization parameter ?

Data-driven smoothing of Laguerre cells ?

Estimated optimal map ˆ Tε,N for various values of ε