Generative Models and Optimal Transport
Marco Cuturi
Joint work / work in progress with
- G. Peyré, A. Genevay (ENS), F. Bach (INRIA),
- G. Montavon, K-R Müller (TU Berlin)
Generative Models and Optimal Transport Marco Cuturi Joint work / - - PowerPoint PPT Presentation
Generative Models and Optimal Transport Marco Cuturi Joint work / work in progress with G. Peyr, A. Genevay (ENS) , F. Bach (INRIA), G. Montavon, K-R Mller (TU Berlin) Statistics 0.1 : Density Fitting We collect data N data = 1 X x
Joint work / work in progress with
2
N
i=1
2
N
i=1
θ∈Θ
N
i=1
θ∈Θ
N
i=1
θ∈Θ KL(νdatakpθ)
θ∈Θ KL(νdatakpθ)
8
θ∈Θ KL(νdatakpθ)
8
θ∈Θ KL(νdatakpθ)
9
10
10
10
10
z = .32 .8 .34 . . . .01
10
z = .32 .8 .34 . . . .01
10
z = .32 .8 .34 . . . .01
10
10
11
11
12
13
max
θ∈Θ
1 N
N
X
i=1
log pθ(xi)
min
θ∈Θ KL(νdatakpθ)
=
14
max
✓∈Θ
1 N
N
X
i=1
log f✓]µ(xi)
min
✓∈Θ KL(νdatakf✓]µ)
14
max
✓∈Θ
1 N
N
X
i=1
log f✓]µ(xi)
min
✓∈Θ KL(νdatakf✓]µ)
15
16
µ
latent space
νdata
data space
✓∈Θ
classifiers g Accuracyg ((f✓]µ, +1), (νdata, −1))
θ∈Θ ∆(νdata, pθ),
θ∈Θ KL(νdatakpθ)
17
18
Monge Kantorovich Dantzig Brenier McCann Villani Otto Koopmans
Nobel ’75 Fields ’10
θ∈Θ W(νdata, fθ]µ)
19
Empirical Measures, i.e. data
µ
ν
h1
Color Histograms
h2
Bags
d
pθ pθ0
Statistical Models Brain Activation Maps
h2
Bags
d
Brain Activation Maps
20
pθ pθ0
Statistical Models
µ
ν
Color Histograms Empirical Measures, i.e. data
21
21
21
22
22
23
[SDPC..’15]
23
[SDPC..’15]
24
[SDPC..’15]
25
26
26
26
26
26
26
26
26
26
26
27
T ]µ=ν
Ω
27
28
T ]µ=ν
Ω
28
T ]µ=ν
Ω
29
def
30
def
{ } { } { } {
−1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 0.1 0.2 0.3 P (x, y)
30
def
{ } { } { } {
−1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 0.1 0.2 0.3 P (x, y) −1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 5 · 10 0.1 0.15 P (x, y) 0.1 0.2 0.3
31
p (µ, ν) def
P ∈Π(µ,ν)
PRIMAL
31
p (µ, ν) def
P ∈Π(µ,ν)
PRIMAL
31
p (µ, ν) def
P ∈Π(µ,ν)
PRIMAL
p (µ, ν) =
ϕ∈L1(µ),ψ∈L1(ν) ϕ(x)+ψ(y)≤Dp(x,y)
DUAL
32
32
Stochastic Optimization
low dim.
33
θ∈Θ W(νdata, fθ]µ)
34
p (δx, δy) = D(x, y)
35
n
i=1
n
j=1
35
n
i=1
n
j=1
n
i=1
36
n
i=1
p (µ, ν) = min σ∈Sn C(σ)
n
j=1
37
n
i=1
m
j=1
37
n
i=1
m
j=1
38
def
+
def
n
i=1
m
j=1
b1 ... bm a1
· · · · · · · · ·
. . .
· · · P 1m = a · · ·
an
· · · · · · · · ·
y1 ... ym x1
· · ·
. . .
· D(xi, yj)p ·
xn
· · ·
38
def
+
def
n
i=1
m
j=1
b1 ... bm a1
. . . . . . . . .
. . .
. . . P T 1n = b . . .
an
. . . . . . . . .
y1 ... ym x1
· · ·
. . .
· D(xi, yj)p ·
xn
· · ·
38
def
+
def
p (µ, ν) =
P ∈U(a,b)hP , MXY i
n
i=1
m
j=1
39
40
40
p (µ, ν) =
α∈Rn,β∈Rm αi+βj≤D(xi,yj)p
40
Note: flow/PDE formulations [Beckman’61]/[Benamou’98] can be used for p=1/p=2 for a sparse-graph metric/Euclidean metric.
41
41
41
42
42
42
p (µ, ν) not differentiable.
43
43
43
44
45
Note: Unique optimal solution because of strong concavity of Entropy
def
nm
i,j=1
def
P ∈U(a,b)hP , MXY i γE(P )
45
γ
Note: Unique optimal solution because of strong concavity of Entropy
def
P ∈U(a,b)hP , MXY i γE(P )
46
def
P ∈U(a,b)
+, v 2 Rm +, such that
def
46
def
P ∈U(a,b)
+, v 2 Rm +, such that
def
L(P, α, β) = X
ij
PijMij + γPij log Pij + αT (P1 − a) + βT (P T 1 − b) ∂L/∂Pij = Mij + γ(log Pij + 1) + αi + βj (∂L/∂Pij = 0) ⇒Pij = e
αi γ + 1 2 e − Mij γ
e
βj γ + 1 2 = ui Kijvj
46
def
P ∈U(a,b)
+, v 2 Rm +, such that
def
[S..C..’15]
47
sampled uniformly on simplex, Sinkhorn tolerance 10-2.
(Ω, D)
64 128 256 512 1024 2048 4096 10
−6
10
−4
10
−2
10 10
2
10
4
Histogram Dimension
FastEMD Rubner’s emd CPU γ=0.02 CPU γ=0.1 GPU γ=0.02 GPU γ=0.1
48
n
i=1
m
j=1
P ∈U(a,b)hP , MXY iγE(P )
48
n
i=1
m
j=1
48
n
i=1
m
j=1
49
n
i=1
m
j=1
49
n
i=1
m
j=1
50
α,β αT a + βT b − 1
D(a−1).
51
ν : g 2 Rn 7! γ
a∈ΣnHν(a)+f(Aa)=max g∈RdH∗ ν(
p (µ, ν) = sup ϕ,ψ
52
C(ϕ, ψ) = γ
ϕ,ψ
C(ϕ, ψ)
REGULARIZED DUAL DUAL
p (µ, ν) = sup ϕ,ψ
52
C(ϕ, ψ) = γ
ϕ,ψ
C(ϕ, ψ)
REGULARIZED DUAL DUAL
53
p (µ, ν) = sup ϕ
ϕ
ϕ(x)−D(x,·)p γ
REGULARIZED SEMI-DUAL SEMI-DUAL
54
ϕ
ϕ(x)−D(x,·)p γ
REGULARIZED SEMI-DUAL REGULARIZED SEMI-DUAL
sup
ϕ
Z
y
Z
x
ϕ(x)dµ(x) − γ log Z
x
e
ϕ(x)−D(x,y)p γ
dµ(x)
55
REGULARIZED SEMI-DUAL
sup
ϕ
Z
y
Z
x
ϕ(x)dµ(x) − γ log Z
x
e
ϕ(x)−D(x,y)p γ
dµ(x)
55
REGULARIZED SEMI-DUAL
µ = Pn
i=1 aiδxi
sup
ϕ
Z
y
Z
x
ϕ(x)dµ(x) − γ log Z
x
e
ϕ(x)−D(x,y)p γ
dµ(x)
55
REGULARIZED SEMI-DUAL
α∈Rn Eν[f(α, y)]
STOCHASTIC REGULARIZED SEMI-DUAL
α∈Rn
y
i=1
n
i=1
αi−D(xi,y)p γ
µ = Pn
i=1 aiδxi
sup
ϕ
Z
y
Z
x
ϕ(x)dµ(x) − γ log Z
x
e
ϕ(x)−D(x,y)p γ
dµ(x)
56
def
def
γ→∞ ED(µ, ν)
57
∂WL ∂X , ∂WL ∂a
def
def
def
def
58
59
60
µ∈P(Ω) N
i=1
p (µ, νi)
−1 −0.5 0.5 1 1.5 2 2.5 3 −1.5 −1 −0.5 0.5 1
61
−1 −0.5 0.5 1 1.5 2 2.5 3 −1.5 −1 −0.5 0.5 1
i ni, P i ni)
−1 −0.5 0.5 1 1.5 2 2.5 3 −1.5 −1 −0.5 0.5 1
61
62
µ
i
p (µ, νi)
62
P1,··· ,PN ,a N
i=1
T 1n = bi, 8i N,
63
[CD’14]
µ∈Q⊂P(Ω) N
i=1
Fast Computation of Wasserstein Barycenters International Conference on Machine Learning 2014
63
[CD’14]
µ∈Q⊂P(Ω) N
i=1
Fast Computation of Wasserstein Barycenters International Conference on Machine Learning 2014
63
[CD’14]
µ∈Q⊂P(Ω) N
i=1
Fast Computation of Wasserstein Barycenters International Conference on Machine Learning 2014
64
µ∈Q⊂P(Ω) N
i=1
64
µ∈Q⊂P(Ω) N
i=1
64
µ∈Q⊂P(Ω) N
i=1
not a convex problem
65
def
a N
i=1
λ∈ΣN
def
66
67
d
def
bl1T
N
KT Ul ,
def
B KVl+1 .
68
λ∈ΣN
def
λ∈ΣN
def
def
69
70
Wasserstein Barycentric Coordinates: Histogram Regression using Optimal Transport, SIGGRAPH’16
[BPC’16]
71
72
73
74
Wasserstein Barycentric Coordinates: Histogram Regression using Optimal Transport, SIGGRAPH’16
[BPC’16]
75
Original Euclidean Wasserstein projection projection
75
Original Euclidean Wasserstein projection projection
76
C
K
` ← ` + 1
Sinkhorn Generative model
` = 1, . . . , L − 1
. . .
θ1 θ2
(c(xi, yj))i,j
. . .
Input data
(z1, . . . , zm)
(x1, . . . , xm) (y1, . . . , yn)
1m
ˆ EL(θ)
1/· ×mK>
×nK
1/·
b`
a`+1
b`+1
. . . . . .
h(C K)bL, aLi
e−C/ε
77
78
MMD-GAN gamma = 1000 gamma=10
79
NIPS’17 WORKSHOP NIPS’17 TUTORIAL