Optimal mass transport as a distance measure between images Axel - - PowerPoint PPT Presentation

optimal mass transport as a distance measure between
SMART_READER_LITE
LIVE PREVIEW

Optimal mass transport as a distance measure between images Axel - - PowerPoint PPT Presentation

Optimal mass transport as a distance measure between images Axel Ringh 1 1 Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden. 21 st of June 2018 INRIA, Sophia-Antipolis, France Acknowledgements This is based on joint


slide-1
SLIDE 1

Optimal mass transport as a distance measure between images

Axel Ringh1

1Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden.

21st of June 2018 INRIA, Sophia-Antipolis, France

slide-2
SLIDE 2

Acknowledgements

This is based on joint work with Johan Karlsson1.

[1] J. Karlsson, and A. Ringh. Generalized Sinkhorn iterations for regularizing inverse problems using optimal mass transport. SIAM Journal on Imaging Sciences, 10(4), 1935-1962, 2017.

I acknowledge financial support from Swedish Research Council (VR) Swedish Foundation for Strategic Research (SSF) Code: https://github.com/aringh/Generalized-Sinkhorn-and-tomography The code is based on ODL: https://github.com/odlgroup/odl

1Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden 2 / 24

slide-3
SLIDE 3

Outline

Background

Inverse problems Optimal mass transport Sinkhorn iterations - solving discretized optimal transport problems

Sinkhorn iterations as dual coordinate ascent Inverse problems with optimal mass transport priors Example in computerized tomography

3 / 24

slide-4
SLIDE 4

Background

Inverse problems

Consider the problem of recovering f ∈ X from data g ∈ Y , given by g = A(f ) + ’noise’ Notation: X is called the reconstruction space. Y is called the data space. A : X → Y is the forward operator. A∗ : Y → X denotes the adjoint operator

4 / 24

slide-5
SLIDE 5

Background

Inverse problems

Consider the problem of recovering f ∈ X from data g ∈ Y , given by g = A(f ) + ’noise’ Notation: X is called the reconstruction space. Y is called the data space. A : X → Y is the forward operator. A∗ : Y → X denotes the adjoint operator Problems of interest are ill-posed inverse problems: a solution might not exist, the solution might not be unique, the solution does not depend continuously on data. Simply put: A−1 does not exist as a continuous bijection! Comes down to: find approximate inverse A† so that g = A(f ) + ’noise’ = ⇒ A†(g) ≈ f .

4 / 24

slide-6
SLIDE 6

Background

Variational regularization

A common technique to solve ill-posed inverse problems is to use variational regularization: arg min

f ∈X

G(A(f ), g) + λF(f ) G : Y × Y → R, data discrepancy functional. F : X → R, regularization functional. λ is the regularization parameter. Controls trade-off between data matching and regularization. Common example in imaging is total variation regularization: G(h, g) = h − g2

2,

F(f ) = ∇f 1. If A is linear this is a convex problem!

5 / 24

slide-7
SLIDE 7

Background

Incorporating prior information in variational schemes

How can one incorporate prior information in such a scheme?

6 / 24

slide-8
SLIDE 8

Background

Incorporating prior information in variational schemes

How can one incorporate prior information in such a scheme? One way: consider arg min

f ∈X

G(A(f ), g) + λF(f ) + γH(˜ f , f ) ˜ f is prior/template H defines “closeness” to ˜ f . What is a good choice for H?

6 / 24

slide-9
SLIDE 9

Background

Incorporating prior information in variational schemes

How can one incorporate prior information in such a scheme? One way: consider arg min

f ∈X

G(A(f ), g) + λF(f ) + γH(˜ f , f ) ˜ f is prior/template H defines “closeness” to ˜ f . What is a good choice for H? Scenarios where potentially of interest. incomplete measurements, e.g. limited angle tomography. spatiotemporal imaging:

data is a time-series of data sets: {gt}T

t=0.

For each set, the underlying image has undergone a deformation. each data set gt normally “contains less information”: A†(gt) is a poor reconstruction.

Approach: solve coupled inverse problems arg min

f0,...,fT ∈X T

  • j=0
  • G(A(fj), gj) + λF(fj)
  • +

T

  • j=1

γH(fj−1, fj)

6 / 24

slide-10
SLIDE 10

Background

Measuring distances between functions: the Lp metrics

Given two functions f0(x) and f1(x), what is a suitable way to measure the distance between the two?

7 / 24

slide-11
SLIDE 11

Background

Measuring distances between functions: the Lp metrics

Given two functions f0(x) and f1(x), what is a suitable way to measure the distance between the two? One suggestion: measure it pointwise, e.g., using an Lp metric f0 − f1p =

  • D

|f0(x) − f1(x)|pdx 1/p .

7 / 24

slide-12
SLIDE 12

Background

Measuring distances between functions: the Lp metrics

Given two functions f0(x) and f1(x), what is a suitable way to measure the distance between the two? One suggestion: measure it pointwise, e.g., using an Lp metric f0 − f1p =

  • D

|f0(x) − f1(x)|pdx 1/p . Draw-backs: for example unsensitive to shifts. Example: It gives the same distance from f0 to f1 and f2: f0 − f11 = f0 − f21 = 8.

7 / 24

slide-13
SLIDE 13

Background

Optimal mass transport - Monge formulation

Gaspard Monge: formulated optimal mass transport 1781. Optimal transport of soil for construction of forts and roads.

Gaspard Monge 8 / 24

slide-14
SLIDE 14

Background

Optimal mass transport - Monge formulation

Gaspard Monge: formulated optimal mass transport 1781. Optimal transport of soil for construction of forts and roads.

Gaspard Monge 8 / 24

slide-15
SLIDE 15

Background

Optimal mass transport - Monge formulation

Gaspard Monge: formulated optimal mass transport 1781. Optimal transport of soil for construction of forts and roads.

Gaspard Monge 8 / 24

slide-16
SLIDE 16

Background

Optimal mass transport - Monge formulation

Gaspard Monge: formulated optimal mass transport 1781. Optimal transport of soil for construction of forts and roads.

Gaspard Monge

Let c(x0, x1) : X × X → R+ describes the cost for transporting a unit mass from location x0 to x1. Given two functions f0, f1 : X → R+, find the function φ : X → X minimizing the transport cost

  • X

c(x, φ(x))f0(x)dx

0.5 1 1.5 2 2.5 3 2 4 6 8 10 5 10 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3

8 / 24

slide-17
SLIDE 17

Background

Optimal mass transport - Monge formulation

Gaspard Monge: formulated optimal mass transport 1781. Optimal transport of soil for construction of forts and roads.

Gaspard Monge

Let c(x0, x1) : X × X → R+ describes the cost for transporting a unit mass from location x0 to x1. Given two functions f0, f1 : X → R+, find the function φ : X → X minimizing the transport cost

  • X

c(x, φ(x))f0(x)dx where φ is mass preserving map from f0 to f1:

  • x∈A

f1(x)dx =

  • φ(x)∈A

f0(x)dx for all A ⊂ X.

0.5 1 1.5 2 2.5 3 2 4 6 8 10 5 10 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3

8 / 24

slide-18
SLIDE 18

Background

Optimal mass transport - Monge formulation

Gaspard Monge: formulated optimal mass transport 1781. Optimal transport of soil for construction of forts and roads.

Gaspard Monge

Let c(x0, x1) : X × X → R+ describes the cost for transporting a unit mass from location x0 to x1. Given two functions f0, f1 : X → R+, find the function φ : X → X minimizing the transport cost

  • X

c(x, φ(x))f0(x)dx where φ is mass preserving map from f0 to f1:

  • x∈A

f1(x)dx =

  • φ(x)∈A

f0(x)dx for all A ⊂ X.

0.5 1 1.5 2 2.5 3 2 4 6 8 10 5 10 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3

8 / 24

Nonconvex problem!

slide-19
SLIDE 19

Background

Optimal mass transport - Kantorovich formulation

Leonid Kantorovich: convex formulation and duality theory 1942. Nobel Memorial Prize 1975 in Economics for “contributions to the theory of

  • ptimum allocation of resources.”

Leonid Kantorovich 9 / 24

slide-20
SLIDE 20

Background

Optimal mass transport - Kantorovich formulation

Leonid Kantorovich: convex formulation and duality theory 1942. Nobel Memorial Prize 1975 in Economics for “contributions to the theory of

  • ptimum allocation of resources.”

Leonid Kantorovich

Again, let c(x0, x1) denote the cost of transporting a unit mass from the point x0 to the point x1. Given two functions f0, f1 : X → R+, find a transport plan M : X×X → R+, where M(x0, x1) is the amount of mass moved between x0 to x1.

0.5 1 1.5 2 2.5 3 2 4 6 8 10 5 10 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3

9 / 24

slide-21
SLIDE 21

Background

Optimal mass transport - Kantorovich formulation

Leonid Kantorovich: convex formulation and duality theory 1942. Nobel Memorial Prize 1975 in Economics for “contributions to the theory of

  • ptimum allocation of resources.”

Leonid Kantorovich

Again, let c(x0, x1) denote the cost of transporting a unit mass from the point x0 to the point x1. Given two functions f0, f1 : X → R+, find a transport plan M : X×X → R+, where M(x0, x1) is the amount of mass moved between x0 to x1.

0.5 1 1.5 2 2.5 3 2 4 6 8 10 5 10 0.5 1 1.5 2 2.5 3 Transportation plan (M)

9 / 24

slide-22
SLIDE 22

Background

Optimal mass transport - Kantorovich formulation

Leonid Kantorovich: convex formulation and duality theory 1942. Nobel Memorial Prize 1975 in Economics for “contributions to the theory of

  • ptimum allocation of resources.”

Leonid Kantorovich

Again, let c(x0, x1) denote the cost of transporting a unit mass from the point x0 to the point x1. Given two functions f0, f1 : X → R+, find a transport plan M : X×X → R+, where M(x0, x1) is the amount of mass moved between x0 to x1. Look for M that minimizes transportation cost: min

M≥0

  • X×X

c(x0, x1)M(x0, x1)dx0dx1

0.5 1 1.5 2 2.5 3 2 4 6 8 10 5 10 0.5 1 1.5 2 2.5 3 Transportation plan (M)

9 / 24

slide-23
SLIDE 23

Background

Optimal mass transport - Kantorovich formulation

Leonid Kantorovich: convex formulation and duality theory 1942. Nobel Memorial Prize 1975 in Economics for “contributions to the theory of

  • ptimum allocation of resources.”

Leonid Kantorovich

Again, let c(x0, x1) denote the cost of transporting a unit mass from the point x0 to the point x1. Given two functions f0, f1 : X → R+, find a transport plan M : X×X → R+, where M(x0, x1) is the amount of mass moved between x0 to x1. Look for M that minimizes transportation cost: min

M≥0

  • X×X

c(x0, x1)M(x0, x1)dx0dx1 s.t. f0(x0) =

  • X

M(x0, x1)dx1, x0 ∈ X f1(x1) =

  • X

M(x0, x1)dx0, x1 ∈ X.

0.5 1 1.5 2 2.5 3 2 4 6 8 10 5 10 0.5 1 1.5 2 2.5 3 Transportation plan (M)

9 / 24

slide-24
SLIDE 24

Background

Measuring distances between functions: optimal mass transport

Define a distance between two functions f0(x) and f1(x) using optimal transport

T(f0, f1) :=              min

M≥0

  • X×X

c(x0, x1)M(x0, x1)dx0dx1 s.t. f0(x0) =

  • X

M(x0, x1)dx1, x0 ∈ X f1(x1) =

  • X

M(x0, x1)dx0, x1 ∈ X.

If d(x, y) metric on X and c(x, y) = d(x, y)p, then T(f0, f1)1/p is a metric on the space of measures.

10 / 24

slide-25
SLIDE 25

Background

Measuring distances between functions: optimal mass transport

Define a distance between two functions f0(x) and f1(x) using optimal transport

T(f0, f1) :=              min

M≥0

  • X×X

c(x0, x1)M(x0, x1)dx0dx1 s.t. f0(x0) =

  • X

M(x0, x1)dx1, x0 ∈ X f1(x1) =

  • X

M(x0, x1)dx0, x1 ∈ X.

If d(x, y) metric on X and c(x, y) = d(x, y)p, then T(f0, f1)1/p is a metric on the space of measures. Example revisited: let c(x, y) = x − y2

2

T(f0, f1) = 4 · ( √ 2)2 = 8, T(f0, f2) = 4 · 52 = 100 This indicates that optimal transport is a more natural distance between two images than Lp, at least if one is a deformation of the other.

10 / 24

slide-26
SLIDE 26

Background

Optimal mass transport - discrete formulation

How to solve the optimal transport problem? Here: “discretize then optimize”

11 / 24

slide-27
SLIDE 27

Background

Optimal mass transport - discrete formulation

How to solve the optimal transport problem? Here: “discretize then optimize” A linear programming problem: for vectors f0 ∈ Rn and f1 ∈ Rm and a cost matrix C = [cij] ∈ Rn×m, where cij defines the transportation cost between pixels xi and xj, find the transportation plan M = [mij] ∈ Rn×m, mij is the mass transported between pixels xi and xj, such that

min

mij ≥0 n

  • i=1

m

  • j=1

cijmij subject to

m

  • j=1

mij = f0(i), i = 1, . . . , n

n

  • i=1

mij = f1(j), j = 1, . . . , m

11 / 24

slide-28
SLIDE 28

Background

Optimal mass transport - discrete formulation

How to solve the optimal transport problem? Here: “discretize then optimize” A linear programming problem: for vectors f0 ∈ Rn and f1 ∈ Rm and a cost matrix C = [cij] ∈ Rn×m, where cij defines the transportation cost between pixels xi and xj, find the transportation plan M = [mij] ∈ Rn×m, mij is the mass transported between pixels xi and xj, such that

min

mij ≥0 n

  • i=1

m

  • j=1

cijmij subject to

m

  • j=1

mij = f0(i), i = 1, . . . , n

n

  • i=1

mij = f1(j), j = 1, . . . , m ⇐ ⇒ min

M≥0

trace(C T M) subject to M1m = f0 MT 1n = f1

11 / 24

slide-29
SLIDE 29

Background

Optimal mass transport - discrete formulation

How to solve the optimal transport problem? Here: “discretize then optimize” A linear programming problem: for vectors f0 ∈ Rn and f1 ∈ Rm and a cost matrix C = [cij] ∈ Rn×m, where cij defines the transportation cost between pixels xi and xj, find the transportation plan M = [mij] ∈ Rn×m, mij is the mass transported between pixels xi and xj, such that

min

mij ≥0 n

  • i=1

m

  • j=1

cijmij subject to

m

  • j=1

mij = f0(i), i = 1, . . . , n

n

  • i=1

mij = f1(j), j = 1, . . . , m ⇐ ⇒ min

M≥0

trace(C T M) subject to M1m = f0 MT 1n = f1

Number of variables is n · m. To compare the distance between to images of size n = m = 256 × 256: M ∈ R2562×2562 = ⇒ approximately 4 · 109variables. Prohibitively large!

11 / 24

slide-30
SLIDE 30

Background

Optimal mass transport - Sinkhorn iterations

Original problem too computationally demanding. Solution: introduce an entropy barrier/regularization term D(M) =

i,j(mij log(mij) − mij + 1) [1],

Tǫ(f0, f1) := min

M≥0

trace(C TM) + ǫD(M) subject to f0 = M1 f1 = MT1.

[1]

  • M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages

2292–2300, 2013.

12 / 24

slide-31
SLIDE 31

Background

Optimal mass transport - Sinkhorn iterations

Original problem too computationally demanding. Solution: introduce an entropy barrier/regularization term D(M) =

i,j(mij log(mij) − mij + 1) [1],

Tǫ(f0, f1) := min

M≥0

trace(C TM) + ǫD(M) subject to f0 = M1 f1 = MT1. Using Lagrangian relaxation gives L(M, λ0, λ1) = trace(C TM) + ǫD(M) + λT

0 (f0 − M1) + λT 1 (f1 − MT1). [1]

  • M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages

2292–2300, 2013.

12 / 24

slide-32
SLIDE 32

Background

Optimal mass transport - Sinkhorn iterations

Original problem too computationally demanding. Solution: introduce an entropy barrier/regularization term D(M) =

i,j(mij log(mij) − mij + 1) [1],

Tǫ(f0, f1) := min

M≥0

trace(C TM) + ǫD(M) subject to f0 = M1 f1 = MT1. Using Lagrangian relaxation gives L(M, λ0, λ1) = trace(C TM) + ǫD(M) + λT

0 (f0 − M1) + λT 1 (f1 − MT1).

Given dual variables λ0, λ1, the minimum mij is 0 = ∂L(M, λ0, λ1) ∂mij = cij + ǫ log(mij) − λ0(i) − λ1(j)

[1]

  • M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages

2292–2300, 2013.

12 / 24

slide-33
SLIDE 33

Background

Optimal mass transport - Sinkhorn iterations

Original problem too computationally demanding. Solution: introduce an entropy barrier/regularization term D(M) =

i,j(mij log(mij) − mij + 1) [1],

Tǫ(f0, f1) := min

M≥0

trace(C TM) + ǫD(M) subject to f0 = M1 f1 = MT1. Using Lagrangian relaxation gives L(M, λ0, λ1) = trace(C TM) + ǫD(M) + λT

0 (f0 − M1) + λT 1 (f1 − MT1).

Given dual variables λ0, λ1, the minimum mij is 0 = ∂L(M, λ0, λ1) ∂mij = cij + ǫ log(mij) − λ0(i) − λ1(j) Expressed as mij = eλ0(i)/ǫe−cij /ǫeλ1(j)/ǫ M = diag(eλT

0 /ǫ)Kdiag(eλ1/ǫ)

where K = exp(−C/ǫ). Here and in what follows exp(·), log(·), ./, ⊙ denotes the elementwise function.

[1]

  • M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages

2292–2300, 2013.

12 / 24

slide-34
SLIDE 34

Background

Optimal mass transport - Sinkhorn iterations

Original problem too computationally demanding. Solution: introduce an entropy barrier/regularization term D(M) =

i,j(mij log(mij) − mij + 1) [1],

Tǫ(f0, f1) := min

M≥0

trace(C TM) + ǫD(M) subject to f0 = M1 f1 = MT1. Using Lagrangian relaxation gives L(M, λ0, λ1) = trace(C TM) + ǫD(M) + λT

0 (f0 − M1) + λT 1 (f1 − MT1).

Given dual variables λ0, λ1, the minimum mij is 0 = ∂L(M, λ0, λ1) ∂mij = cij + ǫ log(mij) − λ0(i) − λ1(j) Expressed as mij = eλ0(i)/ǫe−cij /ǫeλ1(j)/ǫ M = diag(eλT

0 /ǫ)Kdiag(eλ1/ǫ)

where K = exp(−C/ǫ). Here and in what follows exp(·), log(·), ./, ⊙ denotes the elementwise function.

[1]

  • M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages

2292–2300, 2013.

12 / 24

The solution M = diag(eλT

0 /ǫ)Kdiag(eλ1/ǫ) is specified by n + m unknowns!

slide-35
SLIDE 35

Background

Optimal mass transport - Sinkhorn iterations

How to find the values of λ0 and λ1?

[1]

  • M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages

2292–2300, 2013.

13 / 24

slide-36
SLIDE 36

Background

Optimal mass transport - Sinkhorn iterations

How to find the values of λ0 and λ1? Let u0 = exp(λ0/ǫ), u1 = exp(λ1/ǫ). The optimal solution M = diag(u0)Kdiag(u1) needs to satisfy diag(u0)Kdiag(u1)1 = f0 and diag(u1)K Tdiag(u0)1 = f1.

[1]

  • M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages

2292–2300, 2013.

13 / 24

slide-37
SLIDE 37

Background

Optimal mass transport - Sinkhorn iterations

How to find the values of λ0 and λ1? Let u0 = exp(λ0/ǫ), u1 = exp(λ1/ǫ). The optimal solution M = diag(u0)Kdiag(u1) needs to satisfy diag(u0)Kdiag(u1)1 = f0 and diag(u1)K Tdiag(u0)1 = f1. Theorem (Sinkhorn iterations [2]) For any matrix K with positive elements there are diagonal matrices diag(u0), diag(u1) such that M = diag(u0)Kdiag(u1) has prescribed row- and column-sums f0 and f1. The vectors u0 and u1 can be

  • btained by alternating marginalization:

u0 = f0./(Ku1) u1 = f1./(K Tu0). Each iteration only requires the multiplications Ku1 and K Tu0. This is the bottle neck. Linear convergence rate. Thus highly computationally efficient, allowing for solving large problems.

[1]

  • M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages

2292–2300, 2013. [2]

  • R. Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4), 402–405,

1967.

13 / 24

slide-38
SLIDE 38

Sinkhorn iterations as dual coordinate ascent

Several ways to motivate Sinkhorn iterations [1] Diagonal matrix scaling Bregman projections Dykstra’s algorithm

[1] J.D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyr´

  • e. Iterative Bregman projections for regularized transportation problems. SIAM

Journal on Scientific Computing, 37(2), A1111-A1138, 2015.

14 / 24

slide-39
SLIDE 39

Sinkhorn iterations as dual coordinate ascent

Several ways to motivate Sinkhorn iterations [1] Diagonal matrix scaling Bregman projections Dykstra’s algorithm Here we will introduce yet another interpretation: Use a dual formulation The Sinkhorn iteration corresponds to dual coordinate ascent This allows us to generalize Sinkhorn iterations Approach for addressing inverse problem with optimal transport term

[1] J.D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyr´

  • e. Iterative Bregman projections for regularized transportation problems. SIAM

Journal on Scientific Computing, 37(2), A1111-A1138, 2015.

14 / 24

slide-40
SLIDE 40

Sinkhorn iterations as dual coordinate ascent

Lagrangian relaxation gave optimal form of the primal variable M∗ = diag(u0)Kdiag(u1)

15 / 24

slide-41
SLIDE 41

Sinkhorn iterations as dual coordinate ascent

Lagrangian relaxation gave optimal form of the primal variable M∗ = diag(u0)Kdiag(u1) The Lagrangian dual function: ϕ(u0, u1) := min

M≥0 L(M, u0, u1) = L(M∗, u0, u1) = . . .

= ǫ log(u0)Tf0 + ǫ log(u1)Tf1 − ǫuT

0 Ku1 + ǫnm.

15 / 24

slide-42
SLIDE 42

Sinkhorn iterations as dual coordinate ascent

Lagrangian relaxation gave optimal form of the primal variable M∗ = diag(u0)Kdiag(u1) The Lagrangian dual function: ϕ(u0, u1) := min

M≥0 L(M, u0, u1) = L(M∗, u0, u1) = . . .

= ǫ log(u0)Tf0 + ǫ log(u1)Tf1 − ǫuT

0 Ku1 + ǫnm.

The dual problem is thus max

u0,u1 ϕ(u0, u1)

15 / 24

slide-43
SLIDE 43

Sinkhorn iterations as dual coordinate ascent

Lagrangian relaxation gave optimal form of the primal variable M∗ = diag(u0)Kdiag(u1) The Lagrangian dual function: ϕ(u0, u1) := min

M≥0 L(M, u0, u1) = L(M∗, u0, u1) = . . .

= ǫ log(u0)Tf0 + ǫ log(u1)Tf1 − ǫuT

0 Ku1 + ǫnm.

The dual problem is thus max

u0,u1 ϕ(u0, u1)

Taking the gradient w.r.t u0 and putting it equal to zero gives 0 = f0./u0 − Ku1, and w.r.t u1 gives 0 = f1./u1 −

  • uT

0 K

T . These are the Sinkhorn iterations!

15 / 24

slide-44
SLIDE 44

Sinkhorn iterations as dual coordinate ascent

For g proper, convex and lower semicontinuous, we can now consider problems of the form min

f1 Tǫ(f0, f1) + g(f1)

16 / 24

slide-45
SLIDE 45

Sinkhorn iterations as dual coordinate ascent

For g proper, convex and lower semicontinuous, we can now consider problems of the form min

f1 Tǫ(f0, f1) + g(f1)

= min

M≥0, f1

trace(C TM) + ǫD(M) + g(f1) subject to f0 = M1 f1 = MT1.

16 / 24

slide-46
SLIDE 46

Sinkhorn iterations as dual coordinate ascent

For g proper, convex and lower semicontinuous, we can now consider problems of the form min

f1 Tǫ(f0, f1) + g(f1)

= min

M≥0, f1

trace(C TM) + ǫD(M) + g(f1) subject to f0 = M1 f1 = MT1. Can be solve by dual coordinate ascent u0 = f0./(Ku1) 0 ∈ ∂g ∗(−ǫ log(u1)) 1 u1 − K Tu0, if the second inclusion can be solved efficiently. Can be solvable when ∂g ∗(·) is component-wise. Example of such cases: g(·) = I˜

f (·) indicator function on {˜

f } original optimal transport problem g(·) = · 2

2

16 / 24

slide-47
SLIDE 47

Inverse problems with optimal mass transport priors

Consider the inverse problems min

f1≥0

∇f11 subject to Af1 − g2 ≤ κ. TV-regularization term: ∇f11 Forward model A, data g, and data mismatch term: Af1 − g2

17 / 24

slide-48
SLIDE 48

Inverse problems with optimal mass transport priors

Consider the inverse problems min

f1≥0

∇f11 + “f1 close to f0” subject to Af1 − g2 ≤ κ. TV-regularization term: ∇f11 Forward model A, data g, and data mismatch term: Af1 − g2 Prior f0

17 / 24

slide-49
SLIDE 49

Inverse problems with optimal mass transport priors

Consider the inverse problems min

f1≥0

∇f11 + γTǫ(f0, f1) subject to Af1 − g2 ≤ κ. TV-regularization term: ∇f11 Forward model A, data g, and data mismatch term: Af1 − g2 Prior f0

17 / 24

slide-50
SLIDE 50

Inverse problems with optimal mass transport priors

Consider the inverse problems min

f1≥0

∇f11 + γTǫ(f0, f1) subject to Af1 − g2 ≤ κ. TV-regularization term: ∇f11 Forward model A, data g, and data mismatch term: Af1 − g2 Prior f0 Large convex optimization problem with several terms, variable splitting common: ADMM, primal-dual hybrid gradient algorithm, primal-dual Douglas-Rachford algorithm. Common tool in these algorithms: the proximal operator of the involved functions F. Proxσ

F(h) = argmin f

F(f ) + 1 2σ f − h2

2. [1] R.T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14(5), 877-898, 1976.

17 / 24

slide-51
SLIDE 51

Inverse problems with optimal mass transport priors

Generalized Sinkhorn iterations

We want to compute the proximal of Tǫ(f0, ·), given by Proxσ

Tǫ(f0,·)(h) = argmin f1 Tǫ(f0, f1) + 1

2σ f1 − h2

2.

Thus we want to solve min

M≥0,f1 trace(C TM) + ǫD(M) + 1

2σ f1 − h2

2

subject to f0 = M1 f1 = MT1.

18 / 24

slide-52
SLIDE 52

Inverse problems with optimal mass transport priors

Generalized Sinkhorn iterations

We want to compute the proximal of Tǫ(f0, ·), given by Proxσ

Tǫ(f0,·)(h) = argmin f1 Tǫ(f0, f1) + 1

2σ f1 − h2

2.

Thus we want to solve min

M≥0,f1 trace(C TM) + ǫD(M) + 1

2σ f1 − h2

2

subject to f0 = M1 f1 = MT1. Using dual coordinate ascent, with g(·) =

1 2σ · −h2 2,

we get the algorithm:

1 u0 = f0./(Ku1) 2 u1 = exp

h

σǫ − ω

h

σǫ + log

  • K Tu0)
  • + log(σǫ)
  • Here ω denotes the (elementwise) Wright omega function, i.e., x = log(ω(x)) + ω(x).

Solved elementwise. Bottleneck is still computation of Ku1, K Tu0.

18 / 24

Compare to

1 u0 = f0./(Ku1) 2 u1 = f1./(K Tu0)

slide-53
SLIDE 53

Inverse problems with optimal mass transport priors

Generalized Sinkhorn iterations

We want to compute the proximal of Tǫ(f0, ·), given by Proxσ

Tǫ(f0,·)(h) = argmin f1 Tǫ(f0, f1) + 1

2σ f1 − h2

2.

Thus we want to solve min

M≥0,f1 trace(C TM) + ǫD(M) + 1

2σ f1 − h2

2

subject to f0 = M1 f1 = MT1. Using dual coordinate ascent, with g(·) =

1 2σ · −h2 2,

we get the algorithm:

1 u0 = f0./(Ku1) 2 u1 = exp

h

σǫ − ω

h

σǫ + log

  • K Tu0)
  • + log(σǫ)
  • Here ω denotes the (elementwise) Wright omega function, i.e., x = log(ω(x)) + ω(x).

Solved elementwise. Bottleneck is still computation of Ku1, K Tu0. Theorem The algorithm is globally convergent, and with linear convergence rate.

18 / 24

Compare to

1 u0 = f0./(Ku1) 2 u1 = f1./(K Tu0)

slide-54
SLIDE 54

Example in computerized tomography

Computerized Tomography (CT): imaging modality used in many areas, e.g., medicine. The object is probed with X-rays. Different materials attenuates X-rays differently = ⇒ incoming and outgoing intensities gives information about the object. Simplest model

  • Lr,θ

f (x)dx = log I0 I

  • ,

f (x) is the attenuation in the point x, which is what we want to reconstruct, Lr,θ is the line along which the X-ray beam travels, I0 and I are the the incoming and outgoing intensities.

Illustration from Wikipedia 19 / 24

slide-55
SLIDE 55

Example in computerized tomography

Parallel beam 2D CT example: Reconstruction space: 256 × 256 pixels Angles: 30 in [π/4, 3π/4] (limited angle) Detector partition: uniform 350 bins Noise level 5% (a) Shepp-Logan phantom (b) Prior

20 / 24

slide-56
SLIDE 56

Example in computerized tomography

TV-regularization and ℓ2

2 prior:

min

f1

γf0 − f12

2 + ∇f11

subject to Af1 − w2 ≤ κ. TV-regularization and optimal transport prior: min

f1

γTǫ(f0, f1) + ∇f11 subject to Af1 − w2 ≤ κ. (f) Shepp-Logan phantom (g) Prior

20 / 24

slide-57
SLIDE 57

Example in computerized tomography

TV-regularization and ℓ2

2 prior:

min

f1

γf0 − f12

2 + ∇f11

subject to Af1 − w2 ≤ κ. TV-regularization and optimal transport prior: min

f1

γTǫ(f0, f1) + ∇f11 subject to Af1 − w2 ≤ κ. (k) Shepp-Logan phantom (l) Prior (m) TV-regularization (n) TV-regularization and ℓ2

2-prior

(γ = 10)

(o) TV-regularization and optimal

transport prior (γ = 4)

20 / 24

slide-58
SLIDE 58

Example in computerized tomography

Comparing different regularization parameters for the problem with ℓ2

2 prior.

min

f1

γf0 − f12

2 + ∇f11

subject to Af1 − w2 ≤ κ. (p) γ = 1 (q) γ = 10 (r) γ = 100 (s) γ = 1000 (t) γ = 10 000. Figure: Reconstructions using ℓ2 prior with different regularization parameters γ.

21 / 24

slide-59
SLIDE 59

Example in computerized tomography

Parallel beam 2D CT example: Reconstruction space: 256 × 256 pixels Angles: 30 in [0, π] Detector partition: uniform 350 bins Noise level 3% (a) Phantom (b) Prior

22 / 24

slide-60
SLIDE 60

Example in computerized tomography

Parallel beam 2D CT example: Reconstruction space: 256 × 256 pixels Angles: 30 in [0, π] Detector partition: uniform 350 bins Noise level 3% (f) Phantom (g) Prior (h) TV-regularization (i) TV-regularization and ℓ2

2-prior

(γ = 10)

(j) TV-regularization and optimal

transport prior (γ = 4)

22 / 24

slide-61
SLIDE 61

Conclusions and further work

Conclusions Optimal mass transport - a viable framework for imaging applications Generalized Sinkhorn iteration for computing the proximal operator of optimal transport cost Use variable splitting for solving the inverse problem Application to CT reconstruction using optimal transport priors

23 / 24

slide-62
SLIDE 62

Conclusions and further work

Conclusions Optimal mass transport - a viable framework for imaging applications Generalized Sinkhorn iteration for computing the proximal operator of optimal transport cost Use variable splitting for solving the inverse problem Application to CT reconstruction using optimal transport priors Potential future directions: Application to spatiotemporal image reconstruction: arg min

f0,...,fT ∈X T

  • j=0
  • G(A(fj), gj) + λF(fj)
  • +

T

  • j=1

γH(fj−1, fj) More efficient ways of solving the dual problem? Learning for inverse problems using optimal transport as a loss function

23 / 24

slide-63
SLIDE 63

Thank you for your attention!

Questions?

24 / 24