[PPT] - MADMM: a generic algorithm for non-smooth optimization on manifolds PowerPoint Presentation

SLIDE 1

1/54

MADMM: a generic algorithm for non-smooth

ptimization on manifolds

Michael Bronstein

Faculty of Informatics Perceptual Computing Group University of Lugano Intel Corporation Switzerland Israel

Louvain-la-Neuve, 25 September 2015

SLIDE 2

2/54

Image.processing Geometry processing Image analysis.. Shape . analysis Computer vision Computer graphics

2D 3D nD

Pattern recognition Machine learning . Graph analysis . & processing

SLIDE 3

3/54

What is manifold optimization?

Manifold (or manifold-constrained) optimization problem min

X∈Rn×m f(X) s.t. X ∈ M

f ∶ Rn×m → R is a smooth function M is a Riemannian submanifold of Rn×m

Absil et al. 2009

SLIDE 4

4/54

Applications

Sphere: principal geodesic analysis1, 1-bit compressed sensing2

1Zhang, Fletcher 2013; 2Boufounos, Baraniuk 2008

SLIDE 5

4/54

Applications

Sphere: principal geodesic analysis1, 1-bit compressed sensing2 Stiefel manifold: eigenvalue-, assignment-, Procrustes problems3,

rthogonal dictionary learning4, binary coding5

1Zhang, Fletcher 2013; 2Boufounos, Baraniuk 2008; 3Ten Berghe 1977; 4Sun et al.

2015; 5Xia et al. 2015

SLIDE 6

4/54

Applications

Sphere: principal geodesic analysis1, 1-bit compressed sensing2 Stiefel manifold: eigenvalue-, assignment-, Procrustes problems3,

rthogonal dictionary learning4, binary coding5

Product of Stiefel manifolds: functional correspondence6, manifold learning7, structure-from-motion8, sensor localization9

1Zhang, Fletcher 2013; 2Boufounos, Baraniuk 2008; 3Ten Berghe 1977; 4Sun et al.

2015; 5Xia et al. 2015; 6Kovnatsky et al. 2013; 7Eynard et al. 2015; 8Arie-Nachimson et al. 2012; 9Cucuringu et al. 2012

SLIDE 7

4/54

Applications

Sphere: principal geodesic analysis1, 1-bit compressed sensing2 Stiefel manifold: eigenvalue-, assignment-, Procrustes problems3,

rthogonal dictionary learning4, binary coding5

Product of Stiefel manifolds: functional correspondence6, manifold learning7, structure-from-motion8, sensor localization9 Fixed-rank PSD: maxcut problems, sparse PCA10, matrix completion11, multidimensional scaling12

1Zhang, Fletcher 2013; 2Boufounos, Baraniuk 2008; 3Ten Berghe 1977; 4Sun et al.

2015; 5Xia et al. 2015; 6Kovnatsky et al. 2013; 7Eynard et al. 2015; 8Arie-Nachimson et al. 2012; 9Cucuringu et al. 2012; 10Journ´ ee et al. 2010; 11Tan et al. 2014;

12Cayton, Dasgupta 2006

SLIDE 8

4/54

Applications

Sphere: principal geodesic analysis1, 1-bit compressed sensing2 Stiefel manifold: eigenvalue-, assignment-, Procrustes problems3,

rthogonal dictionary learning4, binary coding5

Product of Stiefel manifolds: functional correspondence6, manifold learning7, structure-from-motion8, sensor localization9 Fixed-rank PSD: maxcut problems, sparse PCA10, matrix completion11, multidimensional scaling12 Oblique: ICA13, blind source separation14

1Zhang, Fletcher 2013; 2Boufounos, Baraniuk 2008; 3Ten Berghe 1977; 4Sun et al.

2015; 5Xia et al. 2015; 6Kovnatsky et al. 2013; 7Eynard et al. 2015; 8Arie-Nachimson et al. 2012; 9Cucuringu et al. 2012; 10Journ´ ee et al. 2010; 11Tan et al. 2014;

12Cayton, Dasgupta 2006; 13Absil, Gallivan 2006; 14Kleinsteuber, Shen 2012.

SLIDE 9

5/54

Toy example: eigenvalue problem

min

x∈Rn x⊺Ax

s.t. x⊺x = 1

SLIDE 10

5/54

Toy example: eigenvalue problem

min

x∈Rn x⊺Ax

s.t. x⊺x = 1

SLIDE 11

6/54

Optimization on the manifold: main idea

min

X∈M f(X)

where f ∶ M → R is a function on the manifold (scalar field) No global system of coordinates

Absil et al. 2009

SLIDE 12

6/54

Optimization on the manifold: main idea

min

X∈M f(X)

where f ∶ M → R is a function on the manifold (scalar field) No global system of coordinates Manifold M is locally homeomorphic to the tangent space TXM

Absil et al. 2009

SLIDE 13

6/54

Optimization on the manifold: main idea

min

X∈M f(X)

where f ∶ M → R is a function on the manifold (scalar field) No global system of coordinates Manifold M is locally homeomorphic to the tangent space TXM Intrinsic gradient ∇Mf ∶ M → TM such that f(“X + dV ”) = f(X) + ⟨∇Mf(X),dV ⟩TXM + O(∥dV ∥2)

Absil et al. 2009

SLIDE 14

6/54

Optimization on the manifold: main idea

min

X∈M f(X)

where f ∶ M → R is a function on the manifold (scalar field) No global system of coordinates Manifold M is locally homeomorphic to the tangent space TXM Intrinsic gradient = projection of the extrinsic gradient ∇Mf(X) = PTXM∇f(X)

Absil et al. 2009

SLIDE 15

6/54

Optimization on the manifold: main idea

min

X∈M f(X)

where f ∶ M → R is a function on the manifold (scalar field) No global system of coordinates Manifold M is locally homeomorphic to the tangent space TXM Intrinsic gradient = projection of the extrinsic gradient ∇Mf(X) = PTXM∇f(X) Exponential map expx ∶ TXM → M

Absil et al. 2009

SLIDE 16

6/54

Optimization on the manifold: main idea

min

X∈M f(X)

where f ∶ M → R is a function on the manifold (scalar field) No global system of coordinates Manifold M is locally homeomorphic to the tangent space TXM Intrinsic gradient = projection of the extrinsic gradient ∇Mf(X) = PTXM∇f(X) Exponential map expx ∶ TXM → M Moving vectors on M requires parallel transport

Absil et al. 2009

SLIDE 17

7/54

Optimization on the manifold: main idea

X(k) X(k+1) M Absil et al. 2009

SLIDE 18

7/54

Optimization on the manifold: main idea

X(k) ∇f(X(k)) PX(k) ∇Mf(X(k)) TX(k)M M Absil et al. 2009

SLIDE 19

7/54

Optimization on the manifold: main idea

X(k) ∇f(X(k)) PX(k) α(k)∇Mf(X(k)) TX(k)M M Absil et al. 2009

SLIDE 20

7/54

Optimization on the manifold: main idea

X(k) ∇f(X(k)) PX(k) α(k)∇Mf(X(k)) RX(k) X(k+1) TX(k)M M Absil et al. 2009

SLIDE 21

8/54

Optimization on the manifold

Algorithm 1 Conceptual algorithm for smooth optimization on M repeat Compute extrinsic gradient ∇f(X(k)) Projection: ∇Mf(X(k)) = PX(k)(∇f(X(k))) Compute step size α(k) along the descent direction −∇Mf(X(k)) Retraction: X(k+1) = RX(k)(−α(k)∇Mf(X(k))) k ← k + 1 until convergence;

Absil et al. 2009; Boumal et al. 2014

SLIDE 22

8/54

Optimization on the manifold

Algorithm 1 Conceptual algorithm for smooth optimization on M repeat Compute extrinsic gradient ∇f(X(k)) Projection: ∇Mf(X(k)) = PX(k)(∇f(X(k))) Compute step size α(k) along the descent direction −∇Mf(X(k)) Retraction: X(k+1) = RX(k)(−α(k)∇Mf(X(k))) k ← k + 1 until convergence; Projection and retraction operators are manifold-dependent

Absil et al. 2009; Boumal et al. 2014

SLIDE 23

8/54

Optimization on the manifold

Algorithm 1 Conceptual algorithm for smooth optimization on M repeat Compute extrinsic gradient ∇f(X(k)) Projection: ∇Mf(X(k)) = PX(k)(∇f(X(k))) Compute step size α(k) along the descent direction −∇Mf(X(k)) Retraction: X(k+1) = RX(k)(−α(k)∇Mf(X(k))) k ← k + 1 until convergence; Projection and retraction operators are manifold-dependent Typically expressed in closed form

Absil et al. 2009; Boumal et al. 2014

SLIDE 24

8/54

Optimization on the manifold

Algorithm 1 Conceptual algorithm for smooth optimization on M repeat Compute extrinsic gradient ∇f(X(k)) Projection: ∇Mf(X(k)) = PX(k)(∇f(X(k))) Compute step size α(k) along the descent direction −∇Mf(X(k)) Retraction: X(k+1) = RX(k)(−α(k)∇Mf(X(k))) k ← k + 1 until convergence; Projection and retraction operators are manifold-dependent Typically expressed in closed form “Black box”: need to provide only f(X) and gradient ∇f(X)

Absil et al. 2009; Boumal et al. 2014

SLIDE 25

9/54

Prototype problem

Non-smooth manifold optimization problem min

X∈M f(X) + g(AX)

f ∶ Rn×m → R is a smooth function g ∶ Rk×m → R is a non-smooth function A is k × n matrix M is a Riemannian submanifold of Rn×m

Kovnatsky, B, Glashoff 2015

SLIDE 26

9/54

Prototype problem

Non-smooth manifold optimization problem min

X∈M f(X) + g(AX)

f ∶ Rn×m → R is a smooth function g ∶ Rk×m → R is a non-smooth function A is k × n matrix M is a Riemannian submanifold of Rn×m Typical examples: g(X) = ∥X∥1, ∥X∥2,1-, or ∥X∥∗

Kovnatsky, B, Glashoff 2015

SLIDE 27

9/54

Prototype problem

Non-smooth manifold optimization problem min

X∈M f(X) + g(AX)

Smoothing Subgradient Splitting

Smoothing: Chen 2012 Subgradient: Ferreira, Oliveira 1998; Ledyaev, Zhu 2007; Kleinsteuber, Shen 2012 Splitting: Lai, Osher 2014; Neumann et al. 2014; Rosman et al. 2014

SLIDE 28

9/54

Prototype problem

Non-smooth manifold optimization problem min

X∈M f(X) + g(AX)

Smoothing Subgradient Splitting Approximate Problem dependent Problem dependent

Smoothing: Chen 2012 Subgradient: Ferreira, Oliveira 1998; Ledyaev, Zhu 2007; Kleinsteuber, Shen 2012 Splitting: Lai, Osher 2014; Neumann et al. 2014; Rosman et al. 2014

SLIDE 29

10/54

Manifold ADMM

SLIDE 30

11/54

Manifold ADMM

Non-smooth manifold optimization problem min

X∈M f(X) + g(AX)

Hestenes 1969; Powell 1969; Kovnatsky, Glashoff, B 2015

SLIDE 31

11/54

Manifold ADMM

Non-smooth manifold optimization problem equivalently written as min

X∈M,Z f(X) + g(Z) s.t. Z = AX

introducing an artificial variable Z and a linear constraint

Hestenes 1969; Powell 1969; Kovnatsky, Glashoff, B 2015

SLIDE 32

11/54

Manifold ADMM

Non-smooth manifold optimization problem equivalently written as min

X∈M,Z f(X) + g(Z) s.t. Z = AX

introducing an artificial variable Z and a linear constraint Apply the method of multipliers only to the constraint Z = AX min

X∈M,Z f(X) + g(Z) + ρ 2∥AX − Z + U∥2 F

Solve alternating w.r.t. X and Z and updating U ← U + AX − Z

Hestenes 1969; Powell 1969; Kovnatsky, Glashoff, B 2015

SLIDE 33

11/54

Manifold ADMM

Non-smooth manifold optimization problem equivalently written as min

X∈M,Z f(X) + g(Z) s.t. Z = AX

introducing an artificial variable Z and a linear constraint Apply the method of multipliers only to the constraint Z = AX min

X∈M,Z f(X) + g(Z) + ρ 2∥AX − Z + U∥2 F

Solve alternating w.r.t. X and Z and updating U ← U + AX − Z Problem breaks into Smooth manifold optimization sub-problem w.r.t. X, and Non-smooth unconstrained sub-problem w.r.t. Z

Hestenes 1969; Powell 1969; Kovnatsky, Glashoff, B 2015

SLIDE 34

12/54

MADMM

Algorithm 2 MADMM for non-smooth optimization on manifold M Initialize k ← 1, Z(1) = AX(1), U (1) = 0. repeat X-step: X(k+1) = argmin

X∈M

f(X) + ρ

2∥AX − Z(k) + U (k)∥2 F

Z-step: Z(k+1) = argmin

Z

g(Z) + ρ

2∥AX(k+1) − Z + U (k)∥2 F

Update U (k+1) = U (k) + AX(k+1) − Z(k+1) k ← k + 1 until convergence;

Kovnatsky, Glashoff, B 2015

SLIDE 35

12/54

MADMM

Algorithm 2 MADMM for non-smooth optimization on manifold M Initialize k ← 1, Z(1) = AX(1), U (1) = 0. repeat X-step: X(k+1) = argmin

X∈M

f(X) + ρ

2∥AX − Z(k) + U (k)∥2 F

Z-step: Z(k+1) = argmin

Z

g(Z) + ρ

2∥AX(k+1) − Z + U (k)∥2 F

Update U (k+1) = U (k) + AX(k+1) − Z(k+1) k ← k + 1 until convergence; Solver/number of optimization iterations in X- and Z-steps

Kovnatsky, Glashoff, B 2015

SLIDE 36

12/54

MADMM

Algorithm 2 MADMM for non-smooth optimization on manifold M Initialize k ← 1, Z(1) = AX(1), U (1) = 0. repeat X-step: X(k+1) = argmin

X∈M

f(X) + ρ

2∥AX − Z(k) + U (k)∥2 F

Z-step: Z(k+1) = argmin

Z

g(Z) + ρ

2∥AX(k+1) − Z + U (k)∥2 F

Update U (k+1) = U (k) + AX(k+1) − Z(k+1) k ← k + 1 until convergence; Solver/number of optimization iterations in X- and Z-steps X-step and Z-step in some problems have a closed form

Kovnatsky, Glashoff, B 2015

SLIDE 37

12/54

MADMM

Algorithm 2 MADMM for non-smooth optimization on manifold M Initialize k ← 1, Z(1) = AX(1), U (1) = 0. repeat X-step: X(k+1) = argmin

X∈M

f(X) + ρ

2∥AX − Z(k) + U (k)∥2 F

Z-step: Z(k+1) = argmin

Z

g(Z) + ρ

2∥AX(k+1) − Z + U (k)∥2 F

Update U (k+1) = U (k) + AX(k+1) − Z(k+1) k ← k + 1 until convergence; Solver/number of optimization iterations in X- and Z-steps X-step and Z-step in some problems have a closed form Parameter ρ > 0 can be chosen fixed or adapted

Kovnatsky, Glashoff, B 2015

SLIDE 38

13/54

Compressed modes

SLIDE 39

14/54

Laplacian eigenfunctions

The first k eigenfunctions of some Laplacian are used in...

Spectral clustering Dimensionality reduction Spectral distances

Ng et al. 2001; Belkin, Nyogi 2001; Coifman, Lafon 2006

SLIDE 40

15/54

Laplacian eigenfunctions

Find the first k eigenfunctions of an n × n Laplacian matrix ∆ min

Φ∈Rn×k tr(Φ⊺∆Φ) s.t. Φ⊺Φ = I

SLIDE 41

15/54

Laplacian eigenfunctions

Find the first k eigenfunctions of an n × n Laplacian matrix ∆ min

Φ∈Rn×k tr(Φ⊺∆Φ) s.t. Φ⊺Φ = I

tr(Φ⊺∆Φ) = ∑ij wij∥φi − φj∥2

F a.k.a. Dirichlet energy in physics

SLIDE 42

15/54

Laplacian eigenfunctions

Find the first k eigenfunctions of an n × n Laplacian matrix ∆ min

Φ∈Rn×k tr(Φ⊺∆Φ) s.t. Φ⊺Φ = I

tr(Φ⊺∆Φ) = ∑ij wij∥φi − φj∥2

F a.k.a. Dirichlet energy in physics

Many efficient solvers with global optimality guarantees

SLIDE 43

15/54

Laplacian eigenfunctions

Find the first k eigenfunctions of an n × n Laplacian matrix ∆ min

Φ∈Rn×k tr(Φ⊺∆Φ) s.t. Φ⊺Φ = I

tr(Φ⊺∆Φ) = ∑ij wij∥φi − φj∥2

F a.k.a. Dirichlet energy in physics

Many efficient solvers with global optimality guarantees 1D Euclidean Laplacian eigenfunctions = Fourier basis ∆e−iωx = −ω2e−iωx

SLIDE 44

15/54

Laplacian eigenfunctions

Find the first k eigenfunctions of an n × n Laplacian matrix ∆ min

Φ∈Rn×k tr(Φ⊺∆Φ) s.t. Φ⊺Φ = I

tr(Φ⊺∆Φ) = ∑ij wij∥φi − φj∥2

F a.k.a. Dirichlet energy in physics

Many efficient solvers with global optimality guarantees 1D Euclidean Laplacian eigenfunctions = Fourier basis ∆e−iωx = −ω2e−iωx Globally supported!

SLIDE 45

16/54

Laplacian eigenfunctions: 1D example

10 20 30 40 50 60 70 80 90 100 −0.2 0.2 10 20 30 40 50 60 70 80 90 100 −0.2 0.2 10 20 30 40 50 60 70 80 90 100 −0.2 0.2 10 20 30 40 50 60 70 80 90 100 −0.2 0.2 10 20 30 40 50 60 70 80 90 100 −0.2 0.2 10 20 30 40 50 60 70 80 90 100 −0.2 0.2

φ1 φ2 φ3 φ4 φ5 φ6

First eigenfunctions of a 1D Euclidean Laplacian

SLIDE 46

17/54

Laplacian eigenfunctions: non-Euclidean example

max min

First Laplacian eigenfunctions of a Laplacian on a triangular mesh

Neumann et al. 2014

SLIDE 47

18/54

Compressed modes

min

Φ∈Rn×k tr(Φ⊺∆Φ) + µ∥Φ∥1 s.t. Φ⊺Φ = I

Ozoli¸ nˇ s et al. 2013

SLIDE 48

18/54

Compressed modes

min

Φ∈Rn×k tr(Φ⊺∆Φ) + µ∥Φ∥1 s.t. Φ⊺Φ = I

Dirichlet energy = smoothness

Ozoli¸ nˇ s et al. 2013

SLIDE 49

18/54

Compressed modes

min

Φ∈Rn×k tr(Φ⊺∆Φ) + µ∥Φ∥1 s.t. Φ⊺Φ = I

Dirichlet energy = smoothness L1-norm = sparsity

Ozoli¸ nˇ s et al. 2013

SLIDE 50

18/54

Compressed modes

min

Φ∈Rn×k tr(Φ⊺∆Φ) + µ∥Φ∥1 s.t. Φ⊺Φ = I

Dirichlet energy = smoothness L1-norm = sparsity Smoothness + sparsity = localization

Ozoli¸ nˇ s et al. 2013

SLIDE 51

19/54

Compressed modes: 1D example

10 20 30 40 50 60 70 80 90 100 −2 2 4 6 10 20 30 40 50 60 70 80 90 100 −5 5 10 20 30 40 50 60 70 80 90 100 −5 5 10 20 30 40 50 60 70 80 90 100 −5 5 10 20 30 40 50 60 70 80 90 100 −5 5 10 20 30 40 50 60 70 80 90 100 −5 5

φ1 φ2 φ3 φ4 φ5 φ6

First compressed modes of a 1D Euclidean Laplacian

Ozoli¸ nˇ s et al. 2013

SLIDE 52

20/54

Compressed modes: non-Euclidean example

max min

First compressed modes

Neumann et al. 2014

SLIDE 53

20/54

Compressed modes: non-Euclidean example

max min

First Laplacian eigenfunctions

Neumann et al. 2014

SLIDE 54

21/54

Wannier functions

Maximally-localized Wannier functions in Si and GaAs crystals

Wannier 1937; Mostofi 2008

SLIDE 55

22/54

Splitting method for orthogonality constraints (SOC)

min

Φ∈Rn×k tr(Φ⊺∆Φ) + µ∥Φ∥1

s.t. Φ⊺Φ = I

Lai, Osher 2014

SLIDE 56

22/54

Splitting method for orthogonality constraints (SOC)

min

Φ,P,Q∈Rn×k tr(Φ⊺∆Φ) + µ∥Q∥1

s.t. P = Φ, Q = Φ, P ⊺P = I

Lai, Osher 2014

SLIDE 57

22/54

Splitting method for orthogonality constraints (SOC)

min

Φ,P,Q∈Rn×k tr(Φ⊺∆Φ) + µ∥Q∥1

s.t. P = Φ, Q = Φ, P ⊺P = I Algorithm 3 SOC method for computing compressed modes Initialize k ← 1, Φ(1), P (1) = Q(1) = Φ(1), U (1) = V (1) = 0 repeat Φ(k+1) = argmin

Φ

tr(Φ⊺∆Φ)+ ρ

2∥Φ−Q(k)+U (k)∥2 F+ ρ′ 2 ∥Φ−P (k)+V (k)∥2 F

Q(k+1) = argmin

Q

µ∥Q∥1 + ρ

2∥Φ(k+1) − Q + U (k)∥2 F

P (k+1) = argmin

P ρ′ 2 ∥Φ(k+1) − P + V (k)∥2 F

s.t. P ⊺P = I U (k+1) = U (k) + Φ(k+1) − Q(k+1) V (k+1) = V (k) + Φ(k+1) − P (k+1) k ← k + 1 until convergence;

Lai, Osher 2014

SLIDE 58

23/54

Compressed modes as manifold optimization

min

Φ∈S(n,k) tr(Φ⊺∆Φ) + µ∥Φ∥1

Stiefel manifold S(n,k) = {X ∈ Rn×k ∶ X⊺X = I}

Kovnatsky, Glashoff, B 2015

SLIDE 59

23/54

Compressed modes as manifold optimization

min

Φ∈S(n,k),Z tr(Φ⊺∆Φ) + µ∥Z∥1 + ρ 2∥Φ − Z + U∥2 F

Stiefel manifold S(n,k) = {X ∈ Rn×k ∶ X⊺X = I}

Kovnatsky, Glashoff, B 2015

SLIDE 60

23/54

Compressed modes as manifold optimization

min

Φ∈S(n,k),Z tr(Φ⊺∆Φ) + µ∥Z∥1 + ρ 2∥Φ − Z + U∥2 F

Stiefel manifold S(n,k) = {X ∈ Rn×k ∶ X⊺X = I} Sub-problem w.r.t. Φ: smooth manifold-constrained minimization of a quadratic function min

Φ∈S(n,k) tr(Φ⊺∆Φ) + ρ 2∥Φ − Z + U∥2 F

Kovnatsky, Glashoff, B 2015

SLIDE 61

23/54

Compressed modes as manifold optimization

min

Φ∈S(n,k),Z tr(Φ⊺∆Φ) + µ∥Z∥1 + ρ 2∥Φ − Z + U∥2 F

Stiefel manifold S(n,k) = {X ∈ Rn×k ∶ X⊺X = I} Sub-problem w.r.t. Φ: smooth manifold-constrained minimization of a quadratic function min

Φ∈S(n,k) tr(Φ⊺∆Φ) + ρ 2∥Φ − Z + U∥2 F

Sub-problem w.r.t. Z: sparse coding (Lasso) problem min

Z

∥Z∥1 + ρ

2∥Φ + U − Z∥2 F

Kovnatsky, Glashoff, B 2015; Chen et al. 1995; Tibshirani 1996

SLIDE 62

24/54

Compressed modes by MADMM

Algorithm 4 MADMM for computing compressed modes Input n × n Laplacian matrix ∆, parameter µ > 0 Output k first compressed modes of ∆ Initialize k ← 1, Φ(1) ←some orthonormal matrix, Z(1) = Φ(1), U (1) = 0 repeat Φ(k+1) = argmin

Φ∈S(n,k)

tr(Φ⊺∆Φ) + ρ

2∥Φ − Z(k) + U (k)∥2 F

Z(k+1) = Shrink µ

ρ (Φ(k+1) + U (k))

Update U (k+1) = U (k) + Φ(k+1) − Z(k+1) k ← k + 1 until convergence; where Shrinkα(x) = sign(x)max{0,∣x∣ − α} is the shrinkage operator

Kovnatsky, Glashoff, B 2015

SLIDE 63

25/54

Convergence

Convergence of MADMM with different random initializations (compressed modes problem of size n = 500,k = 10)

10−1 100 101 102 101 102 103

Time (sec) Cost Kovnatsky, Glashoff, B 2015

SLIDE 64

26/54

Convergence

Convergence of MADMM with different X-step solvers (compressed modes problem of size n = 500,k = 10)

10−1 100 101 102 101 102 103

2 3 5 2 3 5 Time (sec) Cost Trust regions Conjugate gradients Kovnatsky, Glashoff, B 2015; Manopt: Boumal et al. 2014

SLIDE 65

27/54

Convergence

Example of convergence of different methods (compressed modes problem of size n = 8 × 103,k = 10)

1,000 2,000 3,000 4,000 5,000 100 101 102 103

Time (sec) Cost Lai & Osher Neumann et al. MADMM Kovnatsky, Glashoff, B 2015; Lai, Osher 2014; Neumann et al. 2014

SLIDE 66

28/54

Scalability

Complexity of different methods (compressed modes problem of size n,k = 10)

1,000 2,000 3,000 4,000 5,000 10−1 100 101 102

Problem size n Time/iter (sec) Lai & Osher Neumann MADMM Kovnatsky, Glashoff, B 2015; Lai, Osher 2014; Neumann et al. 2014

SLIDE 67

29/54

Functional correspondence

SLIDE 68

30/54

Applications of shape correspondence

Texture mapping Pose transfer

B2, Kimmel 2007; Sumner et al. 2004

SLIDE 69

31/54

Shape correspondence

s S q Q t

Point-wise map t∶S → Q

SLIDE 70

31/54

Shape correspondence

s S q Q t s′ q′

Minimum-distortion point-wise map t∶S → Q

B2, Kimmel 2006

SLIDE 71

31/54

Shape correspondence

f F(S) g F(Q) linear T

Functional map T∶F(S) → F(Q)

Ovsjanikov et al. 2012

SLIDE 72

32/54

Functional correspondence

f g ↓ T ↓ Ovsjanikov et al. 2012

SLIDE 73

32/54

Functional correspondence

f g φ1 φ2 φk ψ1 ψ2 ψk ≈ a1 + a2 + ⋯ + ak ≈ b1 + b2 + ⋯ + bk ↓ T ↓ Ovsjanikov et al. 2012

SLIDE 74

32/54

Functional correspondence

f g φ1 φ2 φk ψ1 ψ2 ψk ≈ a1 + a2 + ⋯ + ak ≈ b1 + b2 + ⋯ + bk ↓ T ↓ ↓ C ↓

Representation in truncated Laplacain eigenbasis, T ≈ ΨCΦ⊺

Ovsjanikov et al. 2012

SLIDE 75

32/54

Functional correspondence

f g φ1 φ2 φk ψ1 ψ2 ψk ≈ a1 + a2 + ⋯ + ak ≈ b1 + b2 + ⋯ + bk ↓ T ↓ ↓ C ↓

Representation in truncated Laplacain eigenbasis, T ≈ ΨCΦ⊺ If T is area-preserving, C⊺C = I

Ovsjanikov et al. 2012

SLIDE 76

32/54

Functional correspondence

f g φ1 φ2 φk ψ1 ψ2 ψk ≈ a1 + a2 + ⋯ + ak ≈ b1 + b2 + ⋯ + bk ↓ T ↓ ↓ C ↓

Representation in truncated Laplacain eigenbasis, T ≈ ΨCΦ⊺ If T is area-preserving, C⊺C = I Represent C = XY ⊺, then T ≈ ˆ Ψˆ Φ⊺ = ΨX(ΦY )⊺ (rotation of bases)

Ovsjanikov et al. 2012

SLIDE 77

32/54

Functional correspondence

f g φ1 φ2 φk ψ1 ψ2 ψk ≈ a1 + a2 + ⋯ + ak ≈ b1 + b2 + ⋯ + bk ↓ T ↓ ↓ C ↓

Representation in truncated Laplacain eigenbasis, T ≈ ΨCΦ⊺ If T is area-preserving, C⊺C = I Represent C = XY ⊺, then T ≈ ˆ Ψˆ Φ⊺ = ΨX(ΦY )⊺ (rotation of bases) Given known corresponding functions F = (f1,⋯,fq) and G = (g1,⋯,gq), find C by solving linear system CΦ⊺F = Ψ⊺G

Ovsjanikov et al. 2012

SLIDE 78

32/54

Functional correspondence

f g φ1 φ2 φk ψ1 ψ2 ψk ≈ a1 + a2 + ⋯ + ak ≈ b1 + b2 + ⋯ + bk ↓ T ↓ ↓ C ↓

Representation in truncated Laplacain eigenbasis, T ≈ ΨCΦ⊺ If T is area-preserving, C⊺C = I Represent C = XY ⊺, then T ≈ ˆ Ψˆ Φ⊺ = ΨX(ΦY )⊺ (rotation of bases) Given known corresponding Fourier coefficients A = Φ⊺F and B = Ψ⊺G, find C by solving linear system CA = B

Ovsjanikov et al. 2012

SLIDE 79

33/54

Functional correspondence in shape collection

S1 S2 SL ⋱ AijCij ≈ Bij Si Sj Kovnatsky, B2, Glashoff, Kimmel 2013; Kovnatsky, Glashoff, B 2015

SLIDE 80

33/54

Functional correspondence in shape collection

X1 X2 XL Xi Xj S1 S2 SL ⋱ AijXi ≈ BijXj Si Sj Kovnatsky, B2, Glashoff, Kimmel 2013; Kovnatsky, Glashoff, B 2015

SLIDE 81

34/54

Functional correspondence as manifold optimization

min

(X1,⋯,XL)∈SL(k,k) ∑ i≠j

∥AijXi − BijXj∥2,1 + µ

L

∑

i=1

tr(X⊺

i ΛiXi)

where Aij,Bij are Fourier coefficients of given corresponding functions

n shapes Si,Sj, and Λi are the first k eigenvalues of Laplacian ∆Si

Eynard, Kovnatsky, B2, Glashoff 2012; Kovnatsky, Glashoff, B 2015

SLIDE 82

34/54

Functional correspondence as manifold optimization

min

(X1,⋯,XL)∈SL(k,k) ∑ i≠j

∥AijXi − BijXj∥2,1 + µ

L

∑

i=1

tr(X⊺

i ΛiXi)

where Aij,Bij are Fourier coefficients of given corresponding functions

n shapes Si,Sj, and Λi are the first k eigenvalues of Laplacian ∆Si

Joint diagonalization of Laplacians: find new bases ˆ Φi = ΦiXi that approximately diagonalize ∆i and are coupled Optimization on product of Stiefel manifolds SL(k,k)

Eynard, Kovnatsky, B2, Glashoff 2012; Kovnatsky, Glashoff, B 2015

SLIDE 83

34/54

Functional correspondence as manifold optimization

min

(X1,⋯,XL)∈SL(k,k) ∑ i≠j

∥AijXi − BijXj∥2,1 + µ

L

∑

i=1

tr(X⊺

i ΛiXi)

where Aij,Bij are Fourier coefficients of given corresponding functions

n shapes Si,Sj, and Λi are the first k eigenvalues of Laplacian ∆Si

Joint diagonalization of Laplacians: find new bases ˆ Φi = ΦiXi that approximately diagonalize ∆i and are coupled Optimization on product of Stiefel manifolds SL(k,k) L2,1-norm allows to cope with outliers in correspondence data

Eynard, Kovnatsky, B2, Glashoff 2012; Kovnatsky, Glashoff, B 2015

SLIDE 84

34/54

Functional correspondence as manifold optimization

min

(X1,⋯,XL)∈SL(k,k) ∑ i≠j

∥AijXi − BijXj∥2,1 + µ

L

∑

i=1

tr(X⊺

i ΛiXi)

where Aij,Bij are Fourier coefficients of given corresponding functions

n shapes Si,Sj, and Λi are the first k eigenvalues of Laplacian ∆Si

Joint diagonalization of Laplacians: find new bases ˆ Φi = ΦiXi that approximately diagonalize ∆i and are coupled Optimization on product of Stiefel manifolds SL(k,k) L2,1-norm allows to cope with outliers in correspondence data X-step: manifold-constrained minimization of a quadratic function

Eynard, Kovnatsky, B2, Glashoff 2012; Kovnatsky, Glashoff, B 2015

SLIDE 85

34/54

Functional correspondence as manifold optimization

min

(X1,⋯,XL)∈SL(k,k) ∑ i≠j

∥AijXi − BijXj∥2,1 + µ

L

∑

i=1

tr(X⊺

i ΛiXi)

where Aij,Bij are Fourier coefficients of given corresponding functions

n shapes Si,Sj, and Λi are the first k eigenvalues of Laplacian ∆Si

Joint diagonalization of Laplacians: find new bases ˆ Φi = ΦiXi that approximately diagonalize ∆i and are coupled Optimization on product of Stiefel manifolds SL(k,k) L2,1-norm allows to cope with outliers in correspondence data X-step: manifold-constrained minimization of a quadratic function Z-step: one iteration of shrinkage

Eynard, Kovnatsky, B2, Glashoff 2012; Kovnatsky, Glashoff, B 2015

SLIDE 86

35/54

Correspondence data

Example of correspondence data (10% of outliers shown in red)

Kovnatsky, Glashoff, B 2015

SLIDE 87

36/54

Correspondence quality

Robust (MADMM) Least squares Kovnatsky, Glashoff, B 2015

SLIDE 88

37/54

Correspondence quality

Correspondence quality evaluated using Princeton protocol

5 ⋅ 10−2 0.1 0.15 0.2 0.25 0.2 0.4 0.6 0.8 1

% geodesic diameter % of correspondence LS MADMM Kovnatsky, Glashoff, B 2015; data: B2, Kimmel 2008, benchmark: Kim et al. 2011

SLIDE 89

38/54

Convergence

Convergence of different methods

2 4 6 8 10 100.2 100.4

10-4 10-6 10-8 Time (sec) Cost Smoothing MADMM Kovnatsky, Glashoff, B 2015

SLIDE 90

39/54

Multimodal spectral clustering

Uncoupled No outliers

100% Kovnatsky, Glashoff, B 2015; Eynard, Kovnatsky, B2, Glashoff 2012

SLIDE 91

39/54

Multimodal spectral clustering

Uncoupled No outliers

100%

Coupled (L2) No outliers

53% Kovnatsky, Glashoff, B 2015; Eynard, Kovnatsky, B2, Glashoff 2012

SLIDE 92

39/54

Multimodal spectral clustering

Uncoupled No outliers

100%

Coupled (L2) No outliers

53%

Coupled (L2) 10% outliers

72% Kovnatsky, Glashoff, B 2015; Eynard, Kovnatsky, B2, Glashoff 2012

SLIDE 93

39/54

Multimodal spectral clustering

Uncoupled No outliers

100%

Coupled (L2) No outliers

53%

Coupled (L2) 10% outliers

72%

Coupled (L2,1) 10% outliers

82% Kovnatsky, Glashoff, B 2015; Eynard, Kovnatsky, B2, Glashoff 2012

SLIDE 94

40/54

Multidimensional scaling

SLIDE 95

41/54

Multidimensional scaling D = X =

7 1 9 2 13 10 2 7 2 13 9 1 2 2 2 2 14 2 7 9 3 14 1 2 1 3 2 9 10 7

MDS problem: given an n × n (squared) distance matrix D, find a k-dimensional configuration of points X ∈ Rn×k such that ∥xi − xj∥2

2 ≈ dij

Cayton, Dasgupta 2006

SLIDE 96

42/54

Similarity vs Distance

Equivalence between distances and similarities

(Squared) distances Similarities EDM PSD

dist(B) = (bii + bjj − 2bij) B = − 1

2HDH

where H = I − 1

n11⊺ is the double-centering matrix

Sch¨

nberg 1938; Dattoro 2005; Cayton, Dasgupta 2006

SLIDE 97

42/54

Similarity vs Distance

Equivalence between distances and similarities

(Squared) distances Similarities EDM PSD

B = − 1

2HDH

B∗ = UΛ+U⊺

where H = I − 1

n11⊺ is the double-centering matrix

Sch¨

nberg 1938; Dattoro 2005; Cayton, Dasgupta 2006

SLIDE 98

43/54

Classical MDS

Algorithm 5 Classical MDS Input squared distance matrix D Compute similarity by double centering: B = − 1

2HDH

Perform eigendecomposition B = UΛU ⊺ and take the largest k positive eigenvalues Λk and corresponding eigenvectors Uk Output X = UkΛ1/2

k

Young, Householder 1938

SLIDE 99

43/54

Classical MDS

Algorithm 5 Classical MDS Input squared distance matrix D Compute similarity by double centering: B = − 1

2HDH

Perform eigendecomposition B = UΛU ⊺ and take the largest k positive eigenvalues Λk and corresponding eigenvectors Uk Output X = UkΛ1/2

k

Classical MDS as optimization problem: minimize the strain min

X∈Rn×k ∥HDH − XX⊺∥2 F

Young, Householder 1938

SLIDE 100

44/54

Sensitivity to outliers

Error dispersion by double-centering

(Squared) distance matrix Similarity matrix ǫ ǫ/n ǫ ǫ/n2

B = − 1

2HDH

Cayton, Dasgupta 2006

SLIDE 101

45/54

Sensitivity to outliers

Seattle SF LA Denver NY WDC Atlanta Miami Houston Chicago

Distances between 10 US cities computed with classical MDS

Kruskal, Wish 1978; Cayton, Dasgupta 2006

SLIDE 102

45/54

Sensitivity to outliers

Seattle SF LA Denver NY WDC Atlanta Miami Houston Chicago

Distances between 10 US cities computed with classical MDS with distance between NY and LA doubled

Kruskal, Wish 1978; Cayton, Dasgupta 2006

SLIDE 103

46/54

Robust Euclidean embedding (REE)

Minimize a robust norm (instead of the Frobenius norm) min

D∗∈EDM∥D − D∗∥1

and then recover k-dimensional X from D∗ using classical MDS

Cayton, Dasgupta 2006

SLIDE 104

46/54

Robust Euclidean embedding (REE)

Minimize a robust norm (instead of the Frobenius norm) min

D∗∈EDM∥D − D∗∥1

and then recover k-dimensional X from D∗ using classical MDS Non-smooth

Cayton, Dasgupta 2006

SLIDE 105

46/54

Robust Euclidean embedding (REE)

Minimize a robust norm (instead of the Frobenius norm) min

D∗∈EDM∥D − D∗∥1

and then recover k-dimensional X from D∗ using classical MDS Non-smooth Can be formulated as a semi-definite program (SDP), or

Cayton, Dasgupta 2006

SLIDE 106

46/54

Robust Euclidean embedding (REE)

Minimize a robust norm (instead of the Frobenius norm) min

D∗∈EDM∥D − D∗∥1

and then recover k-dimensional X from D∗ using classical MDS Non-smooth Can be formulated as a semi-definite program (SDP), or Solved by subgradient minimization

Cayton, Dasgupta 2006

SLIDE 107

47/54

REE as manifold optimization

min

B∈S+(n,k)∥D − dist(B)∥1

Manifold of fixed-rank positive semi-definite matrices S+(n,k) = {X ∈ Rn×n ∶ X = X⊺ ⪰ 0, rank(X) = k} Only non-smooth function (f ≡ 0) X-step: manifold-constrained minimization of a quadratic function Z-step: one iteration of shrinkage

Kovnatsky, Glashoff, B 2015

SLIDE 108

48/54

REE by MADMM

Algorithm 6 MADMM for solving the REE problem Input squared distance matrix D Initialize k ← 1, Z(1) = X(1), U (1) = 0 repeat X-step: B(k+1) = argmin

B∈S+(n,k)

∥dist(B(k+1)) − Z(k) − D + U (k)∥2

F

Z-step: Z(k+1) = Shrink 1

ρ

(dist(B(k+1)) − D + U (k)) Update U (k+1) = U (k) + dist(B(k+1)) − D − Z(k+1) k ← k + 1 until convergence;

Kovnatsky, Glashoff, B 2015

SLIDE 109

49/54

Robust Euclidean embedding example

Groundtruth Classical MDS MADMM

Embedding of distanced between 500 US cities corrupted by sparse noise (doubling the distance between a few pairs of cities)

Kovnatsky, Glashoff, B 2015

SLIDE 110

50/54

Scalability of REE

Complexity of different methods for REE problem of different size n

200 400 600 800 1,000 10−2 100 102

Problem size n Time/iter (sec) SDP Subgradient MADMM Kovnatsky, Glashoff, B 2015; Cayton, Dasgupta 2006

SLIDE 111

51/54

Convergence

Convergence of different methods on REE problem of size n = 500

20 40 60 80 100 103.5 104

10-3 10-4 10-5 10-2 Time (sec) Stress Subgradient MADMM Kovnatsky, Glashoff, B 2015; Cayton, Dasgupta 2006

SLIDE 112

52/54

Conclusions

Non-smooth manifold optimization problems are ubiquitous in machine learning, pattern recognition, signal processing, and computer graphics applications MADMM is a generic algorithm for such problems Any manifold, any function Very simple to implement No parameters to tune

A. Kovnatsky, K. Glashoff, M. M. Bronstein, ‘MADMM: a generic algorithm for

non-smooth optimization on manifolds’, arXiv:1505.07676, May 2015

SLIDE 113

53/54

A. Kovnatsky

Funded by

SLIDE 114

54/54