Optimization and curve fitting on manifolds Pierre-Antoine Absil - - PowerPoint PPT Presentation

optimization and curve fitting on manifolds
SMART_READER_LITE
LIVE PREVIEW

Optimization and curve fitting on manifolds Pierre-Antoine Absil - - PowerPoint PPT Presentation

Optimization and curve fitting on manifolds Pierre-Antoine Absil (Dept. of Mathematical Engineering, UCLouvain) Journ ee conjointe des GDR MIA et ISIS Optimisation G eom etrique sur les Vari et es 21st November 2014 1


slide-1
SLIDE 1

Optimization and curve fitting on manifolds

Pierre-Antoine Absil (Dept. of Mathematical Engineering, UCLouvain)

Journ´ ee conjointe des GDR MIA et ISIS Optimisation G´ eom´ etrique sur les Vari´ et´ es 21st November 2014

1

slide-2
SLIDE 2

Optimization on Manifolds in one picture

M f R

2

slide-3
SLIDE 3

Optimization on Manifolds in one picture

M f R x

3

slide-4
SLIDE 4

A book

http://press.princeton.edu/titles/8586.html

Optimization Algorithms on Matrix Manifolds P.-A. Absil, R. Mahony, R. Sepulchre Princeton University Press, January 2008

  • 1. Introduction
  • 2. Motivation and applications
  • 3. Matrix manifolds: first-order geometry
  • 4. Line-search algorithms
  • 5. Matrix manifolds: second-order geometry
  • 6. Newton’s method
  • 7. Trust-region methods
  • 8. A constellation of superlinear algorithms

4

slide-5
SLIDE 5

A toolbox

http://www.manopt.org/

Ref: Nicolas Boumal et al, Manopt, a Matlab toolbox for optimization on manifolds, JMLR 15(Apr) 1455-1459, 2014.

5

slide-6
SLIDE 6

Optimization on manifolds: an introduction Motivation and problem formulation

Optimization on Manifolds in one picture

M f R x

6

slide-7
SLIDE 7

Optimization on manifolds: an introduction Motivation and problem formulation

Why general manifolds? – Motivating examples

Given A = AT ∈ Rn×n Given A = AT ∈ Rn×n, and N = diag(p, p − 1, . . . , 1), min f (X) = − trace(X TAXN) min f (Y ) = − trace

  • (Y TY )−1(Y TAY )
  • subj. to X ∈ Rn×p : X TX = I
  • subj. to Y ∈ Rn×p

(i.e., Y full rank)

f

R

Y R f YM f (YM) = f (Y )

Feasible set: St(p, n) Feasible set: Gr(p, n) = {X ∈ Rn×p : X TX = I} =

  • {YM : M ∈ Rp×p

} : Y ∈ Rn×p

  • Embedded submanifold

Quotient manifold

7

slide-8
SLIDE 8

Optimization on manifolds: an introduction Specific manifolds

Optimization on Manifolds in one picture

M f R x

8

slide-9
SLIDE 9

Optimization on manifolds: an introduction Specific manifolds

Specific manifolds, and where they appear

◮ Stiefel manifold St(p, n) and orthogonal group Op = St(n, n)

St(p, n) = {X ∈ Rn×p : X TX = Ip} Applications: computer vision; principal component analysis; independent component analysis...

◮ Grassmann manifold Gr(p, n)

Set of all p-dimensional subspaces of Rn Applications: various dimension reduction problems...

◮ Set of fixed-rank PSD matrices S+(p, n). A quotient representation:

X ∼ Y ⇔ ∃Q ∈ Op : Y = XQ Applications: Low-rank approximation of symmetric matrices; algorithms for (large-scale) semidefinite programming...

9

slide-10
SLIDE 10

Optimization on manifolds: an introduction Specific manifolds

Specific manifolds, and where they appear

◮ Low-rank manifold Rm×n rkp

Rm×n

rkp

= {M ∈ Rm×n : rk(M) = p} Applications: dimensionality reduction; model for matrix completion...

◮ Shape manifold On\Rn×p ∗

Y ∼ X ⇔ ∃U ∈ On : Y = UX Applications: shape analysis

◮ Oblique manifold Rn×p ∗

/Sdiag+ Rn×p

/Sdiag+ ≃ {Y ∈ Rn×p

: diag(Y TY ) = Ip} Applications: blind source separation; factor analysis (oblique Procrustes problem)...

10

slide-11
SLIDE 11

Optimization on manifolds: an introduction Mathematical background

Smooth optimization problems on general manifolds

M f R x f ∈ C ∞(x)? ϕ(U) Rd ϕ Yes iff f ◦ ϕ−1 ∈ C ∞(ϕ(x)) ψ U V ψ(V) ϕ(U ∩ V) ψ(U ∩ V) ψ ◦ ϕ−1 ϕ ◦ ψ−1 C ∞ Rd

11

slide-12
SLIDE 12

Optimization on manifolds: an introduction Mathematical background

Optimization on manifolds in its most abstract formulation

M f R x f ∈ C ∞(x)? ϕ(U) Rd ϕ Yes iff f ◦ ϕ−1 ∈ C ∞(ϕ(x)) ψ U V ψ(V) ϕ(U ∩ V) ψ(U ∩ V) ψ ◦ ϕ−1 ϕ ◦ ψ−1 C ∞ Rd

Given:

◮ A set M endowed (explicitly or implicitly) with a manifold structure

(i.e., a collection of compatible charts).

◮ A function f : M → R, smooth in the sense of the manifold

structure. Task: Compute a local minimizer of f .

12

slide-13
SLIDE 13

Optimization on manifolds: an introduction Algorithms on abstract manifolds

Algorithms formulated on abstract manifolds

◮ Steepest-descent

Needs: Riemannian structure and retraction

◮ Newton

Needs: affine connection and retraction

◮ Conjugate Gradients

Needs: Riemannian structure, retraction, and vector transport

◮ BFGS

Needs: needs Riemannian structre, retraction, and vector transport

◮ Trust Region

Needs: Riemannian structure and retraction

13

slide-14
SLIDE 14

Optimization on manifolds: an introduction Algorithms on abstract manifolds

Steepest descent on abstract manifolds

Required: Riemannian manifold M; retraction R on M. Iteration xk ∈ M → xk+1 ∈ M defined by

  • 1. Compute steepest-descent direction in TxkM:

ηk = − grad f (xk).

  • 2. Set

xk+1 := Rxk(tkηk) where tk is chosen using a line-search method.

R f x x+ grad f (x) 14

slide-15
SLIDE 15

Optimization on manifolds: an introduction Algorithms on abstract manifolds

Newton on abstract manifolds

Required: Riemannian manifold M; retraction R on M; affine connection ∇ on M; real-valued function f on M. Iteration xk ∈ M → xk+1 ∈ M defined by

  • 1. Solve the Newton equation

Hess f (xk)ηk = − grad f (xk) for the unknown ηk ∈ TxkM, where Hess f (xk)ηk := ∇ηk grad f .

  • 2. Set

xk+1 := Rxk(ηk).

15

slide-16
SLIDE 16

Optimization on manifolds: an introduction Algorithms on abstract manifolds

Newton on submanifolds of Rn

Required: Riemannian submanifold M of Rn; retraction R on M; real-valued function f on M. Iteration xk ∈ M → xk+1 ∈ M defined by

  • 1. Solve the Newton equation

Hess f (xk)ηk = − grad f (xk) for the unknown ηk ∈ TxkM, where Hess f (xk)ηk := PTxk MD grad f (xk)[ηk].

  • 2. Set

xk+1 := Rxk(ηk).

16

slide-17
SLIDE 17

Optimization on manifolds: an introduction Algorithms on abstract manifolds

Newton on the unit sphere Sn−1

Required: real-valued function f on Sn−1. Iteration xk ∈ Sn−1 → xk+1 ∈ Sn−1 defined by

  • 1. Solve the Newton equation
  • Pxk D(grad f )(xk)[ηk] = − grad f (xk)

xTηk = 0, for the unknown ηk ∈ Rn, where Pxk = (I − xkxT

k ).

  • 2. Set

xk+1 := xk + ηk xk + ηk.

17

slide-18
SLIDE 18

Optimization on manifolds: an introduction Algorithms on abstract manifolds

Newton for Rayleigh quotient optimization on unit sphere

Iteration xk ∈ Sn−1 → xk+1 ∈ Sn−1 defined by

  • 1. Solve the Newton equation
  • Pxk A Pxk ηk − ηkxT

k Axk = − Pxk Axk,

xT

k ηk = 0,

for the unknown ηk ∈ Rn, where Pxk = (I − xkxT

k ).

  • 2. Set

xk+1 := xk + ηk xk + ηk.

18

slide-19
SLIDE 19

Optimization on manifolds: an introduction Algorithms on abstract manifolds

Conjugate Gradients on abstract manifolds

Require: Riemannian manifold M; vector transport T on M with associated retraction R; real-valued function f on M; initial iterate x0 ∈ M.

1: Set η0 = − grad f (x0). 2: for k = 0, 1, 2, . . . do :

Compute a step size αk and set xk+1 = Rxk(αkηk). (1)

:

Compute βk+1 and set ηk+1 = − grad f (xk+1) + βk+1Tαkηk(ηk). (2)

5: end for

Fletcher-Reeves: βk+1 = grad f (xk+1),grad f (xk+1)

grad f (xk),grad f (xk)

. Polak-Ribi` ere: βk+1 =

grad f (xk+1),grad f (xk+1)−Tαk ηk (grad f (xk)) grad f (xk),grad f (xk)

. Ref: PAA et al [AMS08, §8.3], Sato & Iwai [SI13].

19

slide-20
SLIDE 20

Optimization on manifolds: an introduction Algorithms on abstract manifolds

BFGS on abstract manifolds

1: Given: Riemannian manifold M with Riemannian metric g; vector

transport T on M with associated retraction R; smooth real-valued function f on M; initial iterate x0 ∈ M; initial Hessian approximation B0.

2: for k = 0, 1, 2, . . . do 3:

Obtain ηk ∈ TxkM by solving Bkηk = − grad f (xk).

4:

Compute step size αk and set xk+1 = Rxk(αkηk).

5:

Define sk = Tαηk(αηk) and yk = grad f (xk+1) − Tαηk(grad f (xk)).

6:

Define the linear operator Bk+1 : Txk+1M → Txk+1M by Bk+1p = ˜ Bkp− g(sk, ˜ Bkp) g(sk, ˜ Bksk) ˜ Bksk+ g(yk, p) g(yk, sk)yk for all p ∈ Txk+1M, (3) with ˜ Bk = Tαηk ◦ Bk ◦ (Tαηk)−1. (4)

7: end for

Ref: Qi et al [QGA10], Ring & Wirth [RW12].

20

slide-21
SLIDE 21

Optimization on manifolds: an introduction Algorithms on abstract manifolds

Trust region on abstract manifolds y v1 M TyM my η y+

Refs: PAA et al [ABG07], Huang et al [HAG14].

21

slide-22
SLIDE 22

Optimization on manifolds: an introduction A brief history

Optimization on Manifolds in one picture

M f R x

22

slide-23
SLIDE 23

Optimization on manifolds: an introduction A brief history

Some classics on Optimization On Manifolds (I)

R f x x+

Luenberger (1973), Introduction to linear and nonlinear programming. Luenberger mentions the idea of performing line search along geodesics, “which we would use if it were computationally feasible (which it definitely is not)”.

23

slide-24
SLIDE 24

Optimization on manifolds: an introduction A brief history

Some classics on Optimization On Manifolds (II)

Gabay (1982), Minimizing a differentiable function over a differential

  • manifold. Stepest descent along geodesics; Newton’s method along

geodesics; Quasi-Newton methods along geodesics. Smith (1994), Optimization techniques on Riemannian manifolds. Levi-Civita connection ∇; Riemannian exponential; parallel translation. But Remark 4.9: If Algorithm 4.7 (Newton’s iteration on the sphere for the Rayleigh quotient) is simplified by replacing the exponential update with the update xk+1 = xk + ηk xk + ηk then we obtain the Rayleigh quotient iteration.

24

slide-25
SLIDE 25

Optimization on manifolds: an introduction A brief history

Some classics on Optimization On Manifolds (III)

Manton (2002), Optimization algorithms exploiting unitary constraints “The present paper breaks with tradition by not moving along geodesics”. The geodesic update Expx η is replaced by a projective update π(x + η), the projection of the point x + η onto the manifold. Adler, Dedieu, Shub, et al. (2002), Newton’s method on Riemannian manifolds and a geometric model for the human spine. The exponential update is relaxed to the general notion of retraction. The geodesic can be replaced by any (smoothly prescribed) curve tangent to the search direction.

25

slide-26
SLIDE 26

Optimization on manifolds: an introduction A brief history

Optimization on Manifolds in one picture

M f R x

26

slide-27
SLIDE 27

Application: curve fitting

An Application: Curve Fitting

27

slide-28
SLIDE 28

Application: curve fitting

Sources

◮ Nonparametric curve fitting on manifolds:

◮ Chafik Samir, PAA, Anuj Srivastava, Eric Klassen, A gradient-descent

method for curve fitting on Riemannian manifolds, Foundations of Computational Mathematics, 12(1), pp. 49-73, 2012.

◮ Nicolas Boumal, PAA, Discrete regression methods on the cone of

positive-definite matrices, ICASSP 2011.

◮ Nicolas Boumal, PAA, A discrete regression method on manifolds and

its application to data on SO(n), IFAC World Congress 2011.

◮ Parametric curve fitting on manifolds (see Pierre-Yves’s talk):

◮ C. Samir, P. Van Dooren, D. Laurent, K. A. Gallivan, PAA, Elastic

morphing of 2D and 3D objects on a shape manifold, Lecture Notes in Computer Science, Volume 5627/2009, pp. 563-572, 2009

◮ Pierre-Yves Gousenbourger, Chafik Samir, PAA, Piecewise-B´

ezier C 1 interpolation on Riemannian manifolds with application to 2D shape morphing, ICPR 2014

◮ Antoine Arnould, Pierre-Yves Gousenbourger, Chafik Samir, PAA,

Fitting Smooth Paths on Riemannian manifolds: Endometrial Surface Reconstruction and Preoperative MRI-Based Navigation, submitted.

28

slide-29
SLIDE 29

Application: curve fitting Motivation and problem formulation

Curve fitting on manifolds

Γ R E M p0 p1 p2 γ(t0) γ(t1) γ(t2) γ

29

slide-30
SLIDE 30

Application: curve fitting Motivation and problem formulation

Curve fitting on manifolds: application to morphing

Γ R E Shape manifold γ

30

slide-31
SLIDE 31

Application: curve fitting Motivation and problem formulation

Curve fitting on manifolds: possible applications

Γ R E M p0 p1 p2 γ(t0) γ(t1) γ(t2) γ

Applications in noise reduction, resampling, and trajectory generation.

◮ Evolution of the paleomagnetic north pole, as in Jupp and

Kent [JK87]: M = S2, the sphere.

◮ Rigid body motion: M = SE(3), the special Euclidean group. ◮ Diffusion-Tensor MRI: M = S++ 3

, the set of all 3 × 3 symmetric positive-definite matrices.

◮ Morphing: M is a shape manifold. ◮ ...

31

slide-32
SLIDE 32

Application: curve fitting Motivation and problem formulation

Curve fitting on manifolds: problem considered

Γ R E M p0 p1 p2 γ(t0) γ(t1) γ(t2) γ

Given: Riemannian manifold M; p0, . . . , pN ∈ M; 0 = t0 < · · · < tN = 1. Goal: find the curve γ : [0, 1] → M that minimizes E2 : Γ2 → R : E2(γ) = Ed(γ) + λEs,2(γ) = 1 2

N

  • i=0

d2(γ(ti), pi) + λ 2 1 D2γ dt2 , D2γ dt2 dt, where Γ2 is the Sobolev space H2([0, 1], M).

32

slide-33
SLIDE 33

Application: curve fitting Previous work

Previous work

Machado and Silva Leite [ML06, Mac06] consider E2 : Γ2 → R : E2(γ) = 1 2

N

  • i=0

d2(γ(ti), pi) + λ 2 1 D2γ dt2 , D2γ dt2 dt, and obtain the Euler-Lagrange equations (stationarity conditions): On each subinterval, D4γ dt4 + R D2γ dt2 , ˙ γ

  • ˙

γ = 0, and at the knot points, Dkγ dtk (t+

i )−Dkγ

dtk (t−

i ) =

     0, k = 0, 1, (i = 1, . . . , N − 1) 0, k = 2, (i = 0, . . . , N)

1 λ Exp−1 γ(ti)(pi),

k = 3, (i = 0, . . . , N) , with D2γ dt2 (t−

0 ) = D3γ

dt3 (t−

0 ) = D2γ

dt2 (t+

N ) = D3γ

dt3 (t+

N ) = 0.

33

slide-34
SLIDE 34

Application: curve fitting Previous work

Gradient-descent for discretized E2

Objective: E2(γ) = 1

2

N

i=0 d2(γ(ti), pi) + λ 2

1

0 D2γ dt2 , D2γ dt2 dt. ◮ Finite differences in Rn:

¨ x0 = 2 ∆tf + ∆tb 1 ∆tf∆tb [∆tb(xf − x0) + ∆tf(xb − x0)] + O(∆t) (5)

34

slide-35
SLIDE 35

Application: curve fitting Previous work

Gradient-descent for discretized E2

Objective: E2(γ) = 1

2

N

i=0 d2(γ(ti), pi) + λ 2

1

0 D2γ dt2 , D2γ dt2 dt. ◮ Finite differences in Rn:

¨ x0 = 2 ∆tf + ∆tb 1 ∆tf∆tb [∆tb(xf − x0) + ∆tf(xb − x0)] + O(∆t) (5)

◮ Finite differences on a manifold:

¨ x0 ≈ 2 ∆tf + ∆tb 1 ∆tf∆tb

  • ∆tb Logx0 (xf) + ∆tf Logx0 (xb)
  • (6)

35

slide-36
SLIDE 36

Application: curve fitting Previous work

Gradient-descent for discretized E2

Objective: E2(γ) = 1

2

N

i=0 d2(γ(ti), pi) + λ 2

1

0 D2γ dt2 , D2γ dt2 dt. ◮ Finite differences in Rn:

¨ x0 = 2 ∆tf + ∆tb 1 ∆tf∆tb [∆tb(xf − x0) + ∆tf(xb − x0)] + O(∆t) (5)

◮ Finite differences on a manifold:

¨ x0 ≈ 2 ∆tf + ∆tb 1 ∆tf∆tb

  • ∆tb Logx0 (xf) + ∆tf Logx0 (xb)
  • (6)

◮ Discretized E2:

ˆ E2 : MNd → R : ˆ E2(γ) = 1 2

N

  • i=0

d2(pi, γsi) + λ 2

Nd

  • i=1

βiai2

γi

36

slide-37
SLIDE 37

Application: curve fitting Previous work

Illustrations on the sphere

Objective: E2(γ) = 1

2

N

i=0 d2(γ(ti), pi) + λ 2

1

0 D2γ dt2 , D2γ dt2 dt.

λ = 10−4 λ = 10−3 λ = 100

37

slide-38
SLIDE 38

Application: curve fitting Parametric curve fitting

Curve fitting on manifolds

Γ R E M p0 p1 p2 γ(t0) γ(t1) γ(t2) γ

38

slide-39
SLIDE 39

Application: curve fitting Parametric curve fitting

Curve fitting on manifolds: application to morphing

Γ R E Shape manifold γ

39

slide-40
SLIDE 40

Application: curve fitting Parametric curve fitting

Polynomial interpolation on manifolds

◮ Polynomial interpolation reminder: Given (t0, x0), . . . , (tn, xn) in Rd,

there is one and only one polynomial pn of degree at most n such that p(tk) = xk, k = 0, . . . , n.

40

slide-41
SLIDE 41

Application: curve fitting Parametric curve fitting

Polynomial interpolation on manifolds

◮ Polynomial interpolation reminder: Given (t0, x0), . . . , (tn, xn) in Rd,

there is one and only one polynomial pn of degree at most n such that p(tk) = xk, k = 0, . . . , n.

◮ pn(t) can be computed with Neville’s algorithm, based on the

formula Pi,j(t) = Pi,j−1(t) + t − ti tj − ti (Pi+1,j(t) − Pi,j−1(t)) , (7) where Pi,j stands for the polynomial of degree at most j − i that interpolates (ti, xi), . . . , (tj, xj). We have pn(t) = P0,n(t).

41

slide-42
SLIDE 42

Application: curve fitting Parametric curve fitting

Polynomial interpolation on manifolds

◮ Polynomial interpolation reminder: Given (t0, x0), . . . , (tn, xn) in Rd,

there is one and only one polynomial pn of degree at most n such that p(tk) = xk, k = 0, . . . , n.

◮ pn(t) can be computed with Neville’s algorithm, based on the

formula Pi,j(t) = Pi,j−1(t) + t − ti tj − ti (Pi+1,j(t) − Pi,j−1(t)) , (7) where Pi,j stands for the polynomial of degree at most j − i that interpolates (ti, xi), . . . , (tj, xj). We have pn(t) = P0,n(t).

◮ When x0, . . . , xn are on a manifold, (7) readily generalizes to

Pi,j(t) = ExpPi,j−1(t) t − ti tj − ti LogPi,j−1(t) Pi+1,j(t)

  • .

42

slide-43
SLIDE 43

Application: curve fitting Parametric curve fitting

Piecewise-polynomial interpolation on manifolds

◮ Polynomial interpolation on manifolds is prone to the Runge

phenomenon.

43

slide-44
SLIDE 44

Application: curve fitting Parametric curve fitting

Piecewise-polynomial interpolation on manifolds

◮ Polynomial interpolation on manifolds is prone to the Runge

phenomenon.

◮ Polynomial interpolation on manifolds is also prone to a Runge-like

effect!

44

slide-45
SLIDE 45

Application: curve fitting Parametric curve fitting

Piecewise-polynomial interpolation on manifolds

◮ Polynomial interpolation on manifolds is prone to the Runge

phenomenon.

◮ Polynomial interpolation on manifolds is also prone to a Runge-like

effect!

◮ Remedy: Piecewise-polynomial interpolation on manifolds.

45

slide-46
SLIDE 46

Application: curve fitting Parametric curve fitting

Piecewise-polynomial interpolation on manifolds

◮ Polynomial interpolation on manifolds is prone to the Runge

phenomenon.

◮ Polynomial interpolation on manifolds is also prone to a Runge-like

effect!

◮ Remedy: Piecewise-polynomial interpolation on manifolds. ◮ See Pierre-Yves Gousenbourger’s talk later today.

46

slide-47
SLIDE 47

Conclusion

Optimization on Manifolds in one picture

M f R x

47

slide-48
SLIDE 48

Conclusion

P.-A. Absil, C. G. Baker, and K. A. Gallivan, Trust-region methods

  • n Riemannian manifolds, Found. Comput. Math. 7 (2007), no. 3,

303–330. P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms

  • n matrix manifolds, Princeton University Press, Princeton, NJ,

2008. Wen Huang, P.-A. Absil, and K. A. Gallivan, A riemannian symmetric rank-one trust-region method, Mathematical Programming (2014), accepted for publication. Peter E. Jupp and John T. Kent, Fitting smooth paths to spherical data, J. Roy. Statist. Soc. Ser. C 36 (1987), no. 1, 34–46. MR MR887825 (88f:62066) Lu´ ıs Miguel Faustino Machado, Least squares problems on Riemannian manifolds, Ph.D. thesis, Department of Mathematics, University of Coimbra, 2006.

48

slide-49
SLIDE 49

Conclusion

Lu´ ıs Machado and F. Silva Leite, Fitting smooth paths on Riemannian manifolds, Int. J. Appl. Math. Stat. 4 (2006), no. J06, 25–53. Chunhong Qi, Kyle A. Gallivan, and P.-A. Absil, An efficient bfgs algorithm for riemannian optimization, Proceedings of the 19th International Symposium on Mathematical Theory of Networks and Systems (MTNS 2010), 2010, pp. 2221–2227. Wolfgang Ring and Benedikt Wirth, Optimization methods on Riemannian manifolds and their application to shape space, SIAM J.

  • Optim. 22 (2012), no. 2, 596–627.

Hiroyuki Sato and Toshihiro Iwai, Convergence analysis for the riemannian conjugate gradient method, 2013, arXiv:1302.0125.

49