Tutorial on Gradient methods for non-convex problems Part 1 - - PowerPoint PPT Presentation

โ–ถ
tutorial on gradient methods
SMART_READER_LITE
LIVE PREVIEW

Tutorial on Gradient methods for non-convex problems Part 1 - - PowerPoint PPT Presentation

Tutorial on Gradient methods for non-convex problems Part 1 Guillaume Garrigos November 28th ENS What can we expect? Does my algorithm converge? lim + exists? What is the nature of the limit


slide-1
SLIDE 1

Tutorial on Gradient methods for non-convex problems

Part 1

Guillaume Garrigos โ€“ November 28th โ€“ ENS

slide-2
SLIDE 2

What can we expect?

  • Does my algorithm converge? ๐‘ฆโˆž โ‰” lim

๐‘™โ†’+โˆž ๐‘ฆ๐‘™ exists?

  • What is the nature of the limit ๐‘ฆโˆž?

Global/Local minima? Saddle?

slide-3
SLIDE 3

General results

Let 0 โ‰ช ๐œ‡๐‘™ โ‰ช 2/๐‘€, then: 1) ๐‘” ๐‘ฆ๐‘™ is decreasing 2) if ๐‘ฆ๐‘™๐‘œ โ†’ ๐‘ฆโˆž then ๐›ผ๐‘” ๐‘ฆโˆž = 0 3) Isolated local minima are attractive Proposition f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™

[Pro 1.2.3, 1.2.5 & Ex. 1.2.18] Bertsekas, Nonlinear Programming, 1999.

slide-4
SLIDE 4

General results

Let 0 โ‰ช ๐œ‡๐‘™ โ‰ช 2/๐‘€, then: 1) ๐‘” ๐‘ฆ๐‘™ is decreasing 2) if ๐‘ฆ๐‘™๐‘œ โ†’ ๐‘ฆโˆž then ๐›ผ๐‘” ๐‘ฆโˆž = 0 3) Isolated local minima are attractive Proposition f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™

[Pro 1.2.3, 1.2.5 & Ex. 1.2.18] Bertsekas, Nonlinear Programming, 1999.

๐‘ฆ๐‘™ can have no limit !! No convergence โ‰  Lack of regularity, but rather wild ildness

slide-5
SLIDE 5

General results

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™

[Ex. 3] Palis, de Melo, Geometric Theory of Dynamical Systems: An Introduction, 1982. H.B.Curry, The method of steepest descent for nonlinear minimization problems, 1944.

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

slide-6
SLIDE 6
slide-7
SLIDE 7

How to guarantee convergence?

  • A sufficient condition for ๐‘ฆ(๐‘ข) to converge is ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข < โˆž

slide-8
SLIDE 8

How to guarantee convergence?

  • A sufficient condition for ๐‘ฆ(๐‘ข) to converge is ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข < โˆž

  • It is a classic result that `Finite Lengthโ€™ implies convergence
  • Converse is not true (but tricky):

๐‘ฆ๐‘œ โ‰” ฯƒ๐‘™=1

๐‘œ โˆ’1 ๐‘™ ๐‘™

โ†’ โˆ’log(2) but ฯƒ ๐‘ฆ๐‘œ+1 โˆ’ ๐‘ฆ๐‘œ = ฯƒ

1 ๐‘œ

slide-9
SLIDE 9

How to guarantee convergence?

  • A sufficient condition for ๐‘ฆ(๐‘ข) to converge is ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข < โˆž

  • Length is invariant up to a reparametrization in time
slide-10
SLIDE 10

How to guarantee convergence?

  • A sufficient condition for ๐‘ฆ(๐‘ข) to converge is ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข < โˆž

  • Length is invariant up to a reparametrization in time
  • We have a natural diffeomorphism ๐‘” โˆ˜ ๐‘ฆ โˆถ [0, โˆž โ†’ ๐‘กโˆž, ๐‘ก0] where

๐‘ก0 = ๐‘”(๐‘ฆ0) and ๐‘กโˆž = lim

โˆž ๐‘”(๐‘ฆ ๐‘ข )

slide-11
SLIDE 11

How to guarantee convergence?

  • A sufficient condition for ๐‘ฆ(๐‘ข) to converge is ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข < โˆž

  • Length is invariant up to a reparametrization in time
  • We have a natural diffeomorphism ๐‘” โˆ˜ ๐‘ฆ โˆถ [0, โˆž โ†’ ๐‘กโˆž, ๐‘ก0] where

๐‘ก0 = ๐‘”(๐‘ฆ0) and ๐‘กโˆž = lim

โˆž ๐‘”(๐‘ฆ ๐‘ข )

  • With ๐‘ก = ๐‘”(๐‘ฆ ๐‘ข ) we can define ๐‘ง ๐‘ก = ๐‘ฆ

๐‘” โˆ˜ ๐‘ฆ โˆ’1 ๐‘ก s.t. แˆถ ๐‘ง ๐‘ก = ๐›ผ๐‘” ๐‘ง ๐‘ก ๐›ผ๐‘” ๐‘ง ๐‘ก

โˆ’2

slide-12
SLIDE 12

How to guarantee convergence?

  • A sufficient condition for ๐‘ฆ(๐‘ข) to converge is ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข < โˆž

  • Length is invariant up to a reparametrization in time
  • We have a natural diffeomorphism ๐‘” โˆ˜ ๐‘ฆ โˆถ [0, โˆž โ†’ ๐‘กโˆž, ๐‘ก0] where

๐‘ก0 = ๐‘”(๐‘ฆ0) and ๐‘กโˆž = lim

โˆž ๐‘”(๐‘ฆ ๐‘ข )

  • With ๐‘ก = ๐‘”(๐‘ฆ ๐‘ข ) we can define ๐‘ง ๐‘ก = ๐‘ฆ

๐‘” โˆ˜ ๐‘ฆ โˆ’1 ๐‘ก s.t. แˆถ ๐‘ง ๐‘ก = ๐›ผ๐‘” ๐‘ง ๐‘ก ๐›ผ๐‘” ๐‘ง ๐‘ก

โˆ’2

  • So the length becomes ืฌ

๐‘กโˆž ๐‘ก0 1 โ€–๐›ผ๐‘” ๐‘ง ๐‘ก โ€– ๐‘’๐‘ก

Ignore ๐›ผ๐‘” ๐‘ง ๐‘ก = 0 Finite interval !

slide-13
SLIDE 13

How to guarantee convergence?

  • How to upper bound ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข = ืฌ

๐‘กโˆž ๐‘ก0 1 โ€–๐›ผ๐‘” ๐‘ง ๐‘ก โ€– ๐‘’๐‘ก ?

slide-14
SLIDE 14

How to guarantee convergence?

  • How to upper bound ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข = ืฌ

๐‘กโˆž ๐‘ก0 1 โ€–๐›ผ๐‘” ๐‘ง ๐‘ก โ€– ๐‘’๐‘ก ?

  • ``Naiveโ€™โ€™ hypothesis: ๐›ผ๐‘” ๐‘ง

โ‰ฅ ๐ท i.e. sharpness

slide-15
SLIDE 15

How to guarantee convergence?

  • How to upper bound ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข = ืฌ

๐‘กโˆž ๐‘ก0 1 โ€–๐›ผ๐‘” ๐‘ง ๐‘ก โ€– ๐‘’๐‘ก ?

  • ``Naiveโ€™โ€™ hypothesis: ๐›ผ๐‘” ๐‘ง

โ‰ฅ ๐ท i.e. sharpness

slide-16
SLIDE 16

How to guarantee convergence?

  • How to upper bound ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข = ืฌ

๐‘กโˆž ๐‘ก0 1 โ€–๐›ผ๐‘” ๐‘ง ๐‘ก โ€– ๐‘’๐‘ก ?

  • ``Naiveโ€™โ€™ hypothesis: ๐›ผ๐‘” ๐‘ง

โ‰ฅ ๐ท i.e. sharpness

  • ``Smartโ€™โ€™ hypothesis:

1 โ€–๐›ผ๐‘” ๐‘ง ๐‘ก โ€– โ‰ค ๐œ’โ€ฒ(๐‘ก) with ๐œ’ โ‰ฅ 0, ๐œ’ โ†‘

so the length is โ‰ค ๐œ’ ๐‘ก0 โˆ’ ๐œ’ ๐‘กโˆž โ‰ค ๐œ’(๐‘ก0)

slide-17
SLIDE 17

How to guarantee convergence?

  • How to upper bound ืฌ

โˆž

แˆถ ๐‘ฆ ๐‘ข ๐‘’๐‘ข = ืฌ

๐‘กโˆž ๐‘ก0 1 โ€–๐›ผ๐‘” ๐‘ง ๐‘ก โ€– ๐‘’๐‘ก ?

  • ``Naiveโ€™โ€™ hypothesis: ๐›ผ๐‘” ๐‘ง

โ‰ฅ ๐ท i.e. sharpness

  • ``Smartโ€™โ€™ hypothesis:

1 โ€–๐›ผ๐‘” ๐‘ง ๐‘ก โ€– โ‰ค ๐œ’โ€ฒ(๐‘ก) with ๐œ’ โ‰ฅ 0, ๐œ’ โ†‘

so the length is โ‰ค ๐œ’ ๐‘ก0 โˆ’ ๐œ’ ๐‘กโˆž โ‰ค ๐œ’(๐‘ก0)

  • In other words ๐œ’โ€ฒ ๐‘” ๐‘ฆ ๐‘ข

๐›ผ๐‘” ๐‘ฆ ๐‘ข โ‰ฅ 1 i.e. ๐œ’ โˆ˜ ๐‘” is sharp: ๐›ผ ๐œ’ โˆ˜ ๐‘” ๐‘ฆ โ‰ฅ 1

slide-18
SLIDE 18

The ลojasiewicz property

We say that ๐‘” is ลojasiewicz at a critical point ๐‘ฆโˆ— if ๐œ’โ€ฒ ๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— ๐›ผ๐‘” ๐‘ฆ โ‰ฅ 1,

  • with ๐œ’: [0, โˆž[โ†’ [0, โˆž[ s.t. ๐œ’ 0 = 0, ๐œ’ โ†‘, ๐œ’ concave
  • for all ๐‘ฆ โˆˆ

๐‘ฆโ€ฒ โˆˆ ๐”บ ๐‘ฆโˆ—, ๐œ€ ๐‘” ๐‘ฆโˆ— < ๐‘” ๐‘ฆโ€ฒ < ๐‘” ๐‘ฆโˆ— + ๐‘  } Definition

  • ๐‘” is ลojasiewicz if it is ลojasiewicz at every critical point
  • ๐‘” is p-ลojasiewicz if it is ลojasiewicz at every critical point with

๐œ’ ๐‘ก โ‰ƒ ๐‘ก1/๐‘ž : ๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

Definition

slide-19
SLIDE 19

The ลojasiewicz property : convergence

Let ๐‘” be ลojasiewicz and ๐œ‡๐‘™ โˆˆ ]0,2/๐‘€[. If ๐‘ฆ๐‘™ is bounded, then it converges to some critical point ๐‘ฆโˆž. Theorem (convergence) ๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™

ลojasiewicz. Sur les trajectoires du gradient dโ€™une fonction analytique, 1984. Absil, Mahony, Andrews. Convergence of the Iterates of Descent Methods for Analytic Cost Functions, 2005.

Let ๐‘” be ลojasiewicz and ๐œ‡๐‘™ โˆˆ ]0,2/๐‘€[. For every ๐‘ฆโˆ— โˆˆ ๐‘๐‘ ๐‘•๐‘›๐‘—๐‘œ ๐‘”, if ๐‘ฆ0 โˆผ ๐‘ฆโˆ— then ๐‘ฆ๐‘™ converges to ๐‘ฆโˆž โˆˆ ๐‘๐‘ ๐‘•๐‘›๐‘—๐‘œ ๐‘”. Theorem (capture) f โˆถ Rn โ†’ R is of class C๐‘€

1,1

slide-20
SLIDE 20

The ลojasiewicz property : convergence

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™ f โˆถ Rn โ†’ R is of class C๐‘€

1,1

Sketch of proof : show that ๐œ’โ€ฒ ๐‘ก โ‰ฅ โ€– แˆถ ๐‘ฆ ๐‘ข โ€–

slide-21
SLIDE 21

The ลojasiewicz property : convergence

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™ f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œ’ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— โˆ’ ๐œ’ ๐‘” ๐‘ฆ๐‘™+1) โˆ’ ๐‘”(๐‘ฆโˆ—) Sketch of proof : show that ๐œ’โ€ฒ ๐‘ก โ‰ฅ โ€– แˆถ ๐‘ฆ ๐‘ข โ€–

slide-22
SLIDE 22

The ลojasiewicz property : convergence

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™ f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œ’ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— โˆ’ ๐œ’ ๐‘” ๐‘ฆ๐‘™+1) โˆ’ ๐‘”(๐‘ฆโˆ—) โ‰ฅ ๐œ’โ€ฒ(๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— )(๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆ๐‘™+1 ) because ๐œ’ concave Sketch of proof : show that ๐œ’โ€ฒ ๐‘ก โ‰ฅ โ€– แˆถ ๐‘ฆ ๐‘ข โ€–

slide-23
SLIDE 23

The ลojasiewicz property : convergence

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™ f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œ’ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— โˆ’ ๐œ’ ๐‘” ๐‘ฆ๐‘™+1) โˆ’ ๐‘”(๐‘ฆโˆ—) โ‰ฅ ๐œ’โ€ฒ(๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— )(๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆ๐‘™+1 ) because ๐œ’ concave โ‰ฅ ๐œ’โ€ฒ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— ๐‘‘๐œ‡,๐‘€ ๐‘ฆ๐‘™+1 โˆ’ ๐‘ฆ๐‘™

2

with Descent Lemma Sketch of proof : show that ๐œ’โ€ฒ ๐‘ก โ‰ฅ โ€– แˆถ ๐‘ฆ ๐‘ข โ€–

slide-24
SLIDE 24

The ลojasiewicz property : convergence

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™ f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œ’ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— โˆ’ ๐œ’ ๐‘” ๐‘ฆ๐‘™+1) โˆ’ ๐‘”(๐‘ฆโˆ—) โ‰ฅ ๐œ’โ€ฒ(๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— )(๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆ๐‘™+1 ) because ๐œ’ concave โ‰ฅ ๐œ’โ€ฒ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— ๐‘‘๐œ‡,๐‘€ ๐‘ฆ๐‘™+1 โˆ’ ๐‘ฆ๐‘™

2

with Descent Lemma = ๐œ’โ€ฒ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— ๐ท๐œ‡,๐‘€ ๐‘ฆ๐‘™+1 โˆ’ ๐‘ฆ๐‘™ ๐›ผ๐‘” ๐‘ฆ๐‘™ Sketch of proof : show that ๐œ’โ€ฒ ๐‘ก โ‰ฅ โ€– แˆถ ๐‘ฆ ๐‘ข โ€–

slide-25
SLIDE 25

The ลojasiewicz property : convergence

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™ f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œ’ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— โˆ’ ๐œ’ ๐‘” ๐‘ฆ๐‘™+1) โˆ’ ๐‘”(๐‘ฆโˆ—) โ‰ฅ ๐œ’โ€ฒ(๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— )(๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆ๐‘™+1 ) because ๐œ’ concave โ‰ฅ ๐œ’โ€ฒ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— ๐‘‘๐œ‡,๐‘€ ๐‘ฆ๐‘™+1 โˆ’ ๐‘ฆ๐‘™

2

with Descent Lemma = ๐œ’โ€ฒ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— ๐ท๐œ‡,๐‘€ ๐‘ฆ๐‘™+1 โˆ’ ๐‘ฆ๐‘™ ๐›ผ๐‘” ๐‘ฆ๐‘™ Sketch of proof : show that ๐œ’โ€ฒ ๐‘ก โ‰ฅ โ€– แˆถ ๐‘ฆ ๐‘ข โ€–

slide-26
SLIDE 26

The ลojasiewicz property : convergence

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™ f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œ’ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— โˆ’ ๐œ’ ๐‘” ๐‘ฆ๐‘™+1) โˆ’ ๐‘”(๐‘ฆโˆ—) โ‰ฅ ๐œ’โ€ฒ(๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— )(๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆ๐‘™+1 ) because ๐œ’ concave โ‰ฅ ๐œ’โ€ฒ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— ๐‘‘๐œ‡,๐‘€ ๐‘ฆ๐‘™+1 โˆ’ ๐‘ฆ๐‘™

2

with Descent Lemma = ๐œ’โ€ฒ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— ๐ท๐œ‡,๐‘€ ๐‘ฆ๐‘™+1 โˆ’ ๐‘ฆ๐‘™ ๐›ผ๐‘” ๐‘ฆ๐‘™ โ‰ฅ 1 โ‹… ๐ท๐œ‡,๐‘€ ๐‘ฆ๐‘™+1 โˆ’ ๐‘ฆ๐‘™ with: Sketch of proof : show that ๐œ’โ€ฒ ๐‘ก โ‰ฅ โ€– แˆถ ๐‘ฆ ๐‘ข โ€–

slide-27
SLIDE 27

The ลojasiewicz property : convergence

Let ๐‘” be ลojasiewicz and ๐œ‡๐‘™ โˆˆ ]0,2/๐‘€[. If ๐‘ฆ๐‘™ is bounded, then it converges to some critical point ๐‘ฆโˆž. Theorem (convergence) ๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐‘™๐›ผ๐‘” ๐‘ฆ๐‘™

ลojasiewicz. Sur les trajectoires du gradient dโ€™une fonction analytique, 1984. Absil, Mahony, Andrews. Convergence of the Iterates of Descent Methods for Analytic Cost Functions, 2005.

Let ๐‘” be ลojasiewicz and ๐œ‡๐‘™ โˆˆ ]0,2/๐‘€[. For every ๐‘ฆโˆ— โˆˆ ๐‘๐‘ ๐‘•๐‘›๐‘—๐‘œ ๐‘”, if ๐‘ฆ0 โˆผ ๐‘ฆโˆ— then ๐‘ฆ๐‘™ converges to ๐‘ฆโˆž โˆˆ ๐‘๐‘ ๐‘•๐‘›๐‘—๐‘œ ๐‘”. Theorem (capture) f โˆถ Rn โ†’ R is of class C๐‘€

1,1

slide-28
SLIDE 28

The ลojasiewicz property : in practice

Bolte, Daniilidis, Ley, Mazet, Characterizations of ลojasiewicz inequalities [โ€ฆ], 2010.

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

  • If ๐‘” is ๐œˆ-strongly convex, then it is 2-ลojasiewicz with ๐œˆ = ๐œˆ
  • If ๐‘” convex and ๐œˆ ๐‘’ ๐‘ฆ, argmin ๐‘” ๐‘ž โ‰ค ๐‘” ๐‘ฆ โˆ’ inf ๐‘”

Examples ๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • There is a convex function ๐‘”: โ„2 โ†’ โ„ which is not ลojasiewicz

Counter-example

slide-29
SLIDE 29

The ลojasiewicz property : in practice

ลojasiewicz, Ensembles semi-analytiques, 1965. Kurdyka, On gradients of functions definable in o-minimal structures, 1998. Bolte, Daniilidis, Lewis, Shiota, Clarke Subgradients of Stratifiable Functions, 2007.

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

Any analytic function is p-ลojasiewicz at its critical points. Theorem

  • Any semi-algebraic function is p-ลojasiewicz at its critical points.
  • Any o-minimal function is ลojasiewicz.

Theorem

slide-30
SLIDE 30

The ลojasiewicz property : in practice

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Polynomials by parts

Examples of semi-algebraic functions

slide-31
SLIDE 31

The ลojasiewicz property : in practice

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Polynomials by parts

Examples of semi-algebraic functions The class of semi-algebraic functions is stable under:

  • addition, multiplication, division, sup, inf
  • restriction, composition, inverse ๐‘”โˆ’1
  • derivative

Theorem (``Tarski-Seidenbergโ€™โ€™)

Coste, An Introduction to O-minimal Geometry, 2000.

slide-32
SLIDE 32

The ลojasiewicz property : in practice

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Polynomials by parts
  • ๐›ฝ ๐‘ฆ 0 + ๐ต๐‘ฆ โˆ’ ๐‘ 2, ๐‘ฆ โˆ—, โ€ฆ

Examples of semi-algebraic functions The class of semi-algebraic functions is stable under:

  • addition, multiplication, division, sup, inf
  • restriction, composition, inverse ๐‘”โˆ’1
  • derivative

Theorem (``Tarski-Seidenbergโ€™โ€™)

Coste, An Introduction to O-minimal Geometry, 2000.

slide-33
SLIDE 33

The ลojasiewicz property : in practice

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Exponential/Logarithmic stuff

Counter-examples of semi-algebraic functions

slide-34
SLIDE 34

The ลojasiewicz property : in practice

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Exponential/Logarithmic stuff

Counter-examples of semi-algebraic functions There exists a class of functions (o-minimal structure) which:

  • Includes the semi-algebraic structure
  • Contains the exponential function
  • Has the same stability property than the semi-algebraics
  • Is also stable by integration (and resolution of 1st order ODEs)

Theorem

Speissegger, The Pfaffian closure of an o-minimal structure, 1999.

slide-35
SLIDE 35

The ลojasiewicz property : in practice

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

Take-home message: Virtually an any fu function you can think about is ลojasiewicz, as long as it does not involve โ„ โ†’ โ„ ๐‘ฆ โ†ฆ sin(๐‘ฆ)

slide-36
SLIDE 36

The ลojasiewicz property : in practice

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

Take-home message: Virtually an any fu function you can think about is ลojasiewicz, as long as it does not involve โ„ โ†’ โ„ ๐‘ฆ โ†ฆ sin(๐‘ฆ)

slide-37
SLIDE 37

The ลojasiewicz property : in practice

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

Take-home message: Virtually an any fu function you can think about is ลojasiewicz, as long as it does not involve โ„ โ†’ โ„ ๐‘ฆ โ†ฆ sin(๐‘ฆ) So gradient descent ``always convergesโ€™โ€™ to a critical point

slide-38
SLIDE 38

The ลojasiewicz property : rates for free

Polyak, Gradient methods for the minimisation of functionals, 1963.

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

Let ๐‘” be globally 2-ลojasiewicz, ๐œ‡ โˆˆ]]0,2/๐‘€[[, and ๐‘ฆ๐‘™ โ†’ ๐‘ฆโˆ—. Then we have linear convergence : ๐‘” ๐‘ฆ๐‘™+1 โˆ’ ๐‘” ๐‘ฆโˆ— โ‰ค ๐œ„ ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— where ๐œ„ โˆˆ [0,1[, and ๐œพ = ๐Ÿ โˆ’ ๐‚/๐‘ด if ๐œ‡ = 1/๐‘€. Theorem (p=2)

  • If ๐‘” strongly convex we have ๐œ„ = 1 โˆ’ ๐œ† 2 for ๐œ‡ = 1/๐‘€.
  • If ๐‘” is [any weak s. convex notion] we have a better ๐œ„.
  • Rates become asymptotic if local ลojasiewicz only.
slide-39
SLIDE 39

The ลojasiewicz property : rates for free

Attouch, Bolte, On the convergence of the proximal algorithm for nonsmooth functions [โ€ฆ], 2009. Chouzenoux, Pesquet, Repetti, A block coordinate variable metric forward-backward algorithm, 2014.

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

Let ๐‘” be globally p-ลojasiewicz, ๐œ‡ โˆˆ]]0,2/๐‘€[[, and ๐‘ฆ๐‘™ โ†’ ๐‘ฆโˆ—. Then we have sublinear convergence : ๐‘” ๐‘ฆ๐‘™ โˆ’ ๐‘” ๐‘ฆโˆ— = ๐‘ƒ ๐‘™

โˆ’๐‘ž ๐‘žโˆ’2

Theorem (p>2)

  • ๐‘ž

๐‘žโˆ’2 โ†’ +โˆž when ๐‘ž โ†“ 2 ; ๐‘ž ๐‘žโˆ’2 โ†’ 1 when ๐‘ž โ†‘ โˆž

  • Rates are matched for ๐‘” ๐‘ฆ = ๐‘ฆ๐‘ž
slide-40
SLIDE 40

How to guarantee convergence?

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

Take-home message: Virtually an any fu function you can think about is ลojasiewicz, as long as it does not involve โ„ โ†’ โ„ ๐‘ฆ โ†ฆ sin(๐‘ฆ) So gradient descent ``always convergesโ€™โ€™ to a critical point What about oth

  • ther methods?
slide-41
SLIDE 41

How to guarantee convergence?

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Nonsmooth 1st-order methods : works the same

Attouch, Bolte, Svaiter, Convergence of descent methods for semi-algebraic and tame problems [โ€ฆ], 2013.

slide-42
SLIDE 42

How to guarantee convergence?

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Nonsmooth 1st-order methods : works the same
  • Projected gradient
  • Forward Backward
  • Douglas Rachford
  • ADMM
  • Adapts to Maximal Monotone theory (saddle point problems)

Attouch, Bolte, Svaiter, Convergence of descent methods for semi-algebraic and tame problems [โ€ฆ], 2013.

slide-43
SLIDE 43

How to guarantee convergence?

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Nonsmooth 1st-order methods : works the same
  • Inertial (2nd order in time) methods :

Bรฉgout, Bolte, Jendoubi, On damped second-order gradient systems, 2015. + refs within! Li et al., Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization, 2017.

แˆท ๐‘ฆ ๐‘ข + ๐›ฝ(๐‘ข) แˆถ ๐‘ฆ ๐‘ข + ๐›ผ๐‘” ๐‘ฆ ๐‘ข = 0 ๐‘ฆ๐‘™+1 = ๐‘ง๐‘™ โˆ’ ๐œ‡๐›ผ๐‘” ๐‘ง๐‘™ ๐‘ง๐‘™ = ๐‘ฆ๐‘™ +

1 1+๐›ฝ๐‘™ (๐‘ฆ๐‘™ โˆ’ ๐‘ฆ๐‘™โˆ’1)

slide-44
SLIDE 44

How to guarantee convergence?

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Nonsmooth 1st-order methods : works the same
  • Inertial (2nd order in time) methods :
  • Heavy-Ball (๐›ฝ ๐‘ข โ‰ก ๐›ฝ) ok
  • Nesterov (๐›ฝ ๐‘ข โˆผ ๐›ฝ/๐‘ข) + Monotone OK pour p=2 global

Bรฉgout, Bolte, Jendoubi, On damped second-order gradient systems, 2015. + refs within! Li et al., Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization, 2017.

แˆท ๐‘ฆ ๐‘ข + ๐›ฝ(๐‘ข) แˆถ ๐‘ฆ ๐‘ข + ๐›ผ๐‘” ๐‘ฆ ๐‘ข = 0 ๐‘ฆ๐‘™+1 = ๐‘ง๐‘™ โˆ’ ๐œ‡๐›ผ๐‘” ๐‘ง๐‘™ ๐‘ง๐‘™ = ๐‘ฆ๐‘™ +

1 1+๐›ฝ๐‘™ (๐‘ฆ๐‘™ โˆ’ ๐‘ฆ๐‘™โˆ’1)

slide-45
SLIDE 45

How to guarantee convergence?

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Nonsmooth 1st-order methods : works the same
  • Inertial (2nd order in time) methods : ok
  • Newton-like (2nd order in space) methods: some results for trust-

region methods, Landweber iterations. IDK for Newton/BFGS.

Frankel, Garrigos, Peypouquet, Splitting Methods with Variable Metric [โ€ฆ], 2015. + Absil et al., and many others.

slide-46
SLIDE 46

How to guarantee convergence?

f โˆถ Rn โ†’ R is of class C๐‘€

1,1

๐œˆ(๐‘” ๐‘ฆ โˆ’ ๐‘” ๐‘ฆโˆ— )๐‘žโˆ’1 โ‰ค ๐›ผ๐‘” ๐‘ฆ

๐‘ž

  • Nonsmooth 1st-order methods : works the same
  • Inertial (2nd order in time) methods : ok for Heavy-Ball
  • Newton-like (2nd order in space) methods: some results for trust-

region methods, Landweber iterations. IDK for Newton/BFGS.

  • Stochastic methods: (under global 2-ลojasiewicz)
  • SAGA, SVRG: linear rates
  • SGD: rates ๐‘ƒ( 1/๐‘™ ), and linear if vanishing variance
  • SVRG + monotone Nesterov: linear rates

Reddi, Hefny, Sra, Poczos, Smola, Stochastic variance reduction for nonconvex optimization, 2016. Karimi, Nutini, Schmidt, Linear Convergence of Gradient and Proximal-Gradient [โ€ฆ], 2016. Lei et al., Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions, 2019.

slide-47
SLIDE 47

What can we expect?

  • Does my algorithm converge? ๐‘ฆโˆž โ‰” lim

๐‘™โ†’+โˆž ๐‘ฆ๐‘™ exists?

  • What is the nature of the limit ๐‘ฆโˆž?

Global/Local minima? Saddle?

slide-48
SLIDE 48

What can we expect?

  • Does my algorithm converge? ๐‘ฆโˆž โ‰” lim

๐‘™โ†’+โˆž ๐‘ฆ๐‘™ exists?

  • What is the nature of the limit ๐‘ฆโˆž?

Global/Local minima? Saddle?

slide-49
SLIDE 49

What is the limit? The linear case

๐ต symmetric operator แˆถ ๐‘ฆ ๐‘ข + ๐ต๐‘ฆ ๐‘ข = 0

slide-50
SLIDE 50
slide-51
SLIDE 51

What is the limit? The linear case

๐ต symmetric operator แˆถ ๐‘ฆ ๐‘ข + ๐ต๐‘ฆ ๐‘ข = 0 ๐ต = 1 1 ๐ต = โˆ’1 โˆ’1 ๐ต = 1 โˆ’1

  • Positive eigs. are attractive, negative eigs. are repulsive.
  • Converging to the saddle point requires starting from ๐นโˆ’1(๐ต)

แˆถ ๐‘ฆ

slide-52
SLIDE 52

What is the limit? The linear case

๐ต symmetric operator แˆถ ๐‘ฆ ๐‘ข + ๐ต๐‘ฆ ๐‘ข = 0 Let าง ๐‘ฆ be an equilibrium of the system. We define ๐‘‹ าง ๐‘ฆ = ๐‘ฆ ๐‘ฆ ๐‘ข โ†’ าง ๐‘ฆ with ๐‘ฆ 0 = ๐‘ฆ} Definition ๐‘‹ าง ๐‘ฆ โ‰ƒ โŠ•๐œ‡>0 ๐น๐œ‡(๐ต) Theorem If ๐œ‡๐‘›๐‘—๐‘œ ๐ต < 0, then ๐‘‹ าง ๐‘ฆ has Lebesgue measure 0. Corollary

slide-53
SLIDE 53

What is the limit? The potential case

แˆถ ๐‘ฆ ๐‘ข + ๐›ผ๐‘”(๐‘ฆ ๐‘ข ) = 0 f โˆถ Rn โ†’ R is of class C2

slide-54
SLIDE 54

What is the limit? The potential case

แˆถ ๐‘ฆ ๐‘ข + ๐›ผ๐‘”(๐‘ฆ ๐‘ข ) = 0 ๐‘‹ าง ๐‘ฆ is a submanifold of dimension smaller than the one of โŠ•๐œ‡>0 ๐น๐œ‡ ๐›ผ2๐‘” าง ๐‘ฆ Theorem (Stable Manifold Lemma) If ๐œ‡๐‘›๐‘—๐‘œ ๐›ผ2๐‘” าง ๐‘ฆ < 0, then ๐‘‹ าง ๐‘ฆ has Lebesgue measure 0. Corollary f โˆถ Rn โ†’ R is of class C2

Perron, Die stabilitรคtsfrage bei differentialgleichungen, 1930. Smale, Differentiable dynamical systems, 1967.

slide-55
SLIDE 55

What is the limit? The linear case

The 3 kinds of critical points:

  • The local minima (e.g. ๐œ‡๐‘›๐‘—๐‘œ ๐›ผ2๐‘”

าง ๐‘ฆ > 0) , attractive

  • The strict saddles ๐œ‡๐‘›๐‘—๐‘œ ๐›ผ2๐‘”

าง ๐‘ฆ < 0, repulsive

  • The degenerated ones (they have ๐œ‡๐‘›๐‘—๐‘œ ๐›ผ2๐‘”

าง ๐‘ฆ = 0), ??? แˆถ ๐‘ฆ แˆถ ๐‘ฆ ๐‘ข + ๐›ผ๐‘”(๐‘ฆ ๐‘ข ) = 0 f โˆถ Rn โ†’ R is of class C2

slide-56
SLIDE 56

What is the limit? The potential case

แˆท ๐‘ฆ ๐‘ข + ๐›ฝ แˆถ ๐‘ฆ ๐‘ข + ๐›ผ๐‘”(๐‘ฆ ๐‘ข ) = 0 f โˆถ Rn โ†’ R is of class C2

Goudou, Munier, The gradient and heavy ball with friction dynamical systems: the quasiconvex case, 2007.

slide-57
SLIDE 57

What is the limit? The potential case

แˆท ๐‘ฆ ๐‘ข + ๐›ฝ แˆถ ๐‘ฆ ๐‘ข + ๐›ผ๐‘”(๐‘ฆ ๐‘ข ) = 0 ๐‘‹ าง ๐‘ฆ is a submanifold of dimension smaller than the one of โŠ•๐œ‡>0 ๐น๐œ‡ ๐›ผ2๐‘” าง ๐‘ฆ Theorem (Stable Manifold Lemma) If ๐œ‡๐‘›๐‘—๐‘œ ๐›ผ2๐‘” าง ๐‘ฆ < 0, then ๐‘‹ าง ๐‘ฆ has Lebesgue measure 0. Corollary f โˆถ Rn โ†’ R is of class C2

Goudou, Munier, The gradient and heavy ball with friction dynamical systems: the quasiconvex case, 2007.

slide-58
SLIDE 58

What is the limit? The potential case

Let ๐œ‡ โˆˆ]0,1/๐‘€[. Then ๐‘‹ าง ๐‘ฆ is a submanifold of dimension smaller than the one of โŠ•๐œ‡>0 ๐น๐œ‡ ๐›ผ2๐‘” าง ๐‘ฆ Theorem If ๐œ‡๐‘›๐‘—๐‘œ ๐›ผ2๐‘” าง ๐‘ฆ < 0, then ๐‘‹ าง ๐‘ฆ has Lebesgue measure 0. Corollary f โˆถ Rn โ†’ R is of class C๐‘€

1,1 โˆฉ ๐ท2

Lee, Simchowitz, Jordan, Recht, Gradient Descent Converges to Minimizers, 2016.

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐›ผ๐‘” ๐‘ฆ๐‘™ If ๐‘” has no degenerated critical points and is ลojasiewicz, then ๐‘ฆ๐‘™ converges a.s. to a local minima with random initialization. Corollary

slide-59
SLIDE 59

What is the limit? The potential case

f โˆถ Rn โ†’ R is of class C๐‘€

1,1 โˆฉ ๐ท2

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐›ผ๐‘” ๐‘ฆ๐‘™ It is time now for examples.

slide-60
SLIDE 60

What is the limit? The potential case

f โˆถ Rn โ†’ R is of class C๐‘€

1,1 โˆฉ ๐ท2

Li et al., Symmetry, saddle points, and global geometry of nonconvex matrix factorization, 2016.

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐›ผ๐‘” ๐‘ฆ๐‘™ If ๐‘” has no degenerated critical points and is ลojasiewicz, then ๐‘ฆ๐‘™ converges a.s. to a local minima with random initialization. Corollary Some problems have no degenrated critical points, like the matrix factorization problem a.k.a. two-layer-linear-neural-network min

๐‘Œโˆˆโ„๐‘’ร—๐‘  ๐‘” ๐‘Œ =

๐‘Œ๐‘ˆ๐‘Œ โˆ’ ๐ต ๐บ

2

slide-61
SLIDE 61

What is the limit? The potential case

f โˆถ Rn โ†’ R is of class C๐‘€

1,1 โˆฉ ๐ท2

๐‘ฆ๐‘™+1 = ๐‘ฆ๐‘™ โˆ’ ๐œ‡๐›ผ๐‘” ๐‘ฆ๐‘™ + ๐œŠ๐‘™ If ๐‘” has no degenerated critical points and is ลojasiewicz, then ๐‘ฆ๐‘™ converges a.s. to a local minima with random initialization. Corollary The above result remains true for the noisy gradient method. Corollary The noise here must be isotropic! Not the case for SGD (proportional to eigenvalues), but proof can be adapted for RKHS learning with a loss s.t. โ„“โ€ฒโ€ฒ = ๐‘ƒ( โ„“โ€ฒ ).

Daneshmand et al., Escaping saddles with stochastic gradients, 2018.

slide-62
SLIDE 62

What can we expect?

  • Does my algorithm converge? ๐‘ฆโˆž โ‰” lim

๐‘™โ†’+โˆž ๐‘ฆ๐‘™ exists?

  • What is the nature of the limit ๐‘ฆโˆž?

Global/Local minima? Saddle?

slide-63
SLIDE 63

What can we expect?

  • Does my algorithm converge? ๐‘ฆโˆž โ‰” lim

๐‘™โ†’+โˆž ๐‘ฆ๐‘™ exists?

  • What is the nature of the limit ๐‘ฆโˆž?

Global/Local minima? Saddle?

slide-64
SLIDE 64

What can we expect?

  • Does my algorithm converge? ๐‘ฆโˆž โ‰” lim

๐‘™โ†’+โˆž ๐‘ฆ๐‘™ exists?

  • What is the nature of the limit ๐‘ฆโˆž?

Global/Local minima? Saddle? Depends strongly on :

  • What your problem is
  • how you initialize

Yes, this bold statement will be my conclusion

slide-65
SLIDE 65

What can we expect?

  • Does my algorithm converge? ๐‘ฆโˆž โ‰” lim

๐‘™โ†’+โˆž ๐‘ฆ๐‘™ exists?

  • What is the nature of the limit ๐‘ฆโˆž?

Global/Local minima? Saddle? Depends strongly on :

  • What your problem is
  • how you initialize

Yes, this bold statement will be my conclusion

slide-66
SLIDE 66

Any questions ?

slide-67
SLIDE 67