Convergence of Cubic Regularization for Nonconvex Optimization under - - PowerPoint PPT Presentation

convergence of cubic regularization for nonconvex
SMART_READER_LITE
LIVE PREVIEW

Convergence of Cubic Regularization for Nonconvex Optimization under - - PowerPoint PPT Presentation

Convergence of Cubic Regularization for Nonconvex Optimization under ojasiewicz Property Cubic-regularization (CR) + 1 + 2 CR :


slide-1
SLIDE 1

Convergence of Cubic Regularization for Nonconvex Optimization under Łojasiewicz Property

slide-2
SLIDE 2

2

Cubic-regularization (CR)

∈ℝ

CR : 𝑦 ∈ argmin 𝑧 − 𝑦, 𝛼𝑔 𝑦 + 1 2 𝑧 − 𝑦 𝛼𝑔 𝑦 𝑧 − 𝑦 + 𝑁 6 𝑧 − 𝑦

  • Converge to 2nd-order stationary point (Nesterov’06)

nd

  • Escape strict-saddle points
slide-3
SLIDE 3

3

Motivation and Contribution

  • General nonconvex optimization
  • global sublinear convergence (Nesterov’06)
  • Nonconvex + local geometry
  • gradient dominance (Nesterov’06)
  • super-linear convergence
  • error bound (Yue’18)
  • quadratic convergence
  • limited function class
  • Our contributions
  • general Łojasiewicz property
slide-4
SLIDE 4

4

Lojasiewicz Property

  • Satisfied by large function class:
  • analytic function, polynomials, exp-log functions, etc
  • ML examples: Lasso, phase retrieval, blind deconvolution, etc.

Definition (Lojasiewicz Property) Let takes a constant value

∗ on a compact

set . There exists such that for all

∗ ∗

  • where

is the Lojasiewicz exponent.

slide-5
SLIDE 5

5

Convergence to 2nd-order Stationary Point

  • Lojasiewicz exponent 𝜾

Convergence rate 𝜄 = +∞ 𝜈 𝑦 = 0 finite-step 𝜄 ∈ 3 2 , +∞ 𝜈 𝑦 ≤ Θ exp − 2(𝜄 − 1) super-linear 𝜄 = 3 2 𝜈 𝑦 ≤ Θ exp −(𝑙 − 𝑙) linear 𝜄 ∈ 1, 3 2 𝜈 𝑦 ≤ Θ 𝑙 − 𝑙

()

  • sub-linear

Sharp Flat

slide-6
SLIDE 6

6

Convergence of Function Value

Lojasiewicz exponent 𝜾 Convergence rate 𝜄 = +∞ 𝑔 𝑦 − 𝑔∗ = 0 𝜄 ∈ 3 2 , +∞ 𝑔 𝑦 − 𝑔∗ ≤ Θ exp −

  • 𝜄 = 3

2 𝑔 𝑦 − 𝑔∗ ≤ Θ exp −(𝑙 − 𝑙) 𝜄 ∈ 1, 3 2 𝑔 𝑦 − 𝑔∗ ≤ Θ 𝑙 − 𝑙

slide-7
SLIDE 7

7

Convergence of Variable Sequence

  • Implies Cauchy-convergent
  • (Nesterov’06): cubic-summable
  • 𝟒

Theorem Assume satisfies the Lojasiewicz property. Then, the sequence generated by CR is absolutely-summable as

slide-8
SLIDE 8

8

Convergence of Variable Sequence

Lojasiewicz exponent 𝜾 Convergence rate 𝜄 = +∞ 𝑦 − 𝑦∗ = 0 𝜄 ∈ 3 2 , +∞ 𝑦 − 𝑦∗ ≤ Θ exp −

()

  • +
  • 𝜄 = 3

2 𝑦 − 𝑦∗ ≤ Θ exp −(𝑙 − 𝑙) 𝜄 ∈ 1, 3 2 𝑦 − 𝑦∗ ≤ Θ 𝑙 − 𝑙

()

slide-9
SLIDE 9

9

Comparison with First-order Algorithm

Lojasiewicz exponent 𝜾 Gradient descent Cubic-regularization 𝜄 = +∞ finite-step finite-step 𝜄 ∈ 2, +∞ linear super-linear 𝜄 ∈ [

, 2)

sub-linear super-linear 𝜄 ∈ 1,

  • sub-linear Θ(𝑙

)

sub-linear Θ(𝑙

.)

slide-10
SLIDE 10

10

Come to our poster Thursday 05:00 PM Room 210 & 230 AB #4 Thank You!