convergence of cubic regularization for nonconvex
play

Convergence of Cubic Regularization for Nonconvex Optimization under - PowerPoint PPT Presentation

Convergence of Cubic Regularization for Nonconvex Optimization under ojasiewicz Property Cubic-regularization (CR) + 1 + 2 CR :


  1. Convergence of Cubic Regularization for Nonconvex Optimization under Łojasiewicz Property ∗ � �

  2. Cubic-regularization (CR) �∈ℝ � + 1 𝑧 − 𝑦 � + 𝑁 2 𝑧 − 𝑦 � � 𝛼 � 𝑔 𝑦 � � CR : 𝑦 ��� ∈ argmin � 𝑧 − 𝑦 � , 𝛼𝑔 𝑦 � 𝑧 − 𝑦 � 6  Converge to 2 nd -order stationary point (Nesterov’06) nd �  Escape strict-saddle points 2

  3. Motivation and Contribution  General nonconvex optimization • global sublinear convergence (Nesterov’06)  Nonconvex + local geometry • gradient dominance (Nesterov’06)  super-linear convergence • error bound (Yue’18)  quadratic convergence • limited function class  Our contributions  general Łojasiewicz property 3

  4. Lojasiewicz Property ∗ on a compact Definition (Lojasiewicz Property) Let takes a constant value � ∗ set . There exists such that for all � ∗ ∗ � where is the Lojasiewicz exponent.  Satisfied by large function class:  analytic function, polynomials, exp-log functions, etc  ML examples: Lasso, phase retrieval, blind deconvolution, etc. 4

  5. Convergence to 2 nd -order Stationary Point � Lojasiewicz exponent 𝜾 Convergence rate Sharp 𝜄 = +∞ 𝜈 𝑦 � � = 0 finite-step 𝜈 𝑦 � ≤ Θ exp − 2(𝜄 − 1) ��� � 3 𝜄 ∈ 2 , +∞ super-linear 𝜄 = 3 𝜈 𝑦 � ≤ Θ exp −(𝑙 − 𝑙 � ) linear 2 1, 3 � �(���) 𝜄 ∈ 𝜈 𝑦 � ≤ Θ 𝑙 − 𝑙 � Flat ���� 2 sub-linear 5

  6. Convergence of Function Value Lojasiewicz exponent 𝜾 Convergence rate 𝑔 𝑦 � � − 𝑔 ∗ = 0 𝜄 = +∞ ��� � 3 �� 𝑔 𝑦 � − 𝑔 ∗ ≤ Θ exp − 𝜄 ∈ 2 , +∞ � 𝑔 𝑦 � − 𝑔 ∗ ≤ Θ exp −(𝑙 − 𝑙 � ) 𝜄 = 3 2 1, 3 � �� 𝑔 𝑦 � − 𝑔 ∗ ≤ Θ 𝜄 ∈ 𝑙 − 𝑙 � ���� 2 6

  7. Convergence of Variable Sequence Theorem Assume satisfies the Lojasiewicz property. Then, the sequence generated by CR is absolutely-summable as � ��� � ���  Implies Cauchy-convergent  (Nesterov’06): cubic-summable � 𝟒 ��� � ��� 7

  8. Convergence of Variable Sequence Lojasiewicz exponent 𝜾 Convergence rate 𝑦 � � − 𝑦 ∗ = 0 𝜄 = +∞ ��� � 3 �(���) � 𝑦 � − 𝑦 ∗ ≤ Θ exp − + 𝜄 ∈ 2 , +∞ � � 𝜄 = 3 𝑦 � − 𝑦 ∗ ≤ Θ exp −(𝑙 − 𝑙 � ) 2 1, 3 � �(���) 𝑦 � − 𝑦 ∗ 𝜄 ∈ ≤ Θ 𝑙 − 𝑙 � ���� 2 8

  9. Comparison with First-order Algorithm Lojasiewicz exponent 𝜾 Gradient descent Cubic-regularization 𝜄 = +∞ finite-step finite-step 𝜄 ∈ 2, +∞ linear super-linear 𝜄 ∈ [ � sub-linear super-linear � , 2) 𝜄 ∈ 1, � sub-linear Θ(𝑙 � ��� sub-linear Θ(𝑙 � ��� ��� ) �.��� ) � 9

  10. Come to our poster Thursday 05:00 PM Room 210 & 230 AB #4 Thank You! 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend