from nesterov s estimate sequence to riemannian
play

From Nesterovs Estimate Sequence To Riemannian Acceleration - PowerPoint PPT Presentation

From Nesterovs Estimate Sequence To Riemannian Acceleration Kwangjun Ahn, Suvrit Sra COLT 2020 arXiv: https://arxiv.org/abs/2001.08876 Riemannian Optimization? : (Euclidean) Optimization: ()


  1. From Nesterov’s Estimate Sequence To Riemannian Acceleration Kwangjun Ahn, Suvrit Sra COLT 2020 arXiv: https://arxiv.org/abs/2001.08876

  2. Riemannian Optimization? 𝑔: ℝ � → ℝ • (Euclidean) Optimization: 𝑦∈ℝ � 𝑔(𝑦) min • Riemannian Optimization: 𝑔: 𝑁 → ℝ min 𝑦∈𝑁 𝑔(𝑦) 𝑁 = a Riemannian manifold

  3. Accel. Gradient Method! • Yurii Nesterov 80�s Accel. Gradient Descent: For 𝑢 � 0,1,2, … 𝑦 ��1 � 𝑧 � � 𝛽 ��1 𝑨 � � 𝑧 � 𝑧 ��1 � 𝑦 ��1 � 𝛿 ��1 𝛼𝑔 𝑦 ��1 𝑨 ��1 � 𝑦 ��1 � 𝛾 ��1 𝑨 � � 𝑦 ��1 � 𝜃 ��1 𝛼𝑔�𝑦 ��1 �

  4. Accel. Gradient Method: Theory • Yurii Nesterov 80�s Nesterov showed: C.f. Gradient Descent: For 𝜈 ≼ 𝛼 � 𝑔 𝑦 ≼ 𝑀 For 𝜈 ≼ 𝛼 � 𝑔 𝑦 ≼ 𝑀 � � � � 𝑔 𝑦 � � 𝑔 𝑦 ∗ � 𝑃 1 � 𝑔 𝑧 � � 𝑔 𝑦 ∗ � 𝑃 1 � 𝑴 𝑴 For 𝜗 -approx. solution, For 𝜗 -approx. solution, 𝑴 � We need 𝑃 many iterations. � log 𝑴 � We only need t � 𝑃 . � log � � � Acceleration! � (and indeed optimal for this class!)

  5. Natural Question.. � Could we develop such landmark result for curved spaces (Riem. manifolds)? � Turns out to be challenging question: � Li� et al.�17 ( NIPS ) reduces the task to solving nonlinear equations. � Not clear whether whether these equations are even feasible or tractably solvable. � Alimisis et al.�20 ( AISTATS ): Continuous dynamic approach � Not clear whether the discretization yields accel. � Most concrete result: Zhang- Sra�18 ( COLT ) � proposed an alg. guaranteed to accel. locally . Global accel ? � Open!

  6. Challenge! • Nesterov�s analysis is called the Estimate Sequence technique • Nesterov�s analysis relies on linear structure! � not clear if it generalizes to non-linear space like Riem. manifolds. • Nesterov�s analysis entails non-trivial algebraic tricks! � Hard to understand; its scope has puzzled researchers for years.

  7. Riemannian Accel. GD (Euclidean) Accel. Gradient Descent: 𝑦 𝑢+1 � 𝑧 𝑢 � 𝛽 𝑢+1 𝑨 𝑢 � 𝑧 𝑢 𝑧 𝑢+1 � 𝑦 𝑢+1 � 𝛿 𝑢+1 𝛼𝑔 𝑦 𝑢+1 𝑨 𝑢+1 � 𝑦 𝑢+1 � 𝛾 𝑢+1 𝑨 𝑢 � 𝑦 𝑢+1 � 𝜃 𝑢+1 𝛼𝑔�𝑦 𝑢+1 � Riemannian Accel. Gradient Descent: −1 𝑨 𝑢 𝑦 𝑢+1 � 𝐹𝑦𝑞 � � 𝛽 𝑢+1 ⋅ 𝐹𝑦𝑞 � � 𝑧 𝑢+1 � 𝐹𝑦𝑞 � ��� �𝛿 𝑢+1 ⋅ 𝛼𝑔 𝑦 𝑢+1 −1 𝑨 𝑢+1 � 𝐹𝑦𝑞 � ��� 𝛾 𝑢+1 ⋅ 𝐹𝑦𝑞 � ��� 𝑨 𝑢 � 𝜃 𝑢+1 𝛼𝑔 𝑦 𝑢+1 Space is curved, causes “distortion”

  8. 1.How does this affect the convergence rate? • Non-linear recursive relation: � ��� �� ��� −�/�� 𝟐 2 Severer the distortion gets, � 𝜺 𝜊 � �1−� ��� � Slower the convergence rate becomes! 𝜐 𝑣 � 𝑣�𝑣 � 𝜈/𝑀� 1 � 𝑣 No matter how severe the distortion Riem. AGD always faster than RGD! 𝜄 𝑣 � 𝑣 2 1 𝜄 𝑣 � 1 To achieve full accel. i.e. 𝜈/𝑀 , 2 𝑣 2 we need bring 𝜀 down to 1 ! 𝜄 𝑣 � 1 5 𝑣 2 𝜀 � 5 𝜈/𝑀 𝜈/𝑀 1 How do we control/estimate the distortion?

  9. Global Accel for Riem. Case! Thm 2. Th . Given: 𝜊 0 � 0 the magnitude of metric distortion Find 𝜊 𝑢+1 ∈ �2𝜈Δ, 1� such that at iteration t 𝜊 𝑢+1 �𝜊 𝑢+1 −2𝜈Δ� 𝟐 2 � 𝜺 𝒖+𝟐 𝜊 𝑢 �1−𝜊 𝑢+1 � where 𝜺 𝒖+𝟐 � 𝑼�𝒆�𝒚 𝒖 , 𝒜 𝒖 �� for some computable function 𝑈 . 𝑔 𝑧 𝑢+1 � 𝑔 𝑦 ∗ � 𝑃 1 � 𝜊 1 1 � 𝜊 2 ⋯ 1 � 𝜊 𝑢+1 s.t. 1 𝜊 𝑢 � 𝜈/𝑀 for all 𝑢 . (2) 𝜊 𝑢 quickly converges to 𝜈/𝑀 . quickly acheives 𝐠𝐯𝐦𝐦 acceleartion! strictly 𝐠𝐛𝐭𝐮𝐟𝐬 than nonaccel GD!

  10. Open problem Obtaining acceleration the non-strongly convex case? Remarks ★ Using strongly convex perturbation can be done ★ But, extra factor O (log 1/ ϵ ) ★ More crucially, our current proof needs to ensure all 
 iterates remain within a set of specific size to be able 
 to ensure acceleration. Removing this limitation is valuable

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend