inexact tensor methods with dynamic accuracies
play

Inexact Tensor Methods with Dynamic Accuracies Nikita Doikov Yurii - PowerPoint PPT Presentation

Inexact Tensor Methods with Dynamic Accuracies Nikita Doikov Yurii Nesterov UCLouvain, Belgium ICML 2020 Plan of the talk 1. Introduction: Tensor Methods in Convex Optimization 2. Inexact Tensor Methods 3. Acceleration 4. Numerical Example 2


  1. Inexact Tensor Methods with Dynamic Accuracies Nikita Doikov Yurii Nesterov UCLouvain, Belgium ICML 2020

  2. Plan of the talk 1. Introduction: Tensor Methods in Convex Optimization 2. Inexact Tensor Methods 3. Acceleration 4. Numerical Example 2 / 22

  3. Plan of the talk 1. Introduction: Tensor Methods in Convex Optimization 2. Inexact Tensor Methods 3. Acceleration 4. Numerical Example 3 / 22

  4. Gradient Method Composite optimization problem x ∈ dom F F ( x ) := f ( x ) + ψ ( x ) , min ◮ f is convex and smooth; ◮ ψ : R n → R ∪ { + ∞} is convex (possibly nonsmooth, but simple ). The Gradient Method: {︂ 2 ‖ y − x k ‖ 2 + ψ ( y ) }︂ ⟨∇ f ( x k ) , y − x k ⟩ + H = , k ≥ 0 . x k + 1 argmin y ◮ Gradient of f is Lipschitz continuous: ‖∇ f ( y ) − ∇ f ( x ) ‖ ≤ L 1 ‖ y − x ‖ ⇒ H := L 1 ◮ Global sublinear convergence: F ( x k ) − F * ≤ O ( 1 / k ) . 4 / 22

  5. Newton Method with Cubic Regularization ◮ Hessian of f is Lipschitz continuous: ‖∇ 2 f ( y ) − ∇ 2 f ( x ) ‖ ≤ L 2 ‖ y − x ‖ . Cubic Newton: {︂ ⟨∇ f ( x k ) , y − x k ⟩ + 1 2 ⟨∇ 2 f ( x k )( y − x k ) , y − x k ⟩ x k + 1 = argmin y 6 ‖ y − x k ‖ 3 + ψ ( y ) }︂ + H , k ≥ 0 . ◮ H := 0 ⇒ Classical Newton. Global convergence: F ( x k ) − F * ≤ O ( 1 / k 2 ) . ◮ H := L 2 ⇒ [Nesterov-Polyak, 2006] 5 / 22

  6. Tensor Methods Let x ∈ R n be fixed, consider arbitrary h ∈ R n and one-dimensional φ ( t ) := f ( x + th ) , t ∈ R . Then φ ( 0 ) = f ( x ) , φ ′ ( 0 ) = ⟨∇ f ( x ) , h ⟩ , φ ′′ ( 0 ) = ⟨∇ 2 f ( x ) h , h ⟩ . Denote: φ ( p ) ( 0 ) . D p f ( x )[ h ] p := The model: p k ! D k f ( x )[ y − x ] k + ( p + 1 )! ‖ y − x ‖ p + 1 + ψ ( y ) . 1 H Ω H ( x ; y ) := ∑︁ k = 1 Tensor Method of order p ≥ 1 : = Ω H ( x k ; y ) , k ≥ 0 . x k + 1 argmin y ◮ p -th derivative is Lipschitz continuous: ‖ D p f ( y ) − D p f ( x ) ‖ ≤ L p ‖ y − x ‖ . ◮ Global convergence: F ( x k ) − F * ≤ O ( 1 / k p ) . [Baes, 2009] 6 / 22

  7. Tensor Methods: Solving the Subproblem At each iteration k ≥ 0, the subproblem is p k ! D k f ( x )[ y − x ] k + ( p + 1 )! ‖ y − x ‖ p + 1 + ψ ( y ) . 1 H min Ω H ( x k ; y ) := ∑︁ y k = 1 ◮ H ≥ pL p ⇒ Ω H ( x k ; y ) is convex in y . [Nesterov, 2018] ◮ For p = 3: efficient implementation, using Gradient Method with relative smoothness condition [Van Nguyen, 2017; Bauschke-Bolte-Teboulle, 2016; Lu-Freund-Nesterov, 2018] . The cost of minimizing Ω H ( x k ; · ) is: O ( n 3 ) + ˜ O ( n ) . 7 / 22

  8. Some Recent Results ◮ Accelerated Tensor Methods: F ( x k ) − F * ≤ O ( 1 / k p + 1 ) [Baes, 2009; Nesterov, 2018] . 3 p + 1 ◮ Optimal Tensor Methods: F ( x k ) − F * ≤ O ( 1 / k 2 ) [Gasnikov et al., 2019; Kamzolov-Gasnikov-Dvurechensky, 2020] . The oracle complexity matches the lower bound (up to logarithmic factor) from [Arjevani-Shamir-Shiff, 2017] . ◮ Universal Tensor Methods: [Grapiglia-Nesterov, 2019] . ◮ Stochastic Tensor Methods: [Lucchi-Kohler, 2019] . ◮ . . . 8 / 22

  9. Plan of the talk 1. Introduction: Tensor Methods in Convex Optimization 2. Inexact Tensor Methods 3. Acceleration 4. Numerical Example 9 / 22

  10. Definition of Inexactness Use a point T = T H ,δ ( x k ) with small residual in function value: Ω H ( x k ; T ) − min Ω H ( x k ; y ) ≤ δ. y ◮ Easier to achieve by inner method. ◮ Can be controlled in practice using the duality gap. Set H := pL p . We have F ( T ) ≤ F ( x k ) + δ. ◮ Inexact step can be nonmonotone. 10 / 22

  11. Monotone Inexact Tensor Methods Initialization: choose x 0 ∈ dom F , set H := pL p . Iterations: k ≥ 0. 1: Pick up δ k + 1 ≥ 0. 2: Compute inexact monotone tensor step T , such that Ω H ( x k ; T ) − min Ω H ( x k ; y ) ≤ δ k + 1 , y and F ( T ) < F ( x k ) . 3: x k + 1 := T . c Theorem 1. Set δ k := k p + 1 , for c ≥ 0. Then (︁ 1 F ( x k ) − F * )︁ ≤ O . k p 11 / 22

  12. Adaptive Strategy for Inner Accuracy Let us set δ k := c ( F ( x k − 2 ) − F ( x k − 1 )) . Theorem 2. (General convex case) (︁ 1 F ( x k ) − F * )︁ ≤ O . k p Theorem 3. (Uniformly convex objective) Let F ( x ) + ⟨ F ′ ( x ) , y − x ⟩ + σ p + 1 p + 1 ‖ y − x ‖ p + 1 . F ( y ) ≥ Denote ω p := max { ( p + 1 ) 2 L p p ! σ p + 1 , 1 } . Then we have linear rate 1 − p ω − 1 / p (︂ )︂ F ( x k + 1 ) − F * ( F ( x k ) − F * ) . p ≤ 2 ( p + 1 ) ◮ This works for methods, starting from p ≥ 1. Theorem 4. For p ≥ 2 and strongly convex objective, we have local superlinear rate. 12 / 22

  13. Plan of the talk 1. Introduction: Tensor Methods in Convex Optimization 2. Inexact Tensor Methods 3. Acceleration 4. Numerical Example 13 / 22

  14. Contracting Proximal Scheme ◮ Fix prox-function d ( x ) . Bregman divergence: β d ( x ; y ) := d ( y ) − d ( x ) − ⟨∇ d ( x ) , y − x ⟩ . ◮ Two sequences of points { x k } k ≥ 0 , { v k } k ≥ 0 , v 0 = x 0 . def = ∑︁ k ◮ Sequence of positive coefficients { a k } k ≥ 0 , A k i = 1 a i . Iterations , k ≥ 0: 1. Compute (︁ a k + 1 y + A k x k {︂ }︂ )︁ = + a k + 1 ψ ( y ) + β d ( v k ; y ) . v k + 1 argmin A k + 1 f A k + 1 y 2. Put x k + 1 = a k + 1 v k + 1 + A k x k . A k + 1 The rate of convergence: F ( x k ) − F * ≤ β d ( x 0 ; x ∗ ) . A k [Doikov-Nesterov, 2019] 14 / 22

  15. Acceleration of Tensor Steps For Tensor Method of order p ≥ 1: p + 1 ‖ x − x 0 ‖ p + 1 . 1 ◮ Set d ( x ) := ◮ A k + 1 := ( k + 1 ) p + 1 . L p For contracted objective with regularization (︁ a k + 1 y + A k x k )︁ h k + 1 ( y ) := A k + 1 f + a k + 1 ψ ( y ) + β d ( v k ; y ) , A k + 1 we compute inexact minimizer v k + 1 : h k + 1 ( v k + 1 ) − h * c ≤ ( k + 1 ) p + 2 . k + 1 ◮ It requires ˜ O ( 1 ) inexact Tensor Steps. Theorem. For outer iterations, we obtain accelerated rate: F ( x k ) − F * 1 (︁ )︁ ≤ O . k p + 1 15 / 22

  16. Plan of the talk 1. Introduction: Tensor Methods in Convex Optimization 2. Inexact Tensor Methods 3. Acceleration 4. Numerical Example 16 / 22

  17. Log-sum-exp (︃ m )︂)︃ (︂ ⟨ a i , x ⟩− b i ∑︁ x ∈ R n f ( x ) := µ log min exp (SoftMax) . µ i = 1 ◮ a 1 , . . . , a m , b — given data. ◮ µ > 0 — smoothing parameter. m i ⪰ 0, and use ‖ x ‖ ≡ ⟨ Bx , x ⟩ 1 / 2 . ◮ Denote B ≡ a i a T ∑︁ i = 1 We have L 1 ≤ 1 2 4 µ , L 2 ≤ µ 2 , L 3 ≤ µ 3 . ◮ Cubic Newton ( p = 2). ◮ Compute each step (inexactly) by Fast Gradient Method. 17 / 22

  18. Log-sum-exp: Constant strategies ◮ δ k := const. Log-sum-exp, = 0.05: constant strategies 10 1 10 1 Functional residual 1 1 10 10 10 3 10 3 10 5 10 5 2 10 10 4 10 6 10 7 10 7 10 8 0 200 400 0 20000 40000 60000 Iterations Hessian-vector products 18 / 22

  19. Log-sum-exp: Dynamic strategies ◮ δ k := 1 / k α . Log-sum-exp, = 0.05: dynamic strategies 10 1 10 1 Functional residual 10 1 10 1 10 3 10 3 1/ k 5 5 10 10 1/ k 2 1/ k 3 1/ k 4 10 7 10 7 10 8 0 200 400 0 20000 40000 60000 Iterations Hessian-vector products 19 / 22

  20. Log-sum-exp: Adaptive strategies ◮ δ k := ( F ( x k − 1 ) − F ( x k )) α . Log-sum-exp, = 0.05: adaptive strategies 10 0 10 0 Functional residual 10 2 10 2 10 4 10 4 adaptive 10 6 10 6 adaptive 1.5 adaptive 2 10 8 10 8 10 8 0 50 100 150 0 20000 40000 60000 Iterations Hessian-vector products 20 / 22

  21. Log-sum-exp: Cubic Newton vs. Tensor Method Log-sum-exp, = 0.1. 10 1 10 1 Functional residual 10 3 10 5 10 7 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Time, s Cubic Newton (p = 2) Tensor (p = 3), Exact Tensor (p = 3), Adaptive ◮ H is fixed. 21 / 22

  22. Conclusion Inexact Tensor Methods of degree p ≥ 1: p = 1 : Gradient Method. p = 2 : Newton method with Cubic regularization. p = 3 : Third order Tensor method. We admit to solve the subproblem inexactly, δ k — accuracy in functional residual for the subproblem. ◮ Dynamic strategy δ k := c k p + 1 . ◮ Adaptive strategy δ k := c ( F ( x k ) − F ( x k − 1 )) . Global rate of convergence: F ( x k ) − F * ≤ O ( 1 k p ) . ◮ Using contracting proximal iterations we obtain accelerated 1 O ( k p + 1 ) rate. Thank you for your attention! 22 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend