algorithms for unconstrained local optimization
play

Algorithms for unconstrained local optimization Fabio Schoen 2008 - PowerPoint PPT Presentation

Algorithms for unconstrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for unconstrained local optimization p. Optimization Algorithms Most common form for optimization algorithms: Line


  1. Algorithms for unconstrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for unconstrained local optimization – p.

  2. Optimization Algorithms Most common form for optimization algorithms: Line search-based methods: Given a starting point x 0 a sequence is generated: x k +1 = x k + α k d k where d k ∈ R n : search direction, α k > 0 : step Usually first d k is chosen and than the step is obtained, often from a 1–dimensional optimization Algorithms for unconstrained local optimization – p.

  3. Trust-region algorithms A model m ( x ) and a confidence region U ( x k ) containing x k are defined. The new iterate is chosen as the solution of the constrained optimization problem x ∈ U ( x k ) m ( x ) min The model and the confidence region are possibly updated at each iteration. Algorithms for unconstrained local optimization – p.

  4. Speed measures Let x ⋆ : local optimum. The error in x k might be measured e.g. as e ( x k ) = � x k − x ⋆ � or e ( x k ) = | f ( x k ) − f ( x ⋆ ) | . Given { x k } → x ⋆ if ∃ q > 0 , β ∈ (0 , 1) : (for k large enough): e ( x k ) ≤ qβ k ⇒{ x k } is linearly convergent, or converges with order 1; β : convergence rate A sufficient condition for linear convergence: lim sup e ( x k +1 ) ≤ β e ( x k ) Algorithms for unconstrained local optimization – p.

  5. super–linear convergence If for every β ∈ (0 , 1) exists q : e ( x k ) ≤ qβ k then convergence is super–linear. Sufficient condition: lim sup e ( x k +1 ) = 0 e ( x k ) Algorithms for unconstrained local optimization – p.

  6. Higher order convergence If, given p > 1 , ∃ q > 0 , β ∈ (0 , 1) : e ( x k ) ≤ qβ ( p k ) then { x k } is said to converge with order at least p If p = 2 ⇒ quadratic convergence Sufficient condition: lim sup e ( x k +1 ) e ( x k ) p < ∞ Algorithms for unconstrained local optimization – p.

  7. Examples 1 k converges to 0 with order one 1 (linear convergence) Algorithms for unconstrained local optimization – p.

  8. Examples 1 k converges to 0 with order one 1 (linear convergence) 1 k 2 converges to 0 with order 1 Algorithms for unconstrained local optimization – p.

  9. Examples 1 k converges to 0 with order one 1 (linear convergence) 1 k 2 converges to 0 with order 1 2 − k converges to 0 with order 1 Algorithms for unconstrained local optimization – p.

  10. Examples 1 k converges to 0 with order one 1 (linear convergence) 1 k 2 converges to 0 with order 1 2 − k converges to 0 with order 1 k − k converges to 0 with order 1; convergence is super–linear Algorithms for unconstrained local optimization – p.

  11. Examples 1 k converges to 0 with order one 1 (linear convergence) 1 k 2 converges to 0 with order 1 2 − k converges to 0 with order 1 k − k converges to 0 with order 1; convergence is super–linear 1 2 2 k converges a 0 with order 2 quadratic convergence Algorithms for unconstrained local optimization – p.

  12. Descent directions and the gradient Let f ∈ C 1 ( R n ) , x k ∈ R n : ∇ f ( x k ) � = 0 Let d ∈ R n . If d T ∇ f ( x k ) < 0 then d is a descent direction Taylor expansion: f ( x k + αd ) − f ( x k ) = αd T ∇ f ( x k ) + o ( α ) f ( x k + αd ) − f ( x k ) = d T ∇ f ( x k ) + o (1) α Thus if α is small enough f ( x k + αd ) − f ( x k ) < 0 NB: d might be a descent direction even if d T ∇ f ( x k ) = 0 Algorithms for unconstrained local optimization – p.

  13. Convergence of line search methods If a sequence x k +1 = x k + α k d k is generated in such a way that: L 0 = { x : f ( x ) ≤ f ( x 0 ) } is compact d k � = 0 whenever ∇ f ( x k ) � = 0 f ( x k +1 ) ≤ f ( x k ) if ∇ f ( x k ) � = 0 ∀ k then d T k lim � d k �∇ f ( x k ) = 0 k →∞ Algorithms for unconstrained local optimization – p.

  14. if d k � = 0 then | d T k ∇ f ( x k ) | ≥ σ ( �∇ f ( x k ) � ) � d k � where σ is such that lim k →∞ σ ( t k ) = 0 ⇒ lim k →∞ t k = 0 ( σ is called a forcing function) Algorithms for unconstrained local optimization – p. 1

  15. Then either there exists a finite index ¯ k such that ∇ f ( x ¯ k ) = 0 or otherwise x k ∈ L 0 and all of its limit points are in L 0 { f ( x k ) } admits a limit lim k →∞ ∇ f ( x k ) = 0 for every limit point ¯ x of { x k } we have ∇ f (¯ x ) = 0 Algorithms for unconstrained local optimization – p. 1

  16. Comments on the assumptions f ( x k +1 ) ≤ f ( x k ) : most optimization methods choose d k as a descent direction. If d k is a descent direction, choosing α k “sufficiently small” ensures the validity of the assumption d T lim k →∞ � d k � ∇ f ( x k ) = 0 : given a normalized direction d k , the k scalar product d k T ∇ f ( x k ) is the directional derivative of f along d k : it is required that this goes to zero. This can be achieved through precise line searches (choosing the step so that f is minimized along d k ) | d T k ∇ f ( x k ) | ≥ σ ( �∇ f ( x k ) � ) : letting, e.g., σ ( t ) = ct , c > 0 , if � d k � d k : d T k ∇ f ( x k ) < 0 then the condition becomes d T k ∇ f ( x k ) � d k � �∇ f ( x k � ≤ − c Algorithms for unconstrained local optimization – p. 1

  17. Recalling that d T k ∇ f ( x k ) cos θ k = � d k � �∇ f ( x k � then the condition becomes cos θ k ≤ − c that is, the angle between d k and ∇ f ( x k ) is bounded away from orthogonality. d T k ∇ f ( x k ) θ k Algorithms for unconstrained local optimization – p. 1

  18. Gradient Algorithms General scheme: x k +1 = x k − α k D k ∇ f ( x k ) with D k ≻ 0 e α k > 0 If ∇ f ( x k ) � = 0 then d k = D k ∇ f ( x k ) is a descent direction. In fact d T k ∇ f ( x k ) = −∇ T f ( x k ) D k ∇ f ( x k ) < 0 Algorithms for unconstrained local optimization – p. 1

  19. Steepest Descent or “gradient” method: D k := I i.e. x k +1 = x k − α k ∇ f ( x k ) . If ∇ f ( x k ) � = 0 then d k = −∇ f ( x k ) is a descent direction. Moreover, it is the steepest (w.r.t. the euclidean norm): d ∈ R n ∇ T f ( x k ) d min � d � ≤ 1 Algorithms for unconstrained local optimization – p. 1

  20. ∇ f ( x k ) Algorithms for unconstrained local optimization – p. 1

  21. . . . d ∈ R n ∇ T f ( x k ) d min √ d T d ≤ 1 KKT conditions: In the interior ⇒∇ T f ( x k ) = 0 ; if the constraint is active ⇒ ∇ f ( x k ) + λ d � d � = 0 √ d T d = 1 λ ≥ 0 ⇒ d = − ∇ f ( x k ) �∇ f ( x k ) � . Algorithms for unconstrained local optimization – p. 1

  22. Newton’s method � − 1 ∇ 2 f ( x k ) � D k := − Motivation: Taylor expansion of f : f ( x ) ≈ f ( x k ) + ∇ T f ( x k )( x − x k ) + 1 2( x − x k ) T ∇ 2 f ( x k )( x − x k ) Minimizing the approximation: ∇ f ( x k ) + ∇ 2 f ( x k )( x − x k ) = 0 If the hessian is non singular ⇒ � − 1 ∇ f ( x k ) ∇ 2 f ( x k ) � x = x k − Algorithms for unconstrained local optimization – p. 1

  23. Step choice Given d k , how to choose α k so that x k +1 = x k + α k d k ? “optimal” choice (one-dimensional optimization): α k = arg min α ≥ 0 f ( x k + αd k ) . Analytical expression of the optimal step is available only in few cases. E.g. if f ( x ) = 1 2 x T Qx + c T x with Q ≻ 0 . Then f ( x k + αd k ) = 1 2( x k + αd k ) T Q ( x k + αd k ) + c T ( x k + αd k ) = 1 2 α 2 d T k Qd k + α ( Qx k + c ) T d k + β where β does not depend on α . Algorithms for unconstrained local optimization – p. 1

  24. Minimizing w.r.t. α : αd T k Qd k + ( Qx k + c ) T d k = 0 ⇒ α = − ( Qx k + c ) T d k d T k Qd k = − d T k ∇ f ( x k ) d T k ∇ 2 f ( x k ) d k E.g., in steepest descent: �∇ f ( x k ) � 2 α k = ∇ T f ( x k ) ∇ 2 f ( x k ) ∇ f ( x k ) Algorithms for unconstrained local optimization – p. 2

  25. Approximate step size Rules for choosing a step-size (from the sufficient condition for convergence): f ( x k +1 ) < f ( x k ) d T lim k →∞ � d k � ∇ f ( x k ) = 0 k Often it is also required that � x k +1 − x k � → 0 d T K ∇ f ( x k + α k d k ) → 0 In general it is important to insure a sufficient reduction of f and a sufficiently large step x k +1 − x k Algorithms for unconstrained local optimization – p. 2

  26. Avoid too large steps ✉ ✉ ✉ ✉ Algorithms for unconstrained local optimization – p. 2

  27. Avoid too small steps ✉ ✉ ✉ ✉ ✉ Algorithms for unconstrained local optimization – p. 2

  28. Armijo’s rule Input : δ ∈ (0 , 1) , γ ∈ (0 , 1 / 2) , ∆ k > 0 α := ∆ k ; while ( f ( x k + αd k ) > f ( x k ) + γαd T k ∇ f ( x k ) ) do α := δα ; end return α Typical values : δ ∈ [0 . 1 , 0 . 5] , γ ∈ [10 − 4 , 10 − 3 ] . On exit the returned step is such that f ( x k + αd k ) ≤ f ( x k ) + γαd T k ∇ f ( x k ) Algorithms for unconstrained local optimization – p. 2

  29. acceptable steps α γαd T k ∇ f ( x k ) αd T k ∇ f ( x k ) Algorithms for unconstrained local optimization – p. 2

  30. Line search in practice How to choose the initial step size ∆ k ? Let φ ( α ) = f ( x k + αd k ) . A possibility is to choose ∆ k = α ⋆ , the minimizer of a quadratic approximation to φ ( · ) . Example: q ( α ) = c 0 + c 1 α + 1 2 c 2 α 2 q (0) = c 0 := f ( x k ) q ′ (0) = c 1 := d T k ∇ f ( x k ) Then α ⋆ = − c 1 /c 2 . Algorithms for unconstrained local optimization – p. 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend