computational optimization
play

Computational Optimization Quasi Newton Methods 2/22 NW Chapter 8 - PowerPoint PPT Presentation

Computational Optimization Quasi Newton Methods 2/22 NW Chapter 8 Theorem 3.4 Suppose f is twice cont diff and the sequence of steepest descent converge to x* satisfying SOSC. 1 Let =


  1. Computational Optimization Quasi Newton Methods 2/22 NW Chapter 8

  2. Theorem 3.4 Suppose f is twice cont diff and the sequence of steepest descent converge to x* satisfying SOSC. ⎛ ⎞ λ − λ ⎛ ⎞ ρ − 1 Let ∈ = λ ≤ ≤ λ ∇ 2 ⎜ n 1 ⎟ ⎜ ⎟ r where ... eigenvaluesof f x ( *) λ + λ ρ + 1 n ⎝ ⎠ ⎝ ⎠ 1 n 1 The for all k sufficient large [ ] − ≤ − 2 f ( x ) f ( x *) r f ( x ) f ( x *) + k 1 k

  3. Choosing better directions Steepest Descent – simple and cheap per iterations but can converge very slowly if conditioning is bad. Modified Newton’s – expensive per iterations but converges quickly. Goal – first order methods with Newton like behavior

  4. Scaled Steepest Descent Pick approximation of Hessian D k + = − α ∇ x x D f x ( ) k 1 k k k k ( ) = 1/ 2 Let S D k k Do change = x Sy of variables Now problem is = min ( ) h y f Sy ( )

  5. Scaled Steepest Descent… = − α ∇ y y h y ( ) + k 1 k k k = − α ∇ y S f Sy ( ) k k k Multiple by S = − α ∇ S y Sy SS f Sy ( ) + k 1 k k k = − α ∇ S y Sy D f Sy ( ) + k 1 k k k = − α ∇ x x D f x ( ) + k 1 k k k Thus convergence rate of steepest descent applies in this space = = ( ) ( ) ' g y f Sy y SQSy

  6. Scaled Steepest Descent… Convergence rate governed by eigs of SQS λ = smallest eigenvale of SQS 1 λ = largest eigenvale of SQS n ( ) 1/ 2 λ λ -1 Choose S close to Q to make / close to 1 n 1 ( ) ( ) ( ) 1/ 2 1/ 2 = -1 -1 note Q Q Q I

  7. Cheap Newton Approximation Use just diagonal of Hessian ⎡ − ⎤ 1 ⎛ ⎞ ∂ 2 f ⎢ ⎥ ⎜ ⎟ 0 0 ∂ ∂ ⎢ ⎥ ⎝ ⎠ x x 1 1 ⎢ ⎥ − 1 ⎛ ⎞ ∂ ⎢ ⎥ 2 f = = ⎜ ⎟ S D 0 0 ⎢ ⎥ ∂ ∂ k ⎝ ⎠ x x ⎢ ⎥ 2 2 ⎢ ⎥ − 1 ⎛ ⎞ ∂ 2 f ⎢ ⎥ ⎜ ⎟ 0 0 ⎢ ⎥ ∂ ∂ ⎝ ⎠ x x ⎢ ⎥ ⎣ ⎦ 3 3 Linear storage and computation, inverse is trivial. Limited effectiveness.

  8. Quasi-Newton Methods Newton’s Method 2 ( ∇ = −∇ f x ) p f x ( ) k k Instead substitute B k = −∇ B p f x ( ) k k to get Newton-like directions

  9. Better yet – estimate Newton inverse Quasi-Newton Methods ) k f x ( = −∇ 1 B − k = B p k k H directly

  10. 1-dimensional case In 1-d case might estimate change in derivative − f '( x ) f '( x ) ≈ − k k 1 f ''( x ) change in x − k x x − k k 1 If you do this you get secant method − x x = − − k k 1 x x f '( x ) f '( x ) + − k 1 k k k f '( x ) f '( x ) − k k 1 0 x k-1 x k x k+1

  11. 1-d convergence Secant method has superlinear convergence with rate ( ) 1 1 r = + 5 (the "golden ratio" again!) 2 But Secant Method only applies to 1-d

  12. Secant Condition 1-d condition − = − '' f ( x )( x x ) f '( x ) f '( x ) − − k k k 1 k k 1 Generalizes to ∇ − = ∇ −∇ 2 f x ( )( x x ) f x ( ) f x ( ) − − k k k 1 k k 1 So we want − = ∇ −∇ B ( x x ) f x ( ) f x ( ) − − k k k 1 k k 1

  13. Another way to think about it Approximating quadratic model = + ∇ + 1 m ( ) p f x ( ) f x ( )' p p B p ' k k k k 2 Gradient = grad of current iterate ∇ = ∇ m (0) f x ( ) k k Want gradient = gradient of old iterate ∇ − = ∇ − α = ∇ − α = ∇ m ( x x ) m ( p ) f x ( ) B p f x ( ) − − − − − k k k 1 k k k 1 k k k 1 k 1 k 1 So α − = ∇ −∇ B p f x ( ) f x ( ) − − k k 1 k 1 k k 1

  14. Quadratic Case For min 1/2 x’Qx-b’x ∇ −∇ = − − − f x ( ) f x ( ) ( Qx b ) ( Qx b ) − − k k 1 k k 1 = − Q x ( x ) − 1 k k So B k should act like Q along direction = − s x x − k k k 1 = ∇ −∇ y f x ( ) f x ( ) Let − k k k 1 So Quasi Newton Condition becomes = B s y + k 1 k k

  15. Choice of B At each step we get information about Q along direction x k -x k-1 Use it to update our estimate of Q Many possible ways to do this and still satisfy quasi-Newton condition

  16. BFGS Update Update by adding two matrices ′ ′ = + α + β B B a a b b Note outer product + k 1 k k k k k Need ′ ′ = + α + β B s B s a a s b b s =y from QNC + k 1 k k k k k k k k k k ′ α = So we make a a s y k k k k ′ = β and - b b s B s k k k k

  17. BFGS Update ′ = β To make b b s - B s k k k k ( ) = Define b B s k k k ( )( ) β = β ' ' So b b s B s B s s k k k k k k k k ( ) ( ) = β ' s B s B s k k k k k 1 So pick β =- ' s B s k k k

  18. BFGS Update k y k ) k y y s y ) k ' y s = k k ' y s k 1 k ' k a a s k ′ α ( k α ( k α = k = = y α k = a a s So define ' k k a To make k Define α So

  19. BFGS Update Final Update is ′ ( )( ) ′ B s B s y y = − + k k k k k k B B + 1 ′ ′ k k s B s y s k k k k k This is called a BFGS family update for Broyden Fletcher Goldfarb and Shanno

  20. Key Ideas This update is called a rank 2 update since it adds two rank one matrices. We want B k to be p.d. and symmetric. =−∇ Want to solve efficiently. B p f x ( ) k k k Two possible ways

  21. Descent directions Need B to be positive definite. Necessary condition=Curvature Condition = ⇒ = > B s y s ' B s s ' y 0 + + k 1 k k k k 1 k k k Enforce for general conditions using Wolfe or Strong Wolfe Conditions

  22. Wolfe Conditions For 0<c 1 <c 2 <1 ≤ + α ∇ f ( x ) f ( x ) c f ( x ) ' p + k 1 k 1 k k ∇ ≥ ∇ f ( x ) ' p c f ( x ) ' p + k 1 k 2 k k Implies ∇ ≥ ∇ f ( x ) ' s c f ( x ) ' s + k 1 k 2 k k ( ) = ∇ − ∇ ≥ − ∇ y ' s f ( x ) f ( x ) ' s ( c 1) f ( x ) ' s + k k k 1 k k 2 k k = − ∇ α > ( c 1) f ( x ) '( p ) 0 2 k k k

  23. Guaranteeing B p.d. and sym. Lemma 11.5 in Nash and Sofer if B k is p.d. and symmetric then B k+1 is p.d. if and only if y k ’s k >0 So enforce this condition in linesearch procedure using wolfe conditions ′ [ ] [ ] ∇ −∇ − > f x ( ) f x ( ) x x 0 − − k k 1 k k 1

  24. Quasi-Newton Algorithm with BFGS update Start with x 0. B 0 e.g. B 0 = I For k =1,…,K � If x k is optimal then stop � Solve: =−∇ B p f x ( ) using modified cholesky fact. k k k � Perform linesearch satisfying Wolf conditions x k+1 =x k + α k p k � Update s k =x k+1 -x k, = ∇ −∇ y f x ( ) f x ( ) + k k 1 k ′ ( )( ) ′ B s B s y y = − + k k k k k k � B B + 1 ′ ′ k k s B s y s k k k k k

  25. Add Wolfe Condition to Linesearch Wolfe condition is approximation to optimality condition for the exact linesearch. + α = α min f x ( p ) g ( ) α k k α = = ∇ + α ' Optim. Cond. '( ) g 0 p f x ( p ) k k k ∇ + α ≤ η ∇ η > ' ' want p f x ( p ) p f x ( ) for 1> 0 k k k k k Used with Armijo search condition

  26. Theorem 8.5 – global convergence Assumes start with symmetric pd B0 F is twice continuously differentiable X0 forms a convex level set, and eigenvalues of hessian on that level are bounded and strictly positive Then BFGS converges to minimizer of f.

  27. Theorem 8.6 Assumes BFGS converges to x* and Hessian is Lipschitz in neighborhood of x* Then quasi-Newton BFGS algorithm has superlinear convergence.

  28. B + = B =LL, want LL k k 1 Easy update of Cholesky Fact. Don’t need to refactorize whole matrix each time. Just much simpler matrix. ′ ( )( ) ′ B s B s y y = − + k k k k k k B B + ′ ′ k 1 k s B s y s k k k k k ′ ( )( ) ′ LL s ' LL s ' y y = − + k k k k LL ' O(n 2 ) ′ ′ s LL s ' y s k k k k ⎛ ⎞ ˆ ˆ ss ' yy ' ′ = − + = = ⎜ ⎟ ˆ L I L ' where s L s ' , Ly y k k ⎝ ⎠ s s ' y s ' � � � = LLL L ' ' where L are factors of inner matrix ˆ ˆ =LL'

  29. Practical considerations see pages 200-201 Linesearches that don’t’ satisfy Wolfe conditions may not satisfy curvature condition. Then no descent direction so need some kind of recovery strategy. (Book suggest damped Newton) Can eliminate solving Newton equation

  30. Calculating H B − = 1 Want H k k = − ρ − ρ + ρ ' ' ' H ( I s y ) H ( I s y ) s s k k k k k k k k k k k 1 ρ = where k ' y s k k Book shows derivation of this directly

  31. Finding H = Want H y s + k 1 k k H H Such that is as close as possible to + k 1 k − min H H k = = subject to H H ' Hy s k k Can go back and forth between H and B using Sherman-Morrison-Woodbury Formula (see page 605)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend