reduced hessian methods for constrained optimization
play

Reduced-Hessian Methods for Constrained Optimization Philip E. Gill - PowerPoint PPT Presentation

Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its Applications Huatulco, Mexico,


  1. Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its Applications Huatulco, Mexico, January 8–12, 2018. UC San Diego | Center for Computational Mathematics 1/45

  2. Our honoree . . . UC San Diego | Center for Computational Mathematics 2/45

  3. Outline 1 Reduced-Hessian Methods for Unconstrained Optimization 2 Bound-Constrained Optimization 3 Quasi-Wolfe Line Search 4 Reduced-Hessian Methods for Bound-Constrained Optimization 5 Some Numerical Results UC San Diego | Center for Computational Mathematics 3/45

  4. Reduced-Hessian Methods for Unconstrained Optimization UC San Diego | Center for Computational Mathematics 4/45

  5. Definitions Minimize f : R n �→ R ∈ C 2 with quasi-Newton line-search method: Given x k , let f k = f ( x k ), g k = ∇ f ( x k ), and H k ≈ ∇ 2 f ( x k ). Choose p k such that x k + p k minimizes the quadratic model q k ( x ) = f k + g T k ( x − x k ) + 1 2 ( x − x k ) T H k ( x − x k ) If H k is positive definite then p k satisfies H k p k = − g k (qN step) UC San Diego | Center for Computational Mathematics 5/45

  6. Definitions Define x k +1 = x k + α k p k where α k is obtained from line search on φ k ( α ) = f ( x k + α p k ) • Armijo condition : φ k ( α ) < φ k (0) + η A αφ ′ η A ∈ (0 , 1 k (0) , 2 ) • (strong) Wolfe conditions : φ k ( α ) < φ k (0) + η A αφ ′ η A ∈ (0 , 1 k (0) , 2 ) | φ ′ k ( α ) | ≤ η W | φ ′ k (0) | , η W ∈ ( η A , 1) UC San Diego | Center for Computational Mathematics 6/45

  7. f ( x k + αp k ) Wolfe conditions α

  8. Quasi-Newton Methods Updating H k : • H 0 = σ I n where σ > 0 • Compute H k +1 as the BFGS update to H k , i.e., 1 1 H k s k s T y k y T H k +1 = H k − k H k + k , s T y T k H k s k k s k where s k = x k +1 − x k , y k = g k +1 − g k , and y T k s k approximates the curvature of f along p k . • Wolfe condition guarantees that H k can be updated. One option to calculate p k : • Store upper-triangular Cholesky factor R k where R T k R k = H k UC San Diego | Center for Computational Mathematics 8/45

  9. Quasi-Newton Methods Updating H k : • H 0 = σ I n where σ > 0 • Compute H k +1 as the BFGS update to H k , i.e., 1 1 H k s k s T y k y T H k +1 = H k − k H k + k , s T y T k H k s k k s k where s k = x k +1 − x k , y k = g k +1 − g k , and y T k s k approximates the curvature of f along p k . • Wolfe condition guarantees that H k can be updated. One option to calculate p k : • Store upper-triangular Cholesky factor R k where R T k R k = H k UC San Diego | Center for Computational Mathematics 8/45

  10. Reduced-Hessian Methods (Fenelon, 1981 and Siegel, 1992) Let G k = span( g 0 , g 1 , . . . , g k ) and G ⊥ k be the orthogonal complement of G k in R n . Result Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H 0 = σ I ( σ > 0), then: • p k ∈ G k for all k . • If z ∈ G k and w ∈ G ⊥ k , then H k z ∈ G k and H k w = σ w . UC San Diego | Center for Computational Mathematics 9/45

  11. Reduced-Hessian Methods (Fenelon, 1981 and Siegel, 1992) Let G k = span( g 0 , g 1 , . . . , g k ) and G ⊥ k be the orthogonal complement of G k in R n . Result Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H 0 = σ I ( σ > 0), then: • p k ∈ G k for all k . • If z ∈ G k and w ∈ G ⊥ k , then H k z ∈ G k and H k w = σ w . UC San Diego | Center for Computational Mathematics 9/45

  12. Reduced-Hessian Methods (Fenelon, 1981 and Siegel, 1992) Let G k = span( g 0 , g 1 , . . . , g k ) and G ⊥ k be the orthogonal complement of G k in R n . Result Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H 0 = σ I ( σ > 0), then: • p k ∈ G k for all k . • If z ∈ G k and w ∈ G ⊥ k , then H k z ∈ G k and H k w = σ w . UC San Diego | Center for Computational Mathematics 9/45

  13. Reduced-Hessian Methods (Fenelon, 1981 and Siegel, 1992) Let G k = span( g 0 , g 1 , . . . , g k ) and G ⊥ k be the orthogonal complement of G k in R n . Result Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H 0 = σ I ( σ > 0), then: • p k ∈ G k for all k . • If z ∈ G k and w ∈ G ⊥ k , then H k z ∈ G k and H k w = σ w . UC San Diego | Center for Computational Mathematics 9/45

  14. Reduced-Hessian Methods Significance of p k ∈ G k : • No need to minimize the quadratic model over the full space. • Search directions lie in an expanding sequence of subspaces. Significance of H k z ∈ G k and H k w = σ w : • Curvature stored in H k along any unit vector in G ⊥ k is σ . • All nontrivial curvature information in H k can be stored in a smaller r k × r k matrix, where r k = dim( G k ). UC San Diego | Center for Computational Mathematics 10/45

  15. Reduced-Hessian Methods Significance of p k ∈ G k : • No need to minimize the quadratic model over the full space. • Search directions lie in an expanding sequence of subspaces. Significance of H k z ∈ G k and H k w = σ w : • Curvature stored in H k along any unit vector in G ⊥ k is σ . • All nontrivial curvature information in H k can be stored in a smaller r k × r k matrix, where r k = dim( G k ). UC San Diego | Center for Computational Mathematics 10/45

  16. Reduced-Hessian Methods Given a matrix B k ∈ R n × r k , whose columns span G k , let • B k = Z k T k be the QR decomposition of B k ; • W k be a matrix whose orthonormal columns span G ⊥ k ; � � • Q k = Z k W k . H k p k = − g k ⇔ ( Q T k H k Q k ) Q T k p k = − Q T Then, k g k , where � � � � Z T Z T Z T k H k Z k k H k W k k H k Z k 0 Q T k H k Q k = = W T W T k H k Z k k H k W k 0 σ I n − r k � � Z T k g k Q T k g k = . 0 UC San Diego | Center for Computational Mathematics 11/45

  17. Reduced-Hessian Methods Given a matrix B k ∈ R n × r k , whose columns span G k , let • B k = Z k T k be the QR decomposition of B k ; • W k be a matrix whose orthonormal columns span G ⊥ k ; � � • Q k = Z k W k . H k p k = − g k ⇔ ( Q T k H k Q k ) Q T k p k = − Q T Then, k g k , where � � � � Z T Z T Z T k H k Z k k H k W k k H k Z k 0 Q T k H k Q k = = W T W T k H k Z k k H k W k 0 σ I n − r k � � Z T k g k Q T k g k = . 0 UC San Diego | Center for Computational Mathematics 11/45

  18. Reduced-Hessian Methods Given a matrix B k ∈ R n × r k , whose columns span G k , let • B k = Z k T k be the QR decomposition of B k ; • W k be a matrix whose orthonormal columns span G ⊥ k ; � � • Q k = Z k W k . H k p k = − g k ⇔ ( Q T k H k Q k ) Q T k p k = − Q T Then, k g k , where � � � � Z T Z T Z T k H k Z k k H k W k k H k Z k 0 Q T k H k Q k = = W T W T k H k Z k k H k W k 0 σ I n − r k � � Z T k g k Q T k g k = . 0 UC San Diego | Center for Computational Mathematics 11/45

  19. Reduced-Hessian Methods A reduced-Hessian (RH) method obtains p k from Z T k HZ k q k = − Z T p k = Z k q k where q k solves k g k , (RH step) which is equivalent to (qN step). In practice, we use a Cholesky factorization R T k R k = Z T k H k Z k . • The new gradient g k +1 is accepted iff � ( I − Z k Z T k ) g k +1 � > ǫ . • Store and update Z k , R k , Z T k p k , Z T k g k , and Z T k g k +1 . UC San Diego | Center for Computational Mathematics 12/45

  20. Reduced-Hessian Methods A reduced-Hessian (RH) method obtains p k from Z T k HZ k q k = − Z T p k = Z k q k where q k solves k g k , (RH step) which is equivalent to (qN step). In practice, we use a Cholesky factorization R T k R k = Z T k H k Z k . • The new gradient g k +1 is accepted iff � ( I − Z k Z T k ) g k +1 � > ǫ . • Store and update Z k , R k , Z T k p k , Z T k g k , and Z T k g k +1 . UC San Diego | Center for Computational Mathematics 12/45

  21. H k = Q k Q T k H k Q k Q T k � � Z T � � Z k � � k H k Z k 0 = Z k W k 0 σ I n − r k W k = Z k ( Z T k H k Z k ) Z T k + σ ( I − Z k Z T k ) . ⇒ any z such that Z T k z = 0 satisfies H k z = σ z . UC San Diego | Center for Computational Mathematics 13/45

  22. Reduced-Hessian Method Variants Reinitialization: If g k +1 �∈ G k , the Cholesky factor R k is updated as � R k � 0 R k +1 ← √ σ k +1 , 0 where σ k +1 is based on the latest estimate of the curvature, e.g., σ k +1 = y T k s k . s T k s k Lingering: restrict search direction to a smaller subspace and allow the subspace to expand only when f is suitably minimized on that subspace. UC San Diego | Center for Computational Mathematics 14/45

  23. Reduced-Hessian Method Variants Reinitialization: If g k +1 �∈ G k , the Cholesky factor R k is updated as � R k � 0 R k +1 ← √ σ k +1 , 0 where σ k +1 is based on the latest estimate of the curvature, e.g., σ k +1 = y T k s k . s T k s k Lingering: restrict search direction to a smaller subspace and allow the subspace to expand only when f is suitably minimized on that subspace. UC San Diego | Center for Computational Mathematics 14/45

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend