1 quasi-newton in one variable: the secant method In a one - PDF document

QUASI-N E W TON MET HODS David F. Gleich February 29, 2012 Te material here is from Chapter 6 in No- Te idea behind Quasi-Newton methods is to make an optimization algorithm with cedal and Wright, and Section 12.3 in Griva, Sofer, and Nash. only a function value and gradient converge more quickly than steepest descent. Tat is, a Quasi-Newton method does not require a means to evaluate the Hessian matrix at the current iterate, as in a Newton method. Instead, the algorithm constructs a matrix that resembles the Hessian as it proceeds. In fact, there are many ways of doing this, and so there is really a family of Quasi- Newton methods. 1 quasi-newton in one variable: the secant method In a one dimensional problem, approximating the Hessian simplifies to approximating f ′ ( x + h )− f ′ ( x ) the second derivative: f ′′ ( x ) ≈ . Tus, the fact that this is possible is not h unreasonable. Using a related approximation in a one-dimensional optimization algorithm results in a procedure called the Secant method : 1 ( x k − x k − 1 ) f ′′ ( x k ) f ′ ( x k ) ” f ′ ( x k ) “ x k + 1 = x k − → x k + 1 = x k − f ′ ( x k ) − f ′ ( x k − 1 ) �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� One dimensional Newton ≈ 1 / f ′′ ( x k ) Tis new update is trying to approximate the Newton update by approximating the second derivative information. Te secant method converges superlinearly, under appropriate conditions; so this idea checks out in one-dimension. 2 quasi-newton in general Quasi-Newton methods are line-search methods that compute the search direction by trying to approximate the Newton direction: “ H ( x k ) p = − g ” without using the matrix H ( x k ) . Tey work by computing B k “that behaves like” H ( x k ) . Once we compute x k + 1 , then we update B k → B k + 1 . Tus, a Quasi-Newton method has the general iteration: initialize B 0 , and k = 0 for k = 0, ... and while x k does not satisfy the conditions we want ... solve for the search direction B k p k = − g compute a line search α k update x k + 1 = x k + α p k update B k + 1 based on x k + 1 We can derive different Quasi-Newton methods by changing how we update B k + 1 from B k . 1

3 the secant condition While there are many ways of updating B k + 1 from B k , a random choice is unlikely to provide any benefit, and may making things considerably worse. Tus, we want to start from a principled approach. Recall that the Newton direction H k p k = − g arises as the unconstrained minimizer of m N k ( p ) = f k + g T k p + 1 2 p T H k p when H k is positive definite. Te model for Quasi-Newton methods uses B k instead of H k : m Q k ( p ) = f k + g T k p + 1 2 p T B k p so one common requirement for B k is that it remains positive definite. Tis requirement is relaxed for some Quasi-Newton methods. However, all Quasi-Newton methods require: ∇ m Q k + 1 ( 0 ) = g ( x k + 1 ) and ∇ m Q k + 1 (− α k p k ) = g ( x k ) . In other works, a Quasi-Newton method has the property that the gradient of the model function m Q k + 1 ( p ) has the same gradient as f at x k and x k + 1 . Tis requirement imposes some conditions on B k + 1 : ∇ m Q k + 1 (− α p k ) = g ( x k + 1 ) − α k B k + 1 p k = g ( x k ) � → B k + 1 α k p k = g ( x k + 1 ) − g ( x k ) . Note that α k p k = x k + 1 − x k . If we define: s k = x k + 1 − x k and y k = g ( x k + 1 ) − g ( x k ) . Ten Quasi-Newton methods require: B k + 1 s k = y k , which is called the secant condition . If we write this out for a one-dimensional problem: b k + 1 ( x k + 1 − x k ) = f ′ ( x k + 1 ) − f ′ ( x k ) . Tis equation is identical to the approximation of f ′′ ( x k ) used in the secant method. Q uiz Is it always possible to find such a B k + 1 ? Suppose that B k is symmetric, positive definite. Show that we need y T k x k > 0 in order for B k + 1 to be positive definite. If B k = 1 for a one dimensional problem, find a function where this isn’t true. 4 finding the update We are getting closer to figuring out how to find such an update. Tere are many ways to derive the following updates, I’ll just list them and state their properties. 4.1 DAVIDSON, FLETCHER, POWELL (DFP) 1 Let ρ = k s k . y T B k + 1 = ( I − ρ k sy T ) B k ( I − ρ k sy T ) + ρ k y k y T k . Clearly this matrix is symmetric when B k is. Also, B k + 1 is positive definite. Quiz Show that B k + 1 is positive definite. Tis choice of B k + 1 has the following optimality property: minimize ∥ B − B k ∥ W B T = B , Bs k = y k subject to where W is a weight based on the average Hessian. 2

4.2 BROYDEN, FLETCHER, GOLDFARB, SHANNO (BFGS) – “STANDARD” Because we compute the search direction by solving a system with the approximate Hessian matrix: B k p k = − g k , the BFGS update constructs an approximation of the inverse Hessian instead. Suppose that T k “behaves like” H ( x ) − 1 . Ten T k + 1 y k = s k is the secant condition for the inverse. Tis helps because now we can find search directions via p k = − T k g k , via a matrix-vector multiplication instead of a linear solve. Te BFGS method uses the update: T k + 1 = ( I − ρ k sy T ) T k ( I − ρ k sy T ) + ρ s k s T k . By the same proof, this update is also positive definite. Tis choice has the following optimality property: ∥ T − T k ∥ W minimize T T = T , Ty k = s k subject to where W is a weight based on the average Hessian. 4.3 SYMMETRIC RANK-1 (SR1) – FOR TRUST REGION METHODS Both of the previous updates were rank-2 changes to B k (or T k ). Te SR1 method is a rank-1 update to B k . Unfortunately, this update will not preserve positive definiteness. Nonetheless, it’s frequently used in practice and is a reasonable choice for Trust Region methods that don’t require a positive definite approximate Hessian. Any rank-1 symmetric matrix is: σ vv T and so the update is: B k + 1 = B k + σ vv T . Applying the Secant equation constrains v , and we have: B k + 1 = B k + ( y k − B k s k )( y k − B k s k ) T ( y k − B k s k ) T s k or T k + 1 = T k + ( s k − T k y k )( s k − T k y k ) T . ( s k − T k y k ) T y k Te SR1 method tends to generate better approximations to the true Hessian than the other methods. For instance, if the search directions p k are all linearly independent for k = 1, . . . , n , and f ( x ) is a simple quadratic model, then T n is the inverse of the true Hessian. 4.4 BROYDEN CLASS Te Broyden class is a linear combination of the BFGS and the DFP method: B k + 1 = ( 1 − ϕ ) B BFGS k + 1 + ϕ B DFP k + 1 . (Tis form requires the BFGS update for B and not T .) Tere are all sorts of great properties of the Broyden class, e.g. for the right choice of parameters, it’ll reproduce the CG method. 3

1 quasi-newton in one variable: the secant method In a one - PDF document

QUASI-N E W TON MET HODS David F. Gleich February 29, 2012 Te material here is from Chapter 6 in No- Te idea behind Quasi-Newton methods is to make an optimization algorithm with cedal and Wright, and Section 12.3 in Griva, Sofer, and Nash.

Quasi-Newton methods for minimization Lectures for PHD course on Non-linear equations and

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico

Slope of a Secant MCV4U: Calculus & Vectors Recall that a secant is a line segment that

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics &

A Secant Method for Nonsmooth Optimization Asef Nazari CSIRO Melbourne CARMA Workshop on

NEWTON SEPAC End of Year Report to Newton School Committee June 10, 2019 Newton SEPAC Co-Chairs

Designs Chapter 11 Quasi-Experimentation Quasi-experiments resemble experiments, but lack

Convex Optimization ( EE227A: UC Berkeley ) Lecture 25 (Newton, quasi-Newton) 23 Apr, 2013

Worldwide Newton Conference Paris, September 2004 eBook composition on the Newton MessagePad 2100

Images of Isaac Newton 1 Portrait of Isaac Newton, Godfrey Kneller, 1689 This image is in the

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Tropical Secant Graphs of Monomial Curves M. Angelica Cueto UC Berkeley Joint work with Shaowei

Tropical Secant Graphs of Monomial Curves Mar a Ang elica Cueto Shaowei Lin Department

Solutions of Equations in One Variable Secant & Regula Falsi Methods Numerical Analysis (9th

Review The Newton method and how it works where bisection cannot! Next class The

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Quadrature CS3220 - Summer 2008 Jonathan Kaldor What is Quadrature? Numerical evaluation /

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Introduction to Convex Optimization Xuezhi Wang Computer Science Department Carnegie Mellon

Sparse Inverse Covariance Estimation Using Quadratic Approximation Inderjit S. Dhillon Dept of

Higher-order Segmentation Functionals: Entropy, Color Consistency, Curvature, etc. Yuri Boykov

EC400 Part II, Math for Micro: Lecture 2 Leonardo Felli NAB.SZT 10 September 2010 Taylors

Next steps for housing in Scotland: investment, delivery and meeting housing needs Involving

Benny Pinkas, Bar-Ilan University Can el elec ecti tions, ons, auction ions, , statistic

1 quasi-newton in one variable: the secant method In a one - PDF document

QUASI-N E W TON MET HODS David F. Gleich February 29, 2012 Te material here is from Chapter 6 in No- Te idea behind Quasi-Newton methods is to make an optimization algorithm with cedal and Wright, and Section 12.3 in Griva, Sofer, and Nash.

Quasi-Newton methods for minimization Lectures for PHD course on Non-linear equations and

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico

Slope of a Secant MCV4U: Calculus &amp; Vectors Recall that a secant is a line segment that

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics &amp;

A Secant Method for Nonsmooth Optimization Asef Nazari CSIRO Melbourne CARMA Workshop on

NEWTON SEPAC End of Year Report to Newton School Committee June 10, 2019 Newton SEPAC Co-Chairs

Designs Chapter 11 Quasi-Experimentation Quasi-experiments resemble experiments, but lack

Convex Optimization ( EE227A: UC Berkeley ) Lecture 25 (Newton, quasi-Newton) 23 Apr, 2013

Worldwide Newton Conference Paris, September 2004 eBook composition on the Newton MessagePad 2100

Images of Isaac Newton 1 Portrait of Isaac Newton, Godfrey Kneller, 1689 This image is in the

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Tropical Secant Graphs of Monomial Curves M. Angelica Cueto UC Berkeley Joint work with Shaowei

Tropical Secant Graphs of Monomial Curves Mar a Ang elica Cueto Shaowei Lin Department

Solutions of Equations in One Variable Secant &amp; Regula Falsi Methods Numerical Analysis (9th

Review The Newton method and how it works where bisection cannot! Next class The

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Quadrature CS3220 - Summer 2008 Jonathan Kaldor What is Quadrature? Numerical evaluation /

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Introduction to Convex Optimization Xuezhi Wang Computer Science Department Carnegie Mellon

Sparse Inverse Covariance Estimation Using Quadratic Approximation Inderjit S. Dhillon Dept of

Higher-order Segmentation Functionals: Entropy, Color Consistency, Curvature, etc. Yuri Boykov

EC400 Part II, Math for Micro: Lecture 2 Leonardo Felli NAB.SZT 10 September 2010 Taylors

Next steps for housing in Scotland: investment, delivery and meeting housing needs Involving

Benny Pinkas, Bar-Ilan University Can el elec ecti tions, ons, auction ions, , statistic

Slope of a Secant MCV4U: Calculus & Vectors Recall that a secant is a line segment that

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics &

Solutions of Equations in One Variable Secant & Regula Falsi Methods Numerical Analysis (9th