MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye - PowerPoint PPT Presentation

MATH 4211/6211 – Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0

Quasi-Newton Method Motivation : Approximate the inverse Hessian ( ∇ 2 f ( x ( k ) )) − 1 in the New- ton’s method by some H k : x ( k +1) = x ( k ) − α k H k g ( k ) That is, the search direction is set to d ( k ) = − H k g ( k ) . Based on H k , x ( k ) , g ( k ) , quasi-Newton generates the next H k +1 , and so on. Xiaojing Ye, Math & Stat, Georgia State University 1

Proposition . If f ∈ C 1 , g ( k ) � = 0 , and H k ≻ 0 , then d ( k ) = − H k g ( k ) is a descent direction. Proof . Let x ( k +1) = x ( k ) − α H k g ( k ) for some α , then by Taylor’s expansion f ( x ( k +1) ) = f ( x ( k ) ) − α g ( k ) ⊤ H k g ( k ) + o ( � H k g ( k ) � α ) < f ( x ( k ) ) for α sufficiently small. Xiaojing Ye, Math & Stat, Georgia State University 2

Recall that for quadratic functions with Q ≻ 0 , the Hessian is H ( k ) = Q for all k , and g ( k +1) − g ( k ) = Q ( x ( k +1) − x ( k ) ) For notation simplicity, we denote ∆ x ( k ) = x ( k +1) − x ( k ) ∆ g ( k ) = g ( k +1) − g ( k ) and Then we can write the identity above as ∆ g ( k ) = Q ∆ x ( k ) or equivalently Q − 1 ∆ g ( k ) = ∆ x ( k ) Xiaojing Ye, Math & Stat, Georgia State University 3

In quasi-Newton method, H k is in the place of Q − 1 : x ( k +1) = x ( k ) − α k Q − 1 g ( k ) Newton : x ( k +1) = x ( k ) − α k H k g ( k ) Quasi-Newton : Therefore we would like to have a sequence of H k with same property of Q − 1 : H k +1 ∆ g ( i ) = ∆ x ( i ) , 0 ≤ i ≤ k for all k = 0 , 1 , 2 , . . . . Xiaojing Ye, Math & Stat, Georgia State University 4

If this is true, then at iteration n , there are H n ∆ g (0) = ∆ x (0) H n ∆ g (1) = ∆ x (1) . . . H n ∆ g ( n − 1) = ∆ x ( n − 1) or H n [∆ g (0) , . . . , ∆ g ( n − 1) ] = [∆ x (0) , . . . , ∆ x ( n − 1) ] . On the other hand, Q − 1 [∆ g (0) , . . . , ∆ g ( n − 1) ] = [∆ x (0) , . . . , ∆ x ( n − 1) ] . If [∆ g (0) , . . . , ∆ g ( n − 1) ] is invertible, then we have H n = Q − 1 . Then at the iteration n + 1 , there is x ( n +1) = x ( n ) − α n H n g ( n ) = x ∗ since this is the same as the Newton’s update. Hence for quadratic functions, quasi-Newton method would converge in at most n steps. Xiaojing Ye, Math & Stat, Georgia State University 5

Quasi-Newton method d ( k ) = − H k g ( k ) f ( x ( k ) + α k d ( k ) ) α k = arg min α ≥ 0 x ( k +1) = x ( k ) + α k d ( k ) where H 0 , H 1 , . . . are symmetric. Moreover, for quadratic functions of form f ( x ) = 1 2 x ⊤ Qx − b ⊤ x , the matrices H 0 , H 1 , . . . are required to satisfy H k +1 ∆ g ( i ) = ∆ x ( i ) , 0 ≤ i ≤ k Xiaojing Ye, Math & Stat, Georgia State University 6

Theorem . Consider a quasi-Newton algorithm applied to a quadratic function with symmetric Q ≻ 0 , such that for all k = 0 , 1 , . . . , n − 1 , there are H k +1 ∆ g ( i ) = ∆ x ( i ) , 0 ≤ i ≤ k and H k are all symmetric. If α i � = 0 for 0 ≤ i ≤ k , then d (0) , . . . , d ( n ) are Q -conjugate. Xiaojing Ye, Math & Stat, Georgia State University 7

Proof . We prove by induction. It is trivial to show g (1) ⊤ d ( i ) . Assume the claim holds for some k < n − 1 . We have for i ≤ k that d ( k +1) ⊤ Qd ( i ) = − ( H k +1 g ( k +1) ) ⊤ Qd ( i ) Q ∆ x ( i ) = − g ( k +1) ⊤ H k +1 α i ∆ g ( i ) = − g ( k +1) ⊤ H k +1 α i = − g ( k +1) ⊤ ∆ x ( i ) α i = − g ( k +1) ⊤ d ( i ) Since d (0) , . . . , d ( k ) are Q -conjugate, we know g ( k +1) ⊤ d ( i ) = 0 for all i ≤ Hence d (0) , . . . , d ( k ) , d ( k +1) are Q -conjugate. k . By induction the claim holds. Xiaojing Ye, Math & Stat, Georgia State University 8

The theorem above also shows that quasi-Newton method is a conjugate direction method, and hence converges in n steps for quadratic objective functions. In practice, there are various ways to generate H k such that H k +1 ∆ g ( i ) = ∆ x ( i ) , 0 ≤ i ≤ k Now we learn three algorithms that produce such H k . Xiaojing Ye, Math & Stat, Georgia State University 9

Rank one correction formula Suppose we would like to update H k to H k +1 by adding a rank one matrix a k z ( k ) z ( k ) ⊤ for some a k ∈ R and z ( k ) ∈ R n : H k +1 = H k + a k z ( k ) z ( k ) ⊤ Now let us derive what this a k z ( k ) z ( k ) ⊤ should be. Since we need H k +1 ∆ g ( i ) = ∆ x ( i ) for i ≤ k , we at least need H k +1 ∆ g ( k ) = ∆ x ( k ) . That is ∆ x ( k ) = H k +1 ∆ g ( k ) = ( H k + a k z ( k ) z ( k ) ⊤ )∆ g ( k ) = H k ∆ g ( k ) + a k ( z ( k ) ⊤ ∆ g ( k ) ) z ( k ) Xiaojing Ye, Math & Stat, Georgia State University 10

Therefore z ( k ) = ∆ x ( k ) − H k ∆ g ( k ) a k ( z ( k ) ⊤ ∆ g ( k ) ) and hence H k +1 = H k + (∆ x ( k ) − H k ∆ g ( k ) )(∆ x ( k ) − H k ∆ g ( k ) ) ⊤ a k ( z ( k ) ⊤ ∆ g ( k ) ) 2 On the other hand, multiplying ∆ g ( k ) ⊤ on both sides of ∆ x ( k ) − H k g ( k ) = a k ( z ( k ) ⊤ ∆ g ( k ) ) z ( k ) , we obtain ∆ g ( k ) ⊤ (∆ x ( k ) − H k ∆ g ( k ) ) = a k ( z ( k ) ⊤ ∆ g ( k ) ) 2 . Hence H k +1 = H k + (∆ x ( k ) − H k ∆ g ( k ) )(∆ x ( k ) − H k ∆ g ( k ) ) ⊤ ∆ g ( k ) ⊤ (∆ x ( k ) − H k ∆ g ( k ) ) This is the rank one correction formula. Xiaojing Ye, Math & Stat, Georgia State University 11

We obtained the formula by requiring H k +1 ∆ g ( k ) = ∆ x ( k ) . However, we also need H k +1 ∆ g ( i ) = ∆ x ( i ) for i < k . This turns out to be true automat- ically: Theorem . For the rank one algorithm applied to quadratic functions with Hes- sian symmetric Q , there are H k +1 ∆ g ( i ) = ∆ x ( i ) , 0 ≤ i ≤ k for k = 0 , 1 , . . . , n − 1 . Xiaojing Ye, Math & Stat, Georgia State University 12

We have showed H k +1 ∆ g ( k ) = ∆ x ( k ) for all k = 0 , 1 , 2 , · · · . Proof . Assume the identities hold up to k , we use induction to show it’s true for k +1 . We here only need to show H k +1 ∆ g ( i ) = ∆ x ( i ) for i < k : H k + (∆ x ( k ) − H k ∆ g ( k ) )(∆ x ( k ) − H k ∆ g ( k ) ) ⊤ � � H k +1 ∆ g ( i ) = ∆ g ( i ) ∆ g ( k ) ⊤ (∆ x ( k ) − H k ∆ g ( k ) ) = ∆ x ( i ) + (∆ x ( k ) − H k ∆ g ( k ) )(∆ x ( k ) − H k ∆ g ( k ) ) ⊤ ∆ g ( i ) ∆ g ( k ) ⊤ (∆ x ( k ) − H k ∆ g ( k ) ) Note that ( H k ∆ g ( k ) ) ⊤ ∆ g ( i ) = ∆ g ( k ) ⊤ H k ∆ g ( i ) = ∆ g ( k ) ⊤ ∆ x ( i ) = ∆ x ( k ) ⊤ Q ∆ x ( i ) = ∆ x ( k ) ⊤ ∆ g ( i ) Hence the second term on the right is zero, and we obtain H k ∆ g ( i ) = ∆ x ( i ) This completes the proof. Xiaojing Ye, Math & Stat, Georgia State University 13

Issues with rank one correction formula: • H k +1 may not be positive definite even if H k is. Hence − H k g ( k ) may not be a descent direction; • the denominator in the rank one correction is ∆ g ( k ) ⊤ (∆ x ( k ) − H k ∆ g ( k ) ) , which can be close to 0 and makes computation unstable. Xiaojing Ye, Math & Stat, Georgia State University 14

We now study the DFP algorithm which improves the rank one correction formula by ensuring positive definiteness of H k . DFP algoirthm [Davidson 1959, Fletcher and Powell 1963] H k +1 = H k + ∆ x ( k ) ∆ x ( k ) ⊤ ∆ x ( k ) ⊤ ∆ g ( k ) − ( H k ∆ g ( k ) )( H k ∆ g ( k ) ) ⊤ ∆ g ( k ) ⊤ H k ∆ g ( k ) Xiaojing Ye, Math & Stat, Georgia State University 15

We first show that DFP is a quasi-Newton method. Theorem . The DFP algorithm applied to quadratic functions satisfies H k +1 ∆ g ( i ) = ∆ x ( i ) , 0 ≤ i ≤ k for all k . Xiaojing Ye, Math & Stat, Georgia State University 16

Proof . We prove this by induction. It is trivial for k = 0 . Assume the claim is true for k , i.e., H k ∆ g ( i ) = ∆ x ( i ) for all i ≤ k − 1 . Now we first have H k +1 ∆ g ( i ) = ∆ x ( i ) for i = k by direct computation. For i < k , there is H k +1 ∆ g ( i ) = H k ∆ g ( i ) + ∆ x ( k ) (∆ x ( k ) ⊤ ∆ g ( i ) ) ∆ x ( k ) ⊤ ∆ g ( k ) − ( H k ∆ g ( k ) )( H k ∆ g ( k ) ) ⊤ ∆ g ( i ) ∆ g ( k ) ⊤ H k ∆ g ( k ) Note that due to assumption d (0) , . . . , d ( k ) are Q -conjugate, and hence ∆ x ( k ) ⊤ ∆ g ( i ) = ∆ x ( k ) ⊤ Q ∆ x ( i ) = α k α i d ( k ) ⊤ Qd ( i ) = 0 similarly ∆ g ( k ) ⊤ H k ∆ g ( i ) = ∆ g ( k ) ⊤ ∆ x ( i ) = 0 . This completes the proof. Xiaojing Ye, Math & Stat, Georgia State University 17

Next we show that H k +1 inherits positive definiteness of H k in DFP algorithm. Theorem . Suppose g ( k ) � = 0 , then H k ≻ 0 implies H k +1 ≻ 0 in DFP . Proof . For any x ∈ R n , there is x ⊤ H k +1 x = x ⊤ H k x + ( x ⊤ ∆ x ( k ) ) 2 ∆ x ( k ) ⊤ ∆ g ( k ) − ( x ⊤ H k ∆ g ( k ) ) 2 ∆ g ( k ) ⊤ H k ∆ g ( k ) For notation simplicity, we denote a = H 1 / 2 b = H 1 / 2 ∆ g ( k ) x and k k where H k = H 1 / 2 H 1 / 2 (we know H 1 / 2 exists since H k is SPD). k k k Xiaojing Ye, Math & Stat, Georgia State University 18

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye - PowerPoint PPT Presentation

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation : Approximate the

MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Algorithms for Constrained Optimization Xiaojing Ye Department

MATH 4211/6211 Optimization Convex Optimization Problems Xiaojing Ye Department of

MATH 4211/6211 Optimization Linear Programming Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Non-Simplex Methods for LP Xiaojing Ye Department of Mathematics

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico

Quasi-Newton methods for minimization Lectures for PHD course on Non-linear equations and

Optimization Unconstrained optimization Constrained optimization Newton with equality

Convex Optimization ( EE227A: UC Berkeley ) Lecture 25 (Newton, quasi-Newton) 23 Apr, 2013

NEWTON SEPAC End of Year Report to Newton School Committee June 10, 2019 Newton SEPAC Co-Chairs

Designs Chapter 11 Quasi-Experimentation Quasi-experiments resemble experiments, but lack

1 quasi-newton in one variable: the secant method In a one dimensional problem, approximating the

Worldwide Newton Conference Paris, September 2004 eBook composition on the Newton MessagePad 2100

Images of Isaac Newton 1 Portrait of Isaac Newton, Godfrey Kneller, 1689 This image is in the

A Generic Quasi-Newton Algorithm for Faster Gradient-Based Optimization Hongzhou Lin 1 , Julien

Optimization for Machine Learning Lecture 4: Quasi-Newton Methods S.V . N. (vishy) Vishwanathan

Corporate Tax Competition among County-Level Governments: A Quasi-Natural Experiment from China

On Numerical Semigroups Maria Bras-Amors Universitat Rovira i Virgili, Catalonia Spring

Basics on Differential-Algebraic Equations (DAEs) Stephan Trenn Technomathematics group, Dept. of

Non-semisimple modular tensor categories from quasi-quantum groups Tobias Ohrmann Leibniz

CLIMATE CHANGE AND FIRM VALUATION: EVIDENCE FROM A QUASI-NATURAL EXPERIMENT By Philipp

On the dual flow of slow-roll Inflation Uri Kol Tel Aviv University = University of Michigan

Bending deformation of quasi-Fuchsian groups Yuichi Kabaya (Osaka University) Meiji University,

Crossing Numbers of Beyond-Planar Graphs Philipp Kindermann Universit at W urzburg joint

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye - PowerPoint PPT Presentation

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation : Approximate the

MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department of Mathematics &amp;

MATH 4211/6211 Optimization Algorithms for Constrained Optimization Xiaojing Ye Department

MATH 4211/6211 Optimization Convex Optimization Problems Xiaojing Ye Department of

MATH 4211/6211 Optimization Linear Programming Xiaojing Ye Department of Mathematics &amp;

MATH 4211/6211 Optimization Non-Simplex Methods for LP Xiaojing Ye Department of Mathematics

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico

Quasi-Newton methods for minimization Lectures for PHD course on Non-linear equations and

Optimization Unconstrained optimization Constrained optimization Newton with equality

Convex Optimization ( EE227A: UC Berkeley ) Lecture 25 (Newton, quasi-Newton) 23 Apr, 2013

NEWTON SEPAC End of Year Report to Newton School Committee June 10, 2019 Newton SEPAC Co-Chairs

Designs Chapter 11 Quasi-Experimentation Quasi-experiments resemble experiments, but lack

1 quasi-newton in one variable: the secant method In a one dimensional problem, approximating the

Worldwide Newton Conference Paris, September 2004 eBook composition on the Newton MessagePad 2100

Images of Isaac Newton 1 Portrait of Isaac Newton, Godfrey Kneller, 1689 This image is in the

A Generic Quasi-Newton Algorithm for Faster Gradient-Based Optimization Hongzhou Lin 1 , Julien

Optimization for Machine Learning Lecture 4: Quasi-Newton Methods S.V . N. (vishy) Vishwanathan

Corporate Tax Competition among County-Level Governments: A Quasi-Natural Experiment from China

On Numerical Semigroups Maria Bras-Amors Universitat Rovira i Virgili, Catalonia Spring

Basics on Differential-Algebraic Equations (DAEs) Stephan Trenn Technomathematics group, Dept. of

Non-semisimple modular tensor categories from quasi-quantum groups Tobias Ohrmann Leibniz

CLIMATE CHANGE AND FIRM VALUATION: EVIDENCE FROM A QUASI-NATURAL EXPERIMENT By Philipp

On the dual flow of slow-roll Inflation Uri Kol Tel Aviv University = University of Michigan

Bending deformation of quasi-Fuchsian groups Yuichi Kabaya (Osaka University) Meiji University,

Crossing Numbers of Beyond-Planar Graphs Philipp Kindermann Universit at W urzburg joint

MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Linear Programming Xiaojing Ye Department of Mathematics &