Computational Optimization Newtons Method 2/5/08 Newtons Method - PDF document

Computational Optimization Newton’s Method 2/5/08

Newton’s Method Method for finding a zero of a function. Recall FONC ∇ = f x ( ) 0 For quadratic case: ′ ′ = + 1 g x ( ) xQx bx 2 ∇ = + = g x ( ) Qx b 0 =− Minimum must satisfy, Qx * b − ⇒ =− 1 * (unique if Q is invertible) x Q b

General nonlinear functions For non-quadratic f (twice cont. diff): Approximate by 2 nd order TSA Solve for FONC for quadratic approx. 1 ′ ′ ≈ + − ∇ + − ∇ − 2 f ( ) y f x ( ) ( y x ) f x ( ) ( y x ) f x ( )( y x ) 2 Calculate FONC ∇ = ∇ + ∇ − = 2 f ( ) y f x ( ) f x ( )( y x ) 0 Solve for y ∇ − = −∇ 2 f x ( )( y x ) f x ( ) − 1 ⎡ ⎤ = − ∇ ∇ 2 y x f x ( ) f x ( ) ⎣ ⎦ �� Pure Newton Direction

Basic Newton’s Algorithm Start with x 0 For k =1,…,K � If x k is optimal then stop 2 ( ∇ = −∇ f x p ) f x ( ) � Solve: k k � X k+1 =x k +p

Theorem 3.5 (NW) Convergence of Pure Newton’s Let f be twice cont. diff with a Lipschitz Hessian in the neighborhood of a solution x* that satisfies the SOSC. For x 0 sufficiently close to x*, Newton’s method converges to x*. The rate of convergence of {x k } is quadratic ∇ converges quadratically to 0. f ( x ) k

Analysis of Newton’s Method Newton’s Method converges to a zero of a function very fast (quadratic convergence) Expensive both in storage and time. � Must compute and store Hessian Must solve Newton Equation � May not find a local minimum. Could find any stationary point.

Method 1: d=-H -1 g Require Matrix Vector multiplication for nxn matrix H -1 g n n n(n multiplies +(n-1) adds) 2n 2 -n FLOPS Say O(n 2 ) Also requires computing H -1 O(n 3 ) Matlab comand d=-inv(H)*g

Computing Inverse Gaussian Elimination ⎡ ⎤ ⎡ ⎤ 1 0 0 4 2 1 m u ltip ly b y 1 /4 n + 1 ⎢ ⎥ ⎢ ⎥ 0 1 0 2 5 3 m u ltip ly ro w 1 b y -2 n + 1 ⎢ ⎥ ⎢ ⎥ (n-1 times) ⎢ ⎥ ⎢ ⎥ + ⎣ ⎦ ⎣ ⎦ 0 0 1 1 3 7 ad d to ro w n 1 = + + − + = + 2 n 1 ( n 1)( n 1) n n ⎡ ⎤ ⎡ ⎤ 1 1 1 0 0 1 ⎢ ⎥ ⎢ ⎥ 4 2 4 − ⎢ ⎥ ⎢ ⎥ 1 5 1 0 0 4 rep eat ro u g h ly n tim es 2 ⎢ 2 ⎥ ⎢ ⎥ − 5 2 7 ⎢ ⎥ ⎣ 1 0 1 ⎦ ⎣ 0 ⎦ 2 4 3 0 (n )

Method 2: Hp=-g Solve by Gaussian Elimination Take about 2n 3 /3 Faster but still 0(n 3 ) Matlab command P=-H\g

Method 3: Cholesky Factorization 0(n 3 ) Factorize Matrix - H= LDL’ =L*U where L is lower diagonal and DL’ is upper diagonal matrix Why? Ax=b for X by solving LUx=b First solve Ly=b by forward elimination Then U’x=y by backward elimination Matlab command to compute cholesky factorization is R=chol(H) but only works if H is p.d. Gives factorization R’*R=H

Forward elimination for Ly=b ⎡ � ⎤ ⎡ ⎤ ⎡ ⎤ l 0 0 y b 1 1 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � � l l y b ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = 2 1 2 2 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � � � � � 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ l l l y b n 1 n 2 n n n n b = ⇒ = 1 l y b y 1 1 1 1 1 l 1 1 − b l y + = ⇒ = 2 2 1 1 l y l y b y 2 1 1 2 2 2 2 2 l 2 2 � − 1 n ∑ − b l y n i m m + + + = ⇒ = = � m 1 l y l y l y b y n 1 1 n 2 2 2 2 2 n n l n n O(n 2 ) operations

Back Substitution for Ux=y ⎡ � ⎤ ⎡ ⎤ ⎡ ⎤ u u u x y 11 12 1 n 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � 0 u u x y ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = 22 2 n 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � � � � � � ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 0 0 u x y nn n n y = ⇒ = n u x y x n n n n n l 1 1 − y u x − − + = ⇒ = ( n 1 ) n ( n 1 ) n u x u x y x − − − − − − ( n 1 ) ( n 1 ) ( n 1 ) n ( n 1 ) n ( n 1 ) n 1 u − − ( n 1 ) ( n 1 ) � n ∑ − y u x 1 m i i + + + = ⇒ = = + � m i 1 u x u y u x y x − n 1 n ( n 1 )1 2 1 1 1 n i u i i O(n 2 ) operations

Problems with Newton’s Newton’s method may converge to a local max or stationary problem Would like to make sure we decrease function at each iteration. Newton equation may not have solution or may not have unique solution

Guarantee Descent Would like to guarantee descent ∇ < direction is picked p ' f x ( ) 0 k k = − ∇ p H f x ( ) k k k Any H k p.d. works since ∇ = −∇ ∇ < p ' f x ( ) f x ( )' H f x ( ) 0 k k k k k

What if Hessian is not p.d. Δ ∇ k 2 Add diagonal matrix to f x ( ) ∇ + Δ 2 k so that f x ( ) is p.d. Modified Cholesky Factorization does this automatically.

Theorem: Superlinear Convergence of Newton Like Methods Assume f be twice cont. differentiable on open set S. 2 ( ) ∇ p.d. and Lipschitz cont on S i.e. f x ∇ −∇ ≤ − 2 2 f x ( ) f y ( ) L x y for all x,y ∈ S and some fixed finite L { } the sequence generated by ⊂ x S k + = + = ⊂ * x x p lim k x x S k 1 k k → ∞ k

Theorem 10.1(continued) { } → ∇ = * * superlinearly and ( ) 0 f x x x k if and only if ( ) − p p ( ) = k n k lim 0 where p n k →∞ p k k is the Newton direction at x k + = + x x p k 1 k k

Alternative results In NW superlinear convergence results iff − ∇ 2 ( ( *)) B f x p k k = lim 0 → ∞ p k k So we can use a quasi-newton algorithm with positive definite matrix approximating Hessian B k

Problems with Newton’s Newton’s method converge to a local max or stationary problem Pure Newton’s method may not converge at all. Only has local convergence, i.e. only starts if sufficiently close to the solution.

Stepsize problems too Newton’s has only local convergence. If too far from solution may not converge at all. Try this example starting from 1.1: − = + x x f x ( ) ln( e e ) − − x x ( e e ) = f '( ) x + − x x ( e e ) − − x x 2 ( e e ) = − > f ''( ) x 1 0 + − x x 2 ( e e )

Need adaptive stepsize Add stepsize in each iteration + = + α x x p k 1 k k k Could use exact stepsize algorithm like golden section search to solve α = + α arg min f x ( p ) α k k k But unnecessarily expensive Can do approximate linesearch like Armijo search (next lecture)

Final Newton’s Algorithm Start with x 0 For k =1,…,K � If x k is optimal then stop � Solve: 2 ( ∇ + = f x ) E LDL ' k using modified cholesky = −∇ ' ( ) LDL p f x factorization k k � Perform linesearch to determine X k+1 =x k + α k p k

Newton’s Method Summary Pure Newton � Very fast convergence (if converges) � Each iteration expensive � Must add linesearch and modified Cholesky factorization to guarantee convergence globally. � Requires Calculation/storage of Hessian

Do Lab 3

Computational Optimization Newtons Method 2/5/08 Newtons Method - PDF document

Computational Optimization Newtons Method 2/5/08 Newtons Method Method for finding a zero of a function. Recall FONC = f x ( ) 0 For quadratic case: = + 1 g x ( ) xQx bx 2 = + = g x ( ) Qx b 0 =

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Computational Optimization Advance Topics NonSmooth Optimization Reference: Nonlinear

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Optimization Optimization Goal: Find the minimizer ! that minimizes the objective (cost)

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

St Stress Aware Layout Stress Aware Layout St A A L L t t Optimization Optimization

TEG: A New Post-Layout TEG: A New Post-Layout Optimization Method Optimization Method Shuo

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

Optimization Process Done by an Optimization Algorithm Jose Rueda Torres Learning Objectives

Optimization (Introduction) Optimization Goal: Find the minimizer that minimizes the

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

IntroductiontoIsabelle/HOL [| A 1 ; A 2 ; canbereadasif A 1 and A

solving systems L. Olson Department of Computer Science University of Illinois at

Probabilistic Graphical Models Lecture 7 Variable Elimination CS/CNS/EE 155 Andreas Krause

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp

Constraint solving meets machine learning and data mining Algorithm portfolios Kustaa Kangas

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. Jaegul Choo and Prof. Le

Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Hgskolen i

Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology