Computational Optimization Newtons Method 2/5/08 Newtons Method - - PDF document
Computational Optimization Newtons Method 2/5/08 Newtons Method - - PDF document
Computational Optimization Newtons Method 2/5/08 Newtons Method Method for finding a zero of a function. Recall FONC = f x ( ) 0 For quadratic case: = + 1 g x ( ) xQx bx 2 = + = g x ( ) Qx b 0 =
Newton’s Method
Method for finding a zero of a function. Recall FONC For quadratic case:
( ) f x ∇ =
1 2
( ) ( ) Minimum must satisfy, * g x xQx bx g x Qx b Qx b ′ ′ = + ∇ = + = =−
1
* (unique if Q is invertible) x Q b
−
⇒ =−
General nonlinear functions
For non-quadratic f (twice cont. diff): Approximate by 2nd order TSA Solve for FONC for quadratic approx.
2 2 2 1 2
1 ( ) ( ) ( ) ( ) ( ) ( )( ) 2 Calculate FONC ( ) ( ) ( )( ) Solve for ( )( ) ( ) ( ) ( ) Pure Newton Direction f y f x y x f x y x f x y x f y f x f x y x y f x y x f x y x f x f x
−
′ ′ ≈ + − ∇ + − ∇ − ∇ = ∇ + ∇ − = ∇ − = −∇ ⎡ ⎤ = − ∇ ∇ ⎣ ⎦
Basic Newton’s Algorithm
Start with x0 For k =1,…,K
If xk is optimal then stop Solve: Xk+1=xk+p 2 (
) ( )
k k
f x p f x ∇ = −∇
Theorem 3.5 (NW) Convergence of Pure Newton’s
Let f be twice cont. diff with a Lipschitz Hessian in the neighborhood of a solution x* that satisfies the SOSC. For x0 sufficiently close to x*, Newton’s method converges to x*. The rate of convergence of {xk} is quadratic converges quadratically to 0.
) (
k
x f ∇
Analysis of Newton’s Method
Newton’s Method converges to a zero
- f a function very fast (quadratic
convergence) Expensive both in storage and time.
Must compute and store Hessian
- Must solve Newton Equation
May not find a local minimum. Could find any stationary point.
Method 1: d=-H-1g
Require Matrix Vector multiplication for nxn matrix H-1g n n n(n multiplies +(n-1) adds)
2n2-n FLOPS Say O(n2) Also requires computing H-1 O(n3) Matlab comand d=-inv(H)*g
Computing Inverse
Gaussian Elimination
2
1 4 2 1 m u ltip ly b y 1 /4 n + 1 1 2 5 3 m u ltip ly ro w 1 b y -2 n + 1 1 1 3 7 ad d to ro w 1 1 ( 1)( 1) n n n n n n ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ + ⎣ ⎦ ⎣ ⎦ = + + − + = +
1 1 1 4 2 4 1 5 2 2 5 2 7 2 4 3
1 1 4 rep eat ro u g h ly n tim es 1 1 0 (n )
−
⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − ⎢ ⎥ ⎣ ⎦ ⎣ ⎦
(n-1 times)
Method 2: Hp=-g
Solve by Gaussian Elimination Take about 2n3/3 Faster but still 0(n3) Matlab command P=-H\g
Method 3: Cholesky Factorization 0(n3)
Factorize Matrix - H= LDL’ =L*U where L is lower diagonal and DL’ is upper diagonal matrix Why? Ax=b for X by solving LUx=b First solve Ly=b by forward elimination Then U’x=y by backward elimination Matlab command to compute cholesky factorization is R=chol(H) but only works if H is p.d. Gives factorization R’*R=H
Forward elimination for Ly=b
1 1 1 1 2 1 2 2 2 2 1 2
l
n n n n n n
l y b l l y b l l y b ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
- 1
1 1 1 1 1 1 1 2 2 1 1 2 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 n n i m m m n n n n n n
b l y b y l b l y l y l y b y l b l y l y l y l y b y l
− =
= ⇒ = − + = ⇒ = − + + + = ⇒ =
∑
- O(n2) operations
Back Substitution for Ux=y
11 12 1 1 1 22 2 2 2 n n nn n n
u u u x y u u x y u x y ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
- 1 1
( 1 ) ( 1 ) ( 1 ) ( 1 ) ( 1 ) ( 1 ) ( 1 ) 1 ( 1 ) ( 1 ) 1 1 1 ( 1 )1 2 1 1 1 n n n n n n n n n n n n n n n n n n n n n m i i m i n n n n i i i
y u x y x l y u x u x u x y x u y u x u x u y u x y x u
− − − − − − − − − − = + −
= ⇒ = − + = ⇒ = − + + + = ⇒ =
∑
- O(n2) operations
Problems with Newton’s
Newton’s method may converge to a local max or stationary problem Would like to make sure we decrease function at each iteration. Newton equation may not have solution
- r may not have unique solution
Guarantee Descent
Would like to guarantee descent direction is picked Any Hk p.d. works since
( )
k k k
p H f x = − ∇ ' ( )
k k
p f x ∇ < ' ( ) ( )' ( )
k k k k k
p f x f x H f x ∇ = −∇ ∇ <
What if Hessian is not p.d.
2 2
Add diagonal matrix to ( ) so that ( ) is p.d. Modified Cholesky Factorization does this automatically.
k k
f x f x Δ ∇ ∇ + Δ
Theorem: Superlinear Convergence of Newton Like Methods
Assume f be twice cont. differentiable
- n open set S.
p.d. and Lipschitz cont on S i.e. for all x,y ∈S and some fixed finite L the sequence generated by
2 ( )
f x ∇
2 2
( ) ( ) f x f y L x y ∇ −∇ ≤ −
1 k k k
x x p
+ =
+
{ }
k
x S ⊂
*
lim k
k
x x S
→ ∞
= ⊂
Theorem 10.1(continued)
superlinearly and if and only if is the Newton direction at xk
*
( ) f x ∇ =
1 k k k
x x p
+ =
+
{ }
* k
x x → ( ) ( )
lim 0 where
k n k n k k k
p p p p
→∞
− =
Alternative results
In NW superlinear convergence results iff So we can use a quasi-newton algorithm with positive definite matrix approximating Hessian
*)) ( ( lim
2
= ∇ −
∞ → k k k k
p p x f B
k
B
Problems with Newton’s
Newton’s method converge to a local max or stationary problem Pure Newton’s method may not converge at all. Only has local convergence, i.e. only starts if sufficiently close to the solution.
Stepsize problems too
Newton’s has only local convergence. If too far from solution may not converge at all. Try this example starting from 1.1:
2 2
( ) ln( ) ( ) '( ) ( ) ( ) ''( ) 1 ( )
x x x x x x x x x x
f x e e e e f x e e e e f x e e
− − − − −
= + − = + − = − > +
Need adaptive stepsize
Add stepsize in each iteration Could use exact stepsize algorithm like golden section search to solve But unnecessarily expensive Can do approximate linesearch like Armijo search (next lecture)
1 k k k k
x x p α
+ =
+ arg min ( )
k k k
f x p
α
α α = +
Final Newton’s Algorithm
Start with x0 For k =1,…,K
If xk is optimal then stop Solve:
using modified cholesky factorization
Perform linesearch to determine
Xk+1=xk+ αkpk
2 (
) '
k
f x E LDL ∇ + = ' ( )
k k
LDL p f x = −∇
Newton’s Method Summary
Pure Newton
Very fast convergence (if converges) Each iteration expensive Must add linesearch and modified
Cholesky factorization to guarantee convergence globally.
Requires Calculation/storage of Hessian