Computational Optimization Newtons Method 2/5/08 Newtons Method - - PDF document

computational optimization
SMART_READER_LITE
LIVE PREVIEW

Computational Optimization Newtons Method 2/5/08 Newtons Method - - PDF document

Computational Optimization Newtons Method 2/5/08 Newtons Method Method for finding a zero of a function. Recall FONC = f x ( ) 0 For quadratic case: = + 1 g x ( ) xQx bx 2 = + = g x ( ) Qx b 0 =


slide-1
SLIDE 1

Computational Optimization

Newton’s Method 2/5/08

slide-2
SLIDE 2

Newton’s Method

Method for finding a zero of a function. Recall FONC For quadratic case:

( ) f x ∇ =

1 2

( ) ( ) Minimum must satisfy, * g x xQx bx g x Qx b Qx b ′ ′ = + ∇ = + = =−

1

* (unique if Q is invertible) x Q b

⇒ =−

slide-3
SLIDE 3

General nonlinear functions

For non-quadratic f (twice cont. diff): Approximate by 2nd order TSA Solve for FONC for quadratic approx.

2 2 2 1 2

1 ( ) ( ) ( ) ( ) ( ) ( )( ) 2 Calculate FONC ( ) ( ) ( )( ) Solve for ( )( ) ( ) ( ) ( ) Pure Newton Direction f y f x y x f x y x f x y x f y f x f x y x y f x y x f x y x f x f x

′ ′ ≈ + − ∇ + − ∇ − ∇ = ∇ + ∇ − = ∇ − = −∇ ⎡ ⎤ = − ∇ ∇ ⎣ ⎦

slide-4
SLIDE 4

Basic Newton’s Algorithm

Start with x0 For k =1,…,K

If xk is optimal then stop Solve: Xk+1=xk+p 2 (

) ( )

k k

f x p f x ∇ = −∇

slide-5
SLIDE 5

Theorem 3.5 (NW) Convergence of Pure Newton’s

Let f be twice cont. diff with a Lipschitz Hessian in the neighborhood of a solution x* that satisfies the SOSC. For x0 sufficiently close to x*, Newton’s method converges to x*. The rate of convergence of {xk} is quadratic converges quadratically to 0.

) (

k

x f ∇

slide-6
SLIDE 6

Analysis of Newton’s Method

Newton’s Method converges to a zero

  • f a function very fast (quadratic

convergence) Expensive both in storage and time.

Must compute and store Hessian

  • Must solve Newton Equation

May not find a local minimum. Could find any stationary point.

slide-7
SLIDE 7

Method 1: d=-H-1g

Require Matrix Vector multiplication for nxn matrix H-1g n n n(n multiplies +(n-1) adds)

2n2-n FLOPS Say O(n2) Also requires computing H-1 O(n3) Matlab comand d=-inv(H)*g

slide-8
SLIDE 8

Computing Inverse

Gaussian Elimination

2

1 4 2 1 m u ltip ly b y 1 /4 n + 1 1 2 5 3 m u ltip ly ro w 1 b y -2 n + 1 1 1 3 7 ad d to ro w 1 1 ( 1)( 1) n n n n n n ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ + ⎣ ⎦ ⎣ ⎦ = + + − + = +

1 1 1 4 2 4 1 5 2 2 5 2 7 2 4 3

1 1 4 rep eat ro u g h ly n tim es 1 1 0 (n )

⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − ⎢ ⎥ ⎣ ⎦ ⎣ ⎦

(n-1 times)

slide-9
SLIDE 9

Method 2: Hp=-g

Solve by Gaussian Elimination Take about 2n3/3 Faster but still 0(n3) Matlab command P=-H\g

slide-10
SLIDE 10

Method 3: Cholesky Factorization 0(n3)

Factorize Matrix - H= LDL’ =L*U where L is lower diagonal and DL’ is upper diagonal matrix Why? Ax=b for X by solving LUx=b First solve Ly=b by forward elimination Then U’x=y by backward elimination Matlab command to compute cholesky factorization is R=chol(H) but only works if H is p.d. Gives factorization R’*R=H

slide-11
SLIDE 11

Forward elimination for Ly=b

1 1 1 1 2 1 2 2 2 2 1 2

l

n n n n n n

l y b l l y b l l y b ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦

  • 1

1 1 1 1 1 1 1 2 2 1 1 2 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 n n i m m m n n n n n n

b l y b y l b l y l y l y b y l b l y l y l y l y b y l

− =

= ⇒ = − + = ⇒ = − + + + = ⇒ =

  • O(n2) operations
slide-12
SLIDE 12

Back Substitution for Ux=y

11 12 1 1 1 22 2 2 2 n n nn n n

u u u x y u u x y u x y ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦

  • 1 1

( 1 ) ( 1 ) ( 1 ) ( 1 ) ( 1 ) ( 1 ) ( 1 ) 1 ( 1 ) ( 1 ) 1 1 1 ( 1 )1 2 1 1 1 n n n n n n n n n n n n n n n n n n n n n m i i m i n n n n i i i

y u x y x l y u x u x u x y x u y u x u x u y u x y x u

− − − − − − − − − − = + −

= ⇒ = − + = ⇒ = − + + + = ⇒ =

  • O(n2) operations
slide-13
SLIDE 13

Problems with Newton’s

Newton’s method may converge to a local max or stationary problem Would like to make sure we decrease function at each iteration. Newton equation may not have solution

  • r may not have unique solution
slide-14
SLIDE 14

Guarantee Descent

Would like to guarantee descent direction is picked Any Hk p.d. works since

( )

k k k

p H f x = − ∇ ' ( )

k k

p f x ∇ < ' ( ) ( )' ( )

k k k k k

p f x f x H f x ∇ = −∇ ∇ <

slide-15
SLIDE 15

What if Hessian is not p.d.

2 2

Add diagonal matrix to ( ) so that ( ) is p.d. Modified Cholesky Factorization does this automatically.

k k

f x f x Δ ∇ ∇ + Δ

slide-16
SLIDE 16

Theorem: Superlinear Convergence of Newton Like Methods

Assume f be twice cont. differentiable

  • n open set S.

p.d. and Lipschitz cont on S i.e. for all x,y ∈S and some fixed finite L the sequence generated by

2 ( )

f x ∇

2 2

( ) ( ) f x f y L x y ∇ −∇ ≤ −

1 k k k

x x p

+ =

+

{ }

k

x S ⊂

*

lim k

k

x x S

→ ∞

= ⊂

slide-17
SLIDE 17

Theorem 10.1(continued)

superlinearly and if and only if is the Newton direction at xk

*

( ) f x ∇ =

1 k k k

x x p

+ =

+

{ }

* k

x x → ( ) ( )

lim 0 where

k n k n k k k

p p p p

→∞

− =

slide-18
SLIDE 18

Alternative results

In NW superlinear convergence results iff So we can use a quasi-newton algorithm with positive definite matrix approximating Hessian

*)) ( ( lim

2

= ∇ −

∞ → k k k k

p p x f B

k

B

slide-19
SLIDE 19

Problems with Newton’s

Newton’s method converge to a local max or stationary problem Pure Newton’s method may not converge at all. Only has local convergence, i.e. only starts if sufficiently close to the solution.

slide-20
SLIDE 20

Stepsize problems too

Newton’s has only local convergence. If too far from solution may not converge at all. Try this example starting from 1.1:

2 2

( ) ln( ) ( ) '( ) ( ) ( ) ''( ) 1 ( )

x x x x x x x x x x

f x e e e e f x e e e e f x e e

− − − − −

= + − = + − = − > +

slide-21
SLIDE 21

Need adaptive stepsize

Add stepsize in each iteration Could use exact stepsize algorithm like golden section search to solve But unnecessarily expensive Can do approximate linesearch like Armijo search (next lecture)

1 k k k k

x x p α

+ =

+ arg min ( )

k k k

f x p

α

α α = +

slide-22
SLIDE 22

Final Newton’s Algorithm

Start with x0 For k =1,…,K

If xk is optimal then stop Solve:

using modified cholesky factorization

Perform linesearch to determine

Xk+1=xk+ αkpk

2 (

) '

k

f x E LDL ∇ + = ' ( )

k k

LDL p f x = −∇

slide-23
SLIDE 23

Newton’s Method Summary

Pure Newton

Very fast convergence (if converges) Each iteration expensive Must add linesearch and modified

Cholesky factorization to guarantee convergence globally.

Requires Calculation/storage of Hessian

slide-24
SLIDE 24

Do Lab 3