Local, Unconstrained Function Optimization COMPSCI 527 Computer - - PowerPoint PPT Presentation

local unconstrained function optimization
SMART_READER_LITE
LIVE PREVIEW

Local, Unconstrained Function Optimization COMPSCI 527 Computer - - PowerPoint PPT Presentation

Local, Unconstrained Function Optimization COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Local, Unconstrained Function Optimization 1 / 27 Outline 1 Gradient, Hessian, and Convexity 2 A Local, Unconstrained Optimization


slide-1
SLIDE 1

Local, Unconstrained Function Optimization

COMPSCI 527 — Computer Vision

COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 1 / 27
slide-2
SLIDE 2

Outline

1 Gradient, Hessian, and Convexity 2 A Local, Unconstrained Optimization Template 3 Steepest Descent 4 Termination 5 Convergence Speed of Steepest Descent 6 Convergence Speed of Newton’s Method 7 Newton’s Method 8 Counting Steps versus Clocking

COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 2 / 27
slide-3
SLIDE 3

Motivation and Scope

  • Most estimation problems are solved by optimization
  • Machine learning:
  • Parametric predictor: h(x ; v) : Rd ⇥ Rm ! Y
  • Risk: LT(v) = 1

N

PN

n=1 `(yn, h(xn ; v)) : Rm ! R

  • Training:

ˆ v = arg minv2R

m LT(v)
  • 3D Reconstruction:

I = ⇡(C, S) where I are the images, C are the camera positions and orientations, S is scene shape

  • Given I, find ˆ

C, ˆ S = arg minC,S kI ⇡(C, S)k

  • In general, “solving” equation E(z) = 0 can be viewed as

ˆ z = arg minz kE(z)k

COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 3 / 27

METERS

JAM

SCENE

C s

t

00

ft

EEhFE

T RAG T

slide-4
SLIDE 4

Only Local Minimization

ˆ z = arg minz2? f(z)

  • All we know about f is a “black box” (think Python function)
  • For many problems, f has many local minima
  • Start somewhere (z0), and take steps “down”

f(zk+1) < f(zk)

  • When we get stuck at a local minimum, we declare success
  • We would like global minima, but all we get is local ones
  • For some problems, f has a unique minimum...
  • ... or at least a single connected set of minima
COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 4 / 27

2 C IRM

slide-5
SLIDE 5 Gradient, Hessian, and Convexity

Gradient

rf(z) = ∂f

∂z =

2 6 4

∂f ∂z1

. . .

∂f ∂zm

3 7 5

  • We saw the gradient for the case z 2 R2
  • If rf(z) exists everywhere, the condition

rf(z) = 0 is necessary and sufficient for a stationary point (max, min, or saddle)

  • Warning: only necessary for a minimum!
  • Reduces to first derivative when f : R ! R
COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 5 / 27

zEe

fC

g

k

q

2

forksomeday

slide-6
SLIDE 6 Gradient, Hessian, and Convexity

First Order Taylor Expansion

f(z) ⇡ g1(z) = f(z0) + [rf(z0)]T(z z0) approximates f(z) near z0 with a (hyper)plane through z0

z1 z2 f(z) z0

rf(z0) points to direction of steepest increase of f at z0

  • If we want to find z1 where f(z1) < f(z0), going along

rf(z0) seems promising

  • This is the general idea of steepest descent
COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 6 / 27

ZER2

FE

slide-7
SLIDE 7 Gradient, Hessian, and Convexity

Hessian

H(z) = 2 6 6 4

∂2f ∂z2

1

. . .

∂2f ∂z1∂zm

. . . . . .

∂2f ∂zm∂z1

. . .

∂2f ∂z2

m

3 7 7 5

  • Symmetric matrix because of Schwarz’s theorem:

∂2f ∂zi∂zj = ∂2f ∂zj∂zi

  • Eigenvalues are real because of symmetry
  • Reduces to d2f

dz2 for f : R ! R

COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 7 / 27
slide-8
SLIDE 8 Gradient, Hessian, and Convexity

Convexity

z z' u z + (1-u) z' f(u z + (1-u) z') u f(z) + (1-u) f(z') f(z') f(z)
  • Convex everywhere:

For all z, z0 in the (open) domain of f and for all u 2 [0, 1] f(uz + (1 u)z0)  uf(z) + (1 u)f(z0)

  • Convex at z0: The function f is convex everywhere in some
  • pen neighborhood of z0
COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 8 / 27

O

slide-9
SLIDE 9 Gradient, Hessian, and Convexity

Convexity and Hessian

  • If H(z) is defined at a stationary point z of f, then z is a

minimum iff H(z) < 0

  • “<” means positive semidefinite:

zTHz 0 for all z 2 Rm

  • Above is definition of H(z) < 0
  • To check computationally: All eigenvalues are nonnegative
  • H(z) < 0 reduces to d2f

dz2 0 for f : R ! R

COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 9 / 27

Of E

slide-10
SLIDE 10 Gradient, Hessian, and Convexity

Second Order Taylor Expansion

f ⇡ g2(z) = f(z0) + [rz0]T(z z0) + (z z0)TH(z0)(z z0) approximates f(z) near z0 with a quadratic equation through z0

  • For minimization, this is useful only when H(z0) < 0
  • Function looks locally like a bowl
z1 z2 f(z) z1 z0
  • If we want to find z1 where f(z1) < f(z0), going to the

bottom of the bowl seems promising

  • This is the general idea of Newton’s method
COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 10 / 27

O

slide-11
SLIDE 11 A Local, Unconstrained Optimization Template

A Template

  • Regardless of method, most local unconstrained
  • ptimization methods fit the following template:

k = 0 while zk is not a minimum compute step direction pk with kpkk > 0 compute step size αk > 0 zk+1 = zk + αkpk k = k + 1 end.

COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 11 / 27

GIVEN Eo

ITERATION COUNT

e

I

I

slide-12
SLIDE 12 A Local, Unconstrained Optimization Template

Design Decisions

  • Whether to stop (“while zk is not a minimum”)
  • In what direction to proceed (pk)
  • How long a step to take in that direction (αk)
  • Different decisions for the last two lead to different methods

with very different behaviors and computational costs

COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 12 / 27