AM 205: lecture 18 Last time: optimization methods Today: - - PowerPoint PPT Presentation

am 205 lecture 18
SMART_READER_LITE
LIVE PREVIEW

AM 205: lecture 18 Last time: optimization methods Today: - - PowerPoint PPT Presentation

AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Newtons Method Example: Newtons method for the two-point Gauss quadrature rule Recall the system of equations F 1 ( x 1 , x 2 , w 1 , w 2 ) = w 1


slide-1
SLIDE 1

AM 205: lecture 18

◮ Last time: optimization methods ◮ Today: conditions for optimality

slide-2
SLIDE 2

Newton’s Method

Example: Newton’s method for the two-point Gauss quadrature rule Recall the system of equations F1(x1, x2, w1, w2) = w1 + w2 − 2 = 0 F2(x1, x2, w1, w2) = w1x1 + w2x2 = 0 F3(x1, x2, w1, w2) = w1x2

1 + w2x2 2 − 2/3 = 0

F4(x1, x2, w1, w2) = w1x3

1 + w2x3 2 = 0

slide-3
SLIDE 3

Newton’s Method

We can solve this in Python using our own implementation of Newton’s method To do this, we require the Jacobian of this system: JF(x1, x2, w1, w2) =     1 1 w1 w2 x1 x2 2w1x1 2w2x2 x2

1

x2

2

3w1x2

1

3w2x2

2

x3

1

x3

2

   

slide-4
SLIDE 4

Newton’s Method

Alternatively, we can use Python’s built-in fsolve function Note that fsolve computes a finite difference approximation to the Jacobian by default (Or we can pass in an analytical Jacobian if we want) Matlab has an equivalent fsolve function.

slide-5
SLIDE 5

Newton’s Method

Python example: With either approach and with starting guess x0 = [−1, 1, 1, 1], we get x k =

  • 0.577350269189626

0.577350269189626 1.000000000000000 1.000000000000000

slide-6
SLIDE 6

Conditions for Optimality

slide-7
SLIDE 7

Existence of Global Minimum

In order to guarantee existence and uniqueness of a global min. we need to make assumptions about the objective function e.g. if f is continuous on a closed1 and bounded set S ⊂ Rn then it has global minimum in S In one dimension, this says f achieves a minimum on the interval [a, b] ⊂ R In general f does not achieve a minimum on (a, b), e.g. consider f (x) = x (Though inf

x∈(a,b) f (x), the largest lower bound of f on (a, b), is

well-defined)

1A set is closed if it contains its own boundary

slide-8
SLIDE 8

Existence of Global Minimum

Another helpful concept for existence of global min. is coercivity A continuous function f on an unbounded set S ⊂ Rn is coercive if lim

x→∞ f (x) = +∞

That is, f (x) must be large whenever x is large

slide-9
SLIDE 9

Existence of Global Minimum

If f is coercive on a closed, unbounded2 set S, then f has a global minimum in S Proof: From the definition of coercivity, for any M ∈ R, ∃r > 0 such that f (x) ≥ M for all x ∈ S where x ≥ r Suppose that 0 ∈ S, and set M = f (0) Let Y ≡ {x ∈ S : x ≥ r}, so that f (x) ≥ f (0) for all x ∈ Y And we already know that f achieves a minimum (which is at most f (0)) on the closed, bounded set {x ∈ S : x ≤ r} Hence f achieves a minimum on S

  • 2e.g. S could be all of Rn, or a “closed strip” in Rn
slide-10
SLIDE 10

Existence of Global Minimum

For example:

◮ f (x, y) = x2 + y2 is coercive on R2 (global min. at (0, 0)) ◮ f (x) = x3 is not coercive on R (f → −∞ for x → −∞) ◮ f (x) = ex is not coercive on R (f → 0 for x → −∞)

slide-11
SLIDE 11

Convexity

An important concept for uniqueness is convexity A set S ⊂ Rn is convex if it contains the line segment between any two of its points That is, S is convex if for any x, y ∈ S, we have {θx + (1 − θ)y : θ ∈ [0, 1]} ⊂ S

slide-12
SLIDE 12

Convexity

Similarly, we define convexity of a function f : S ⊂ Rn → R f is convex if its graph along any line segment in S is on or below the chord connecting the function values i.e. f is convex if for any x, y ∈ S and any θ ∈ (0, 1), we have f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y) Also, if f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y) then f is strictly convex

slide-13
SLIDE 13

Convexity

−1 −0.5 0.5 1 0.5 1 1.5 2 2.5 3

Strictly convex

slide-14
SLIDE 14

Convexity

0.2 0.4 0.6 0.8 1 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Non-convex

slide-15
SLIDE 15

Convexity

0.2 0.4 0.6 0.8 1 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Convex (not strictly convex)

slide-16
SLIDE 16

Convexity

If f is a convex function on a convex set S, then any local minimum of f must be a global minimum3 Proof: Suppose x is a local minimum, i.e. f (x) ≤ f (y) for y ∈ B(x, ǫ) (where B(x, ǫ) ≡ {y ∈ S : y − x ≤ ǫ}) Suppose that x is not a global minimum, i.e. that there exists w ∈ S such that f (w) < f (x) (Then we will show that this gives a contradiction)

3A global minimum is defined as a point z such that f (z) ≤ f (x) for all

x ∈ S. Note that a global minimum may not be unique, e.g. if f (x) = − cos x then 0 and 2π are both global minima.

slide-17
SLIDE 17

Convexity

Proof (continued...): For θ ∈ [0, 1] we have f (θw + (1 − θ)x) ≤ θf (w) + (1 − θ)f (x) Let σ ∈ (0, 1] be sufficiently small so that z ≡ σw + (1 − σ) x ∈ B(x, ǫ) Then f (z) ≤ σf (w) + (1 − σ) f (x) < σf (x) + (1 − σ) f (x) = f (x), i.e. f (z) < f (x), which contradicts that f (x) is a local minimum! Hence we cannot have w ∈ S such that f (w) < f (x)

slide-18
SLIDE 18

Convexity

Note that convexity does not guarantee uniqueness of global minimum e.g. a convex function can clearly have a “horizontal” section (see earlier plot) If f is a strictly convex function on a convex set S, then a local minimum of f is the unique global minimum Optimization of convex functions over convex sets is called convex

  • ptimization, which is an important subfield of optimization
slide-19
SLIDE 19

Optimality Conditions

We have discussed existence and uniqueness of minima, but haven’t considered how to find a minimum The familiar optimization idea from calculus in one dimension is: set derivative to zero, check the sign of the second derivative This can be generalized to Rn

slide-20
SLIDE 20

Optimality Conditions

If f : Rn → R is differentiable, then the gradient vector ∇f : Rn → Rn is ∇f (x) ≡      

∂f (x) ∂x1 ∂f (x) ∂x2

. . .

∂f (x) ∂xn

      The importance of the gradient is that ∇f points “uphill,” i.e. towards points with larger values than f (x) And similarly −∇f points “downhill”

slide-21
SLIDE 21

Optimality Conditions

This follows from Taylor’s theorem for f : Rn → R Recall that f (x + δ) = f (x) + ∇f (x)Tδ + H.O.T. Let δ ≡ −ǫ∇f (x) for ǫ > 0 and suppose that ∇f (x) = 0, then: f (x − ǫ∇f (x)) ≈ f (x) − ǫ∇f (x)T∇f (x) < f (x) Also, we see from Cauchy–Schwarz that −∇f (x) is the steepest descent direction

slide-22
SLIDE 22

Optimality Conditions

Similarly, we see that a necessary condition for a local minimum at x∗ ∈ S is that ∇f (x∗) = 0 In this case there is no “downhill direction” at x∗ The condition ∇f (x∗) = 0 is called a first-order necessary condition for optimality, since it only involves first derivatives

slide-23
SLIDE 23

Optimality Conditions

x∗ ∈ S that satisfies the first-order optimality condition is called a critical point of f But of course a critical point can be a local min., local max., or saddle point (Recall that a saddle point is where some directions are “downhill” and others are “uphill”, e.g. (x, y) = (0, 0) for f (x, y) = x2 − y2)

slide-24
SLIDE 24

Optimality Conditions

As in the one-dimensional case, we can look to second derivatives to classify critical points If f : Rn → R is twice differentiable, then the Hessian is the matrix-valued function Hf : Rn → Rn×n Hf (x) ≡       

∂2f (x) ∂x2

1

∂2f (x) ∂x1x2

· · ·

∂2f (x) ∂x1xn ∂2f (x) ∂x2x1 ∂2f (x) ∂x2

2

· · ·

∂2f (x) ∂x2xn

. . . . . . ... . . .

∂2f (x) ∂xnx1 ∂2f (x) ∂xnx2

· · ·

∂2f (x) ∂x2

n

       The Hessian is the Jacobian matrix of the gradient ∇f : Rn → Rn If the second partial derivatives of f are continuous, then ∂2f /∂xi∂xj = ∂2f /∂xj∂xi, and Hf is symmetric

slide-25
SLIDE 25

Optimality Conditions

Suppose we have found a critical point x∗, so that ∇f (x∗) = 0 From Taylor’s Theorem, for δ ∈ Rn, we have f (x∗ + δ) = f (x∗) + ∇f (x∗)Tδ + 1 2δTHf (x∗ + ηδ)δ = f (x∗) + 1 2δTHf (x∗ + ηδ)δ for some η ∈ (0, 1)

slide-26
SLIDE 26

Optimality Conditions

Recall positive definiteness: A is positive definite if xTAx > 0 Suppose Hf (x∗) is positive definite Then (by continuity) Hf (x∗ + ηδ) is also positive definite for δ sufficiently small, so that: δTHf (x∗ + ηδ)δ > 0 Hence, we have f (x∗ + δ) > f (x∗) for δ sufficiently small, i.e. f (x∗) is a local minimum Hence, in general, positive definiteness of Hf at a critical point x∗ is a second-order sufficient condition for a local minimum

slide-27
SLIDE 27

Optimality Conditions

A matrix can also be negative definite: xTAx < 0 for all x = 0 Or indefinite: There exists x, y such that xTAx < 0 < yTAy Then we can classify critical points as follows:

◮ Hf (x∗) positive definite

= ⇒ x∗ is a local minimum

◮ Hf (x∗) negative definite =

⇒ x∗ is a local maximum

◮ Hf (x∗) indefinite

= ⇒ x∗ is a saddle point

slide-28
SLIDE 28

Optimality Conditions

Also, positive definiteness of the Hessian is closely related to convexity of f If Hf (x) is positive definite, then f is convex on some convex neighborhood of x If Hf (x) is positive definite for all x ∈ S, where S is a convex set, then f is convex on S Question: How do we test for positive definiteness?

slide-29
SLIDE 29

Optimality Conditions

Answer: A is positive (resp. negative) definite if and only if all eigenvalues of A are positive (resp. negative)4 Also, a matrix with positive and negative eigenvalues is indefinite Hence we can compute all the eigenvalues of A and check their signs

4This is related to the Rayleigh quotient, see Unit V

slide-30
SLIDE 30

Heath Example 6.5

Consider f (x) = 2x3

1 + 3x2 1 + 12x1x2 + 3x2 2 − 6x2 + 6

Then ∇f (x) = 6x2

1 + 6x1 + 12x2

12x1 + 6x2 − 6

  • We set ∇f (x) = 0 to find critical points5 [1, −1]T and [2, −3]T

5In general solving ∇f (x) = 0 requires an iterative method

slide-31
SLIDE 31

Heath Example 6.5, continued...

The Hessian is Hf (x) = 12x1 + 6 12 12 6

  • and hence

Hf (1, −1) = 18 12 12 6

  • , which has eigenvalues 25.4, −1.4

Hf (2, −3) = 30 12 12 6

  • , which has eigenvalues 35.0, 1.0

Hence [2, −3]T is a local min. whereas [1, −1]T is a saddle point

slide-32
SLIDE 32

Optimality Conditions: Equality Constrained Case

So far we have ignored constraints Let us now consider equality constrained optimization min

x∈Rn f (x)

subject to g(x) = 0, where f : Rn → R and g : Rn → Rm, with m ≤ n Since g maps to Rm, we have m constraints This situation is treated with Lagrange mutlipliers

slide-33
SLIDE 33

Optimality Conditions: Equality Constrained Case

We illustrate the concept of Lagrange multipliers for f , g : R2 → R Let f (x, y) = x + y and g(x, y) = 2x2 + y2 − 5

−3 −2 −1 1 2 3 −2 −1.5 −1 −0.5 0.5 1 1.5 2

∇g is normal to S:6 at any x ∈ S we must move in direction (∇g(x))⊥ (tangent direction) to remain in S

6This follows from Taylor’s Theorem: g(x + δ) ≈ g(x) + ∇g(x)Tδ

slide-34
SLIDE 34

Optimality Conditions: Equality Constrained Case

Also, change in f due to infinitesimal step in direction (∇g(x))⊥ is f (x ± ǫ(∇g(x))⊥) = f (x) ± ǫ∇f (x)T(∇g(x))⊥ + H.O.T. Hence stationary point x∗ ∈ S if ∇f (x∗)T(∇g(x∗))⊥ = 0, or ∇f (x∗) = λ∗∇g(x∗), for some λ∗ ∈ R

−3 −2 −1 1 2 3 −2 −1.5 −1 −0.5 0.5 1 1.5 2

slide-35
SLIDE 35

Optimality Conditions: Equality Constrained Case

This shows that for a stationary point with m = 1 constraints, ∇f cannot have any component in the “tangent direction” to S Now, consider the case with m > 1 equality constraints Then g : Rn → Rm and we now have a set of constraint gradient vectors, ∇gi, i = 1, . . . , m Then we have S = {x ∈ Rn : gi(x) = 0, i = 1, . . . , m} Any “tangent direction” at x ∈ S must be orthogonal to all gradient vectors {∇gi(x), i = 1, . . . , m} to remain in S

slide-36
SLIDE 36

Optimality Conditions: Equality Constrained Case

Let T (x) ≡ {v ∈ Rn : ∇gi(x)Tv = 0, i = 1, 2, . . . , m} denote the

  • rthogonal complement of {∇gi(x), i = 1, . . . , m}

Then, for δ ∈ T (x) and ǫ ∈ R>0, ǫδ is a step in a “tangent direction” of S at x Since we have f (x∗ + ǫδ) = f (x∗) + ǫ∇f (x∗)Tδ + H.O.T. it follows that for a stationary point we need ∇f (x∗)Tδ = 0 for all δ ∈ T (x∗)

slide-37
SLIDE 37

Optimality Conditions: Equality Constrained Case

Hence, we require that at a stationary point x∗ ∈ S we have ∇f (x∗) ∈ span{∇gi(x∗), i = 1, . . . , m} This can be written succinctly as a linear system ∇f (x∗) = (Jg(x∗))Tλ∗ for some λ∗ ∈ Rm, where (Jg(x∗))T ∈ Rn×m This follows because the columns of (Jg(x∗))T are the vectors {∇gi(x∗), i = 1, . . . , m}

slide-38
SLIDE 38

Optimality Conditions: Equality Constrained Case

We can write equality constrained optimization problems more succinctly by introducing the Lagrangian function, L : Rn+m → R, L(x, λ) ≡ f (x) + λTg(x) = f (x) + λ1g1(x) + · · · + λmgm(x) Then we have,

∂L(x,λ) ∂xi

=

∂f (x) ∂xi

+ λ1

∂g1(x) ∂xi

+ · · · + λn

∂gn(x) ∂xi ,

i = 1, . . . , n

∂L(x,λ) ∂λi

= gi(x), i = 1, . . . , m

slide-39
SLIDE 39

Optimality Conditions: Equality Constrained Case

Hence ∇L(x, λ) = ∇xL(x, λ) ∇λL(x, λ)

  • =

∇f (x) + Jg(x)Tλ g(x)

  • ,

so that the first order necessary condition for optimality for the constrained problem can be written as a nonlinear system:7 ∇L(x, λ) = ∇f (x) + Jg(x)Tλ g(x)

  • = 0

(As before, stationary points can be classified by considering the Hessian, though we will not consider this here...)

7n + m variables, n + m equations

slide-40
SLIDE 40

Optimality Conditions: Equality Constrained Case

See Lecture: Constrained optimization of cylinder surface area