SLIDE 1 AM 205: lecture 18
◮ Last time: optimization methods ◮ Today: conditions for optimality
SLIDE 2 Newton’s Method
Example: Newton’s method for the two-point Gauss quadrature rule Recall the system of equations F1(x1, x2, w1, w2) = w1 + w2 − 2 = 0 F2(x1, x2, w1, w2) = w1x1 + w2x2 = 0 F3(x1, x2, w1, w2) = w1x2
1 + w2x2 2 − 2/3 = 0
F4(x1, x2, w1, w2) = w1x3
1 + w2x3 2 = 0
SLIDE 3 Newton’s Method
We can solve this in Python using our own implementation of Newton’s method To do this, we require the Jacobian of this system: JF(x1, x2, w1, w2) = 1 1 w1 w2 x1 x2 2w1x1 2w2x2 x2
1
x2
2
3w1x2
1
3w2x2
2
x3
1
x3
2
SLIDE 4
Newton’s Method
Alternatively, we can use Python’s built-in fsolve function Note that fsolve computes a finite difference approximation to the Jacobian by default (Or we can pass in an analytical Jacobian if we want) Matlab has an equivalent fsolve function.
SLIDE 5 Newton’s Method
Python example: With either approach and with starting guess x0 = [−1, 1, 1, 1], we get x k =
0.577350269189626 1.000000000000000 1.000000000000000
SLIDE 6
Conditions for Optimality
SLIDE 7 Existence of Global Minimum
In order to guarantee existence and uniqueness of a global min. we need to make assumptions about the objective function e.g. if f is continuous on a closed1 and bounded set S ⊂ Rn then it has global minimum in S In one dimension, this says f achieves a minimum on the interval [a, b] ⊂ R In general f does not achieve a minimum on (a, b), e.g. consider f (x) = x (Though inf
x∈(a,b) f (x), the largest lower bound of f on (a, b), is
well-defined)
1A set is closed if it contains its own boundary
SLIDE 8 Existence of Global Minimum
Another helpful concept for existence of global min. is coercivity A continuous function f on an unbounded set S ⊂ Rn is coercive if lim
x→∞ f (x) = +∞
That is, f (x) must be large whenever x is large
SLIDE 9 Existence of Global Minimum
If f is coercive on a closed, unbounded2 set S, then f has a global minimum in S Proof: From the definition of coercivity, for any M ∈ R, ∃r > 0 such that f (x) ≥ M for all x ∈ S where x ≥ r Suppose that 0 ∈ S, and set M = f (0) Let Y ≡ {x ∈ S : x ≥ r}, so that f (x) ≥ f (0) for all x ∈ Y And we already know that f achieves a minimum (which is at most f (0)) on the closed, bounded set {x ∈ S : x ≤ r} Hence f achieves a minimum on S
- 2e.g. S could be all of Rn, or a “closed strip” in Rn
SLIDE 10 Existence of Global Minimum
For example:
◮ f (x, y) = x2 + y2 is coercive on R2 (global min. at (0, 0)) ◮ f (x) = x3 is not coercive on R (f → −∞ for x → −∞) ◮ f (x) = ex is not coercive on R (f → 0 for x → −∞)
SLIDE 11
Convexity
An important concept for uniqueness is convexity A set S ⊂ Rn is convex if it contains the line segment between any two of its points That is, S is convex if for any x, y ∈ S, we have {θx + (1 − θ)y : θ ∈ [0, 1]} ⊂ S
SLIDE 12
Convexity
Similarly, we define convexity of a function f : S ⊂ Rn → R f is convex if its graph along any line segment in S is on or below the chord connecting the function values i.e. f is convex if for any x, y ∈ S and any θ ∈ (0, 1), we have f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y) Also, if f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y) then f is strictly convex
SLIDE 13 Convexity
−1 −0.5 0.5 1 0.5 1 1.5 2 2.5 3
Strictly convex
SLIDE 14 Convexity
0.2 0.4 0.6 0.8 1 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Non-convex
SLIDE 15 Convexity
0.2 0.4 0.6 0.8 1 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Convex (not strictly convex)
SLIDE 16 Convexity
If f is a convex function on a convex set S, then any local minimum of f must be a global minimum3 Proof: Suppose x is a local minimum, i.e. f (x) ≤ f (y) for y ∈ B(x, ǫ) (where B(x, ǫ) ≡ {y ∈ S : y − x ≤ ǫ}) Suppose that x is not a global minimum, i.e. that there exists w ∈ S such that f (w) < f (x) (Then we will show that this gives a contradiction)
3A global minimum is defined as a point z such that f (z) ≤ f (x) for all
x ∈ S. Note that a global minimum may not be unique, e.g. if f (x) = − cos x then 0 and 2π are both global minima.
SLIDE 17
Convexity
Proof (continued...): For θ ∈ [0, 1] we have f (θw + (1 − θ)x) ≤ θf (w) + (1 − θ)f (x) Let σ ∈ (0, 1] be sufficiently small so that z ≡ σw + (1 − σ) x ∈ B(x, ǫ) Then f (z) ≤ σf (w) + (1 − σ) f (x) < σf (x) + (1 − σ) f (x) = f (x), i.e. f (z) < f (x), which contradicts that f (x) is a local minimum! Hence we cannot have w ∈ S such that f (w) < f (x)
SLIDE 18 Convexity
Note that convexity does not guarantee uniqueness of global minimum e.g. a convex function can clearly have a “horizontal” section (see earlier plot) If f is a strictly convex function on a convex set S, then a local minimum of f is the unique global minimum Optimization of convex functions over convex sets is called convex
- ptimization, which is an important subfield of optimization
SLIDE 19
Optimality Conditions
We have discussed existence and uniqueness of minima, but haven’t considered how to find a minimum The familiar optimization idea from calculus in one dimension is: set derivative to zero, check the sign of the second derivative This can be generalized to Rn
SLIDE 20 Optimality Conditions
If f : Rn → R is differentiable, then the gradient vector ∇f : Rn → Rn is ∇f (x) ≡
∂f (x) ∂x1 ∂f (x) ∂x2
. . .
∂f (x) ∂xn
The importance of the gradient is that ∇f points “uphill,” i.e. towards points with larger values than f (x) And similarly −∇f points “downhill”
SLIDE 21
Optimality Conditions
This follows from Taylor’s theorem for f : Rn → R Recall that f (x + δ) = f (x) + ∇f (x)Tδ + H.O.T. Let δ ≡ −ǫ∇f (x) for ǫ > 0 and suppose that ∇f (x) = 0, then: f (x − ǫ∇f (x)) ≈ f (x) − ǫ∇f (x)T∇f (x) < f (x) Also, we see from Cauchy–Schwarz that −∇f (x) is the steepest descent direction
SLIDE 22
Optimality Conditions
Similarly, we see that a necessary condition for a local minimum at x∗ ∈ S is that ∇f (x∗) = 0 In this case there is no “downhill direction” at x∗ The condition ∇f (x∗) = 0 is called a first-order necessary condition for optimality, since it only involves first derivatives
SLIDE 23
Optimality Conditions
x∗ ∈ S that satisfies the first-order optimality condition is called a critical point of f But of course a critical point can be a local min., local max., or saddle point (Recall that a saddle point is where some directions are “downhill” and others are “uphill”, e.g. (x, y) = (0, 0) for f (x, y) = x2 − y2)
SLIDE 24 Optimality Conditions
As in the one-dimensional case, we can look to second derivatives to classify critical points If f : Rn → R is twice differentiable, then the Hessian is the matrix-valued function Hf : Rn → Rn×n Hf (x) ≡
∂2f (x) ∂x2
1
∂2f (x) ∂x1x2
· · ·
∂2f (x) ∂x1xn ∂2f (x) ∂x2x1 ∂2f (x) ∂x2
2
· · ·
∂2f (x) ∂x2xn
. . . . . . ... . . .
∂2f (x) ∂xnx1 ∂2f (x) ∂xnx2
· · ·
∂2f (x) ∂x2
n
The Hessian is the Jacobian matrix of the gradient ∇f : Rn → Rn If the second partial derivatives of f are continuous, then ∂2f /∂xi∂xj = ∂2f /∂xj∂xi, and Hf is symmetric
SLIDE 25
Optimality Conditions
Suppose we have found a critical point x∗, so that ∇f (x∗) = 0 From Taylor’s Theorem, for δ ∈ Rn, we have f (x∗ + δ) = f (x∗) + ∇f (x∗)Tδ + 1 2δTHf (x∗ + ηδ)δ = f (x∗) + 1 2δTHf (x∗ + ηδ)δ for some η ∈ (0, 1)
SLIDE 26
Optimality Conditions
Recall positive definiteness: A is positive definite if xTAx > 0 Suppose Hf (x∗) is positive definite Then (by continuity) Hf (x∗ + ηδ) is also positive definite for δ sufficiently small, so that: δTHf (x∗ + ηδ)δ > 0 Hence, we have f (x∗ + δ) > f (x∗) for δ sufficiently small, i.e. f (x∗) is a local minimum Hence, in general, positive definiteness of Hf at a critical point x∗ is a second-order sufficient condition for a local minimum
SLIDE 27 Optimality Conditions
A matrix can also be negative definite: xTAx < 0 for all x = 0 Or indefinite: There exists x, y such that xTAx < 0 < yTAy Then we can classify critical points as follows:
◮ Hf (x∗) positive definite
= ⇒ x∗ is a local minimum
◮ Hf (x∗) negative definite =
⇒ x∗ is a local maximum
◮ Hf (x∗) indefinite
= ⇒ x∗ is a saddle point
SLIDE 28
Optimality Conditions
Also, positive definiteness of the Hessian is closely related to convexity of f If Hf (x) is positive definite, then f is convex on some convex neighborhood of x If Hf (x) is positive definite for all x ∈ S, where S is a convex set, then f is convex on S Question: How do we test for positive definiteness?
SLIDE 29 Optimality Conditions
Answer: A is positive (resp. negative) definite if and only if all eigenvalues of A are positive (resp. negative)4 Also, a matrix with positive and negative eigenvalues is indefinite Hence we can compute all the eigenvalues of A and check their signs
4This is related to the Rayleigh quotient, see Unit V
SLIDE 30 Heath Example 6.5
Consider f (x) = 2x3
1 + 3x2 1 + 12x1x2 + 3x2 2 − 6x2 + 6
Then ∇f (x) = 6x2
1 + 6x1 + 12x2
12x1 + 6x2 − 6
- We set ∇f (x) = 0 to find critical points5 [1, −1]T and [2, −3]T
5In general solving ∇f (x) = 0 requires an iterative method
SLIDE 31 Heath Example 6.5, continued...
The Hessian is Hf (x) = 12x1 + 6 12 12 6
Hf (1, −1) = 18 12 12 6
- , which has eigenvalues 25.4, −1.4
Hf (2, −3) = 30 12 12 6
- , which has eigenvalues 35.0, 1.0
Hence [2, −3]T is a local min. whereas [1, −1]T is a saddle point
SLIDE 32 Optimality Conditions: Equality Constrained Case
So far we have ignored constraints Let us now consider equality constrained optimization min
x∈Rn f (x)
subject to g(x) = 0, where f : Rn → R and g : Rn → Rm, with m ≤ n Since g maps to Rm, we have m constraints This situation is treated with Lagrange mutlipliers
SLIDE 33 Optimality Conditions: Equality Constrained Case
We illustrate the concept of Lagrange multipliers for f , g : R2 → R Let f (x, y) = x + y and g(x, y) = 2x2 + y2 − 5
−3 −2 −1 1 2 3 −2 −1.5 −1 −0.5 0.5 1 1.5 2
∇g is normal to S:6 at any x ∈ S we must move in direction (∇g(x))⊥ (tangent direction) to remain in S
6This follows from Taylor’s Theorem: g(x + δ) ≈ g(x) + ∇g(x)Tδ
SLIDE 34 Optimality Conditions: Equality Constrained Case
Also, change in f due to infinitesimal step in direction (∇g(x))⊥ is f (x ± ǫ(∇g(x))⊥) = f (x) ± ǫ∇f (x)T(∇g(x))⊥ + H.O.T. Hence stationary point x∗ ∈ S if ∇f (x∗)T(∇g(x∗))⊥ = 0, or ∇f (x∗) = λ∗∇g(x∗), for some λ∗ ∈ R
−3 −2 −1 1 2 3 −2 −1.5 −1 −0.5 0.5 1 1.5 2
SLIDE 35
Optimality Conditions: Equality Constrained Case
This shows that for a stationary point with m = 1 constraints, ∇f cannot have any component in the “tangent direction” to S Now, consider the case with m > 1 equality constraints Then g : Rn → Rm and we now have a set of constraint gradient vectors, ∇gi, i = 1, . . . , m Then we have S = {x ∈ Rn : gi(x) = 0, i = 1, . . . , m} Any “tangent direction” at x ∈ S must be orthogonal to all gradient vectors {∇gi(x), i = 1, . . . , m} to remain in S
SLIDE 36 Optimality Conditions: Equality Constrained Case
Let T (x) ≡ {v ∈ Rn : ∇gi(x)Tv = 0, i = 1, 2, . . . , m} denote the
- rthogonal complement of {∇gi(x), i = 1, . . . , m}
Then, for δ ∈ T (x) and ǫ ∈ R>0, ǫδ is a step in a “tangent direction” of S at x Since we have f (x∗ + ǫδ) = f (x∗) + ǫ∇f (x∗)Tδ + H.O.T. it follows that for a stationary point we need ∇f (x∗)Tδ = 0 for all δ ∈ T (x∗)
SLIDE 37
Optimality Conditions: Equality Constrained Case
Hence, we require that at a stationary point x∗ ∈ S we have ∇f (x∗) ∈ span{∇gi(x∗), i = 1, . . . , m} This can be written succinctly as a linear system ∇f (x∗) = (Jg(x∗))Tλ∗ for some λ∗ ∈ Rm, where (Jg(x∗))T ∈ Rn×m This follows because the columns of (Jg(x∗))T are the vectors {∇gi(x∗), i = 1, . . . , m}
SLIDE 38 Optimality Conditions: Equality Constrained Case
We can write equality constrained optimization problems more succinctly by introducing the Lagrangian function, L : Rn+m → R, L(x, λ) ≡ f (x) + λTg(x) = f (x) + λ1g1(x) + · · · + λmgm(x) Then we have,
∂L(x,λ) ∂xi
=
∂f (x) ∂xi
+ λ1
∂g1(x) ∂xi
+ · · · + λn
∂gn(x) ∂xi ,
i = 1, . . . , n
∂L(x,λ) ∂λi
= gi(x), i = 1, . . . , m
SLIDE 39 Optimality Conditions: Equality Constrained Case
Hence ∇L(x, λ) = ∇xL(x, λ) ∇λL(x, λ)
∇f (x) + Jg(x)Tλ g(x)
so that the first order necessary condition for optimality for the constrained problem can be written as a nonlinear system:7 ∇L(x, λ) = ∇f (x) + Jg(x)Tλ g(x)
(As before, stationary points can be classified by considering the Hessian, though we will not consider this here...)
7n + m variables, n + m equations
SLIDE 40
Optimality Conditions: Equality Constrained Case
See Lecture: Constrained optimization of cylinder surface area