Optimality Conditions Fabio Schoen 2008 - - PowerPoint PPT Presentation

optimality conditions
SMART_READER_LITE
LIVE PREVIEW

Optimality Conditions Fabio Schoen 2008 - - PowerPoint PPT Presentation

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality Conditions p. Optimality Conditions: descent directions Let S R n be a convex set and consider the problem min x S f ( x ) where f : S R .


slide-1
SLIDE 1

Optimality Conditions

Fabio Schoen 2008

http://gol.dsi.unifi.it/users/schoen

Optimality Conditions – p.

slide-2
SLIDE 2

Optimality Conditions: descent directions

Let S ⊆ Rn be a convex set and consider the problem min

x∈S f(x)

where f : S → R. Let x1, x2 ∈ S and d = x2 − x1. d is a feasible direction. If there exists ¯ ǫ > 0 such that f(x1 + ǫd) < f(x1) ∀ ǫ ∈ (0, ¯ ǫ), d is called a descent direction at x1. Elementary necessary optimality condition: if x⋆ is a local

  • ptimum, no descent direction may exist at x⋆

Optimality Conditions – p.

slide-3
SLIDE 3

Optimality Conditions for Convex Sets

If x⋆ ∈ S is a local optimum for f() and there exists a neighborhood U(x⋆) such that f ∈ C1(U(x⋆)), then dT∇f(x⋆) ≥ 0 ∀ d : feasible direction

Optimality Conditions – p.

slide-4
SLIDE 4

Optimality Conditions – p.

slide-5
SLIDE 5

proof

Taylor expansion: f(x⋆ + ǫd) = f(x⋆) + ǫdT∇f(x⋆) + o(ǫ) d cannot be a descent direction, so, if ǫ is sufficiently small, then f(x⋆ + ǫd) ≥ f(x⋆). Thus ǫdT∇f(x⋆) + o(ǫ) ≥ 0 and dividing by ǫ, dT∇f(x⋆) + o(ǫ) ǫ ≥ 0 Letting ǫ ↓ 0 the proof is complete.

Optimality Conditions – p.

slide-6
SLIDE 6

Optimality Conditions: tangent cone

General case: min f(x) gi(x) ≤ 0 i = 1, . . . , m x ∈ X (X : open set) Let S = {x ∈ X : gi(x) ≤ 0, i = 1, . . . , m}. Tangent cone to S in ¯ x: T(¯ x) = {d ∈ Rn}: d d = lim

xk→¯ x

xk − ¯ x xk − ¯ x where xk ∈ S.

Optimality Conditions – p.

slide-7
SLIDE 7

b b c b c b c b c b c b c

Optimality Conditions – p.

slide-8
SLIDE 8

Some examples

S = Rn ⇒T(x) = Rn ∀ x S = {Ax = b} ⇒ T(x) = {d : Ad = 0} S = {Ax ≤ b}; let I be the set of active constraints in ¯ x: aT

i ¯

x = bi i ∈ I aT

i ¯

x < bi i ∈ I.

Optimality Conditions – p.

slide-9
SLIDE 9

Optimality Conditions – p.

slide-10
SLIDE 10

Let d = limk(xk − ¯ x)/(xk − ¯ x) ⇒ aT

i d = aT i lim k (xk − ¯

x)/(xk − ¯ x) i ∈ I = lim

k aT i (xk − ¯

x)/(xk − ¯ x) = lim

k (aT i xk − b)/(xk − ¯

x) ≤ 0 Thus if d ∈ T(¯ x) ⇒aT

i d ≤ 0 for i ∈ I.

Optimality Conditions – p. 1

slide-11
SLIDE 11

Viceversa, let xk = ¯ x + αkd. If aT

i d ≤ 0 for i ∈ I ⇒

aT

i xk = aT i (¯

x + αkd) i ∈ I = bi + αkaT

i d

≤ bi aT

i xk = aT i (¯

x + αkd) i ∈ I < bi + αkaT

i d

≤ bi if αk small enough Thus T(x) = {d : aT

i d ≤ 0 ∀ i ∈ I}

Optimality Conditions – p. 1

slide-12
SLIDE 12

Example

Let S = {(x, y) ∈ R2 : x2 − y = 0} (parabola). Tangent cone at (0, 0)? Let {(xk, yk) → (0, 0)}, i.e. xk → 0, yk = x2

k:

(xk, yk) − (0, 0) =

  • x2

k + (xk)4

= |xk|

  • 1 + x2

k

and lim

xk→0+

xk |xk|

  • 1 + x2

k

= 1 lim

xk→0+

yk |xk|

  • 1 + x2

k

= 0 lim

xk→0−

xk |xk|

  • 1 + x2

k

= −1 lim

xk→0−

yk |xk|

  • 1 + x2

k

= 0 thus T(0, 0) = {(−1, 0), (1, 0)}

Optimality Conditions – p. 1

slide-13
SLIDE 13

Descent direction

d ∈ Rn is a feasible direction in ¯ x ∈ S if ∃ ¯ α > 0 : ¯ x + αd ∈ S ∀ α ∈ [0, ¯ α). d feasible ⇒d ∈ T(¯ x), but in general the converse is false. If f(¯ x + αd) ≤ f(¯ x) ∀ α ∈ (0, ¯ α) d is a descent direction

Optimality Conditions – p. 1

slide-14
SLIDE 14

I order necessary opt condition

Let ¯ x ∈ S ⊆ Rn be a local optimum for minx∈S f(x); let f ∈ C1(U(¯ x)). Then dT∇f(¯ x) ≥ 0 ∀ d ∈ T(¯ x) Proof d = limk(xk − ¯ x)/(xk − ¯ x). Taylor expansion: f(xk) = f(¯ x) + ∇Tf(¯ x)(xk − ¯ x) + o(xk − ¯ x) = f(¯ x) + ∇Tf(¯ x)(xk − ¯ x) + xk − ¯ xo(1). ¯ x local optimum ⇒∃ U(¯ x) : f(x) ≥ f(¯ x) ∀ x ∈ U ∩ S.

Optimality Conditions – p. 1

slide-15
SLIDE 15

. . .

If k is large enough, xk ∈ U(¯ x): f(xk) − f(¯ x) ≥ 0 thus ∇Tf(¯ x)(xk − ¯ x) + xk − ¯ xo(1) ≥ 0 Dividing by (xk − ¯ x) : ∇Tf(¯ x)(xk − ¯ x)/(xk − ¯ x) + o(1) ≥ 0 and in the limit ∇Tf(¯ x)d ≥ 0.

Optimality Conditions – p. 1

slide-16
SLIDE 16

Examples

Unconstrained problems Every d ∈ Rn belongs to the tangent cone ⇒at a local optimum ∇Tf(¯ x)d ≥ 0 ∀ d ∈ Rn Choosing d = ei e d = −ei we get ∇f(¯ x) = 0 NB: the same is true if ¯ x is a local minimum in the relative interior of the feasible region.

Optimality Conditions – p. 1

slide-17
SLIDE 17

Linear equality constraints

min f(x) Ax = b Tangent cone: {d : Ad = 0}. Necessary conditions: ∇Tf(¯ x)d ≥ 0 ∀ d : Ad = 0 equivalent statement: min

d

∇Tf(¯ x)d = 0 Ad = 0 (a linear program).

Optimality Conditions – p. 1

slide-18
SLIDE 18

Linear equality constraints

From LP duality ⇒ max 0Tλ = 0 ATλ = ∇f(¯ x) Thus at a local minimum point there exist Lagrange multipliers: ∃ λ : ATλ = ∇f(¯ x)

Optimality Conditions – p. 1

slide-19
SLIDE 19

Linear inequalities

min f(x) Ax ≤ b Tangent cone at a local minimum ¯ x: {d ∈ Rn : aT

i d ≤ 0

∀ i ∈ I(¯ x)}. Let AI be the rows of A associated to active constraints at ¯

  • x. Then

min

d

∇Tf(¯ x)d = 0 AId ≤ 0 λ ≤ 0

Optimality Conditions – p. 1

slide-20
SLIDE 20

Linear inequalities

From LP duality: max 0Tλ = 0 AT

I λ = ∇f(¯

x) λ ≤ 0 Thus, at a local optimum, the gradient is a non positive linear combination of the coefficients of active constraints.

Optimality Conditions – p. 2

slide-21
SLIDE 21

Farkas’ Lemma

Let A: matrix in Rm×n and b ∈ Rn. One and only one of the following sets: ATy ≤ 0 bTy > 0 and Ax = b x ≥ 0 is non empty

Optimality Conditions – p. 2

slide-22
SLIDE 22

Geometrical interpretation

ATy ≤ 0 Ax = b bTy > 0 x ≥ 0

a1 a2 b {z : ∃ x : z = Ax, x ≥ 0} {y : ATy ≤ 0}

Optimality Conditions – p. 2

slide-23
SLIDE 23

Proof

1) if ∃ x ≥ 0 : Ax = b ⇒bTy = xTATy. Thus if ATy ≤ 0 ⇒bTy ≤ 0. 2) Premise: Separating hyperplane theorem: let C and D be two convex nonempty sets: C ∪ D = ∅. Then there exists a = 0 and b: aTx ≤ b x ∈ C aTx ≥ b x ∈ D If C is a point and D is a closed convex set, separation is strict, i.e. aTC < b aTx > b x ∈ D

Optimality Conditions – p. 2

slide-24
SLIDE 24

Farkas’ Lemma (proof)

2) let {x : Ax = b, x ≥ 0} = ∅. Let S = {y ∈ Rm : ∃ x ≥ 0, Ax = y} S is closed, convex and b ∈ S. From the separating hyperplane theorem: ∃ α ∈ Rm = 0, β ∈ R: αTy ≤ β ∀ x ∈ S αTb > β 0 ∈ S ⇒β ≥ 0 ⇒αTb > 0; αTAx ≤ β for all x ≥ 0. This is possible iff αTA ≤ 0. Letting y = α we obtain a solution of AY y ≤ 0 bTy > 0

Optimality Conditions – p. 2

slide-25
SLIDE 25

First order feasible variations cone

G(¯ x) = {d ∈ Rn : ∇Tgi(¯ x)d ≤ 0} i ∈ I

b b

Optimality Conditions – p. 2

slide-26
SLIDE 26

First order variations

G(¯ x) ⊇ T(¯ x). In fact if {xk} is feasible and d = lim

k

xk − ¯ x xk − ¯ x then gi(¯ x) ≤ 0 and g(¯ x + lim

k (xk − ¯

x)) ≤ 0

Optimality Conditions – p. 2

slide-27
SLIDE 27

. . .

g(¯ x + lim

k xk − ¯

x xk − ¯ x xk − ¯ x) ≤ 0 g(¯ x + lim

k xk − ¯

x lim xk − ¯ x xk − ¯ x) ≤ 0 g(¯ x + lim

k xk − ¯

xd) ≤ 0 Let αk = xk − ¯ x, if αk ≈ 0: g(¯ x + αkd) ≤ 0

Optimality Conditions – p. 2

slide-28
SLIDE 28

gi(¯ x + αkd) = gi(¯ x) + αk∇Tgi(¯ x)d + o(αk) where αk > 0 and d belong to the tangent cone T(¯ x). If the i–th constraint is active, then gi(¯ x + αkd) = αk∇Tgi(¯ x)d + o(αk) ≤ 0 gi(¯ x + αkd)/αk = ∇Tgi(¯ x)d + o(αk))/αk ≤ 0 Letting αk → 0 the result is obtained.

Optimality Conditions – p. 2

slide-29
SLIDE 29

example

G(¯ x) = T(¯ x); −x3 + y ≤ 0 −y ≤ 0

Optimality Conditions – p. 2

slide-30
SLIDE 30

KKT necessary conditions

(Karush–Kuhn–Tucker) Let ¯ x ∈ X ⊆ Rn, X = ∅ be a local optimum for min f(x) gi(x) ≤ 0 i = 1, . . . , m x ∈ X I: indices of active constraints at ¯

  • x. If:
  • 1. f(x), gi(x) ∈ C1(¯

x) for i ∈ I

  • 2. “constraint qualifications” conditions: T(¯

x) = G(¯ x) hold in ¯ x ; then there exist Lagrange multipliers λi ≥ 0, i ∈ I: ∇f(¯ x) +

  • i∈I

λi∇gi(¯ x) = 0.

Optimality Conditions – p. 3

slide-31
SLIDE 31

Proof

¯ x local optimum ⇒if d ∈ T(¯ x) ⇒dT∇f(¯ x) ≥ 0. But d ∈ T(¯ x) ⇒ dT∇gi(¯ x) ≤ 0 i ∈ I. Thus it is impossible that −∇Tf(¯ x)d > 0 ∇Tgi(¯ x)d ≤ 0 i ∈ I From Farkas’ Lemma ⇒there exists a solution of:

  • i∈I

λi∇Tgi(¯ x) = −∇Tf(¯ x) i ∈ I λi ≥ 0 i ∈ I

Optimality Conditions – p. 3

slide-32
SLIDE 32

Constraint qualifications: examples

polyhedra: X = Rn and gi(x) are affine functions: Ax ≤ b linear independence: X open set, gi(x), i ∈ I continuous in ¯

x and {∇gi(¯ x)}, i ∈ I are linearly independent.

Slater condition: X open set, gi(x), i ∈ I convex differentiable

functions in ¯ x, gi(x), i ∈ I continuous in ¯ x, and ∃ ˆ x ∈ X strictly feasible: gi(ˆ x) < 0 i ∈ I.

Optimality Conditions – p. 3

slide-33
SLIDE 33

Convex problems

An optimization problem min

x∈S f(x)

is a convex problem if S is a convex set, i.e. x, y ∈ S⇒λx + (1 − λ)y ∈ S ∀ λ ∈ [0, 1] f is a convex function on S, i.e. f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y) ∀ λ ∈ [0, 1] and x, y ∈ S

Optimality Conditions – p. 3

slide-34
SLIDE 34

Standard convex problem

min f(x) gi(x) ≤ 0 i = 1, m hj(x) = 0 j = 1, k if f is convex gi are convex hj are affine (i.e. of the form αTx + β) then the problem is convex.

Optimality Conditions – p. 3

slide-35
SLIDE 35

Convex problems

Every local optimum is a global one. Proof: ¯ x: local optimum for minS f(x) x⋆: global optimum. S convex ⇒λx⋆ + (1 − λ)¯ x ∈ S. Thus if λ ≈ 0 ⇒ f(¯ x) ≤ f(λx⋆ + (1 − λ)¯ x ≤ λf(x⋆) + (1 − λ)f(¯ x) ⇒ f(¯ x) ≤ f(x⋆) and ¯ x is also a global optimum.

Optimality Conditions – p. 3

slide-36
SLIDE 36

Sufficiency of 1st order conditions

(for a convex differentiable problem: if dT∇f(¯ x) ∀ d ∈ T(¯ x), then ¯ x is a (global) optimum Proof: f(y) ≥ f(¯ x) + (y − ¯ x)T∇f(x) ∀ y ∈ S But y − ¯ x ∈ T(¯ x) ⇒ f(y) ≥ f(¯ x) + dT∇f(x) ∀ y ∈ S ≥ f(¯ x) thus ¯ x is a global minimum.

Optimality Conditions – p. 3

slide-37
SLIDE 37

Convexity of the set of global optima

(for convex problems) The set of global minima of a convex problem is a convex set. In fact, let ¯ x and ¯ y be global minima for the convex problem min

x∈S f(x)

Then, choosing λ ∈ [0, 1] we have λ¯ x + (1 − λ)¯ y ∈ S, as S is

  • convex. Moreover

f(λ¯ x + (1 − λ)¯ y) ≤ λf(¯ x) + (1 − λ)f(¯ y) λf ⋆ + (1 − λ)f ⋆ = f ⋆ where f ⋆ is the global minimum value. Thus the equality holds and the proof is complete.

Optimality Conditions – p. 3

slide-38
SLIDE 38

KKT for equality constraints

¯ x: local optimum for min f(x) gi(x) ≤ 0 i = 1, . . . , m hj(x) = 0 j = 1, . . . , k x ∈ X ⊆ Rn Let I: set of active inequalities in ¯

  • x. If f(x),

gi(x), i ∈ I, hj(x) ∈ C1 and “constraint qualifications” hold in ¯ x, ⇒∃ λi ≥ 0 ∀ i ∈ I e µj ∈ R, ∀ j = 1, . . . , h: ∇f(¯ x) +

  • i∈I

λi∇gi(¯ x) +

h

  • j=1

µj∇hj(¯ x) = 0

Optimality Conditions – p. 3

slide-39
SLIDE 39

Complementarity

KKT equivalent formulation: ∇f(¯ x) +

m

  • i=1

λi∇gi(¯ x) +

h

  • j=1

µj∇hj(¯ x) = 0 λigi(¯ x) = 0 i = 1, . . . , m Condition λigi(¯ x) = 0 is called complementarity condition

Optimality Conditions – p. 3

slide-40
SLIDE 40

II order necessary conditions

If f, g1, hj ∈ C2 in ¯ x and the gradients of active constraints in ¯ x are linearly independent, then there exist mutlipliers λi ≥ 0, i ∈ I and µj, j = 1, . . . , k such that ∇f(¯ x) +

  • i∈I

λi∇gi(¯ x) +

k

  • j=1

µj∇hj(¯ x) = 0 and dT∇2L(¯ x)d ≥ 0 for every direction d: dT∇gi(¯ x) ≤ 0, dT∇hj(x) = 0 where ∇2L(x) := ∇2f(x) +

  • i∈I

λi∇2gi(x) +

k

  • j=1

µj∇2hj(x)

Optimality Conditions – p. 4

slide-41
SLIDE 41

Sufficient conditions

Let f, gi, hj twice continuously differentiable. Let x⋆, λ⋆, µ⋆: ∇f(x⋆) +

  • i∈I

λ⋆

i ∇gi(x⋆) + k

  • j=1

µ⋆

j∇hj(x⋆) = 0

λ⋆

i gi(x⋆) = 0

λ⋆

i ≥ 0

dT∇2L(x⋆)d > 0 ∀ d :dT∇hj(x⋆) = 0 dT∇gi(x⋆) = 0, i ∈ I then x⋆ is a local minimum.

Optimality Conditions – p. 4

slide-42
SLIDE 42

Lagrange Duality

Problem: f ⋆ = min f(x) gi(x) ≤ 0 x ∈ X definition: Lagrange Function: L(x; λ) = f(x) +

  • i

λigi(x) λ ≥ 0, x ∈ X

Optimality Conditions – p. 4

slide-43
SLIDE 43

Relaxation

Given an optimization problem min

x∈S f(x)

a relaxation is a problem min

x∈Q g(x)

where S ⊆ Q g(x) ≤ f(x) ∀ x ∈ S. Weak Duality : The optimal value of a relaxation is a lower bound on the optimum value of the problem.

Optimality Conditions – p. 4

slide-44
SLIDE 44

Lagrange minimization is a relaxation

Proof: Feasible set of the Lagrange problem: X (contains the

  • riginal one)

If g(x) ≤ 0 and λ ≥ 0 ⇒ L(x, λ) = f(x) + λTg(x) ≤ f(x)

Optimality Conditions – p. 4

slide-45
SLIDE 45

Dual Lagrange function

with respect to constraints g(x) ≤ 0: θ(λ) = inf

x∈X L(x, λ)

= inf

x∈X(f(x) + λTg(x))

For every choice of λ ≥ 0, θ(λ) is a lower bound for every feasible solution and in particular, is a lower bound for the global minimum value of the problem.

Optimality Conditions – p. 4

slide-46
SLIDE 46

Example (circle packing)

min −r 4r2 − (xi − xj)2 − (yi − yj)2 ≤ 0 1 ≤ i < j ≤ N xi, yi ≤ 1 i = 1, . . . , N −xi, −yi ≤ 0 i = 1, . . . , N

Optimality Conditions – p. 4

slide-47
SLIDE 47

When N = 2, relaxing the first constraint: θ(λ) = min

x,y,r −r + λ(4r2 − (x1 − x2)2 − (y1 − y2)2)

x1, x2, y1, y2 ≥ 0 x1, x2, y1, y2 ≤ 1

Optimality Conditions – p. 4

slide-48
SLIDE 48

solution

Minimizing with respect to x, y ⇒|x1 − x2| = |y1 − y2| = 1 from which θ(λ) = min

r

−r + 4λr2 − 2λ r = 1 8λ θ(λ) = −2λ − 1 16λ This is a lower bound on the optimum value. Best possible lower bound: θ⋆ = max

λ

θ(λ) λ⋆ = 1 4 √ 2 θ⋆ = − √ 2 2

Optimality Conditions – p. 4

slide-49
SLIDE 49

Choosing (x1, y1) = (0, 0) and (x2, y2) = (1, 1) a feasible solution with r = √ 2/2 is obtained. The Lagrange dual gives a lower bound equal to − √ 2/2: same as the objective function at a feasible solution ⇒optimal solution! (an exception, not the rule!)

Optimality Conditions – p. 4

slide-50
SLIDE 50

Lagrange Dual

θ⋆ = max θ(λ) λ ≥ 0 This problem might:

  • 1. be unbounded
  • 2. have a finite sup but non max
  • 3. have a unique maximum attained in correspondence with a

single solution x

  • 4. have many different maxima, each connected with a

different solution x

Optimality Conditions – p. 5

slide-51
SLIDE 51

Equality constraints

f ⋆ = min f(x) gi(x) ≤ 0 i = 1, . . . , m hj(x) = 0 j = 1, . . . , k x ∈ X Lagrange function: L(x; λ, µ) = f(x) + λTg(x) + µTh(x) where λ ≥ 0, but µ is free.

Optimality Conditions – p. 5

slide-52
SLIDE 52

Linear Programming

min cTx Ax ≤ b Dual Lagrange function: θ(λ) = min

x cTx + λT(Ax − b)

= −λTb + min

x (cT + λTA)x.

but: min

x (cT + λTA)x =

  • if cT + λTA = 0

−∞

  • therwise.

Optimality Conditions – p. 5

slide-53
SLIDE 53

. . .

Lagrange dual function: θ(λ) =

  • −λTb

if cT + λTA = 0 −∞

  • therwise.

Lagrange dual: max −λT b λT A + cT = 0 λ ≥ 0 which is equivalent to: max λT b λT A = cT λ ≤ 0

Optimality Conditions – p. 5

slide-54
SLIDE 54

Quadratic Programming (QP)

min 1 2xTQx + cTx Ax = b (Q: symmetric). Lagrange dual function: θ(λ) = min

x

1 2xTQx + cTx + λT(Ax − b) = −λTb + min

x

1 2xTQx + (cT + λTA)x

Optimality Conditions – p. 5

slide-55
SLIDE 55

QP – Case 1

Q has at least one negative eigenvalue ⇒ min

x

1 2xTQx + (cT + λTA)x = −∞ In fact ∃ d : dTQd < 0. Choosing x = αd with α > 0 ⇒ 1 2xTQx + (cT + λTA)x = 1 2α2dTQd + α(cT + λTA)d and for large values of α this can be made as small as desired.

Optimality Conditions – p. 5

slide-56
SLIDE 56

QP – Case 2

Q positive definite ⇒minimum point of the dual Lagrange function: Q¯ x + (c + ATλ) = 0 i.e. ¯ x = −Q−1(c + ATλ)

Optimality Conditions – p. 5

slide-57
SLIDE 57

. . .

Lagrange function value: θ(λ) = −λTb + 1 2 ¯ xTQ¯ x + (cT + λTA)¯ x = −λTb + 1 2(c + ATλ)TQ−1QQ−1(c + ATλ) − (cT + λTA)Q−1(c + ATλ) = −λTb + 1 2(c + ATλ)TQ−1(c + ATλ) − (cT + λTA)Q−1(c + ATλ) = −λTb − 1 2(c + ATλ)TQ−1(c + ATλ)

Optimality Conditions – p. 5

slide-58
SLIDE 58

. . .

Lagrange dual (seen as a min problem): min

λ λT b + 1

2(c + AT λ)T Q−1(c + AT λ) Optimality conditions: b + AQ−1(c + AT λ) = 0 But recalling that ¯ x = −Q−1(c + AT λ) ⇒ b − A¯ x = 0

feasibility of ¯

x ⇒if we find optimal multipliers λ (a linear system) ⇒we get the optimal solution ¯ x (thanks to feasibility and weak duality)!

Optimality Conditions – p. 5

slide-59
SLIDE 59

Properties of the Lagrange dual

For any problem f ⋆ = min f(x) gi(x) ≤ 0 i = 1, . . . , m x ∈ X where X is non empty and compact, if f and gi are continuous then the Lagrange dual function is concave

Optimality Conditions – p. 5

slide-60
SLIDE 60

Dim.

From Weierstrass theorem θ(λ) = min

x∈X f(x) + λTg(x)

exists and is finite

θ(ηa + (1 − η)b) = min

x∈X(f(x) + (ηa + (1 − η)b)Tg(x))

= min

x∈X(η(f(x) + aTg(x)) + (1 − η)(f(x) + bTg(x)))

≥ η min

x∈X(f(x) + aTg(x)) + (1 − η) min x∈X(f(x) + bTg(x))

= ηθ(a) + (1 − η)θ(b).

Optimality Conditions – p. 6

slide-61
SLIDE 61

Solution of the Lagrange dual

max

λ

θ(λ) = max

λ

min

x∈X(f(x) + λTg(x))

is equivalent to max z z ≤ f(x) + λTg(x) ∀ x ∈ X λ ≥ 0 After having computed f and g in x1, x2, . . . , xk a restricted dual can be defined: max z z ≤ f(xj) + λTg(xj) ∀ j = 1, . . . , k λ ≥ 0

Optimality Conditions – p. 6

slide-62
SLIDE 62

. . .

Let ¯ λ be the optimal solution of the restricted dual. Is it an

  • ptimal dual solution? Is it true that ¯

z ≤ f(x) + ¯ λTg(x)? Check: we look for ¯ x, optimal solution of min

x∈X f(x) + ¯

λTg(x) if f(¯ x) + ¯ λTg(¯ x) ≥ ¯ z then we have found the optimal solution

  • f the dual;
  • therwise the pair ¯

x, f(¯ x) is added to the restricted dual and a new solution is computed.

Optimality Conditions – p. 6

slide-63
SLIDE 63

Geometric programming

Unconstrained Geometric program: min

x>0 m

  • k=1

ck

n

  • j=1

x

αkj j

αkj ∈ R, ck > 0 (non convex). Variable substitution: xj = exp(yj) yj ∈ R

Optimality Conditions – p. 6

slide-64
SLIDE 64

Transformed problem: min

y m

  • k=1
  • ck

n

  • j=1

eαkjyj

  • =

min

y m

  • k=1

eαT

k y+βk

βk = log ck still non convex, but its logarithm is convex.

Optimality Conditions – p. 6

slide-65
SLIDE 65

Duality example

Dual of min f(x) min log

m

  • k=1

exp(αT

k x + βk)

No constraints ⇒dual lagrange function is identical to f(x)! Strong duality holds, but is useless. Simple transformation: min log

m

  • k=1

exp yk yk = αT

k x + βk

Optimality Conditions – p. 6

slide-66
SLIDE 66

solving the dual

Dual function L(λ) = min

x,y log m

  • k=1

exp yk + λT(Ax + β − y) Minimization in x is unconstrained: min λTAx ⇒ if λTA = 0 L(λ) is unbounded if λTA = 0 then L(λ) = min

y

log

m

  • k=1

exp yk + λT(β − y)

Optimality Conditions – p. 6

slide-67
SLIDE 67

First order (unconstrained) optimality conditions w.r.t. yi: exp yi

  • k exp yk

− λi = 0 ⇒Lagrange multipliers exist provided that

  • i

λi = 1 λi > 0∀ i

Optimality Conditions – p. 6

slide-68
SLIDE 68

Substituting λj = exp yj/

k exp yk,

L(λ) = log

  • j

exp yj −

  • j

λjyj = log

  • j

exp yj −

  • j

yj exp yj/

  • k

exp yk = 1

  • k exp yk

(

  • k

exp yk(log

  • j

exp yj − yk)) =

  • k
  • exp yk
  • j exp yj

(log

  • j

exp yj − yk)

  • = −
  • k

λk log λk

Optimality Conditions – p. 6

slide-69
SLIDE 69

Lagrange Dual

The Lagrange Dual becomes: max

λ

βTλ −

  • k

λk log λk

  • k

λk = 1 ATλ = 0 λ ≥ 0

Optimality Conditions – p. 6

slide-70
SLIDE 70

Special cases: linear constraints

min f(x) Ax ≥ b Lagrange function: L(x, λ) = f(x) + λT(b − Ax) Constraint qualifications always hold (polyhedron). If x⋆ is a local optimum there exists λ⋆ ≥ 0: Ax⋆ ≥ b ∇f(x⋆) = ATλ⋆ λ⋆T(b − Ax⋆) = 0

Optimality Conditions – p. 7

slide-71
SLIDE 71

Non negativity constraints

min f(x) x ≥ 0 Lagrange function: L(x, λ) = f(x) − λTx. KKT conditions: ∇f(x⋆) = λ⋆ x⋆ ≥ 0 λ⋆ ≥ 0 (λ⋆)Tx⋆ = 0

Optimality Conditions – p. 7

slide-72
SLIDE 72

λ⋆

j = ∂f(x⋆)

∂xj j = 1, n from which ∂f(x⋆) ∂xj = 0 ∀ j : x⋆

j > 0

∂f(x⋆) ∂xj ≥ 0

  • therwise

Optimality Conditions – p. 7

slide-73
SLIDE 73

Box constraints

min f(x) ℓ ≤ x ≤ u ℓi < ui∀ i Lagrange function: L(x, λ, µ) = f(x) + λT(ℓ − x) + µT(x − u). KKT conditions: ∇f(x⋆) = λ⋆ − µ⋆ (ℓ − x⋆)Tλ⋆ = 0 (x⋆ − u)Tµ = 0 (λ⋆, µ⋆) ≥ 0 Given x⋆ let Jℓ = {j : x⋆

j = ℓj}, Ju = {j : x⋆ j = uj}, J0 = {j : ℓj < x⋆ j < uj}

Optimality Conditions – p. 7

slide-74
SLIDE 74

Box constr. (cont)

Then, from complementarity, ∂f(x⋆) ∂xj = λ⋆

j

j ∈ Jℓ ∂f(x⋆) ∂xj = −µ⋆

j

j ∈ Ju ∂f(x⋆) ∂xj = 0 j ∈ J0

Optimality Conditions – p. 7

slide-75
SLIDE 75

Thus ∂f(x⋆) ∂xj ≥ 0 j ∈ Jℓ ∂f(x⋆) ∂xj ≤ 0 j ∈ Ju ∂f(x⋆) ∂xj = 0 j ∈ J0 with feasibility ℓ ≤ x⋆ ≤ u

Optimality Conditions – p. 7

slide-76
SLIDE 76

Optimization over the simplex

min f(x) 1Tx = 1 x ≥ 0 Lagrange function: L(x, λ, µ) = f(x) − λTx + µT(1Tx − 1). KKT: ∇f(x⋆) = λ⋆ − µ⋆1 1Tx⋆ = 1 (x⋆, λ⋆) ≥ 0 (λ⋆)Tx⋆ = 0

Optimality Conditions – p. 7

slide-77
SLIDE 77
  • simplex. . .

∂f(x⋆) ∂xj − λ⋆

j = −µ⋆

(all equal). Thus, from complementarity, if x⋆

j > 0 then λ⋆ j = 0

and ∂f(x⋆)

∂xj

= −µ⋆; otherwise ∂f(x⋆)

∂xj

≥ −µ⋆. Thus, if j : x⋆

j > 0,

∂f(x⋆) ∂xj ≤ ∂f(x⋆) ∂xk ∀ k

Optimality Conditions – p. 7

slide-78
SLIDE 78

Application: Min var portfolio

Given n assets with random returns R1, . . . , Rn, how to invest 1 e in such a way that the resulting portfolio has minimum variance? If xj denotes the percentage of the investment on asset j, how to compute the variance of this portfolio P(x)?

Var = E(P(x) − (E(P(x))))2

= E n

  • j=1

(Rj − E(Rj))xj 2 =

  • i,j

(Ri − E(Ri))(Rj − E(Rj))xixj = xTQx where Q is the variance-covariance matrix of the n assets.

Optimality Conditions – p. 7

slide-79
SLIDE 79

Min var portfolio

Problem (objective multiplied by 1/2 for simpler computations): min(1/2)xTQx 1Tx = 1 x ≥ 0

Optimality Conditions – p. 7

slide-80
SLIDE 80

Optimal portfolio

KKT: for all j : x⋆

j > 0:

  • j

Qijxj ≤

  • j

Qkjxj ∀ k Vector Qx might be thaught as the vector of marginal contributions to the total risk (which is a weighted sum of elements of Qx). Thus in the optimal portfolio, all assets with positive level give equal (and minimal) contribution to the total risk.

Optimality Conditions – p. 8