Lagrange Function and KKT Conditions October 26, 2018 265 / 429 - - PowerPoint PPT Presentation

lagrange function and kkt conditions
SMART_READER_LITE
LIVE PREVIEW

Lagrange Function and KKT Conditions October 26, 2018 265 / 429 - - PowerPoint PPT Presentation

Lagrange Function and KKT Conditions October 26, 2018 265 / 429 How do you compute the table of Orthogonal Projections? 1 1 || x z || || x z || 2 2 + I ( x ) =argmin P C ( z ) = prox I ( z ) =argmin C x C 2 t 2 t C x Set C =


slide-1
SLIDE 1

Lagrange Function and KKT Conditions

October 26, 2018 265 / 429

slide-2
SLIDE 2

How do you compute the table of Orthogonal Projections?

C

PC(z) = proxI (z) =argmin

x 2 C

||x−z|| +I ( 1 1 2t

x∈C 2t

x) =argmin ||x−z|| 2

SetC= Fort= 1, PC(z) = Assumptions ℜ

n +

[z]+

C i

P (z) =min{max{

i i i

z ,l },u } li ≤u i Box[l,u] Ball[c,r] r c+ max (z−c) ∥.∥2 ball, centrec∈ ℜ

n & radiusr>0 T {∥z−c∥ 2,r}

z−A (AAT)−1(Az−b) {x|Ax=b} { x|aTx≤b} z−

T

[a x−b]+ ∥a∥2

A∈ ℜ

m×n,b∈ ℜ m,Ais full row rank

0̸=a∈ ℜ

n b∈ ℜ

∆ n [z−µ ∗e]+ whereµ ∗ ∈ ℜsatisfiese T[z−µ ∗e]+ = 1

a,b

H ∩Box[l,u] PBox[l,u](z−µ ∗a)whereµ ∗ ∈ ℜsatisfies 0̸=a∈ ℜ

n b∈ℜ

H−a,b ∩Box[l,u] aTPBox[l,u](z−µ ∗a) =b PBox[l,u](z) aTPBox[l,u](z)≤b PBox[l,u](z−λ ∗a) aTPBox[l,u](z)>b whereλ ∗ ∈ℜsatisfies aTPBox[l,u](z−λ ∗a) =b &λ ∗>0 = ̸ a∈ ℜ

n b∈ ℜ

B∥.∥1[0,α] z ∥z∥1≤α [z−λ ∗e]+ ⊙sign(z) ∥z∥1>α whereλ ∗ >0,& [z−λ ∗e]+ ⊙sign(z) =α α>0

October 26, 2018 266 / 429

slide-3
SLIDE 3

Lagrange Function and Necessary KKT Conditions

Can the Lagrange Multiplier construction be generalized to always find optimal solutions to a minimization problem? Instead of the iterative path again, assume everything can be computed analytically Attributed to the mathematician Lagrange (born in1736in Turin). Largely worked on mechanics, the calculus of variations, probability, group theory, and number theory. Credited with the choice of base 10 for the metric system (rather than12).

October 26, 2018 267 / 429

slide-4
SLIDE 4

Lagrange Function and Necessary KKT Conditions

x∈D

subject tog

i(x) = 0i= 1,2,...,m

(67) The figure shows some level curves of the functionf and of a single constraint functiong 1 (dotted lines) The gradient of the constraint∇g 1 is not parallel to the gradient∇fof the function atf= 10.4; it is therefore possible to

grad f has a non-zero component perpendicular to gradient of g1 reduce the value of f by moving Moving perpendicular to grad g1 = in negative of non-zero compon => g1(x) = 0 remains perpendicular to grad g1 Goal: We should not be able to reduce the value of f while still honoring g1(x) = 0 x'

October 26, 2018 268 / 429

All this shows that there cannot be a localminimum at x' Note that a lot of the analysis that follows does not even assume convexity Necessary conditions often do NOT require c

C

  • n

n

v e

s i

x

d

i

e

t y

r the equality constrainedminimization problem (withD⊆ ℜ n) min f(x)

slide-5
SLIDE 5

Lagrange Function and Necessary KKT Conditions

x∈D

Consider the equality constrained minimization problem (withD⊆ ℜn) min f(x) subject tog

i(x) = 0i= 1,2,...,m

(67) The figure shows some level curves of the functionf and of a single constraint functiong 1 (dotted lines) The gradient of the constraint∇g 1 is not parallel to the gradient∇fof the function atf= 10.4; it is therefore possible to move along the constraint surface so as to further reducef .

October 26, 2018 268 / 429

slide-6
SLIDE 6

Lagrange Function and Necessary KKT Conditions

rallel a

gradient of f along that direction

However,∇g 1 and∇fare pa tf= 10.3, and any motion alongg 1(x) = 0will

lie along the perpendicular to gradient of g1(x) at that point <==> but = 0!! ==> If we try to decrease value of f, we will land up increasing/decreasing g1 (unacceptable) ==> If we move along perpendicular to gradient of g1, no change expected in f SO gradients of f and g being in same/opposite directions is necessary condition for local minimum/maximum

October 26, 2018 269 / 429

slide-7
SLIDE 7

Lagrange Function and Necessary KKT Conditions

However,∇g 1 and∇fare parallel atf= 10.3, and any motion alongg 1(x) = 0will leavefunchanged . Hence, at the solutionx ∗,

gradient f(x*) proportional to gradient g1(x*)

October 26, 2018 269 / 429

slide-8
SLIDE 8

Lagrange Function and Necessary KKT Conditions

some constantλ∈ ℜ;λ However,∇g 1 and∇fare parallel atf= 10.3, and any motion alongg 1(x) = 0will leavefunchanged . Hence, at the solutionx ∗,∇f(x ∗)must be proportional to−∇g 1(x∗), yielding, is ∇f(x∗) =−λ∇g 1(x∗), for called aLagrange multiplier. Oftenλitself need never be computed and therefore often qualified as theundetermined lagrange multiplier.

October 26, 2018 269 / 429

slide-9
SLIDE 9

Lagrange Function and Necessary KKT Conditions

The necessary condition for an optimum atx ∗ for the optimization problem in (68) with m= 1can be stated as in (68); the gradient is now in

The gradient of the Lagrange function wrt x* and lambda* should vanish as a necessary condition for optimum at x*,lambda*

October 26, 2018 270 / 429

slide-10
SLIDE 10

Lagrange Function and Necessary KKT Conditions

The necessary condition for an optimum atx ∗ for the optimization problem in (68) with m= 1can be stated as in (68); the gradient is now inℜ

n+1 with its last component

being a partial derivative with respect toλ . ∇L(x∗,λ ∗) =∇f(x ∗) +λ ∗∇g1(x∗) = 0 gi(x∗)= 0 (68)

minima offminimize the Lagrangian globally.

The solutions to (68) are the stationary points of the lagrangianL; they are not necessarily local extrema ofL.

▶ Lis unbounded: given a pointxthat doesn’t lie on the constraint, lettingλ→±∞makesL

arbitrarily large or small.(General property of linear functions - here linearity in lambda)

▶ However, under certain stronger assumptions, if thestrong Lagrangian principleholds, the October 26, 2018 270 / 429

A bit later

slide-11
SLIDE 11

Lagrange Function and Necessary KKT Conditions

Let us extend the necessary condition for optimality of a minimization problem with single constraint to minimization problems with multiple equality constraints (i.e.,m>1. in (67)). LetSbe the subspace spanned by∇g

i(x)at any pointxand letS ⊥ be its orthogonal

  • complement. Let(∇f) ⊥ be the component of∇finthe subspaceS

⊥.

Moving perpendicular to S ==> all constraints remain satisified. ===> At an optimal point x*, we should not be able to move perpendicular to S while reducing the value off ===> Gradient of cannot have any component along perpendicular to S ===> f MUST lie in S

October 26, 2018 271 / 429

slide-12
SLIDE 12

Lagrange Function and Necessary KKT Conditions

Let us extend the necessary condition for optimality of a minimization problem with single constraint to minimization problems with multiple equality constraints (i.e.,m>1. in (67)). LetSbe the subspace spanned by∇g

i(x)at any pointxand letS ⊥ be its orthogonal

  • complement. Let(∇f) ⊥ be the component of∇finthe subspaceS

⊥.

At any solutionx ∗, it must be true that the gradient offhas(∇f)

⊥ = 0(i.e., no

components that are perpendicular to all of the∇g i), because otherwise you could move x∗ a little in that direction (or in the opposite direction) to increase (decrease)fwithout changing any of theg i,i.e. without violating any constraints. Hence for multiple equality constraints, it must be true that at the solutionx ∗, the space Scontains the vector∇f,i.e., there are some constantsλ

i such that∇f(x ∗) =λ i∇gi(x∗).

October 26, 2018 271 / 429

slide-13
SLIDE 13

Lagrange Multipliers with Inequality Constraints

We also need to impose that the solution is on the correct constraint surface (i.e.,

i

g = 0 , ∀ i). In the same manner as in the case ofm= 1, this can be encapsulated by

m

i=1

introducing the LagrangianL(x,λ) =f(x) + λigi(x), whose gradient with respect to bothx, andλvanishes at the solution. This gives us the following necessary condition for optimality of (67):

October 26, 2018 272 / 429

slide-14
SLIDE 14

Lagrange Multipliers with Inequality Constraints

INACTIVE CONSTRAINT ==> g1(x*) < 0

October 26, 2018 273 / 429

Single equality constraintg 1(x) = 0, replaced with a single inequality constraintg 1(x)≤0. The entire region labeledg 1(x)≤0in the Figure becomes feasible. At the solutionx ∗, ifg 1(x∗) = 0,i.e., if the constraint is active, we must have

gradient of f(x*) and gradient of g(x*) are in same space.. (active case is exactly the same as that of equality constrained optimization)

slide-15
SLIDE 15

Lagrange Multipliers with Inequality Constraints

Single equality constraintg 1(x) = 0, replaced with a single inequality constraintg 1(x)≤0. The entire region labeledg 1(x)≤0in the Figure becomes feasible. At the solutionx ∗, ifg 1(x∗) = 0,i.e., if the constraint is active, we must have (as in the case of a single equality constraint) that∇fis parallel to ∇g1, by the same argument as before. Additionally, necessary for the two gradients to point in

We have a problem: It is fine to reduce f while reducing g1 ==> It is fine to move in negative gradient f(x*) if that also has a component in negative gradient g1(x*)

  • pposite directions

October 26, 2018 273 / 429

slide-16
SLIDE 16

Lagrange Multipliers with Inequality Constraints

Single equality constraintg 1(x) = 0, replaced with a single inequality constraintg 1(x)≤0. The entire region labeledg 1(x)≤0in the Figure becomes feasible. At the solutionx ∗, ifg 1(x∗) = 0,i.e., if the constraint is active, we must have (as in the case of a single equality constraint) that∇fis parallel to ∇g1, by the same argument as before. Additionally, necessary for the two gradients to point in opposite directions; else a move away from the surfaceg 1 = 0and into the feasible region would further reducef. With LagrangianL=f+λg

1, an additional

constraint is that

lambda1 >= 0

October 26, 2018 273 / 429

slide-17
SLIDE 17

Lagrange Multipliers with Inequality Constraints

If the constraint is not active at the solution ∇f(x∗) = 0, then removingg 1 makes no difference and we can drop it fromL=f+λg

1,

This is equivalent to settingλ= 0. Thus, whether or not the constraintsg 1 = 0are active, we can find the solution by requiring that

the gradients of the Lagrangian vanish, and

1 2 λg1(x∗) = 0.

This latter condition is one of the important Karush-Kuhn-Tucker conditions of convex

  • ptimization theory that can facilitate the search

for the solution and will be more formally discussed subsequently.

wrt x* only (complementary slackness)

October 26, 2018 274 / 429

slide-18
SLIDE 18

Lagrange Multipliers with Inequality Constraints

x∈D

Now consider the general inequality constrained minimization problem min f(x) subject tog

i(x)≤0i= 1,2,...,m

(70) With multiple inequality constraints, for constraints that are active, (as in the case of multiple equality constraints),

1

i

∇fmust lie in the space spanned by the∇g ’s,

2

m

i i

if the Lagrangian isL=f+ λ g , then wemust

i ≥0, i=1

also haveλ ∀i(since otherwisefcould be reduced by moving into the feasible region).

October 26, 2018 275 / 429

slide-19
SLIDE 19

Lagrange Multipliers with Inequality Constraints

As for an inactive constraintg j (gj <0), removing gj fromLmakes no difference and we can drop∇g

j m

∑ from∇f=− λ ∇

i i j

g or equivalently setλ = 0.

m

i=1 i i

∇L(x∗,λ ∗) =∇ f(x) + λ g (

i=1

Thus, the foregoing KKT condition generalizes to λigi(x∗) = 0,∀i. The necessary condition for optimality of (74) is summarized as: x) = 0 ∀iλ

igi(x) = 0(71)

Gradeint is wrt x*

  • nly

October 26, 2018 276 / 429

slide-20
SLIDE 20

A simple and often useful trick called thefree constraint gambitis to solve ignoring one or more of the constraints, and then check that the solution satisfies those constraints, in which case you have solved the problem.

Eg: Take g1 and see if gradient f(x*) + lambda1* gradient g1(x*) = 0 for some lambda1* and x* If yes, then we have satisified the necessary condition as discussed on the board

October 26, 2018 277 / 429

slide-21
SLIDE 21

A simple and often useful trick called thefree constraint gambitis to solve ignoring one or more of the constraints, and then check that the solution satisfies those constraints, in which case you have solved the problem.

Some Algebraic Justification: Lagrange Multipliers with Inequality Constraints

October 26, 2018 277 / 429

slide-22
SLIDE 22

Algebraic Justification: Lagrange Multipliers with Inequality Constraints

x∈D

For the constrained optimization problem min f(x) subject tox∈C (72)

x∈C

x∗ =argmin f(x)⇐⇒argmin

x f(x)+IC(x),whereI C(x) =I{x∈C}=

{ 0ifx∈C ∞ifx/∈C

C C

{ N (x) =∂I (x) = h

n T

∈ℜ h x

T

h zfor anyz∈C } = { h∈ℜn .hT(x−z)≥0for a . ≥ ∇Tf(x∗)(z−x ∗)≥0for anyz∈C Recap:Necessary condition for optimality atx ∗:0∈ {x∗ .∇f(x∗) +N C(x∗)}, that is, ∇f(x∗) =−N C(x∗) = 0and therefore (73)

October 26, 2018 278 / 429