Lagrange Function and KKT Conditions
October 26, 2018 265 / 429
Lagrange Function and KKT Conditions October 26, 2018 265 / 429 - - PowerPoint PPT Presentation
Lagrange Function and KKT Conditions October 26, 2018 265 / 429 How do you compute the table of Orthogonal Projections? 1 1 || x z || || x z || 2 2 + I ( x ) =argmin P C ( z ) = prox I ( z ) =argmin C x C 2 t 2 t C x Set C =
October 26, 2018 265 / 429
C
x 2 C
x∈C 2t
SetC= Fort= 1, PC(z) = Assumptions ℜ
n +
[z]+
C i
P (z) =min{max{
i i i
z ,l },u } li ≤u i Box[l,u] Ball[c,r] r c+ max (z−c) ∥.∥2 ball, centrec∈ ℜ
n & radiusr>0 T {∥z−c∥ 2,r}
z−A (AAT)−1(Az−b) {x|Ax=b} { x|aTx≤b} z−
T
[a x−b]+ ∥a∥2
A∈ ℜ
m×n,b∈ ℜ m,Ais full row rank
0̸=a∈ ℜ
n b∈ ℜ
∆ n [z−µ ∗e]+ whereµ ∗ ∈ ℜsatisfiese T[z−µ ∗e]+ = 1
a,b
H ∩Box[l,u] PBox[l,u](z−µ ∗a)whereµ ∗ ∈ ℜsatisfies 0̸=a∈ ℜ
n b∈ℜ
H−a,b ∩Box[l,u] aTPBox[l,u](z−µ ∗a) =b PBox[l,u](z) aTPBox[l,u](z)≤b PBox[l,u](z−λ ∗a) aTPBox[l,u](z)>b whereλ ∗ ∈ℜsatisfies aTPBox[l,u](z−λ ∗a) =b &λ ∗>0 = ̸ a∈ ℜ
n b∈ ℜ
B∥.∥1[0,α] z ∥z∥1≤α [z−λ ∗e]+ ⊙sign(z) ∥z∥1>α whereλ ∗ >0,& [z−λ ∗e]+ ⊙sign(z) =α α>0
October 26, 2018 266 / 429
October 26, 2018 267 / 429
x∈D
i(x) = 0i= 1,2,...,m
grad f has a non-zero component perpendicular to gradient of g1 reduce the value of f by moving Moving perpendicular to grad g1 = in negative of non-zero compon => g1(x) = 0 remains perpendicular to grad g1 Goal: We should not be able to reduce the value of f while still honoring g1(x) = 0 x'
October 26, 2018 268 / 429
All this shows that there cannot be a localminimum at x' Note that a lot of the analysis that follows does not even assume convexity Necessary conditions often do NOT require c
v e
x
i
t y
x∈D
i(x) = 0i= 1,2,...,m
October 26, 2018 268 / 429
gradient of f along that direction
lie along the perpendicular to gradient of g1(x) at that point <==> but = 0!! ==> If we try to decrease value of f, we will land up increasing/decreasing g1 (unacceptable) ==> If we move along perpendicular to gradient of g1, no change expected in f SO gradients of f and g being in same/opposite directions is necessary condition for local minimum/maximum
October 26, 2018 269 / 429
gradient f(x*) proportional to gradient g1(x*)
October 26, 2018 269 / 429
October 26, 2018 269 / 429
The gradient of the Lagrange function wrt x* and lambda* should vanish as a necessary condition for optimum at x*,lambda*
October 26, 2018 270 / 429
n+1 with its last component
minima offminimize the Lagrangian globally.
▶ Lis unbounded: given a pointxthat doesn’t lie on the constraint, lettingλ→±∞makesL
arbitrarily large or small.(General property of linear functions - here linearity in lambda)
▶ However, under certain stronger assumptions, if thestrong Lagrangian principleholds, the October 26, 2018 270 / 429
A bit later
i(x)at any pointxand letS ⊥ be its orthogonal
⊥.
Moving perpendicular to S ==> all constraints remain satisified. ===> At an optimal point x*, we should not be able to move perpendicular to S while reducing the value off ===> Gradient of cannot have any component along perpendicular to S ===> f MUST lie in S
October 26, 2018 271 / 429
i(x)at any pointxand letS ⊥ be its orthogonal
⊥.
⊥ = 0(i.e., no
i such that∇f(x ∗) =λ i∇gi(x∗).
October 26, 2018 271 / 429
i
m
i=1
October 26, 2018 272 / 429
INACTIVE CONSTRAINT ==> g1(x*) < 0
October 26, 2018 273 / 429
gradient of f(x*) and gradient of g(x*) are in same space.. (active case is exactly the same as that of equality constrained optimization)
We have a problem: It is fine to reduce f while reducing g1 ==> It is fine to move in negative gradient f(x*) if that also has a component in negative gradient g1(x*)
October 26, 2018 273 / 429
1, an additional
lambda1 >= 0
October 26, 2018 273 / 429
1,
the gradients of the Lagrangian vanish, and
1 2 λg1(x∗) = 0.
wrt x* only (complementary slackness)
October 26, 2018 274 / 429
x∈D
i(x)≤0i= 1,2,...,m
1
i
∇fmust lie in the space spanned by the∇g ’s,
2
m
∑
i i
if the Lagrangian isL=f+ λ g , then wemust
i ≥0, i=1
also haveλ ∀i(since otherwisefcould be reduced by moving into the feasible region).
October 26, 2018 275 / 429
j m
i i j
m
i=1 i i
i=1
igi(x) = 0(71)
Gradeint is wrt x*
October 26, 2018 276 / 429
Eg: Take g1 and see if gradient f(x*) + lambda1* gradient g1(x*) = 0 for some lambda1* and x* If yes, then we have satisified the necessary condition as discussed on the board
October 26, 2018 277 / 429
October 26, 2018 277 / 429
x∈D
x∈C
x f(x)+IC(x),whereI C(x) =I{x∈C}=
C C
n T
T
October 26, 2018 278 / 429