Lagrange Function and KKT Conditions October 26, 2018 265 / 429

How do you compute the table of Orthogonal Projections? 1 1 || x − z || || x − z || 2 2 + I ( x ) =argmin P C ( z ) = prox I ( z ) =argmin C x ∈ C 2 t 2 t C x Set C = For t = 1, P C ( z ) = Assumptions ℜ n [ z ] + + P ( z ) = min { max { z , l } , u } l i ≤ u i Box [ l , u ] C i i i i r ( z − c ) ∥ . ∥ 2 ball, centre c ∈ ℜ n & radius r > 0 Ball [ c , r ] c + max T {∥ z − c ∥ 2 , r } { x | A x = b } z − A ( AA T ) − 1 ( A z − b ) A ∈ ℜ m × n , b ∈ ℜ m , A is full row rank [ a x − b ] + T { x | a T x ≤ b } z − 0 ̸ = a ∈ ℜ n b ∈ ℜ ∥ a ∥ 2 [ z − µ ∗ e ] + where µ ∗ ∈ ℜ satisfies e T [ z − µ ∗ e ] + = 1 ∆ n P Box [ l , u ] ( z − µ ∗ a ) where µ ∗ ∈ ℜ satisfies ∩ Box [ l , u ] 0 ̸ = a ∈ ℜ n b ∈ℜ H a , b a T P Box [ l , u ] ( z − µ ∗ a ) = b a T P Box [ l , u ] ( z ) ≤ b P Box [ l , u ] ( z ) H − a , b ∩ Box [ l , u ] P Box [ l , u ] ( z − λ ∗ a ) a ∈ ℜ ̸ n b ∈ ℜ a T P Box [ l , u ] ( z ) > b 0 = where λ ∗ a T P Box [ l , u ] ( z − λ ∗ a ) = b & λ ∗ > 0 ∈ℜ satisfies ∥ z ∥ 1 ≤ α z [ z − λ ∗ e ] + ⊙ sign ( z ) ∥ z ∥ 1 > α B ∥ . ∥ 1 [0 , α ] α > 0 where λ ∗ > 0 , & [ z − λ ∗ e ] + ⊙ sign ( z ) = α October 26, 2018 266 / 429

Lagrange Function and Necessary KKT Conditions Can the Lagrange Multiplier construction be generalized to always find optimal solutions to a minimization problem? Instead of the iterative path again, assume everything can be computed analytically Attributed to the mathematician Lagrange (born in1736in Turin). Largely worked on mechanics, the calculus of variations, probability, group theory, and number theory. Credited with the choice of base 10 for the metric system (rather than12). October 26, 2018 267 / 429

Lagrange Function and Necessary KKT Conditions Note that a lot of the analysis that follows does not even assume convexity Necessary conditions often do NOT require c o n v e x i t y C o n s i d e r the equality constrainedminimization problem (with D⊆ ℜ n ) min f ( x ) x ∈ D (67) grad f has a x' subject to g i ( x ) = 0 i = 1 , 2 ,..., m non-zero component perpendicular to gradient of g1 The figure shows some level curves of the function f and of a single constraint function g 1 (dotted lines) All this shows The gradient of the constraint ∇ g 1 is not parallel to that there cannot be a localminimum the gradient ∇ f of the function at f = 10 . 4; it is at x' therefore possible to reduce the value of f by moving in negative of non-zero compon Moving perpendicular to grad g1 = => g1(x) = 0 remains perpendicular to grad g1 Goal: We should not be able to reduce the value of f while still honoring g1(x) = 0 October 26, 2018 268 / 429

Lagrange Function and Necessary KKT Conditions Consider the equality constrained minimization problem (with D⊆ ℜ n ) min f ( x ) x ∈ D (67) subject to g i ( x ) = 0 i = 1 , 2 ,..., m The figure shows some level curves of the function f and of a single constraint function g 1 (dotted lines) The gradient of the constraint ∇ g 1 is not parallel to the gradient ∇ f of the function at f = 10 . 4; it is therefore possible to move along the constraint surface so as to further reduce f . October 26, 2018 268 / 429

Lagrange Function and Necessary KKT Conditions However, ∇ g 1 and ∇ f are pa t f = 10 . 3, and rallel a any motion along g 1 ( x ) = 0will lie along the perpendicular to gradient of g1(x) gradient of f along that at that point <==> but direction = 0!! ==> If we try to decrease value of f, we will land up increasing/decreasing g1 (unacceptable) ==> If we move along perpendicular to gradient of g1, no change expected in f SO gradients of f and g being in same/opposite directions is necessary condition for local minimum/maximum October 26, 2018 269 / 429

Lagrange Function and Necessary KKT Conditions However, ∇ g 1 and ∇ f are parallel at f = 10 . 3, and any motion along g 1 ( x ) = 0will leave f unchanged . Hence, at the solution x ∗ , gradient f(x*) proportional to gradient g1(x*) October 26, 2018 269 / 429

Lagrange Function and Necessary KKT Conditions However, ∇ g 1 and ∇ f are parallel at f = 10 . 3, and any motion along g 1 ( x ) = 0will leave f unchanged . Hence, at the solution x ∗ , ∇ f ( x ∗ )must be proportional to −∇ g 1 ( x ∗ ), yielding, ∇ f ( x ∗ ) = − λ ∇ g 1 ( x ∗ ), for some constant λ ∈ ℜ ; λ is called a Lagrange multiplier . Often λ itself need never be computed and therefore often qualified as the undetermined lagrange multiplier. October 26, 2018 269 / 429

Lagrange Function and Necessary KKT Conditions The necessary condition for an optimum at x ∗ for the optimization problem in (68) with m = 1can be stated as in (68); the gradient is now in The gradient of the Lagrange function wrt x* and lambda* should vanish as a necessary condition for optimum at x*,lambda* October 26, 2018 270 / 429

Lagrange Function and Necessary KKT Conditions The necessary condition for an optimum at x ∗ for the optimization problem in (68) with m = 1can be stated as in (68); the gradient is now in ℜ n +1 with its last component being a partial derivative with respect to λ . ∇ L ( x ∗ , λ ∗ ) = ∇ f ( x ∗ ) + λ ∗ ∇ g 1 ( x ∗ ) = 0 (68) g i ( x ∗ )= 0 The solutions to (68) are the stationary points of the lagrangian L ; they are not necessarily local extrema of L . ▶ L is unbounded: given a point x that doesn’t lie on the constraint, letting λ →±∞ makes L arbitrarily large or small. (General property of linear functions - here linearity in lambda) ▶ However, under certain stronger assumptions, if the strong Lagrangian principle holds, the minima of f minimize the Lagrangian globally. A bit later October 26, 2018 270 / 429

Lagrange Function and Necessary KKT Conditions Let us extend the necessary condition for optimality of a minimization problem with single constraint to minimization problems with multiple equality constraints ( i.e. , m > 1. in (67)). Let S be the subspace spanned by ∇ g i ( x )at any point x and let S ⊥ be its orthogonal complement. Let( ∇ f ) ⊥ be the component of ∇ f inthe subspace S ⊥ . Moving perpendicular to S ==> all constraints remain satisi fi ed. ===> At an optimal point x*, we should not be able to move perpendicular to S while reducing the value off ===> Gradient of cannot have any component along perpendicular to S ===> f MUST lie in S October 26, 2018 271 / 429

Lagrange Function and Necessary KKT Conditions Let us extend the necessary condition for optimality of a minimization problem with single constraint to minimization problems with multiple equality constraints ( i.e. , m > 1. in (67)). Let S be the subspace spanned by ∇ g i ( x )at any point x and let S ⊥ be its orthogonal complement. Let( ∇ f ) ⊥ be the component of ∇ f inthe subspace S ⊥ . At any solution x ∗ , it must be true that the gradient of f has( ∇ f ) ⊥ = 0( i.e. , no components that are perpendicular to all of the ∇ g i ), because otherwise you could move x ∗ a little in that direction (or in the opposite direction) to increase (decrease) f without changing any of the g i , i.e . without violating any constraints. Hence for multiple equality constraints, it must be true that at the solution x ∗ , the space S contains the vector ∇ f , i.e. , there are some constants λ i such that ∇ f ( x ∗ ) = λ i ∇ g i ( x ∗ ). October 26, 2018 271 / 429

Lagrange Multipliers with Inequality Constraints We also need to impose that the solution is on the correct constraint surface ( i.e. , ∀ i ). In the same manner as in the case of m = 1, this can be encapsulated by g = 0 , i m ∑ introducing the Lagrangian L ( x , λ ) = f ( x ) + λ i g i ( x ), whose gradient with respect to i =1 both x , and λ vanishes at the solution. This gives us the following necessary condition for optimality of (67): October 26, 2018 272 / 429

Lagrange Multipliers with Inequality Constraints Single equality constraint g 1 ( x ) = 0, replaced with a single inequality constraint g 1 ( x ) ≤ 0. The entire region labeled g 1 ( x ) ≤ 0in the Figure becomes feasible. At the solution x ∗ , if g 1 ( x ∗ ) = 0, i.e. , if the constraint is active, we must have gradient of f(x*) and gradient of g(x*) are in same space.. (active case is exactly the same as that of equality constrained optimization) INACTIVE CONSTRAINT ==> g1(x*) < 0 October 26, 2018 273 / 429

Lagrange Multipliers with Inequality Constraints Single equality constraint g 1 ( x ) = 0, replaced with a single inequality constraint g 1 ( x ) ≤ 0. The entire region labeled g 1 ( x ) ≤ 0in the Figure becomes feasible. At the solution x ∗ , if g 1 ( x ∗ ) = 0, i.e. , if the constraint is active, we must have (as in the case of a single equality constraint) that ∇ f is parallel to ∇ g 1 , by the same argument as before. Additionally, necessary for the two gradients to point in opposite directions We have a problem: It is fi ne to reduce f while reducing g1 ==> It is fi ne to move in negative gradient f(x*) if that also has a component in negative gradient g1(x*) October 26, 2018 273 / 429

Lagrange Function and KKT Conditions October 26, 2018 265 / 429 - PowerPoint PPT Presentation

Lagrange Function and KKT Conditions October 26, 2018 265 / 429 How do you compute the table of Orthogonal Projections? 1 1 || x z || || x z || 2 2 + I ( x ) =argmin P C ( z ) = prox I ( z ) =argmin C x C 2 t 2 t C x Set C =

Duality (I) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline The Lagrange Dual

Section 10 Cosets and the Theorem of Lagrange Instructor: Yifan Yang Fall 2006 Instructor:

Constrained Nonlinear Optimization Moritz Diehl & S ebastien Gros S. Gros, M. Diehl 1 /

Lagrange and Legendre singularities related to support function Ricardo Uribe-Vargas Goryunov

Lecture 08: Ridge Regression, Equivalent Formulations and KKT Conditions Instructor: Prof. Ganesh

KKT conditions I Lecture 14 ME EN 575 Andrew Ning aning@byu.edu Outline Equality Constraints

Euler, Lagrange, Ritz, Brachystochrone Euler Lagrange Galerkin, Courant, Clough: Ritz Chladni

Introduction to Machine Learning - CS725 Instructor: Prof. Ganesh Ramakrishnan Lecture 13 - KKT

From trajectory optimization to inverse KKT and sequential manipulation Marc Toussaint Machine

Inverse KKT - Learning Cost functions of Manipulation from Demonstration Englert, P., Vien, N.

Trim Conditions Trim Conditions Trim Conditions Trim Conditions 1 1 VPC Trim Screen VPC Trim

Today's Specials Detailed look at Lagrange Multipliers Forward-Backward and Viterbi

MICE Demonstration of Ionization Cooling Update JB. Lagrange, J. Pasternak on behalf of the

Lagrange Approach 1 Basilio Bona DAUIN Politecnico di Torino Semester 1, 2015-16 B.

Connected Working Spaces: modelling in the digital age Jean-baptiste Lagrange LDAR University

nuSTORM RFFAG solution JB. Lagrange, J. Pasternak Imperial College, UK/Fermilab, USA 1 Outline

Non-Smooth Convex Optimization in Data Sciences Jalal Fadili Normandie Universit-ENSICAEN,

Probing correla4ons in A=3 systems using electron scaEering Reynier Cruz Torres Hall A/C

I see a cookie banner Is it even legal? Nataliia Bielova and Cristiana Santos joint work with

Complexity of Composite Optimization Guanghui (George) Lan University of Florida Georgia

Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus

A Modular Approach to MaxSAT Modulo Theories Alessandro Cimatti 1 , Alberto Griggio 1 , Bastiaan

PROMETHEE-compatible presentations of multicriteria evaluation tables Karim Lidouh , Anh Vu Doan

GARCH models without positivity constraints: Exponential or Log GARCH ? C. Francq, O.