cs257 linear and convex optimization
play

CS257 Linear and Convex Optimization Lecture 7 Bo Jiang John - PowerPoint PPT Presentation

CS257 Linear and Convex Optimization Lecture 7 Bo Jiang John Hopcroft Center for Computer Science Shanghai Jiao Tong University October 19, 2020 Recap: Convex Optimization Problem min f ( x ) x s.t. g i ( x ) 0 , i = 1 , 2 , . . . , m h


  1. CS257 Linear and Convex Optimization Lecture 7 Bo Jiang John Hopcroft Center for Computer Science Shanghai Jiao Tong University October 19, 2020

  2. Recap: Convex Optimization Problem min f ( x ) x s.t. g i ( x ) ≤ 0 , i = 1 , 2 , . . . , m h i ( x ) = 0 , i = 1 , 2 , . . . , k 1. f , g i are convex functions 2. h i are affine functions, i.e. h i ( x ) = a T i x − b i Domain. D = dom f ∩ ( � m i = 1 dom g i ) Feasible set. X = { x ∈ D : g i ( x ) ≤ 0 , 1 ≤ i ≤ m ; h i ( x ) = 0 , 1 ≤ i ≤ k } Optimal value. f ∗ = inf x ∈ X f ( x ) Optimal point. x ∗ ∈ X and f ( x ∗ ) = f ∗ , i.e. f ( x ∗ ) ≤ f ( x ) , ∀ x ∈ X First-order optimality condition ∇ f ( x ∗ ) T ( x − x ∗ ) ≥ 0 , ∀ x ∈ X 1/21

  3. Recap: LP General form Standard form Inequality form c T x c T x c T x min min min x x x s.t. Bx ≤ d s.t. Ax = b s.t. Ax ≤ b Ax = b x ≥ 0 Conversion to equivalent problems • introducing slack variables • eliminating equality constraints • epigraph form • representing a variable by two nonnegative variables, x = x + − x − 2/21

  4. Recap: Geometry of LP min − x 1 − 3 x 2 c T x min x x s.t. x 1 + x 2 ≤ 6 s.t. Ax ≤ b − x 1 + 2 x 2 ≤ 8 x 1 , x 2 ≥ 0 x 2 − c x ∗ = ( 4 / 3 , 14 / 3 ) f ( x ) x ∗ − c x 1 x 1 x 2 • optimization of a linear function over a polyhedron • graphic solution of simple LP 3/21

  5. Contents 1. Some Canonical Problem Forms 1.1 QP and QCQP 1.2 Geometric Program 4/21

  6. Quadratic Program (QP) 1 2 x T Qx + c T x min x s.t. Bx ≤ d Ax = b QP is convex iff Q � O . x 2 x 1 −∇ f ( x ∗ ) f ( x ) x ∗ x ∗ −∇ f ( x ∗ ) x 2 x 1 5/21

  7. Quadratically Constrained Quadratic Program (QCQP) 1 2 x T Qx + c T x min x 1 2 x T Q i x + c T s.t. i x + d i ≤ 0 , i = 1 , 2 , . . . , m Ax = b QCQP is convex if Q � O and Q i � O , ∀ i x 2 x 1 −∇ f ( x ∗ ) f ( x ) x ∗ −∇ f ( x ∗ ) x ∗ X x 2 x 1 6/21

  8. Example: Linear Least Squares Regression Given y ∈ R n , X ∈ R n × p , find w ∈ R p s.t. � y − Xw � 2 min 2 w • convex QP with objective f ( w ) = w T X T Xw − 2 y T Xw + y T y Geometrically, we are looking for the orthogonal projection ˆ y of y onto the column space of X . y y − Xw ∗ ˆ y = Xw ∗ O column space of X 7/21

  9. Example: Linear Least Squares Regression (cont’d) By the first-order optimality condition, w ∗ is optimal iff ∇ f ( w ∗ ) = 0 i.e. w ∗ is a solution of the normal equation, X T Xw = X T y Case I. X has full column rank, i.e. rank X = p • X T X ≻ O • unique solution w ∗ = ( X T X ) − 1 X T y Note. In this case, the objective f ( w ) is strictly convex and coercive. f ( w ) ≥ λ min ( X T X ) � w � 2 − 2 � y T X � · � w � + � y � 2 8/21

  10. Example: Linear Least Squares Regression (cont’d) Example. Solve � y − Xw � 2 min 2 w with     2 0 3  ,  . X = y = 0 1 2   0 0 2 Solution. The normal equation is X T Xw = X T y with � 4 � 0 X T X = X T y = ( 6 , 2 ) T , 0 1 Since X has full column rank, w ∗ = ( X T X ) − 1 X T y = ( 1 . 5 , 2 ) T 9/21

  11. Example: Linear Least Squares Regression (cont’d) Case II. rank X = r < p . WLOG assume the first r columns are linearly independent, i.e. X = ( X 1 , X 2 ) where X 1 ∈ R n × r and rank X 1 = r . Claim. There is a solution w ∗ with the last p − r components being 0. • X and X 1 have the same column space • If w ∗ 1 solves w 1 ∈ R r � y − X 1 w 1 � min � w ∗ � then w ∗ = 1 solves min w ∈ R p � y − Xw � 0 • w ∗ 1 = ( X T 1 X 1 ) − 1 X T 1 y Question. Is the solution unique in this case? ⇒ w ∗ + w 0 is also a solution. A. rank X < p = ⇒ ∃ w 0 s.t. Xw 0 = 0 = 10/21

  12. Example: Linear Least Squares Regression (cont’d) Example Solve min w � y − Xw � 2 2 with     2 0 2 3  ,  . X = − 1 y = 0 1 2   0 0 0 2 Solution. Note rank X = 2 < 3 . • Let   2 0 X 1 = 0 1   0 0 • By the previous example, 1 = ( X T 1 X 1 ) − 1 X T 1 y = ( 1 . 5 , 2 ) T w ∗ is a solution to min w 1 ∈ R 2 � y − X 1 w 1 � 2 . • w ∗ = ( 1 . 5 , 2 , 0 ) T is a solution to min w ∈ R 3 � y − Xw � 2 . 11/21

  13. Example: Linear Least Squares Regression (cont’d) Example (cont’d). The normal equation to the original problem is X T Xw = X T y where   4 0 4  , X T X = X T y = ( 6 , 2 , 4 ) T − 1 0 1  − 1 4 5 • Note X T X is not invertible, so we cannot use the formula 1 w ∗ = ( X T X ) − 1 X T y • The solution w ∗ = ( 1 . 5 , 2 , 0 ) T satisfies the normal equation. • The normal equation has infinitely many solutions given by w = ( 1 . 5 , 2 , 0 ) T + α ( − 1 , 1 , 1 ) T , α ∈ R . All of them are solutions to the least squares problem. 1 This formula still applies if we use the so-called pseudo inverse of X T X . 12/21

  14. General Unconstrained QP Minimize quadratic function with Q ∈ R n × n s.t. Q � O , f ( x ) = 1 2 x T Qx + b T x + c min x By first-order condition, solution satisfies ∇ f ( x ) = Qx + b = 0 Case I. Q ≻ O . There is a unique solution x ∗ = − Q − 1 b . Example. n = 2 , Q = diag { 1 , 1 } , b = ( 1 , 0 ) T , c = 0 . � 1 � � x 1 � � x 1 � f ( x ) = 1 0 = 1 1 + 1 2 x 2 2 x 2 2 ( x 1 , x 2 ) + ( 1 , 0 ) 2 + x 1 0 1 x 2 x 2 The first-order condition becomes � 1 � � x 1 � � 1 � � 0 � 0 + = 0 1 x 2 0 0 which yields the unique optimal solution x ∗ = ( − 1 , 0 ) . 13/21

  15. General Unconstrained QP (cont’d) Case II. det Q = 0 and b / ∈ column space of Q . There is no solution, and f ∗ = −∞ . Example. n = 2 , Q = diag { 0 , 1 } , b = ( 1 , 0 ) T , c = 0 . � 0 � � x 1 � � x 1 � f ( x ) = 1 = 1 0 2 x 2 2 ( x 1 , x 2 ) + ( 1 , 0 ) 2 + x 1 0 1 x 2 x 2 The first-order condition becomes � 0 � � x 1 � � 1 � � 0 � 0 + = 0 1 x 2 0 0 which has no solution. 2 + x 1 is unbounded below, so f ∗ = −∞ . It is easy to see that f ( x ) = 1 2 x 2 14/21

  16. General Unconstrained QP (cont’d) Case III. det Q = 0 and b ∈ column space of Q . There are infinitely many solutions. Example. n = 2 , Q = diag { 1 , 0 } , b = ( 1 , 0 ) T , c = 0 . � 1 � � x 1 � � x 1 � f ( x ) = 1 = 1 0 2 x 2 2 ( x 1 , x 2 ) + ( 1 , 0 ) 1 + x 1 0 0 x 2 x 2 The first-order condition becomes � 1 � � x 1 � � 1 � � 0 � 0 + = 0 0 x 2 0 0 which has infinitely many solutions of the form x = ( − 1 , x 2 ) for any x 2 ∈ R 2 , as f is actually independent of x 2 . 15/21

  17. General Unconstrained QP (cont’d) For the general case ( Q is non-diagonal), • Diagonalize Q by an orthogonal matrix U , so Q = U Λ U T where Λ is diagonal. • Let x = Uy and ˜ b = U T b . Then f ( x ) = 1 2 y T U T QUy + b T Uy + c = 1 T y + c � g ( y ) 2 y T Λ y + ˜ b In the expanded form, n � 1 � � i + ˜ 2 λ i y 2 g ( y ) = b i y i + c i = 1 • Minimizing f ( x ) is equivalent to minimizing g ( y ) . We can minimize i + ˜ each term 1 2 λ i y 2 b i y i independently. Exercise. Convince yourself the previous three cases apply to the non-diagonal case. 16/21

  18. Example: Lasso Lasso (Least Absolute Shrinkage and Selection Operator) Given y ∈ R n , X ∈ R n × p , t > 0 , y � y − Xw � 2 min 2 w s . t . � w � 1 ≤ t O • convex problem? yes ˆ y = Xw ∗ column space of X • QP? no, but can be converted to QP • optimal solution exists? yes ◮ compact feasible set • optimal solution unique? ◮ yes if n ≥ p and X has full column rank ( X T X ≻ O , strictly convex) ◮ no in general, e.g. p > n and t is large enough for unconstrained optima to be feasible 17/21

  19. Example: Ridge Regression Given y ∈ R n , X ∈ R n × p , t > 0 , y � y − Xw � 2 min 2 w � w � 2 s . t . 2 ≤ t y = Xw ∗ ˆ • convex problem? yes O • QCQP? yes column space of X • optimal solution exists? yes ◮ compact feasible set • optimal solution unique? ◮ yes if n ≥ p and X has full column rank ( X T X ≻ O , strictly convex) ◮ no in general 18/21

  20. Example: SVM Linearly separable case 1 2 � w � 2 min w , b y i ( w T x i + b ) ≥ 1 , s . t . i = 1 , 2 , . . . , m Soft margin SVM m 1 � 2 � w � 2 min 2 + C ξ i w , b , ξ i = 1 y i ( w T x i + b ) ≥ 1 − ξ i , s . t . i = 1 , 2 , . . . , m ξ ≥ 0 Equivalent unconstrained form n 1 � 2 � w � 2 ( 1 − y i b − y i w T x i ) + min 2 + C w , b i = 1 19/21

  21. Geometric Program ++ = { x ∈ R n : x > 0 } → R of the form A monomial is a function f : R n f ( x ) = γ x a 1 1 x a 2 2 · · · x a n n for γ > 0 , a 1 , . . . , a n ∈ R . A posynomial is a sum of monomials, p � γ k x a k 1 1 x a k 2 2 · · · x a kn f ( x ) = n k = 1 A geometric program (GP) is an optimization problem of the form min f ( x ) x s . t . g i ( x ) ≤ 1 , i = 1 , . . . , m h j ( x ) = 1 , j = 1 , . . . , r where f , g i , i = 1 , . . . , m are posynomials and h j , j = 1 , . . . , r are monomials. The constraint x > 0 is implicit. 20/21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend