16 review of convex optimization
play

16. Review of convex optimization Convex sets and functions Convex - PowerPoint PPT Presentation

CS/ECE/ISyE 524 Introduction to Optimization Spring 201718 16. Review of convex optimization Convex sets and functions Convex programming models Network flow problems Least squares problems Regularization and tradeoffs


  1. CS/ECE/ISyE 524 Introduction to Optimization Spring 2017–18 16. Review of convex optimization ❼ Convex sets and functions ❼ Convex programming models ❼ Network flow problems ❼ Least squares problems ❼ Regularization and tradeoffs ❼ Duality Laurent Lessard (www.laurentlessard.com)

  2. Convex sets A set C ⊆ R n is convex if for all x , y ∈ C and all 0 ≤ α ≤ 1, we have: α x + (1 − α ) y ∈ C . ❼ every line segment must be contained in the set ❼ can include boundary or not ❼ can be finite or not y y C C x x convex set nonconvex set 16-2

  3. Examples 1. Polyhedron ❼ A linear inequality a T i x ≤ b i is a halfspace . ❼ Intersections of halfspaces form a polyhedron: Ax ≤ b . Halfspace in 3D Polyhedron in 3D. 16-3

  4. Examples 2. Ellipsoid ❼ A quadratic form looks like: x T Qx ❼ If Q ≻ 0 (positive definite; all eigenvalues positive), then the set of x satisfying x T Qx ≤ b is an ellipsoid . Ellipsoid 16-4

  5. Examples 3. Second-order cone constraint ❼ The set of points satisfying � Ax + b � ≤ c T x + d is called a second-order cone constraint . ❼ Example: robust linear programming 1.5 1.0 0.5 - 1.5 - 1.0 - 0.5 0.5 1.0 1.5 2.0 - 0.5 - 1.0 - 1.5 Constraints a T i x + ρ � x � ≤ b i Second order cone: � x � ≤ y 16-5

  6. Convex functions A function f : D → R is a convex function if: 1. the domain D ⊆ R n is a convex set 2. for all x , y ∈ D and 0 ≤ α ≤ 1, the function f satisfies: f ( α x + (1 − α ) y ) ≤ α f ( x ) + (1 − α ) f ( y ) f ( x ) f ( x ) ❼ any line segment 3 3 joining points of f y 2 2 lies above f . y 1 x 1 x ❼ f is continuous, not x x necessarily smooth - 1 1 2 3 4 - 1 1 2 3 4 Convex function Nononvex function ❼ f is concave if − f is convex. 16-6

  7. Convex programs minimize f 0 ( x ) x ∈ D subject to: f i ( x ) ≤ 0 for i = 1 , . . . , m h j ( x ) = 0 for j = 1 , . . . , r ❼ the domain is the set D ❼ the cost function is f 0 ❼ the inequality constraints are the f i for i = 1 , . . . , m . ❼ the equality constraints are the h j for j = 1 , . . . , r . ❼ feasible set : the x ∈ D satisfying all constraints. A model is convex if D is a convex set, all the f i are convex functions, and the h j are affine functions (linear + constant) 16-7

  8. Examples 1. Linear program (LP) ❼ cost is affine ❼ all constraints are affine ❼ can be maximization or minimization Important properties 1.5 ❼ feasible set is a polyhedron ❼ can be optimal, infeasible, 1.0 or unbounded ❼ optimal point occurs at a 0.5 vertex 0.5 1.0 1.5 2.0 16-8

  9. Examples 2. Convex quadratic program (QP) ❼ cost is a convex quadratic ❼ all constraints are affine ❼ must be a minimization Important properties 1.5 ❼ feasible set is a polyhedron ❼ optimal point occurs on 1.0 boundary or in interior 0.5 0.5 1.0 1.5 2.0 16-9

  10. Examples 3. Convex quadratically constrained QP (QCQP) ❼ cost is convex quadratic ❼ inequality constraints are convex quadratics ❼ equality constraints are affine Important properties 1.5 ❼ feasible set is an intersection of ellipsoids 1.0 ❼ optimal point occurs on boundary or in interior 0.5 0.5 1.0 1.5 2.0 16-10

  11. Examples 4. Second-order cone program (SOCP) ❼ cost is affine ❼ inequality constraints are second-order cone constraints ❼ equality constraints are affine Important properties ❼ feasible set is convex ❼ optimal point occurs on boundary or in interior 16-11

  12. Hierarchy of complexity From simplest to most complicated: 1. linear program 2. convex quadratic program 3. convex quadratically constrained quadratic program 4. second-order cone program 5. semidefinite program 6. general convex program Important notes ❼ more complicated just means that e.g. every LP is a SOCP (by setting appropriate variables to zero), but a general SOCP cannot be expressed as an LP. ❼ in general: strive for the simplest model possible 16-12

  13. Network flow problems 1 6 3 5 8 2 4 7 ❼ Each edge ( i , j ) ∈ E has a flow x ij ≥ 0. ❼ Each edge has a transportation cost c ij . ❼ Each node i ∈ N is: a source if b i > 0, a sink if b i < 0, or a relay if b i = 0. The sum of flows entering i must equal b i . ❼ Find the flow that minimizes total transportation cost while satisfying demand at each node. 16-13

  14. Network flow problems 1 6 3 5 8 2 4 7 ❼ Capacity constraints : p ij ≤ x ij ≤ q ij ∀ ( i , j ) ∈ E . ❼ Balance constraint : � j ∈N x ij = b i ∀ i ∈ N . ❼ Minimize total cost : � ( i , j ) ∈E c ij x ij We assume � i ∈N b i = 0 (balanced graph). Otherwise, add a dummy node with no cost to balance the graph. 16-14

  15. Network flow problems 1 6 3 5 8 2 4 7 Expanded form: x 13   x 23 1 0 0 0 0 0 0 0 0 0 0    b 1  x 24   0 1 1 0 0 0 0 0 0 0 0 b 2  x 35  − 1 − 1 0 1 1 0 0 0 0 0 0 b 3       x 36 0 − 1 0 0 0 1 0 0 0 0 0 b 4       = x 45       0 0 0 − 1 0 − 1 1 1 0 0 0 b 5 x 56       0 0 0 0 − 1 0 − 1 0 1 1 0 b 6       x 57 0 0 0 0 0 0 0 − 1 − 1 0 1 b 7   x 67 0 0 0 0 0 0 0 0 0 − 1 − 1   b 8 x 68 A = incidence matrix x 78 16-15

  16. Integer solutions c T x minimize x subject to: Ax = b p ≤ x ≤ q ❼ If A is a totally unimodular matrix then if demands b i and capacities q ij are integers, the flows x ij are integers. ❼ All incidence matrices are totally unimodular. 16-16

  17. Examples ❼ Transportation problem: each node is a source or a sink ❼ Assignment problem: transportation problem where each source has supply 1 and each sink has demand 1. ❼ Transshipment problem: like a transportation problem, but it also has relay nodes (warehouses) ❼ Shortest path problem: single source, single sink, and the edge costs are the path lengths. ❼ Max-flow problem: single source, single sink. Add a feedback path with − 1 cost and minimize the cost. 16-17

  18. Least squares ❼ We want to solve Ax = b where A ∈ R m × n . ❼ Typical case of interest: m > n (overdetermined). If there is no solution to Ax = b we try instead to have Ax ≈ b . ❼ The least-squares approach: make Euclidean norm � Ax − b � as small as possible. Standard form: � 2 � � minimize � Ax − b x It’s an unconstrained convex QP. 16-18

  19. Example: curve-fitting ❼ We are given noisy data points ( x i , y i ). ❼ We suspect they are related by y = px 2 + qx + r ❼ Find the p , q , r that best agrees with the data. Writing all the equations: y 1 ≈ px 2 1 + qx 1 + r     x 2 1 y 1 x 1 1   p y 2 ≈ px 2 2 + qx 2 + r x 2 y 2 x 2 1     2 = ⇒  ≈ q  .   . . .  . . . . .   .  .   . . .  . r    x 2 1 y m x m y m ≈ px 2 m + qx m + r m ❼ Also called regression . 16-19

  20. Regularization Regularization: Additional penalty term added to the cost function to encourage a solution with desirable properties. Regularized least squares: � Ax − b � 2 + λ R ( x ) minimize x ❼ R ( x ) is the regularizer (penalty function) ❼ λ is the regularization parameter ❼ The model has different names depending on R ( x ). 16-20

  21. Examples � Ax − b � 2 + λ R ( x ) minimize x 1. If R ( x ) = � x � 2 = x 2 1 + x 2 2 + · · · + x 2 n It is called: L 2 regularization , Tikhonov regularization , or Ridge regression depending on the application. It has the effect of smoothing the solution. 2. If R ( x ) = � x � 1 = | x 1 | + | x 2 | + · · · + | x n | It is called: L 1 regularization or LASSO . It has the effect of sparsifying the solution (ˆ x will have few nonzero entries). 3. R ( x ) = � x � ∞ = max {| x 1 | , | x 2 | , . . . , | x n |} It is called L ∞ regularization and it has the effect of equalizing the solution (makes most components equal). 16-21

  22. Tradeoffs ❼ Suppose J 1 = � Ax − b � 2 and J 2 = � Cx − d � 2 . ❼ We would like to make both J 1 and J 2 small. ❼ A sensible approach: solve the optimization problem: minimize J 1 + λ J 2 x where λ > 0 is a (fixed) tradeoff parameter . ❼ Then tune λ to explore possible results. ◮ When λ → 0, we place more weight on J 1 ◮ When λ → ∞ , we place more weight on J 2 16-22

  23. Pareto curve J 2 λ → 0 feasible, but strictly suboptimal P a r e t o - o p t i m a l p o i n t s λ → ∞ infeasible J 1 ❼ Pareto-optimal points can only improve in J 1 at the expense of J 2 or vice versa. 16-23

  24. Example: Min-norm least squares Underdetermined case: A ∈ R m × n is a wide matrix ( m ≤ n ), so Ax = b has infinitely many solutions. ❼ Look to make both � Ax − b � 2 and � x � 2 small � Ax − b � 2 + λ � x � 2 minimize x ❼ In the limit λ → ∞ , we get x = 0 ❼ In the limit λ → 0, we get the min-norm solution: � x � 2 minimize x subject to: Ax = b 16-24

  25. Duality Intuition: Duality is all about finding solution bounds. ❼ If the primal problem is a minimization, all feasible points of the primal are upper bounds on the optimal solution. ❼ The dual problem is a maximization. All feasible points of the dual are lower bounds on the optimal solution. 16-25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend