Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. - PowerPoint PPT Presentation

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright

Convex Optimization- Chapter 1: Introduction mathematical optimization Least-squares and linear programming Convex optimization Nonlinear optimization

Mathematical optimization (mathematical) optimization problem minimize f 0 ( x ) f i ( x ) ≤ b i , i = 1 , . . . , m subject to x = ( x 1 , ..., x n ): optimization variables f 0 : R n → R objective function f i : R n → R , i = 1 , . . . , m constraint functions optimal solution x ∗ has smallest value of f 0 among all vectors that satisfy the constraints

Example portfolio optimization variables: amounts invested in different assets constraints: budget, max./min. investment per asset, minimum return objective: overall risk or return variance data fitting variables: model parameters constraints: prior information, parameter limits objective: measure of misfit or prediction error

Solving optimization problems general optimization problem very difficult to solve methods involve some compromise, e.g., very long computation time, or not always finding the solution exceptions: certain problem classes can be solved efficiently and reliably least-squares problems linear programming problems convex optimization problems

Least-squares || Ax − b || 2 minimize 2 solving least-squares problems analytical solution: x ∗ = ( A T A ) − 1 A T b reliable and efficient algorithms and software computation time proportional to n 2 k ( A ∈ R k × n ); less if structured a mature technology using least-squares least-squares problems are easy to recognize a few standard techniques increase flexibility (e.g., including weights, adding regularization terms)

Linear programming c T x minimize a T i x ≤ b , i = 1 , . . . , m subject to solving linear programs no analytical formula for solution reliable and efficient algorithms and software computation time proportional to n 2 m if m ≥ n ; less with structure a mature technology using linear programming a few standard tricks used to convert problems into linear programs (e.g., problems involving l 1 - or l 2 -norms, piecewise-linear functions)

Chebyshev approximation problem minimize max i =1 ,..., k | a T i x − b i | Approximate linear problem minimize t a T subject to i x − t ≤ b i , i = 1 , . . . , k − a T i x − t ≤ − b i , i = 1 , . . . , k

Convex optimization problem f 0 ( x ) minimize subject to f i ( x ) ≤ b i i = 1 , . . . , m objective and constraint functions are convex: f i ( α x + β y ) ≤ α f i ( x ) + β f i ( y ) if α + β = 1 , α ≥ 0 , β ≥ 0 includes least-squares problems and linear programs as special cases

Convex optimization problem solving convex optimization problems no analytical solution reliable and efficient algorithms computation time (roughly) proportional to max { n 3 , n 2 m , F } , where F is cost of evaluating f i ’s and their first and second derivatives almost a technology using convex optimization often difficult to recognize many tricks for transforming problems into convex form surprisingly many problems can be solved via convex optimization

Nonlinear optimization traditional techniques for general nonconvex problems involve compromises local optimization methods (nonlinear programming) find a point that minimizes f 0 among feasible points near it fast, can handle large problems require initial guess provide no information about distance to (global) optimum global optimization methods find the (global) solution worst-case complexity grows exponentially with problem size these algorithms are often based on solving convex subproblems

Optimization and Machine Learning m 1 � 2 w T w + C minimize w , b ,ξ ξ i i =1 y i ( w T x i + b ) ≥ 1 − ξ i , subject to ξ i ≥ 0 , 1 ≤ i ≤ m . Its dual 1 2 α T YX T XY α − α T 1 minimize α � y i α i = 0 , 0 ≤ α i ≤ C , subject to i Y = Diag ( y 1 , . . . , y m ) X = [ x 1 , . . . , x m ] ∈ R n × m w = � m i =1 α i x i f ( x ) = sgn ( w T x + b )

More powerful classifiers allow kernel. K ij := < φ ( x i ) , φ ( x j ) > is the kernel matrix w = � m i =1 α i φ ( x i ) f ( x ) = sgn [ � m i =1 α i K ( x i , x ) + b ]

Themes of algorithms General techniques for convex quadratic programming have limited appeal (1) large problem size (2) ill-condition Hessian 1 decomposition Rather than computing a step in all components of α at once, these methods focus on a relatively small subset and fix the other components. 2 regularized solutions

decomposition approach Early approach: Works with a subset B ⊂ { 1 , 2 , . . . , s } , whose size is assume to exceed the number of nonzero component of α . Replaces one element of B at each iteration and then re-solves the reduced problem The sequential minimal optimization (SMO): works with just two components of α at each iteration, reducing each QP sub-problem to triviality.

decomposition approach SVM light : Uses a linearization of the objective around the current point to choose the working set B to be the indices most likely to give decent, giving a fixed size limitation on B . Shrinking reduces the workload further by eliminating computation associated with components of α that seem to be at their lower or upper bounds. But the computation is more complex it needs further computational savings. Interior-point method : It is hardly efficient in large problems (duo to ill-conditioning of the kernel matrix) Replace Hessian with a low ranked matrix ( VV T , where V ∈ R m × r for r ≪ m ) Co-ordinate relaxation procedure

regularized solutions 1 Regularized solutions generalized better 2 Regularized solutions provide simplicity. ( w is sparse) minimize w φ γ ( w ) = f ( w ) + γ r ( w ) f ( w ) = � ξ i r ( w ) = 1 2 w T w γ = 1 C trade-off between minimizing mis-classification error and reducing || w || 2

Applications Image denoising: r : total-variation (TV) norm result: large areas of constant intensity (a cartoon like appearance) Matrix completion: W is the matrix variable Regularizer spectral norm : sum of singular values of W This regularizer favors matrices with low rank Lasso procedure l 1 -norm f is least squares

Algorithm 1 Gradient and subgradient methods: w k +1 ← w k − δ k g k This method ensures sub-linear convergence φ γ ( w k ) − φ γ ( w ∗ ) ≤ O ( 1 k 2 )

Algorithm 1 Second approach: ( w − w k ) T ▽ f ( w k ) + γ r ( w ) w k +1 := arg min w + 1 2 µ || w − w k || 2 2 If works well for f with Lipschitz continuous gradient. Sub-linear rate of convergence O ( 1 K ). In special cases O ( 1 K 2 ) Some methods use second-order information.

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. - PowerPoint PPT Presentation

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright Convex Optimization- Chapter 1: Introduction mathematical optimization Least-squares and

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Convex Optimization Lieven Vandenberghe Electrical Engineering Department, UCLA Joint work with

Disciplined Convex Programming Stephen Boyd Michael Grant Electrical Engineering Department,

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization with Abstract Linear Operators Stephen Boyd and Steven Diamond EE & CS

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

Convex Optimization: Modeling and Algorithms Lieven Vandenberghe Electrical Engineering

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Over-parameterized nonlinear learning: Gradient descent follows the shortest path? Samet Oymak

Some Geometrical Considerations James H. Steiger Department of Psychology and Human Development

Workshop 8.2a: Heterogeneity Murray Logan 23 Jul 2016 Section 1 Linear modelling assumptions

Linear algebra A brush-up course Anders Ringgaard Kristensen Slide 1 Outline Real numbers

Session 06 Generalized Linear Models 1 Nature of the generalization Single response variable,

Generalized Linear Factor Models: a local EM estimation Xavier Bry a, Christian Lavergne ab and

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture18: Logistic regression

Linear Models for Classification Greg Mori - CMPT 419/726 Bishop PRML Ch. 4 Discriminant

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. - PowerPoint PPT Presentation

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright Convex Optimization- Chapter 1: Introduction mathematical optimization Least-squares and

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Convex Optimization Lieven Vandenberghe Electrical Engineering Department, UCLA Joint work with

Disciplined Convex Programming Stephen Boyd Michael Grant Electrical Engineering Department,

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization with Abstract Linear Operators Stephen Boyd and Steven Diamond EE &amp; CS

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

Convex Optimization: Modeling and Algorithms Lieven Vandenberghe Electrical Engineering

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Over-parameterized nonlinear learning: Gradient descent follows the shortest path? Samet Oymak

Some Geometrical Considerations James H. Steiger Department of Psychology and Human Development

Workshop 8.2a: Heterogeneity Murray Logan 23 Jul 2016 Section 1 Linear modelling assumptions

Linear algebra A brush-up course Anders Ringgaard Kristensen Slide 1 Outline Real numbers

Session 06 Generalized Linear Models 1 Nature of the generalization Single response variable,

Generalized Linear Factor Models: a local EM estimation Xavier Bry a, Christian Lavergne ab and

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture18: Logistic regression

Linear Models for Classification Greg Mori - CMPT 419/726 Bishop PRML Ch. 4 Discriminant

Convex Optimization with Abstract Linear Operators Stephen Boyd and Steven Diamond EE & CS