Introduction to Convex Optimization for Machine Learning John Duchi - PowerPoint PPT Presentation

Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 1 / 53

Outline What is Optimization Convex Sets Convex Functions Convex Optimization Problems Lagrange Duality Optimization Algorithms Take Home Messages Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 2 / 53

What is Optimization What is Optimization (and why do we care?) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 3 / 53

What is Optimization What is Optimization? ◮ Finding the minimizer of a function subject to constraints: minimize f 0 ( x ) x s.t. f i ( x ) ≤ 0 , i = { 1 , . . . , k } h j ( x ) = 0 , j = { 1 , . . . , l } Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 4 / 53

What is Optimization What is Optimization? ◮ Finding the minimizer of a function subject to constraints: minimize f 0 ( x ) x s.t. f i ( x ) ≤ 0 , i = { 1 , . . . , k } h j ( x ) = 0 , j = { 1 , . . . , l } ◮ Example: Stock market. “Minimize variance of return subject to getting at least $50.” Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 4 / 53

What is Optimization Why do we care? Optimization is at the heart of many (most practical?) machine learning algorithms. ◮ Linear regression: � Xw − y � 2 minimize w ◮ Classification (logistic regresion or SVM): n � 1 + exp( − y i x T � � minimize log i w ) w i =1 n or � w � 2 + C � ξ i s.t. ξ i ≥ 1 − y i x T i w, ξ i ≥ 0 . i =1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 5 / 53

What is Optimization We still care... ◮ Maximum likelihood estimation: n � maximize log p θ ( x i ) θ i =1 ◮ Collaborative filtering: 1 + exp( w T x i − w T x j ) � � � minimize log w i ≺ j ◮ k -means: k � � � x i − µ j � 2 minimize J ( µ ) = µ 1 ,...,µ k j =1 i ∈ C j ◮ And more (graphical models, feature selection, active learning, control) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 6 / 53

What is Optimization But generally speaking... We’re screwed. ◮ Local (non global) minima of f 0 ◮ All kinds of constraints (even restricting to continuous functions): h ( x ) = sin(2 πx ) = 0 250 200 150 100 50 0 −50 3 2 3 1 2 0 1 0 −1 −1 −2 −2 −3 −3 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 7 / 53

What is Optimization But generally speaking... We’re screwed. ◮ Local (non global) minima of f 0 ◮ All kinds of constraints (even restricting to continuous functions): h ( x ) = sin(2 πx ) = 0 250 200 150 100 50 0 −50 3 2 3 1 2 0 1 0 −1 −1 −2 −2 −3 −3 ◮ Go for convex problems! Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 7 / 53

Convex Sets Convex Sets Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 8 / 53

Convex Sets Definition A set C ⊆ R n is convex if for x, y ∈ C and any α ∈ [0 , 1] , αx + (1 − α ) y ∈ C. y x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 9 / 53

Convex Sets Examples ◮ All of R n (obvious) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 10 / 53

Convex Sets Examples ◮ All of R n (obvious) ◮ Non-negative orthant, R n + : let x � 0 , y � 0 , clearly αx + (1 − α ) y � 0 . Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 10 / 53

Convex Sets Examples ◮ All of R n (obvious) ◮ Non-negative orthant, R n + : let x � 0 , y � 0 , clearly αx + (1 − α ) y � 0 . ◮ Norm balls: let � x � ≤ 1 , � y � ≤ 1 , then � αx + (1 − α ) y � ≤ � αx � + � (1 − α ) y � = α � x � + (1 − α ) � y � ≤ 1 . Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 10 / 53

Convex Sets Examples ◮ Affine subspaces: Ax = b , Ay = b , then A ( αx + (1 − α ) y ) = αAx + (1 − α ) Ay = αb + (1 − α ) b = b. 1 0.8 0.6 0.4 x 3 0.2 0 −0.2 −0.4 1 0.8 1 0.6 0.8 0.6 0.4 0.4 0.2 0.2 x 2 0 0 x 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 11 / 53

Convex Sets More examples ◮ Arbitrary intersections of convex sets: let C i be convex for i ∈ I , C = � i C i , then x ∈ C, y ∈ C ⇒ αx + (1 − α ) y ∈ C i ∀ i ∈ I so αx + (1 − α ) y ∈ C . Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 12 / 53

Convex Sets More examples ◮ PSD Matrices, a.k.a. the positive semidefinite cone S n + ⊂ R n × n . A ∈ S n + means x T Ax ≥ 0 for all x ∈ R n . For 1 A, B ∈ S + n , 0.8 0.6 x T ( αA + (1 − α ) B ) x z 0.4 = αx T Ax + (1 − α ) x T Bx ≥ 0 . 0.2 0 1 0.5 1 ◮ On right: 0.8 0 0.6 0.4 −0.5 0.2 y −1 0 x �� x � � z S 2 x, y, z : x ≥ 0 , y ≥ 0 , xy ≥ z 2 � � + = � 0 = z y Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 13 / 53

Convex Functions Convex Functions Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 14 / 53

Convex Functions Definition A function f : R n → R is convex if for x, y ∈ dom f and any α ∈ [0 , 1] , f ( αx + (1 − α ) y ) ≤ αf ( x ) + (1 − α ) f ( y ) . f ( y ) αf ( x ) + (1 - α ) f ( y ) f ( x ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 15 / 53

Convex Functions First order convexity conditions Theorem Suppose f : R n → R is differentiable. Then f is convex if and only if for all x, y ∈ dom f f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) f ( y ) f ( x ) + ∇ f ( x ) T ( y - x ) ( x, f ( x )) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 16 / 53

Convex Functions Actually, more general than that Definition The subgradient set , or subdifferential set, ∂f ( x ) of f at x is g : f ( y ) ≥ f ( x ) + g T ( y − x ) for all y � � ∂f ( x ) = . f ( y ) Theorem f : R n → R is convex if and only if it has non-empty ( x, f ( x )) subdifferential set everywhere. f ( x ) + g T ( y - x ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 17 / 53

Convex Functions Second order convexity conditions Theorem Suppose f : R n → R is twice differentiable. Then f is convex if and only if for all x ∈ dom f , ∇ 2 f ( x ) � 0 . 10 8 6 4 2 0 2 1 2 1 0 0 −1 −1 −2 −2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 18 / 53

Convex Functions Convex sets and convex functions Definition The epigraph of a function f is the epi f set of points epi f = { ( x, t ) : f ( x ) ≤ t } . ◮ epi f is convex if and only if f is convex. a ◮ Sublevel sets, { x : f ( x ) ≤ a } are convex for convex f . Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 19 / 53

Convex Functions Examples Examples ◮ Linear/affine functions: f ( x ) = b T x + c. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 20 / 53

Convex Functions Examples Examples ◮ Linear/affine functions: f ( x ) = b T x + c. ◮ Quadratic functions: f ( x ) = 1 2 x T Ax + b T x + c for A � 0 . For regression: 1 2 � Xw − y � 2 = 1 2 w T X T Xw − y T Xw + 1 2 y T y. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 20 / 53

Convex Functions Examples More examples ◮ Norms (like ℓ 1 or ℓ 2 for regularization): � αx + (1 − α ) y � ≤ � αx � + � (1 − α ) y � = α � x � + (1 − α ) � y � . Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 21 / 53

Convex Functions Examples More examples ◮ Norms (like ℓ 1 or ℓ 2 for regularization): � αx + (1 − α ) y � ≤ � αx � + � (1 − α ) y � = α � x � + (1 − α ) � y � . ◮ Composition with an affine function f ( Ax + b ) : f ( A ( αx + (1 − α ) y ) + b ) = f ( α ( Ax + b ) + (1 − α )( Ay + b )) ≤ αf ( Ax + b ) + (1 − α ) f ( Ay + b ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 21 / 53

Convex Functions Examples More examples ◮ Norms (like ℓ 1 or ℓ 2 for regularization): � αx + (1 − α ) y � ≤ � αx � + � (1 − α ) y � = α � x � + (1 − α ) � y � . ◮ Composition with an affine function f ( Ax + b ) : f ( A ( αx + (1 − α ) y ) + b ) = f ( α ( Ax + b ) + (1 − α )( Ay + b )) ≤ αf ( Ax + b ) + (1 − α ) f ( Ay + b ) ◮ Log-sum-exp (via ∇ 2 f ( x ) PSD): � n � � f ( x ) = log exp( x i ) i =1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 21 / 53

Convex Functions Examples Important examples in Machine Learning 3 ◮ SVM loss: [1 - x ] + 1 − y i x T � � f ( w ) = i w + ◮ Binary logistic loss: log(1 + e x ) 1 + exp( − y i x T � � f ( w ) = log i w ) 0 −2 3 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 22 / 53

Introduction to Convex Optimization for Machine Learning John Duchi - PowerPoint PPT Presentation

Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 1 / 53 Outline What is

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

16. Review of convex optimization Convex sets and functions Convex programming models

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Convex Programs COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Convex

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

Introduction to Linear Programming Notation and Modeling Marco Chiarandini Department of

Problems With Notation Mathematical notation is very precise. This contrasts with both oral

Mathematical String Notation 7 January 2019 OSU CSE 1 String Theory A mathematical model

State-Federal RPS Collaborative and ESTAP Webinar Californias Energy Storage Mandate Hosted by

Notation (1) is the space of all possible trees (and model parameters)

Reminder of Notation Language is always L NT = (0 , S, + , , E, < ). N is the natural numbers

Functional Notation and Lazy Evaluation in Ciao Amadeo Casas 1 Daniel Cabeza 2 Manuel Hermenegildo

Lecture 2.3: Symmetric and alternating groups Matthew Macauley Department of Mathematical

Introduction to Convex Optimization for Machine Learning John Duchi - PowerPoint PPT Presentation

Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 1 / 53 Outline What is

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

16. Review of convex optimization Convex sets and functions Convex programming models

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Convex Programs COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Convex

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

Introduction to Linear Programming Notation and Modeling Marco Chiarandini Department of

Problems With Notation Mathematical notation is very precise. This contrasts with both oral

Mathematical String Notation 7 January 2019 OSU CSE 1 String Theory A mathematical model

State-Federal RPS Collaborative and ESTAP Webinar Californias Energy Storage Mandate Hosted by

Notation (1) is the space of all possible trees (and model parameters)

Reminder of Notation Language is always L NT = (0 , S, + , , E, &lt; ). N is the natural numbers

Functional Notation and Lazy Evaluation in Ciao Amadeo Casas 1 Daniel Cabeza 2 Manuel Hermenegildo

Lecture 2.3: Symmetric and alternating groups Matthew Macauley Department of Mathematical

Reminder of Notation Language is always L NT = (0 , S, + , , E, < ). N is the natural numbers