 
              Convex Programs COMPSCI 371D — Machine Learning COMPSCI 371D — Machine Learning Convex Programs 1 / 16
Logistic Regression → Support Vector Machines Support Vector Machines (SVMs) and Convex Programs • SVMs are linear predictors in their original form • Defined for both regression and classification • Multi-class versions exist • We will cover only binary SVM classification • Why do we need another linear classifier? • We’ll need some new math: Convex Programs • Optimization of convex functions with affine constraints COMPSCI 371D — Machine Learning Convex Programs 2 / 16
Logistic Regression → Support Vector Machines Outline 1 Logistic Regression → Support Vector Machines 2 Local Convex Minimization → Convex Programs 3 Shape of the Solution Set 4 The Karush-Kuhn-Tucker Conditions COMPSCI 371D — Machine Learning Convex Programs 3 / 16
Logistic Regression → Support Vector Machines Logistic Regression → SVMs • A logistic-regression classifier places the decision boundary somewhere (and approximately) between the two classes • Loss is never zero → Exact location of the boundary can be determined by samples that are very distant from the boundary (even on the correct side of it) • SVMs place the boundary “exactly half-way” between the two classes (with exceptions to allow for non linearly-separable classes) • Only samples close to the boundary matter: These are the support vectors • A “kernel trick” allows going beyond linear classifiers • We only look at the binary case COMPSCI 371D — Machine Learning Convex Programs 4 / 16
Logistic Regression → Support Vector Machines Roadmap for SVMs • SVM training minimizes a convex function with constraints • Convex: Unique minimum risk • Constraints: Define a convex program as minimizing a convex function subject to affine constraints • Representer theorem : The SVM hyperplane normal vector is a linear combination of a subset of training samples ( x n , y n ) . The x n are the support vectors . • The proof of the representer theorem is based on a characterization of the solutions of a convex program • Characterization for an unconstrained problem: ∇ f ( u ) = 0 • Characterization for a convex program: The Karush-Kuhn-Tucker (KKT) conditions • The representer theorem leads to the kernel trick , through which SVMs can be turned into nonlinear classifiers • Decision boundary is no longer necessarily a hyperplane COMPSCI 371D — Machine Learning Convex Programs 5 / 16
Logistic Regression → Support Vector Machines Roadmap Summary Convex program → SVM formulation KKT conditions → representer theorem → kernel trick COMPSCI 371D — Machine Learning Convex Programs 6 / 16
Local Convex Minimization → Convex Programs Local Convex Minimization → Convex Programs • Convex function f ( u ) : R m → R • f differentiable, with continuous first derivatives • Unconstrained minimization: u ∗ ∈ arg min u ∈ R m f ( u ) • Constrained minimization: u ∗ ∈ arg min u ∈ C f ( u ) • C = { u ∈ R m : A u + b ≥ 0 } • f is a convex function • C is a convex set : If u , v ∈ C , then for t ∈ [ 0 , 1 ] t u + ( 1 − t ) v ∈ C • The specific C is bounded by hyperplanes • This is a convex program COMPSCI 371D — Machine Learning Convex Programs 7 / 16
Local Convex Minimization → Convex Programs Convex Program u ∗ ∈ arg min u ∈ C f ( u ) where = { u ∈ R m : c ( u ) ≥ 0 } . def C • f differentiable, with continuous gradient, and convex • k inequalities in C are affine: c ( u ) = A u + b ≥ 0 . COMPSCI 371D — Machine Learning Convex Programs 8 / 16
Shape of the Solution Set Shape of the Solution Set • Just as for the unconstrained problem: • There is one f ∗ but there can be multiple u ∗ (a flat valley) • The set of solution points u ∗ is convex • if f is strictly convex at u ∗ , then u ∗ is the unique solution point COMPSCI 371D — Machine Learning Convex Programs 9 / 16
Shape of the Solution Set Zero Gradient → KKT Conditions • For the unconstrained problem, the solution is characterized by ∇ f ( u ) = 0 • Constraints can generate new minima and maxima • Example: f ( u ) = e u f(u) f(u) f(u) 0 1 0 1 0 1 u u u • What is the new characterization? • Karush-Kuhn-Tucker conditions , necessary and sufficient COMPSCI 371D — Machine Learning Convex Programs 10 / 16
The Karush-Kuhn-Tucker Conditions Regular Points s ∇ f u H − H + COMPSCI 371D — Machine Learning Convex Programs 11 / 16
The Karush-Kuhn-Tucker Conditions Corner Points c 2 C C c 2 ∇ f s ∇ f c 1 c 1 u u H − H + H − H + COMPSCI 371D — Machine Learning Convex Programs 12 / 16
The Karush-Kuhn-Tucker Conditions The Convex Cone of the Constraint Gradients c 2 ∇ f c 1 u H − H + COMPSCI 371D — Machine Learning Convex Programs 13 / 16
The Karush-Kuhn-Tucker Conditions Inactive Constraints Do Not Matter c 2 c 3 C ∇ f c 1 u c 1 u c 2 v C H − H + COMPSCI 371D — Machine Learning Convex Programs 14 / 16
The Karush-Kuhn-Tucker Conditions Conic Combinations c 2 n 1 n 2 v ∇ f c 1 u a 1 a 2 H − H + { v : v = α 1 a 1 + α 2 a 2 with α 1 , α 2 ≥ 0 } COMPSCI 371D — Machine Learning Convex Programs 15 / 16
The Karush-Kuhn-Tucker Conditions The KKT Conditions u ∈ C is a solution to a convex program iff there exist α i s.t. � ∇ f ( u ) = α i ∇ c i ( u ) with α i ≥ 0 i ∈A ( u ) where A ( u ) = { i : c i ( u ) = 0 } is the active set at u c 2 ∇ f c 1 u H − H + Convention: � i ∈∅ = 0 (so condition also holds in interior of C ) COMPSCI 371D — Machine Learning Convex Programs 16 / 16
Recommend
More recommend