SVM Duality summary Lagrangian n L ( w , ) = 1 2 w 2 T 2 + i - PowerPoint PPT Presentation

SVM Duality summary Lagrangian n L ( w , α ) = 1 � 2 � w � 2 T 2 + α i (1 − y i x i w ) . i =1 Primal maximum margin problem was   n  1 � 2 � w � 2  . T P ( w ) = max α ≥ 0 L ( w , α ) = max 2 + α i (1 − y i x i w ) α ≥ 0 i =1 Dual problem � �   2 n n � n � α i − 1 � � � � �  = D ( α ) = min w ∈ R d L ( w , α ) = L α i y i x i , α α i y i x i � �  2 � � � � i =1 i =1 i =1 2 n n α i − 1 � � T = α i α j y i y j x i x j . 2 i =1 i,j =1 Given dual optimum ˆ α , w = � n ◮ Corresponding primal optimum ˆ i =1 α i y i x i ; ◮ Strong duality P ( ˆ w ) = D (ˆ α ) ; ◮ ˆ α i > 0 implies y i x T i ˆ w = 1 , and these y i x i are support vectors. 14 / 39

4. Non-separable case

Soft-margin SVMs (Cortes and Vapnik, 1995) When training examples are not linearly separable, the (primal) SVM optimization problem 1 2 � w � 2 min 2 w ∈ R d T s.t. y i x i w ≥ 1 for all i = 1 , 2 , . . . , n has no solution (it is infeasible ). 15 / 39

Soft-margin SVMs (Cortes and Vapnik, 1995) When training examples are not linearly separable, the (primal) SVM optimization problem 1 2 � w � 2 min 2 w ∈ R d T s.t. y i x i w ≥ 1 for all i = 1 , 2 , . . . , n has no solution (it is infeasible ). Introduce slack variables ξ 1 , . . . , ξ n ≥ 0 , and a trade-off parameter C > 0 : n 1 � 2 � w � 2 min 2 + C ξ i w ∈ R d ,ξ 1 ,...,ξ n ∈ R i =1 T s.t. y i x i w ≥ 1 − ξ i for all i = 1 , 2 , . . . , n, ξ i ≥ 0 for all i = 1 , 2 , . . . , n, which is always feasible . This is called soft-margin SVM . (Slack variables are auxiliary variables ; not needed to form the linear classifier.) 15 / 39

Interpretation of slack varables n 1 � 2 � w � 2 min 2 + C ξ i w ∈ R d ,ξ 1 ,...,ξ n ∈ R i =1 T s.t. y i x i w ≥ 1 − ξ i for all i = 1 , 2 , . . . , n, ξ i ≥ 0 for all i = 1 , 2 , . . . , n. H For given w , ξ i / � w � 2 is distance that x i would have to move to satisfy T y i x i w ≥ 1 . 16 / 39

Another interpretation of slack variables Constraints with non-negative slack variables : n 1 � 2 � w � 2 min 2 + C ξ i w ∈ R d ,ξ 1 ,...,ξ n ∈ R i =1 T s.t. y i x i w ≥ 1 − ξ i for all i = 1 , 2 , . . . , n, ξ i ≥ 0 for all i = 1 , 2 , . . . , n. 17 / 39

Another interpretation of slack variables Constraints with non-negative slack variables : n 1 � 2 � w � 2 min 2 + C ξ i w ∈ R d ,ξ 1 ,...,ξ n ∈ R i =1 T s.t. y i x i w ≥ 1 − ξ i for all i = 1 , 2 , . . . , n, ξ i ≥ 0 for all i = 1 , 2 , . . . , n. Equivalent unconstrained form : n 1 � � � 2 � w � 2 T min 2 + C 1 − y i x i w + . w ∈ R d i =1 Notation : [ a ] + := max { 0 , a } (ReLU!). 17 / 39

Another interpretation of slack variables Constraints with non-negative slack variables : n 1 � 2 � w � 2 min 2 + C ξ i w ∈ R d ,ξ 1 ,...,ξ n ∈ R i =1 T s.t. y i x i w ≥ 1 − ξ i for all i = 1 , 2 , . . . , n, ξ i ≥ 0 for all i = 1 , 2 , . . . , n. Equivalent unconstrained form : n 1 � � � 2 � w � 2 T min 2 + C 1 − y i x i w + . w ∈ R d i =1 Notation : [ a ] + := max { 0 , a } (ReLU!). � � 1 − y x T w + is hinge loss of w on example ( x , y ) . 17 / 39

Convex dual in non-separable case Lagrangian n n L ( w , ξ , α ) = 1 � � 2 � w � 2 T 2 + C ξ i + α i (1 − ξ i − y i x i w ) . i =1 i =1 Dual problem D ( α ) = min L ( w , ξ , α ) . w ∈ R d , ξ ∈ R n ≥ 0 18 / 39

Convex dual in non-separable case Lagrangian n n L ( w , ξ , α ) = 1 � � 2 � w � 2 T 2 + C ξ i + α i (1 − ξ i − y i x i w ) . i =1 i =1 Dual problem D ( α ) = min L ( w , ξ , α ) . w ∈ R d , ξ ∈ R n ≥ 0 As before, evaluating gradient gives w = � n i =1 α i y i x i ; plugging in, � �   2 n n � n � n α i − 1 � � � � � �  = D ( α ) = min L α i y i x i , ξ , α α i y i x i + ξ i ( C − α i ) .  � � 2 ξ ∈ R n � � ≥ 0 � � i =1 i =1 i =1 i =1 18 / 39

Convex dual in non-separable case Lagrangian n n L ( w , ξ , α ) = 1 � � 2 � w � 2 T 2 + C ξ i + α i (1 − ξ i − y i x i w ) . i =1 i =1 Dual problem D ( α ) = min L ( w , ξ , α ) . w ∈ R d , ξ ∈ R n ≥ 0 As before, evaluating gradient gives w = � n i =1 α i y i x i ; plugging in, � �   2 n n � n � n α i − 1 � � � � � �  = D ( α ) = min L α i y i x i , ξ , α α i y i x i + ξ i ( C − α i ) .  � � 2 ξ ∈ R n � � ≥ 0 � � i =1 i =1 i =1 i =1 The goal is to maximize D ; if α i > C , then ξ i ↑ ∞ gives D ( α ) = −∞ . Otherwise, minimized at ξ i = 0 . 18 / 39

Convex dual in non-separable case Lagrangian n n L ( w , ξ , α ) = 1 � � 2 � w � 2 T 2 + C ξ i + α i (1 − ξ i − y i x i w ) . i =1 i =1 Dual problem D ( α ) = min L ( w , ξ , α ) . w ∈ R d , ξ ∈ R n ≥ 0 As before, evaluating gradient gives w = � n i =1 α i y i x i ; plugging in, � �   2 n n � n � n α i − 1 � � � � � �  = D ( α ) = min L α i y i x i , ξ , α α i y i x i + ξ i ( C − α i ) .  � � 2 ξ ∈ R n � � ≥ 0 � � i =1 i =1 i =1 i =1 The goal is to maximize D ; if α i > C , then ξ i ↑ ∞ gives D ( α ) = −∞ . Otherwise, minimized at ξ i = 0 . Therefore the dual problem is  2  � � n � n � α i − 1 � � � �   max α i y i x i � �   α ∈ R n 2 � � i =1 � i =1 � 0 ≤ α i ≤ C Can solve this with constrained convex opt (e.g., projected gradient descent). 18 / 39

Nonseparable case: bottom line Unconstrained primal: n � � 1 � 2 � w � 2 + C T min 1 − y i x + . i w w ∈ R d i =1 Dual:  2  � � n � n � α i − 1 � � � �   max α i y i x i � �   2 α ∈ R n � � � � i =1 i =1 0 ≤ α i ≤ C w = � n Dual solution ˆ α gives primal solution ˆ i =1 α i y i x i . 19 / 39

Nonseparable case: bottom line Unconstrained primal: n � � 1 � 2 � w � 2 + C T min 1 − y i x + . i w w ∈ R d i =1 Dual:  2  � � n � n � α i − 1 � � � �   max α i y i x i � �   2 α ∈ R n � � � � i =1 i =1 0 ≤ α i ≤ C w = � n Dual solution ˆ α gives primal solution ˆ i =1 α i y i x i . Remarks. ◮ Can take C → ∞ to recover the separable case. ◮ Dual is a constrained convex quadratic (can be solved with projected gradient descent). ◮ Some presentations include bias in primal ( x T i w + b ); this introduces a constraint � n i =1 y i α i = 0 in dual. ◮ Some presentations replace 1 2 and C with λ 2 and 1 n , respectively. 19 / 39

5. Kernels

Looking at the dual again SVM dual problem only depends on x i through inner products x T i x j . n n α i − 1 � � T max α i α j y i y j x i x j . 2 α 1 ,α 2 ,...,α n ≥ 0 i =1 i,j =1 20 / 39

Looking at the dual again SVM dual problem only depends on x i through inner products x T i x j . n n α i − 1 � � T max α i α j y i y j x i x j . 2 α 1 ,α 2 ,...,α n ≥ 0 i =1 i,j =1 If we use feature expansion (e.g., quadratic expansion) x �→ φ ( x ) , this becomes n n α i − 1 � � T φ ( x j ) . max α i α j y i y j φ ( x i ) 2 α 1 ,α 2 ,...,α n ≥ 0 i =1 i,j =1 20 / 39

Looking at the dual again SVM dual problem only depends on x i through inner products x T i x j . n n α i − 1 � � T max α i α j y i y j x i x j . 2 α 1 ,α 2 ,...,α n ≥ 0 i =1 i,j =1 If we use feature expansion (e.g., quadratic expansion) x �→ φ ( x ) , this becomes n n α i − 1 � � T φ ( x j ) . max α i α j y i y j φ ( x i ) 2 α 1 ,α 2 ,...,α n ≥ 0 i =1 i,j =1 w = � n Solution ˆ i =1 ˆ α i y i φ ( x i ) is used in the following way: n � T ˆ T φ ( x i ) . x �→ φ ( x ) w = ˆ α i y i φ ( x ) i =1 20 / 39

Looking at the dual again SVM dual problem only depends on x i through inner products x T i x j . n n α i − 1 � � T max α i α j y i y j x i x j . 2 α 1 ,α 2 ,...,α n ≥ 0 i =1 i,j =1 If we use feature expansion (e.g., quadratic expansion) x �→ φ ( x ) , this becomes n n α i − 1 � � T φ ( x j ) . max α i α j y i y j φ ( x i ) 2 α 1 ,α 2 ,...,α n ≥ 0 i =1 i,j =1 w = � n Solution ˆ i =1 ˆ α i y i φ ( x i ) is used in the following way: n � T ˆ T φ ( x i ) . x �→ φ ( x ) w = ˆ α i y i φ ( x ) i =1 Key insight : ◮ Training and prediction only use φ ( x ) T φ ( x ′ ) , never an isolated φ ( x ) ; ◮ Sometimes computing φ ( x ) T φ ( x ′ ) is much easier than computing φ ( x ) . 20 / 39

Quadratic expansion ◮ φ : R d → R 1+2 d + ( d 2 ) , where √ √ � 2 x d , x 2 1 , . . . , x 2 φ ( x ) = 1 , 2 x 1 , . . . , d , √ √ √ � 2 x 1 x 2 , . . . , 2 x 1 x d , . . . , 2 x d − 1 x d √ (Don’t mind the 2 ’s. . . ) 21 / 39

Quadratic expansion ◮ φ : R d → R 1+2 d + ( d 2 ) , where √ √ � 2 x d , x 2 1 , . . . , x 2 φ ( x ) = 1 , 2 x 1 , . . . , d , √ √ √ � 2 x 1 x 2 , . . . , 2 x 1 x d , . . . , 2 x d − 1 x d √ (Don’t mind the 2 ’s. . . ) ◮ Computing φ ( x ) T φ ( x ′ ) in O ( d ) time : T φ ( x ′ ) = (1 + x T x ′ ) 2 . φ ( x ) 21 / 39

SVM Duality summary Lagrangian n L ( w , ) = 1 2 w 2 T 2 + i - PowerPoint PPT Presentation

SVM Duality summary Lagrangian n L ( w , ) = 1 2 w 2 T 2 + i (1 y i x i w ) . i =1 Primal maximum margin problem was n 1 2 w 2 . T P ( w ) = max 0 L ( w , ) = max 2 + i (1

Review of duality so far LP/QP duality, cone duality, set duality All are halfspace bounds

10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

SVM on Intel Graphics Jesse Barnes Intel Open Source Technology Center 1 What is SVM?

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Machine Learning Theory CS 446 1. SVM risk SVM risk Consider the empirical and true/population

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

Duality of abelian groups stacks and T -duality U. Bunke September 6, 2006 String

Computational Geometry Lecture 11: Arrangements and Duality Computational Geometry Lecture 11:

Stone duality, more duality, and dynamics in Will Brian May 22, 2014 Will Brian Stone

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

CS675: Convex and Combinatorial Optimization Fall 2019 Geometric Duality of Convex Sets and

T-duality Invariant Formalisms at the Quantum Level Daniel Thompson Queen Mary University of

First-Order Logical Duality Henrik Forssell June 2008 First-Order Logical Duality Introduction

Duality Sensitivity Analysis Marco Chiarandini Department of Mathematics & Computer Science

Lecture 2: Linear Programming and Duality Lecture Outline Part I: Linear Programming and

Two views of Duality: Lagrangians and Geometric 1 Linear Programs. Recall last time, we

Chapter 20 Network flow, duality and Linear Programming NEW CS 473: Theory II, Fall 2015

Linear Program Duality Frdric Giroire FG Duality 1/24 Motivation Finding bounds on the

Duality (I) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline The Lagrange Dual

Chapter 1 Linear Programming Paragraph 5 Duality What we did so far We developed the

Lagrangian Duality Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

Kernel Methods - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia

SVM Duality summary Lagrangian n L ( w , ) = 1 2 w 2 T 2 + i - PowerPoint PPT Presentation

SVM Duality summary Lagrangian n L ( w , ) = 1 2 w 2 T 2 + i (1 y i x i w ) . i =1 Primal maximum margin problem was n 1 2 w 2 . T P ( w ) = max 0 L ( w , ) = max 2 + i (1

Review of duality so far LP/QP duality, cone duality, set duality All are halfspace bounds

10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

SVM on Intel Graphics Jesse Barnes Intel Open Source Technology Center 1 What is SVM?

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Machine Learning Theory CS 446 1. SVM risk SVM risk Consider the empirical and true/population

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

Duality of abelian groups stacks and T -duality U. Bunke September 6, 2006 String

Computational Geometry Lecture 11: Arrangements and Duality Computational Geometry Lecture 11:

Stone duality, more duality, and dynamics in Will Brian May 22, 2014 Will Brian Stone

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

CS675: Convex and Combinatorial Optimization Fall 2019 Geometric Duality of Convex Sets and

T-duality Invariant Formalisms at the Quantum Level Daniel Thompson Queen Mary University of

First-Order Logical Duality Henrik Forssell June 2008 First-Order Logical Duality Introduction

Duality Sensitivity Analysis Marco Chiarandini Department of Mathematics &amp; Computer Science

Lecture 2: Linear Programming and Duality Lecture Outline Part I: Linear Programming and

Two views of Duality: Lagrangians and Geometric 1 Linear Programs. Recall last time, we

Chapter 20 Network flow, duality and Linear Programming NEW CS 473: Theory II, Fall 2015

Linear Program Duality Frdric Giroire FG Duality 1/24 Motivation Finding bounds on the

Duality (I) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline The Lagrange Dual

Chapter 1 Linear Programming Paragraph 5 Duality What we did so far We developed the

Lagrangian Duality Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

Kernel Methods - I Henrik I Christensen Robotics &amp; Intelligent Machines @ GT Georgia

Duality Sensitivity Analysis Marco Chiarandini Department of Mathematics & Computer Science

Kernel Methods - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia