Support Vector Machines Marco Chiarandini Department of Mathematics - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 8 Support Vector Machines Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Overview Solving the Optimal Margin Support Vector Machines: 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 6. Kernels 7. Soft margins 8. SMO Algorithm 2

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions In This Lecture Solving the Optimal Margin 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 3

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Introduction Solving the Optimal Margin ◮ Binary classification. ◮ y ∈ {− 1 , 1 } (instead of { 0 , 1 } like in GLM) ◮ Let’s have h ( � θ, � x ) output values {− 1 , 1 } : � 1 ifz ≥ 0 f ( z ) = sign( z ) − 1 ifz < 0 (hence no probabilities like in logistic regression) ◮ h ( � x ) = f ( � x ∈ R n , � θ ∈ R n , θ 0 ∈ R θ, � θ� x + θ 0 ) , � ◮ Assume for now training set is linearly separable 4

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Solving the Optimal Margin SVM determine model parameters by solving a convex optimization problem and hence a local optimal solution is also global optimal. Margin: smallest distance between the decision boundary and any of the samples. The location of the boundary is determined by a subset of the data points, known as support vectors 5

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Outline Solving the Optimal Margin 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 6

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Resume Solving the Optimal Margin ◮ functional margin: γ i = y i ( � x i + θ 0 ) θ T � γ i ˆ = ⇒ ˆ γ = min ˆ i requires a normalization condition ◮ geometric margin: T � � � θ θ 0 γ i = y i x i + γ i = ⇒ γ = min � � � � θ � θ � i scale invariant γ ˆ ◮ γ = � � θ � γ i = γ i the two margins correspond ◮ if � � θ � = 1 then ˆ 7

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Optimization Problem Solving the Optimal Margin Looking at the geometric margin: (OPT1) : max γ γ,� θ,θ 0 x i + θ 0 ) γ ≤ y i ( � θ T � ∀ i = 1 , . . . , m � � θ � = 1 γ ˆ Alternatively, looking at functional margins and recalling that γ = θ � : � � ˆ γ (OPT2) : max � � γ,� θ � θ,θ 0 ˆ x i + θ 0 ) γ ≤ y i ( � θ T � ˆ ∀ i = 1 , . . . , m 9

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Solving the Optimal Margin For the functional margins we can fix the scale, for the geometric margin no scaling problem. Then we can arbitrary fix ˆ γ = 1 1 2 � � θ � 2 (OPT3) : min � θ,θ 0 x i + θ 0 ) 1 ≤ y i ( � θ T � ∀ i = 1 , . . . , m where we used that: max 1 / � � θ � = min � � θ � � θ T � and removed the square root because monotonous in � � � θ � = θ . This problem is a convex optimization problem, it has convex quadratic objective function and linear constraints, hence it can be solved optimally and efficiently 10

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Convex optimization problem Solving the Optimal Margin minimize f 0 ( x ) subject to f i ( x ) ≤ b i , i = 1 , . . . , m objective and constraint functions are convex: f i ( αx + βy ) ≤ αf i ( x ) + βf i ( y ) if α + β = 1 , α ≥ 0 , β ≥ 0 11

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Lagrangian Solving the Optimal Margin standard form problem (not necessarily convex) minimize f 0 ( x ) subject to f i ( x ) ≤ 0 , i = 1 , ..., m h i ( x ) = 0 , i = 1 , ..., p variable x ∈ R n , domain D , optimal value p ∗ Lagrangian: L : R n × R m × R p → R , with dom L = D × R m × R p , p m � � L ( x, α, β ) = f 0 ( x ) + α i f i ( x ) + βh i ( x ) i =1 i =1 ◮ weighted sum of objective and constraint functions ◮ α i is Lagrange multiplier associated with f i ( x ) ≤ 0 ◮ β i is Lagrange multiplier associated with h i ( x ) = 0 α and � ◮ � β are dual or Lagrangian variables 13

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Lagrange dual function Solving the Optimal Margin Lagrange dual function: L D : R m × R p → R � m p � � � L D ( α, β ) = min x ∈D L ( x, α, β ) = min f 0 ( x ) + α i f i ( x ) + βh i ( x ) x ∈D i =1 i =1 L D is concave, can be −∞ for some α and β Lower bound property: for a feasible ˜ x L D ( α, β ) ≤ p ∗ 1. ∀ α ≥ 0 , β 2. L P ( x ) = max α ≥ 0 ,β ( L D ( α, β )) ≤ p ∗ (best lower bound, it may be = p ∗ ) Proof of (1): for any ˜ x feasible and α ≥ 0 : p m � � L (˜ x, α, β ) = f 0 (˜ x ) + α i f i (˜ x ) + βh i (˜ x ) ≤ f 0 (˜ x ) i =1 i =1 hence L D ( α, β ) = min x ∈D L ( x, α, β ) ≤ L (˜ x, α, β ) ≤ f 0 (˜ x ) (2) is true because (1) true for any α, β . 14

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Solving the Optimal Margin If f 0 and g i are convex and h i affine, d ∗ = max α ≥ 0 ,β ( L D ( α, β )) = p ∗ so we can solve the dual in place of the primal. 15

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Karush Kuhn Tucker Conditions Solving the Optimal Margin standard form problem (not necessarily convex) minimize f 0 ( x ) subject to g i ( x ) ≤ b i , i = 1 , ..., m variable x ∈ R n , f, g nonlinear, f : R n → R , g : R n → R m Necessary conditions for optimality (local validity): ∇ f ( x 0 ) = � m  i =1 λ i ∇ g i ( x 0 )    λ i ≥ 0 ∀ i  � m i =1 λ i ( g i ( x 0 ) − b i ) = 0    g i ( x 0 ) − b i ≤ 0  17

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Solving the Optimal Margin Let’s go back to our problem: 1 2 � � θ � 2 (OPT3) : min � θ,θ 0 x i + θ 0 ) 1 ≤ y i ( � θ T � ∀ i = 1 , . . . , m m α ) = 1 θ � 2 − � x i + θ 0 ) − 1 � L ( � 2 � � � y i ( � θ T � θ, θ 0 , � α i i =1 we find the dual form by solving in � θ, θ 0 L ( � L D ( � α ) = min θ, θ 0 , � α ) � θ,θ 0 m m x i = 0 θ L ( � α ) = � � ⇒ � � α i y i � α i y i � x i ∇ � θ, θ 0 , � θ − = θ = i =1 i =1 ∂ L ( � m m θ, θ 0 , � α ) α i y i α i = 0 α i y i α i = 0 � � = − = ⇒ ∂θ 0 i =1 i =1 19

Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Solving the Optimal Margin Substituting in L ( � θ, θ 0 , � α ) : � m �   m θ ) = 1 L D ( � � � α i y i � x i α j y j � x j   2 i =1 j =1     m m x i + θ 0 � �  − 1  y i α j y j � x j ) � − α i  (  i =1 j =1 m m m α i − 1 � � � y i y j α i α j � � x i � x j � = 2 i =1 i =1 j =1 20

Support Vector Machines Marco Chiarandini Department of Mathematics - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 8 Support Vector Machines Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Functional and Geometric Ma Optimal Margin Classifier Lagrange

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Proving Correctness of Compilers Using Structured Graphs Patrick Bahr University of Copenhagen,

Natural Language Processing Info 159/259 Lecture 8: Vector semantics (Sept 19, 2017) David

Combinat orics Definition 1 (Combinatorics). Combinatorics is the science of counting. Theorem 1

Using Evidence Maps in SRDR to Efficiently Plan Systematic Reviews Center for Evidence-Based

Rental Assistance Demonstration (RAD) Relocation Requirements Notice July 2014 S ECTION 1- P

Income inequality and American business Christopher Jencks Harvard Kennedy School HBS

DIVISION OF TECHNOLOGY AND ADVANCED SOLUTIONS Brad King / Joe Gillen / Teresa Hagedorn Version

@josevalim / phoenixframework.org Glossary Phoenix (web framework) Elixir (programming

Sambuz

Useful Links

Newsletter

Mail Us

Support Vector Machines Marco Chiarandini Department of Mathematics - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 8 Support Vector Machines Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Functional and Geometric Ma Optimal Margin Classifier Lagrange

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Proving Correctness of Compilers Using Structured Graphs Patrick Bahr University of Copenhagen,

Natural Language Processing Info 159/259 Lecture 8: Vector semantics (Sept 19, 2017) David

Combinat orics Definition 1 (Combinatorics). Combinatorics is the science of counting. Theorem 1

Using Evidence Maps in SRDR to Efficiently Plan Systematic Reviews Center for Evidence-Based

Rental Assistance Demonstration (RAD) Relocation Requirements Notice July 2014 S ECTION 1- P

Income inequality and American business Christopher Jencks Harvard Kennedy School HBS

DIVISION OF TECHNOLOGY AND ADVANCED SOLUTIONS Brad King / Joe Gillen / Teresa Hagedorn Version

@josevalim / phoenixframework.org Glossary Phoenix (web framework) Elixir (programming

Sambuz

Useful Links

Newsletter

Mail Us

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David