Support Vector Machines Marco Chiarandini Department of Mathematics - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 9 Support Vector Machines Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark

Kernels Soft margins Overview SMO Algorithm Support Vector Machines: 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 6. Kernels 7. Soft margins 8. SMO Algorithm 2

Kernels Soft margins In This Lecture SMO Algorithm 1. Kernels 2. Soft margins 3. SMO Algorithm 3

Kernels Soft margins Resume SMO Algorithm m m m α i − 1 � � � y i y j α i α j � � x i , � x j � max W ( � α ) = 2 α � i =1 i =1 j =1 s.t. α i ≥ 0 ∀ i = 1 . . . m m α i y i = 0 � i =1 m � � α i y i � x i θ = ∀ i = 1 . . . m i =1 x i + θ 0 ) ≥ 1 y i ( � θ T � ∀ i = 1 . . . m x i + θ 0 ) − 1] = 0 α i [ y i ( � θ T � ∀ i = 1 . . . m Prediction: � m � h ( � � α i y i � � x i , � θ, � x ) = sign x � + θ 0 i =1 4

Kernels Soft margins Introduction SMO Algorithm We saw: 1. h ( � x ) fitted � θ, � θ on training data then discarded training data 2. k -NN training data kept during the prediction phase. Memory based method. (fast to train, slower to predict) 3. locally weighted linear regression x i − � x ) T ( � x i − � � � − ( � x ) w i = exp � � w i ( y i − � θ T � x i ) 2 , θ = argmin 2 τ 2 i (linear parametric method where predictions are based on a linear combination of kernel functions evaluated at training data) 5

Kernels Soft margins Outline SMO Algorithm 1. Kernels 2. Soft margins 3. SMO Algorithm 6

Kernels Soft margins Kernels SMO Algorithm x 1 , . . . , x D inputs if we want all polynomial terms up to degree 2: � T � � x 2 x 2 . . . x 2 x 1 x 2 x 1 x 3 . . . x D − 1 x D φ ( � x ) = 1 2 D � D � = O ( D 2 ) terms 2 For D = 3 x i ) T · � In SVM we need � � x j ) � = φ ( � φ ( � ⇒ O ( D 2 ) for   1 m 2 times √ 2 x 1   d d m √ x ) T � � � � x 2 i z 2 �   φ ( � φ ( � z ) = 1 + 2 x i z i + i + 2 x i x j z i z j 2 x 2   √   i =1 i =1 i =1 2 x 3     x 2   someone recognized that this is the same as � 1 φ ( � x ) =   x 2 x T · � z ) 2 which can be computed in O ( D ) .   2 (1 + �   x 2   √ 3   2 x 1 x 2 x T · �   z ) s √ k ( � x, � z ) = (1 + � kernel    2 x 1 x 3  √   2 x 2 x 3 we may restrict to compute Kernel matrix 7

Kernels Soft margins Kernels SMO Algorithm For models with fixed non linear feature space: Definition (Kernel) x ) T · � x ′ ) = � x ′ ) k ( � x, � φ ( � φ ( � x ′ ) = k ( � x ′ , � It follows that k ( � x, � x ) Kernel Trick If we have an algorithm in which the input vector � x enters only in form of scalar products, then we can replace the scalar product with some choice of kernel. ◮ This is our case with SVM: thanks to dual formulation, both training and prediction can be done via scalar product. ◮ No need to define features 8

Kernels Soft margins Constructing Kernels SMO Algorithm x T · � x ′ (scalar product) x ′ ) = � It must be k ( � x, � 1. define some basis functions � φ ( � x ) : D x ) T � x ′ ) = � � x ′ ) = x ′ ) k ( � x, � φ ( � φ ( � φ i ( � x ) φ i ( � i =1 2. define kernel directly provided it is some scalar product in some feature space (maybe infinite) x T · � x ′ ) = (1 + � x ′ ) 2 k ( � x, � 9

Kernels Soft margins Constructing Kernels SMO Algorithm Following approach 2: Theorem (Mercer’s Kernel) Necessary and sufficient condition for k ( · ) to be a valid kernel is that the x i , � x j ) , is positive semidefinite Gram matrix k , whose elements are k ( � x T k � ( ∀ x ∈ R n , � x i } . x ≥ 0 ) for all choices of the set { � Proof: x i ) T � x j ) T � x j ) = � x j ) = � x i , � x i ) = k ji Symmetry: k ij = k ( � φ ( � φ ( � φ ( � φ ( � 10

Kernels Soft margins Constructing Kernels SMO Algorithm One easy way to construct kernels is by recombining building blocks. Known building blocks: x ′ ) = � x T � Linear: k ( � x, � x x ′ ) = ( � x T � x + c ) s Polynomials: k ( � x, � x ′ � 2 / 2 σ 2 ) (has infinite dimensionality) x ′ ) = exp( − � � radial basis: k ( � x, � x − � x ′ ) = tanh( k� x T � sigmoid func.: k ( � x, � x − σ ) 11

Kernels Soft margins Soft margins SMO Algorithm What if data are not separable? 13

Kernels Soft margins Soft margins SMO Algorithm We allow some points to be on the wrong side and introduce slack variables � ξ = ( ξ 1 . . . , ξ m ) in the formulation: geometric margin becomes: x i + θ 0 ) > 0 if predicted correct ◮ y i ( � θ T � x i + θ 0 ) > − ξ i for the points mispredicted ◮ y i ( � θ T � In the formulation we modify x i + θ 0 ) > γ into y i ( � θ T � x i + θ 0 ) > γ (1 − ξ i ) and include a regularization term to minimize: y i ( � θ T � m 1 θ � 2 + C 2 � � � (OPT) : min ξ i � θ,θ 0 i =1 x i + θ 0 ) 1 − ξ i ≤ y i ( � θ T � α i : ∀ i = 1 , . . . , m µ i : ξ i ≥ 1 ∀ i = 1 , . . . , m still convex optimization 14

Kernels Soft margins SMO Algorithm m m m µ ) = 1 θ � 2 + C � x i + θ 0 ) − (1 − ξ i ) � L ( � 2 � � � � y i ( � θ T � � θ, θ 0 , � α, � ξ i − α i − µ i ξ i i =1 i =1 i =1 µ we have the primal L P ( � θ, θ 0 , � ξ ) which we minimize in � θ, θ 0 , � fixed � α, � ξ : m ⇒ � � α i y i x i ∇ � θ L P = 0 = θ = i =1 m ∂ L P � α i y i = 0 = ⇒ 0 = ∂θ 0 i =1 ∂ L P = 0 = ⇒ α i = C − µ i ∀ i ∂ξ i Lagrange dual: m m m α i − 1 � � � x T L D = α i α j y i y j � i � x j 2 i =1 i =1 j =1 15

Kernels Soft margins SMO Algorithm m m m α i − 1 � � � x T max L D = α i α j y i y j � i � x j (1) 2 i =1 i =1 j =1 0 ≤ α i ≤ C (2) m α i y i = 0 � (3) i =1 i � x T α i [ y i ( � θ + θ 0 ) − (1 − ξ i )] = 0 (4) µ i ξ i = 0 (5) i � x T y i ( � θ + θ 0 ) − (1 − ξ i ) ≥ 0 (6) µ i ≥ 0 , ξ i ≥ 0 (7) for (5) + ∂ L P ∂ξ i = 0 support vectors are: ◮ the points that lie on the edge of the margin ( ξ i = 0 ) and hence = ⇒ 0 < α i < C ◮ the misclassified points ξ i > 0 that have α i = C The margin points can be used to solve (4) for θ 0 16

Kernels Soft margins Coordinate ascent SMO Algorithm max W ( α 1 , α 2 , . . . , α m ) � α repeat for i=1,. . . ,m do α i := arg max ˆ α i W ( α 1 , . . . , α i − 1 ˆ α i , α i +1 , . . . , α m ) until till convergence ; 18

Kernels Soft margins Sequential Minimal Optimization SMO Algorithm max W ( α 1 , α 2 , . . . , α m ) � α m � y i α i = 0 i =1 Fix and change two α s at a time. repeat select α i and α j by some heuristic; hold all α l , l � = i, j fixed and optimize W ( � α ) in α i , α j until till convergence ; α 1 y 1 + α 2 y 2 = − � m i =3 α i y i = const = ⇒ α 1 = C − α 2 y 2 y 1 19

Kernels Soft margins Example SMO Algorithm 20

Kernels Soft margins SVM for K-Classes SMO Algorithm 1. train K SVM each SVM classifies one class from all the others. 2. choose the indication of the SVM that makes the strongest prediction: where the basis vector input point is furthest into positive region 21

Kernels Soft margins SVM for regression SMO Algorithm With a quantitative response we try to fit as much as possible within the margin change, hence we change the objective function in (OPT3) into: m V ( y i − f ( x i )) + λ � 2 � � θ � 2 min i =1 � 0 if | r | < ǫ V ǫ = | r | − ǫ otherwise 22

Kernels Soft margins SVM as Regularized Function SMO Algorithm 23

Support Vector Machines Marco Chiarandini Department of Mathematics - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 9 Support Vector Machines Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Kernels Soft margins Overview SMO Algorithm Support Vector

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

Michael Ryan, John Noecker Jr Evaluating Variations in Language Lab Duquesne University mryan,

Max-Margin Markov Networks Ben Taskar Carlos Guestrin Daphne Koller Main Contribution The

Soft-margin SVM, SMO Algorithm, Decision Trees Milan Straka November 25, 2019 Charles

slides Data February 2015 CITATIONS READS 0 58 1 author: Rajeev Piyare Amber Agriculture

ABSTRACT CHARACTERIZATIONS OF PSEUDODIFFERENTIAL OPERATORS References [1] H. O. Cordes, On

Applying K-means (or flowPeaks) and Support vector machines to the sample classification problem

Distributed Consensus Paxos Ethan Cecchetti October 18, 2016 CS6410 Some structure taken from

Sambuz

Useful Links

Newsletter

Mail Us

Support Vector Machines Marco Chiarandini Department of Mathematics - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 9 Support Vector Machines Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Kernels Soft margins Overview SMO Algorithm Support Vector

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

Michael Ryan, John Noecker Jr Evaluating Variations in Language Lab Duquesne University mryan,

Max-Margin Markov Networks Ben Taskar Carlos Guestrin Daphne Koller Main Contribution The

Soft-margin SVM, SMO Algorithm, Decision Trees Milan Straka November 25, 2019 Charles

slides Data February 2015 CITATIONS READS 0 58 1 author: Rajeev Piyare Amber Agriculture

ABSTRACT CHARACTERIZATIONS OF PSEUDODIFFERENTIAL OPERATORS References [1] H. O. Cordes, On

Applying K-means (or flowPeaks) and Support vector machines to the sample classification problem

Distributed Consensus Paxos Ethan Cecchetti October 18, 2016 CS6410 Some structure taken from

Sambuz

Useful Links

Newsletter

Mail Us

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David