Kernel Machines Support Vector Machines 1 Kernel Machines Optimal - PowerPoint PPT Presentation

Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft Margin HyperPlanes Steven J Zeil Kernel Machines 2 Old Dominion Univ. Kernel Functions Multi-Classes Fall 2010 Regression Outlier Detection Dimensionality Reduction 1 2 Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Seeking Separation Support Vector Machines Recall earlier discussion: Non-parametric, discriminant-based The most specific hypothesis Defines the discriminant as a combination of support vectors S is the tightest rectangle enclosing the positive Convex optimization problems with a unique solution examples. prone to false negatives The most general hypothesis G is the largest rectangle enclosing the positive examples but containing no negative examples. prone to false positives Perhaps we should choose something in between? 3 4

Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Separating HyperPlanes Support Vector Machines: Margins In our earlier linear SVM’s seek the plane that discrimination techniques, maximizes the margin we sought any separating between the plane and the hyperplane closest instances of each class. Some of the points could come arbitrarily close to the After testing, we do not border insist on the margin Distance from the plane But distance from plane is was a measure of still an indication of confidence confidence If the class was not linearly With minor modification, separable, too bad. can extend to classes that are not linearly separable. 5 6 Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Defining the Margin Maximizing the Margin Margin is the distance from the discriminant to the closest � +1 x t ∈ C 1 if � x t , r t } where r t = instances on either side X = { � x t ∈ C 2 − 1 if � w T � x t + w 0 | x to the hyperplane is | � Distance of � || � w || Find � w and w 0 such that Let ρ denote the margin side: x t + w 0 ≥ +1 for r t = +1 w T � x t + w 0 | � w T � ∀ t , | � ≥ ρ || � w || x t + w 0 ≤ +1 for r t = − 1 w T � � If we try to maximize ρ , there are infinite solutions form,ed by or, equialently, simply rescaling the � w x t + w 0 ) ≥ +1 r t ( � w T � Fix ρ || w || = 1, and minimize || w || to maximize ρ w || 2 subject to ∀ t , | � w T � x t + w 0 | Minimize 1 2 || � ≥ ρ || � w || 7 8

Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Margin Derivation (1/3) w || 2 subject to ∀ t , | � w T � x t + w 0 | Minimize 1 2 || � ≥ ρ || � w || Circled inputs are the ones that determine the border N After training, we could 1 w || 2 − x t + w 0 ) − 1] � α t [ r t ( � w T � L p = 2 || � forget about the others In fact, we might be able t =1 eliminate some of the N N 1 w || 2 − x t + w 0 )] + � � α t [ r t ( � w T � α t others before training = 2 || � t =1 t =1 N ∂ L p � α t r t � x t w = 0 ⇒ � w = ∂� t =1 9 10 Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Derivation (2/3) Derivation (3/3) N 1 ∂ L p x t − w 0 α t r t + w T � w T � � � � α t r t � α t α t r t � x t L d = 2( � w ) − � w = 0 ⇒ � w = ∂� t t t t =1 − 1 w T � � α t N = 2( � w ) + ∂ L p α t r t = 0 � = 0 ⇒ t ∂ w 0 − 1 t =1 α t α s r t r s + � � � α t = 2 Maximizing L p is equivalent to maximizing the dual t s t 1 t α t r t = 0 and α t ≥ 0 x t − w 0 α t r t + w T � w T � � � subject to � α t r t � α t L d = 2( � w ) − � Solve numerically. t t t Most α t are 0 The small number with α t > 0 are the support vectors : t α t r t = 0 and α t ≥ 0 subject to � N � α t r t � x t w = � t =1 11 12

Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Support Vectors Demo Applet on http://www.csie.ntu.edu.tw/ cjlin/libsvm/ Circled inputs are the support vectors w = � N t =1 α t r t � x t � Compute w 0 from the average over the support vectors of w 0 = r t − � w T � x t 13 14 Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Soft Margin HyperPlanes Soft Margin Derivation w || 2 + C � t ξ t C is a penalty factor that trades off L p = 1 Suppose the classes are almost, but not quite linearly 2 || � separable complexity against data misfitting x t + w 0 ) ≥ 1 − ξ t r t ( � w T � Leads to same numerical optimization problem with new The ξ are slack variables, ξ t ≥ 0 storing deviation from the constraint margin 0 ≤ α t ≤ C ξ t = 0 means � x t is more than 1 away from hyperplane 0 < ξ t < 1 means � x t is within the margin, but correctly classified ξ t ≥ 1 means � x t is misclassified t ξ t is a measure of error. Add as a penalty term � L p = 1 w || 2 + C � ξ t 2 || � t C is a penalty factor that trades off complexity against data misfitting 15 16

Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Soft Margin Example Hinge Loss � 0 if y t r r ≥ 1 L hinge ( y t , r t ) = 1 − y t r t ow Behavior in 0..1 makes this more robust than 0/1 and squred error Close to cross-entropy over much of its range Applet on http://www.csie.ntu.edu.tw/ cjlin/libsvm/ 17 18 Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Non-Linear SVM The Kernel z = � t α t r t � x T � x t Replace inputs � x by a sequence of basis functions � φ ( � x ) g ( � x ) = � x t is a measure of similarity between � x T � Linear SVM Kernel SVM � x and a support vector. z t = α t r t � � � α t r t � x t ) � w = � φ ( � α t r t � x t w = t α t r t � x t ) T � � g ( � x ) = � φ ( � φ ( � x ) t t x t ) T � t � φ ( � φ ( � x ) can be seen as a similarity measure in the non-linear basis space. w T � w T � x t ) T � g ( � x ) = � x x ) = � x t ,� g ( � x ) = � φ ( � x ) To generalize, let K ( � φ ( � φ ( � x ) � α t r t � x T � x t α t r t � x t ) T � � = = φ ( � φ ( � x ) � α t r t K ( � x t ,� g ( � x ) = x ) t t t K is a kernel function . 19 20

Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Polynomial Kernels Radial-basis Kernels x t − � x || 2 x t + 1) q � � −|| � x t ,� x T � K q ( � x ) = ( � x t ,� K ( � x ) = exp 2 s 2 E.g., Other options include sigmoidal y + 1) 2 (approximated as tanh ) K ( � x ,� y ) = ( � x � ( x 1 y 1 + x 2 y 2 + 1) 2 = = 1 + 2 x 1 y 1 + 2 x 2 y 2 +2 x 1 x 2 y 1 y 2 + x 2 1 y 2 1 + x 2 2 y 2 2 √ √ √ � 2 x 1 x 2 , x 2 1 , x 2 φ ( � x ) = [1 , 2 x 1 , 2 x 2 , 2 ] (FWIW) 21 22 Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Selecting Kernels Multi-Classes 1 versus all Kernels can be customized to application K separate N variable problems Choose appropriate measures of similarity pairwise separation Bag of words (normalized cosines between vocabulary vectors) Genetics: edit distance between strings K(K-1) separate N variable problems Graphs: length of shortest path between nodes, or number of single multiclass optimization connecting paths � K Minimize 1 t ξ t i =1 || w i || 2 + C � � i subject to 2 i For input sets with very large dimension, may be cheaper to x t + w z t 0 ≥ � x t + w i 0 + 2 − ξ t pre-compute the and save the matrix of kernel values ( Gram w T w T i , ∀ i � = z t � z t � i � matrix ) rather than keeping all the inputs available. x t where z t is the index of the class of � one K*N variable problem 23 24

Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Regression Linear Regression Linear regression to w T � f ( � x ) = � x + w 0 Instead of using the usual error measure: e 2 ( r t , f ( � x t )) = [ r t − f ( � x t )] 2 we use a linear, ǫ -sensitive error e ǫ ( r t , f ( � x t )) = max(0 , | r t − f ( � x t ) | − ǫ ) Errors of less than ǫ are tolerated and larger errors have only a linear effect more tolerant to noise t � w T � ( α t + − α t x t ) T � f ( � x ) = � x + w 0 = − )( � x + w 0 Function is a combination of a limited set of support vectors. 25 26 Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Kernel Regression Gaussian Kernel Regression Again, we can replace the inputs by a basis function, eventually leading to a Kernel function as a similarity measure. Shown here: polynomial 27 28

Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines One-Class Kernel Machines Gaussian Kernel One-Class Consider a sphere with center � a amd radius R Minimize R 2 + C � t ξ t subject to x t − � a || ≤ R 2 + ξ t || � w is an outlier if it lies outside the sphere. � 29 30 Support Vector Machines Kernel Machines Dimensionality Reduction Kernel PCA does PCA on the kernel φ T � matrix � φ instead of on the direct inputs For high-dimension input spaces, we can work on an NxN problems instead of DxD 31

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal - PowerPoint PPT Presentation

Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft Margin HyperPlanes Steven J Zeil Kernel Machines 2 Old

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &

Finite State Machines (FSM) AKA Finite State Automat on State Machines Introduction State

A kernel in a library Genodes custom kernel approach Martin Stein <

Linux Kernel Synchronization System Calls Synchronization in Kernel the kernel RCU File

Debugging the Linux Kernel with GDB Kieran Bingham Debugging the Linux Kernel with GDB Many

Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2019 Soleymani

Kernel Methods CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Piyush Rai Beyond

L ECTURE 8: D UAL AND K ERNELS Prof. Julia Hockenmaier juliahmr@illinois.edu Admin CS446

Kernels Course of Machine Learning Master Degree in Computer Science Giorgio Gambosi a.a.

Kernel Methods for regression and classification Prof. Mike Hughes Many ideas/slides

Kernels CS678 Advanced Topics in Machine Learning Thorsten Joachims Spring

Cross-Hospital Bed Management System Abedian S, Kazemi H , Riazi H and Bitaraf E. In the Name of

Creating a Campus Culture Where Every Student Graduates David Laude Senior Vice Provost

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal - PowerPoint PPT Presentation

Support Vector Machines Kernel Machines Support Vector Machines Kernel Machines Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft Margin HyperPlanes Steven J Zeil Kernel Machines 2 Old

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &amp;

Finite State Machines (FSM) AKA Finite State Automat on State Machines Introduction State

A kernel in a library Genodes custom kernel approach Martin Stein &lt;

Linux Kernel Synchronization System Calls Synchronization in Kernel the kernel RCU File

Debugging the Linux Kernel with GDB Kieran Bingham Debugging the Linux Kernel with GDB Many

Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2019 Soleymani

Kernel Methods CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Piyush Rai Beyond

L ECTURE 8: D UAL AND K ERNELS Prof. Julia Hockenmaier juliahmr@illinois.edu Admin CS446

Kernels Course of Machine Learning Master Degree in Computer Science Giorgio Gambosi a.a.

Kernel Methods for regression and classification Prof. Mike Hughes Many ideas/slides

Kernels CS678 Advanced Topics in Machine Learning Thorsten Joachims Spring

Cross-Hospital Bed Management System Abedian S, Kazemi H , Riazi H and Bitaraf E. In the Name of

Creating a Campus Culture Where Every Student Graduates David Laude Senior Vice Provost

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &

A kernel in a library Genodes custom kernel approach Martin Stein <