Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 - PowerPoint PPT Presentation

Support Vector Machines Kernel Machines Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1

Support Vector Machines Kernel Machines Kernel Machines Support Vector Machines 1 Optimal Separating HyperPlanes Soft Margin HyperPlanes Kernel Machines 2 Kernel Functions Multi-Classes Regression Outlier Detection Dimensionality Reduction 2

Support Vector Machines Kernel Machines Seeking Separation Recall earlier discussion: The most specific hypothesis S is the tightest rectangle enclosing the positive examples. prone to false negatives The most general hypothesis G is the largest rectangle enclosing the positive examples but containing no negative examples. prone to false positives Perhaps we should choose something in between? 3

Support Vector Machines Kernel Machines Support Vector Machines Non-parametric, discriminant-based Defines the discriminant as a combination of support vectors Convex optimization problems with a unique solution 4

Support Vector Machines Kernel Machines Separating HyperPlanes In our earlier linear discrimination techniques, we sought any separating hyperplane Some of the points could come arbitrarily close to the border Distance from the plane was a measure of confidence If the class was not linearly separable, too bad. 5

Support Vector Machines Kernel Machines Support Vector Machines: Margins SVM’s seek the plane that maximizes the margin between the plane and the closest instances of each class. After testing, we do not insist on the margin But distance from plane is still an indication of confidence With minor modification, can extend to classes that are not linearly separable. 6

Support Vector Machines Kernel Machines Defining the Margin � +1 x t ∈ C 1 if � x t , r t } where r t = X = { � x t ∈ C 2 − 1 if � Find � w and w 0 such that x t + w 0 ≥ +1 for r t = +1 w T � � x t + w 0 ≤ +1 for r t = − 1 w T � � or, equialently, x t + w 0 ) ≥ +1 r t ( � w T � 7

Support Vector Machines Kernel Machines Maximizing the Margin Margin is the distance from the discriminant to the closest instances on either side w T � x t + w 0 | x to the hyperplane is | � Distance of � || � w || Let ρ denote the margin side: x t + w 0 | w T � ∀ t , | � ≥ ρ || � w || If we try to maximize ρ , there are infinite solutions form,ed by simply rescaling the � w Fix ρ || w || = 1, and minimize || w || to maximize ρ w T � x t + w 0 | w || 2 subject to ∀ t , | � Minimize 1 2 || � ≥ ρ || � w || 8

Support Vector Machines Kernel Machines Margin Circled inputs are the ones that determine the border 9

Support Vector Machines Kernel Machines Margin Circled inputs are the ones that determine the border After training, we could forget about the others 9

Support Vector Machines Kernel Machines Margin Circled inputs are the ones that determine the border After training, we could forget about the others In fact, we might be able eliminate some of the others before training 9

Support Vector Machines Kernel Machines Derivation (1/3) w || 2 subject to ∀ t , | � w T � x t + w 0 | Minimize 1 2 || � ≥ ρ || � w || N 1 w || 2 − x t + w 0 ) − 1] � α t [ r t ( � w T � = 2 || � L p t =1 N N 1 w || 2 − x t + w 0 )] + � � α t [ r t ( � w T � α t = 2 || � t =1 t =1 N ∂ L p � α t r t � x t w = 0 ⇒ � w = ∂� t =1 10

Support Vector Machines Kernel Machines Derivation (2/3) N ∂ L p � α t r t � x t w = 0 ⇒ � w = ∂� t =1 N ∂ L p α t r t = 0 � = 0 ⇒ ∂ w 0 t =1 Maximizing L p is equivalent to maximizing the dual 1 w T � x t − w 0 α t r t + w T � � � α t r t � α t = 2( � w ) − � L d t t t t α t r t = 0 and α t ≥ 0 subject to � 11

Support Vector Machines Kernel Machines Derivation (3/3) 1 x t − w 0 α t r t + w T � w T � � � α t r t � α t L d = 2( � w ) − � t t t − 1 w T � � α t = 2( � w ) + t − 1 α t α s r t r s + � � � α t = 2 t s t t α t r t = 0 and α t ≥ 0 subject to � Solve numerically. Most α t are 0 The small number with α t > 0 are the support vectors : N � α t r t � x t w = � t =1 12

Support Vector Machines Kernel Machines Support Vectors Circled inputs are the support vectors w = � N t =1 α t r t � x t � Compute w 0 from the average over the support vectors of w 0 = r t − � w T � x t 13

Support Vector Machines Kernel Machines Demo Applet on http://www.csie.ntu.edu.tw/ cjlin/libsvm/ 14

Support Vector Machines Kernel Machines Soft Margin HyperPlanes Suppose the classes are almost, but not quite linearly separable x t + w 0 ) ≥ 1 − ξ t r t ( � w T � The ξ are slack variables, ξ t ≥ 0 storing deviation from the margin ξ t = 0 means � x t is more than 1 away from hyperplane 0 < ξ t < 1 means � x t is within the margin, but correctly classified ξ t ≥ 1 means � x t is misclassified t ξ t is a measure of error. Add as a penalty term � L p = 1 w || 2 + C � ξ t 2 || � t C is a penalty factor that trades off complexity against data misfitting 15

Support Vector Machines Kernel Machines Soft Margin Derivation w || 2 + C � t ξ t C is a penalty factor that trades off L p = 1 2 || � complexity against data misfitting Leads to same numerical optimization problem with new constraint 0 ≤ α t ≤ C 16

Support Vector Machines Kernel Machines Soft Margin Example Applet on http://www.csie.ntu.edu.tw/ cjlin/libsvm/ 17

Support Vector Machines Kernel Machines Hinge Loss � 0 if y t r r ≥ L hinge ( y t , r t ) = 1 − y t r t ow Behavior in 0..1 makes this more robust than 0/1 and squred error Close to cross-entropy over much of its range 18

Support Vector Machines Kernel Machines Non-Linear SVM z = � Replace inputs � x by a sequence of basis functions � φ ( � x ) Linear SVM Kernel SVM z t = α t r t � � � α t r t � x t ) � α t r t � x t � w = φ ( � w = � t t t w T � w T � g ( � x ) = � x g ( � x ) = � φ ( � x ) � α t r t � x T � x t α t r t � x t ) T � � = = φ ( � φ ( � x ) t t 19

Support Vector Machines Kernel Machines The Kernel t α t r t � x T � x t g ( � x ) = � x t is a measure of similarity between � x T � � x and a support vector. 20

Support Vector Machines Kernel Machines The Kernel t α t r t � x T � x t g ( � x ) = � x t is a measure of similarity between � x T � � x and a support vector. t α t r t � x t ) T � g ( � x ) = � φ ( � φ ( � x ) x t ) T � � φ ( � φ ( � x ) can be seen as a similarity measure in the non-linear basis space. 20

Support Vector Machines Kernel Machines The Kernel t α t r t � x T � x t g ( � x ) = � x t is a measure of similarity between � x T � � x and a support vector. t α t r t � x t ) T � g ( � x ) = � φ ( � φ ( � x ) x t ) T � � φ ( � φ ( � x ) can be seen as a similarity measure in the non-linear basis space. x t ) T � x ) = � x t ,� To generalize, let K ( � φ ( � φ ( � x ) � α t r t K ( � x t ,� g ( � x ) = x ) t 20

Support Vector Machines Kernel Machines The Kernel t α t r t � x T � x t g ( � x ) = � x t is a measure of similarity between � x T � � x and a support vector. t α t r t � x t ) T � g ( � x ) = � φ ( � φ ( � x ) x t ) T � � φ ( � φ ( � x ) can be seen as a similarity measure in the non-linear basis space. x t ) T � x ) = � x t ,� To generalize, let K ( � φ ( � φ ( � x ) � α t r t K ( � x t ,� g ( � x ) = x ) t K is a kernel function . 20

Support Vector Machines Kernel Machines Polynomial Kernels x t + 1) q x t ,� x T � K q ( � x ) = ( � E.g., y + 1) 2 K ( � x ,� y ) = ( � x � ( x 1 y 1 + x 2 y 2 + 1) 2 = = 1 + 2 x 1 y 1 + 2 x 2 y 2 +2 x 1 x 2 y 1 y 2 + x 2 1 y 2 1 + x 2 2 y 2 2 √ √ √ � 2 x 1 x 2 , x 2 1 , x 2 φ ( � x ) = [1 , 2 x 1 , 2 x 2 , 2 ] (FWIW) 21

Support Vector Machines Kernel Machines Radial-basis Kernels x t − � x || 2 � � −|| � x t ,� K ( � x ) = exp 2 s 2 Other options include sigmoidal (approximated as tanh ) 22

Support Vector Machines Kernel Machines Selecting Kernels Kernels can be customized to application Choose appropriate measures of similarity Bag of words (normalized cosines between vocabulary vectors) Genetics: edit distance between strings Graphs: length of shortest path between nodes, or number of connecting paths For input sets with very large dimension, may be cheaper to pre-compute the and save the matrix of kernel values ( Gram matrix ) rather than keeping all the inputs available. 23

Support Vector Machines Kernel Machines Multi-Classes 1 versus all K separate N variable problems pairwise separation K(K-1) separate N variable problems single multiclass optimization � K Minimize 1 t ξ t i =1 || w i || 2 + C � � i subject to 2 i x t + w z t 0 ≥ � x t + w i 0 + 2 − ξ t w T w T i , ∀ i � = z t � z t � i � x t where z t is the index of the class of � one K*N variable problem 24

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 - PowerPoint PPT Presentation

Support Vector Machines Kernel Machines Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel Machines Kernel Machines Support Vector Machines 1 Optimal Separating HyperPlanes Soft Margin

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &

Finite State Machines (FSM) AKA Finite State Automat on State Machines Introduction State

A kernel in a library Genodes custom kernel approach Martin Stein <

Linux Kernel Synchronization System Calls Synchronization in Kernel the kernel RCU File

Debugging the Linux Kernel with GDB Kieran Bingham Debugging the Linux Kernel with GDB Many

Advanced Algorithms (IV) Chihao Zhang Shanghai Jiao Tong University Mar. 18, 2019 Advanced

INTRODUCTION TO WEB DEVELOPMENT IN C++ WITH WT 4 https://www.webtoolkit.eu/wt Roel Standaert

Non-Uniform Stochastic Average Gradient for Training Conditional Random Fields Mark Schmidt, Reza

CS675: Convex and Combinatorial Optimization Fall 2019 Introduction to Matroid Theory

Categorification of perfect matchings Alastair King, Bath work in progress with I. Canakci &

Jaume Abella, Francisco J. Cazorla July 4 th Euromicro Conference on Real-Time Systems Barcelona,

Splash User-friendly Programming Interface for Parallelizing Stochastic Algorithms Yuchen Zhang

Organization to Teach Gathering and Implementation of Requirements Gregor Gabrysiak, Regina

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 - PowerPoint PPT Presentation

Support Vector Machines Kernel Machines Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel Machines Kernel Machines Support Vector Machines 1 Optimal Separating HyperPlanes Soft Margin

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &amp;

Finite State Machines (FSM) AKA Finite State Automat on State Machines Introduction State

A kernel in a library Genodes custom kernel approach Martin Stein &lt;

Linux Kernel Synchronization System Calls Synchronization in Kernel the kernel RCU File

Debugging the Linux Kernel with GDB Kieran Bingham Debugging the Linux Kernel with GDB Many

Advanced Algorithms (IV) Chihao Zhang Shanghai Jiao Tong University Mar. 18, 2019 Advanced

INTRODUCTION TO WEB DEVELOPMENT IN C++ WITH WT 4 https://www.webtoolkit.eu/wt Roel Standaert

Non-Uniform Stochastic Average Gradient for Training Conditional Random Fields Mark Schmidt, Reza

CS675: Convex and Combinatorial Optimization Fall 2019 Introduction to Matroid Theory

Categorification of perfect matchings Alastair King, Bath work in progress with I. Canakci &amp;

Jaume Abella, Francisco J. Cazorla July 4 th Euromicro Conference on Real-Time Systems Barcelona,

Splash User-friendly Programming Interface for Parallelizing Stochastic Algorithms Yuchen Zhang

Organization to Teach Gathering and Implementation of Requirements Gregor Gabrysiak, Regina

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &

A kernel in a library Genodes custom kernel approach Martin Stein <

Categorification of perfect matchings Alastair King, Bath work in progress with I. Canakci &