Learning with kernels and SVM malova chata, 23. kv etna, 2006 - PowerPoint PPT Presentation

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning with kernels and SVM Šámalova chata, 23. kvˇ etna, 2006 Petra Kudová Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find a general rule that explains data given only as a sample of limited size data may contain measurement errors or noise supervised learning data are sample of input-output pairs find input-output mapping prediction, classification, function approximation, etc. unsupervised learning data are sample of objects find some structure clustering, etc. Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning methods wide range of methods available statistical approaches neural networks originally biological motivation Multi-layer perceptrons, RBF networks Kohonen maps kernel methods modern and popular SVM Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Trends in machine learning Articles on machine learning found by Google Source: http://yaroslavvb.blogspot.com/ Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Trends in machine learning Articles on neural networks found by Google Source: http://yaroslavvb.blogspot.com/ Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Trends in machine learning Articles on suport vector machine found by Google Source: http://yaroslavvb.blogspot.com/ Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Binary classification Training set { ( x i , y i ) } m i = 1 x i ∈ X y i ∈ {− 1 , 1 } find classifier generalization Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Simple Classifier Suppose: X ⊂ R n , classes linearly separable 1 c + = � { i | y i =+ 1 } x i m + 1 � c − = { i | y i = − 1 } x i m − c = 1 2 ( c + + c − ) y = sgn ( � ( x − c ) , w � ) = sgn ( � ( x − ( c + + c − ) / 2 ) , (( c + + c − ) � ) = sgn ( � x , c + � − � x , c − � + b ) b = 1 2 ( || c − || 2 − || c + || 2 ) Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Mapping to the feature space life is not so easy, not all problems are linearly separable what to do if X is not dot-product space? choose a mapping to some (high dimensional) dot-product space - feature space Φ : X → H Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Mercer’s condition and Kernels If a symmetric function K ( x , y ) satisfies M a i a j K ( x i , x j ) ≥ 0 � i , j = 1 for all M ∈ N , x i , and a i ∈ R , there exists a mapping function Φ that maps x into the dot-product feature space and K ( x , y ) = � Φ( x ) , Φ( y ) � and vice versa. Function K is called kernel. Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Examples of kernels Linear Kernels K ( x , y ) = � x , y � Polynomial Kernels K ( x , y ) = ( � x , y � + 1 ) d for d = 2 and 2-dimensional inputs K ( x , y ) 1 + 2 x 1 y 1 + 2 x 2 y 2 + 2 x 1 y 1 x 2 y 2 + x 2 1 y 2 1 + x 2 2 y 2 = 2 = � Φ( x ) , Φ( x ) � √ √ √ 2 ) T 2 x 1 , 2 x 2 , 2 x 1 x 2 , x 2 1 x 2 ( 1 , Φ( x ) = Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Examples of kernels RBF Kernels K ( x , y ) = exp ( −|| x − y || 2 ) d 2 Other kernels kernels on various objects, such as graphs, strings, texts, etc. enable us to use dot-product algorithms measure of similarity Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Simple Classifier - kernel version Suppose: X ⊂ R n , classes linearly separable 1 c + = � { i | y i =+ 1 } x i m + 1 � c − = { i | y i = − 1 } x i m − c = 1 2 ( c + + c − ) y = sgn ( � ( x − c ) , w � ) = sgn ( � ( x − ( c + + c − ) / 2 ) , (( c + + c − ) � ) = sgn ( � x , c + � − � x , c − � + b ) b = 1 2 ( || c − || 2 − || c + || 2 ) Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Simple Classifier - kernel version Suppose: X is any set, Φ : X → H corresponding to kernel K 1 c + = � { i | y i =+ 1 } x i m + 1 � c − = { i | y i = − 1 } x i m − c = 1 2 ( c + + c − ) y = sgn ( � ( x − c ) , w � ) = sgn ( � ( x − ( c + + c − ) / 2 ) , (( c + + c − ) � ) = sgn ( � x , c + � − � x , c − � + b ) b = 1 2 ( || c − || 2 − || c + || 2 ) Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Simple Classifier - kernel version Suppose: X is any set, Φ : X → H corresponding to kernel K y = sgn ( 1 K ( x , x i ) − 1 K ( x , x i ) + b ) � � m + m − { i | y i =+ 1 } { i | y i = − 1 } b = 1 2 ( 1 K ( x i , x j ) − 1 K ( x i , x j )) � � m 2 m 2 − { i , j | y i = y j = − 1 } { i , j | y i = y j =+ 1 } + Statistical approach Bayes classifier - special case � K ( x , y ) d x = 1 b = 0 ∀ y ∈ X ; X Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Simple Classifier - kernel version Suppose: X is any set, Φ : X → H corresponding to kernel K y = sgn ( 1 K ( x , x i ) − 1 K ( x , x i ) + 0 ) � � m + m − { i | y i =+ 1 } { i | y i = − 1 } = p + ( x ) = p − ( x ) Parzen windows Statistical approach Bayes classifier - special case � K ( x , y ) d x = 1 b = 0 ∀ y ∈ X ; X Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Separating hyperplane classifier in a form y ( x ) = sgn ( � w , x � + b ) � for y i = 1 > 0 � w , x i � + b for y i = − 1 < 0 each hyperplane D ( x ) = � w , x � + b = c , − 1 < c < 1 is separating optimal separating hyperplane - the one with the maximal margin Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Separating hyperplane classifier in a form y ( x ) = sgn ( � w , x � + b ) � for y i = 1 ≥ 1 � w , x i � + b for y i = − 1 ≤ − 1 each hyperplane D ( x ) = � w , x � + b = c , − 1 < c < 1 is separating optimal separating hyperplane - the one with the maximal margin Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Classifier with maximal margin Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Classifier with maximal margin y ( x ) = sgn ( � w , x � + b ) where w and b are solution of Q ( w ) = 1 min Q ( w ) , 2 || w || 2 with respect to constraints y i ( � w , x i � + b ) ≥ 1 , for i = 1 , . . . , M quadratic programming problem linear separability → solution exists no local minima Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Classifier with maximal margin constrained optimization problem 1 2 || w || 2 subject to y i ( � w , x i � + b ) ≥ 1 min w can be handled by introducing Lagrange multipliers α i ≥ 0 m L ( w , b , α ) = 1 2 || w || 2 − α i ( y i ( � w , x i � + b ) − 1 ) � i = 1 minimize with respect to w and b maximize with respect to α i Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Classifier with maximal margin m L ( w , b , α ) = 1 2 || w || 2 − α i ( y i ( � w , x i � + b ) − 1 ) � i = 1 minimize with respect to w , b ; maximize with respect to α Karush-Kuhn-Tucker (KKT) conditions δ L ( w , b , α ) δ L ( w , b , α ) = 0 = 0 δ b δ w w = � m � m i = 1 α i y i x i i = 1 α i y i = 0 we get y i ( � w , x i � + b ) > 1 → α i = 0 x i irrelevant y i ( � w , x i � + b ) = 1 → α i � = 0 x i support vector Šámalka, 23. 5. 2006

Learning with kernels and SVM malova chata, 23. kv etna, 2006 - PowerPoint PPT Presentation

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning with kernels and SVM malova chata, 23. kv etna, 2006 Petra Kudov malka, 23. 5. 2006 Introduction Binary classification

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

SVM on Intel Graphics Jesse Barnes Intel Open Source Technology Center 1 What is SVM?

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Machine Learning Theory CS 446 1. SVM risk SVM risk Consider the empirical and true/population

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Machine learning for computational biology Jean-Philippe Vert Jean-Philippe.Vert@mines.org

LEARNING Master in Artificial Intelligence Reference Christopher M. Bishop - Pattern

Cluster-based Segmentation Algorithm Why Shift What is Mean Idea K-means++ based Algorithm

Learning the Density Structure of High-Dimensional Data Yoshua Bengio Work done with Martin

Generative adversarial networks Ian Jean Mehdi Goodfellow Pouget-Abadie Mirza David Bing

Smooth Local Histograms Filters Micheal Kass and Justin Solomon Yeara Kozlov Saarland University

Uncertainty Reasoning through Similarity in Context Claudia dAmato Nicola Fanizzi

Pascals Triangle MDM4U: Mathematics of Data Management Pascals Triangle is an arrangement

Getting it Right in the Early Years Foundation Stage: A review of the evidence Chris Pascal,