MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING - PowerPoint PPT Presentation

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING kernels 1

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Intuition How to separate the red class from the grey class? x  2 360  r x 1 Polar coordinates Data become linearly separable 2

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Intuition How to separate the red class from the grey class? x  2 360    What is ?  0 i x x i x r x 1 Assume a model (equation) for the   2 2 : transformation.        Need at least 3 datapoints to solve. i i i x r ,              0 i i i i i i i 0 Solve for , , s.t. sin , cos x r x r r x 3

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Intuition Feature Space H Original Space x  2  360    i x i x r x 1 Idea : Send the data X into a feature space H through the nonlinear map .            i 1... M 1 ,.....,       i N M X x X x x In feature space, computation is simpler (e.g. perform linear classification) 4

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Intuition In most cases, determining beforehand the transformation  may be difficult. Which representation of the data allows to classify easily the three groups of datapoints? 5

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Intuition In most cases, determining beforehand the transformation  may be difficult. What if the groups live in N dimensions , with N>>1. Grouping may require separate sets of dimensions and can no longer be visualized 6

MACHINE LEARNING – 2012 MACHINE LEARNING Kernel-Induced Feature Space Idea: Send the data X into a feature space H through the nonlinear map .            i 1... M 1 ,.....,       i N M X x X x x H  While the dimension of the Original Space original space is N, the dimension of the feature x 2 space may be greater than N!  X is lifted onto H Determining  is difficult  Kernel Trick 7

MACHINE LEARNING – 2012 MACHINE LEARNING The Kernel Trick In most cases, determining the transformation  may be difficult.  Kernel trick Key idea behind the kernel trick : Most algorithms for classification, regression or clustering compute an inner product across pairs of observations to determine the separating line, the fit or the grouping of datapoints, respectively: i j Inner product across two datapoints: x x , 8

MACHINE LEARNING – 2012 MACHINE LEARNING The Kernel Trick In most cases, determining the transformation  may be difficult.  No need to compute the transformation  , if one expresses everything as a function of the inner product in feature space.  Proceed as follows: The function k can be used to determine a metric of similarity across datapoints in feature 1) Define a kernel function :   space. k X : X  It can extract features that are          either common or that distinguish i j i j k x x , x , x . groups of datapoints. 2) Use this transformation to perform classical classification, regression or clustering for the linear case. 9

MACHINE LEARNING – 2012 MACHINE LEARNING Use of Kernels: Example Key idea: Some problems are made simpler if you change the representation of the data Which representation of the data allows to separate linearly the two groups of datapoints? 10

MACHINE LEARNING – 2012 MACHINE LEARNING Use of Kernels: Example Key idea: Some problems are made simpler if you change the representation of the data Data becomes linearly separable when projected onto two first principal component of kernel PCA with RBF kernel (see next lecture) 11

MACHINE LEARNING – 2012 MACHINE LEARNING Use of Kernels: Example Key idea: Some problems are made simpler if you change the representation of the data Which representation of the data allows to classify easily the three groups of datapoints in a different cluster? 12

MACHINE LEARNING – 2012 MACHINE LEARNING Use of Kernels: Example Key idea: Some problems are made simpler if you change the representation of the data Data are correctly clustered when using kernel K-means, using a RBF kernel (see lecture next week) 13

MACHINE LEARNING – 2012 MACHINE LEARNING Popular Kernels • Gaussian / RBF Kernel (translation-invariant): 2  x x '        2 k x x , ' e 2 , . • Homogeneous Polynomial Kernels:    p  k x x , ' x x , ' , p ; • Inhomogeneous Polynomial Kernels:     p     k x x , ' x x , ' c , p , c 0 14

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Exercise I  2 ' x x        2 Using the RBF kernel: k x x , ' e 2 , , a) draw the isolines for:    1 1 - one datapoint , i..e.: Find all , s.t. , . x x k x x cst       1 2 1 2 - two datapoints, , : Find all , s.t. , , . x x x k x x k x x cst       1 2 1 2 - two datapo ints, , x x : Find all , s.t. x k x x , k x x , cst . - three datapoints  Discuss the effect of on the isolines. c) determine a metric in feature space 15

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise I RBF Kernel; M=1, i.e. 1 data point 16

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise I Gaussian Kernel; M=2, i.e. 2 data points 17

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise I Gaussian Kernel; M=2, i.e. 2 data points Small kernel width Large kernel width 18

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Exercise II Using the homogeneous polynomial kernel:    p  , ' , ' , , k x x x x p draw the isolines as in previous exercise for: a) one datapoint b) two datapoints c) three datapoints Discuss the effect of on the isolines. p 22

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise II Polynomial Kernel; order p=1, 2, 3; M=1, i.e. 1 data points p=1 p=2 p=3 The isolines are lines perpendicular to the vector point from the origin. The order p does not change the geometry and only changes the values of the isolines. 23

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise II Polynomial Kernel; order p=1, 2, 3; M=2, i.e. 2 data points p=1 p=2 p=3 The isolines are lines perpendicular to the combination of the vector points for p=1. With p=2, we have an ellipse. With p=3, we have hyperbolas. P>3 are similar in concept with change in signs of the values of the isolines depending on whether we have odd or even p. 24

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise II Polynomial Kernel; order p=1, 2, 3; M=3, i.e. 3 data points Solutions with p>1 present a symmetry around the origin. 25

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Exercise III Another also relatively popular kernel is the linea r kernel:    T k x x , ' x x '. 1) Can you tell what this kernel measures? 2) Find an application where using the lin ear kernel provides an in teresting measure . 26

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise III Bags of words: [machine, learning, kernel, rbf, robot, vision, dimension, blue, speed,...] You want to group webpages with common groups of words.  1000 Set x with each entry in set to 1 if the word i x s present else zero.   x  1 E.g. 1,1,1,0,0,0,.... contains the words machine learning and kernel and nothing else. Features live in low-dimensional space (common group of webpages have a low number of combin ation of words):          i j T T T T T k x x , x x x x x x x x x x ... k k 1 1 2 2 3 3 4 4 k    1 j The isoline k x x , 3 delineate the set of webpages that share the same 1 set of three keywords as . x 27

MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise III Sequence of strings (e.g genetic code): [IPTS L QD VBUV,...] Want to group strings with common subgroups of strings.     1000 Set x , x the number of times sub-string appears x in the string word. Apply same e r asoning as before for grouping. 28

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING - PowerPoint PPT Presentation

MACHINE LEARNING 2012 MACHINE LEARNING MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How to separate the red class from the grey class? x 2 360 r x 1 Polar coordinates Data

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE &

Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Vis u ali z ation of Linear Models C OR R E L ATION AN D R E G R E SSION IN R Ben Ba u mer

Visualization of Linear Models Correlation and Regression Possums > ggplot(data = possum,

Adding Factors and Interactions Danielle Quinn PhD Candidate, Memorial University Regression

Advice for applying Machine Learning Andrew Ng Stanford University Andrew Y. Ng Todays

Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology

Web Mining and Recommender Systems Supervised learning Regression Learning Goals Introduce

Sco$ Speidel, Colorado State University 6/1/17 Brief Stayability

ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING - PowerPoint PPT Presentation

MACHINE LEARNING 2012 MACHINE LEARNING MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How to separate the red class from the grey class? x 2 360 r x 1 Polar coordinates Data

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE &amp;

Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Vis u ali z ation of Linear Models C OR R E L ATION AN D R E G R E SSION IN R Ben Ba u mer

Visualization of Linear Models Correlation and Regression Possums &gt; ggplot(data = possum,

Adding Factors and Interactions Danielle Quinn PhD Candidate, Memorial University Regression

Advice for applying Machine Learning Andrew Ng Stanford University Andrew Y. Ng Todays

Linear &amp; nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology

Web Mining and Recommender Systems Supervised learning Regression Learning Goals Introduce

Sco$ Speidel, Colorado State University 6/1/17 Brief Stayability

ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING

Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE &

Visualization of Linear Models Correlation and Regression Possums > ggplot(data = possum,

Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology