Kernel Methods CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides - PowerPoint PPT Presentation

Kernel Methods CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Piyush Rai

Beyond linear classification • Problem: linear classifiers – Easy to implement and easy to optimize – But limited to linear decision boundaries • What can we do about it? – Last week: Neural networks • Very expressive but harder to optimize (non- convex objective) – Today: Kernels

Kernel Methods • Goal: keep advantages of linear models, but make them capture non-linear patterns in data! • How? – By mapping data to higher dimensions where it exhibits linear patterns

Classifying non-linearly separable data with a linear classifier: examples Non-linearly separable data in 1D Becomes linearly separable in new 2D space defined by the following mapping:

Classifying non-linearly separable data with a linear classifier: examples Non-linearly separable data in 2D Becomes linearly separable in the 3D space defined by the following transformation:

Defining feature mappings • Map an original feature vector to an expanded version • Example: quadratic feature mapping represents feature combinations

Feature Mappings • Pros: can help turn non-linear classification problem into linear problem • Cons: “feature explosion” creates issues when training linear classifier in new feature space – More computationally expensive to train – More training examples needed to avoid overfitting

Kernel Methods • Goal: keep advantages of linear models, but make them capture non-linear patterns in data! • How? – By mapping data to higher dimensions where it exhibits linear patterns – By rewriting linear models so that the mapping never needs to be explicitly computed

The Kernel Trick • Rewrite learning algorithms so they only depend on dot products between two examples • Replace dot product by kernel function which computes the dot product implicitly

Example of Kernel function

Another example of Kernel Function (see CIML 9.1) What is the function k(x,z) that can implicitly compute the dot product ?

Kernels: Formally defined

Kernels: Mercer’s condition • Can any function be used as a kernel function? • No! it must satisfy Mercer’s condition. For all square integrable functions f

Kernels: Constructing combinations of kernels

Commonly Used Kernel Functions

The Kernel Trick • Rewrite learning algorithms so they only depend on dot products between two examples • Replace dot product by kernel function which computes the dot product implicitly

“ Kernelizing ” the perceptron • N aïve approach: let’s explicitly train a perceptron in the new feature space Can we apply the Kernel trick? Not yet, we need to rewrite the algorithm using dot products between examples

“ Kernelizing ” the perceptron • Perceptron Representer Theorem “During a run of the perceptron algorithm, the weight vector w can always be represented as a linear combination of the expanded training data” Proof by induction (on board + see CIML 9.2)

“ Kernelizing ” the perceptron • We can use the perceptron representer theorem to compute activations as a dot product between examples

“ Kernelizing ” the perceptron • Same training algorithm, but doesn’t explicitly refers to weights w anymore only depends on dot products between examples • We can apply the kernel trick!

Kernel Methods • Goal: keep advantages of linear models, but make them capture non-linear patterns in data! • How? – By mapping data to higher dimensions where it exhibits linear patterns – By rewriting linear models so that the mapping never needs to be explicitly computed

Discussion • Other algorithms can be kernelized: – See CIML for K-means – We’ll talk about Support Vector Machines next • Do Kernels address all the downsides of “ feature explosion ”? – Helps reduce computation cost during training – But overfitting remains an issue

What you should know • Kernel functions – What they are, why they are useful, how they relate to feature combination • Kernelized perceptron – You should be able to derive it and implement it

Kernel Methods CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides - PowerPoint PPT Presentation

Kernel Methods CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Piyush Rai Beyond linear classification Problem: linear classifiers Easy to implement and easy to optimize But limited to linear decision boundaries What

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick:

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

1 Kernel methods & optimization One example of a kernel that is frequently used in practice

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

A kernel in a library Genodes custom kernel approach Martin Stein <

L ECTURE 8: D UAL AND K ERNELS Prof. Julia Hockenmaier juliahmr@illinois.edu Admin CS446

Kernels Course of Machine Learning Master Degree in Computer Science Giorgio Gambosi a.a.

Why the Best Predictive What Do We Mean by . . . Models Are Often Different Main Result: . . .

The formation of gas dwarfs and rocky planets a case for the new DISPATCH Code ke Nordlund

Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2019 Soleymani

Kernel Methods for regression and classification Prof. Mike Hughes Many ideas/slides

Kernels CS678 Advanced Topics in Machine Learning Thorsten Joachims Spring

Cross-Hospital Bed Management System Abedian S, Kazemi H , Riazi H and Bitaraf E. In the Name of

Kernel Methods CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides - PowerPoint PPT Presentation

Kernel Methods CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Piyush Rai Beyond linear classification Problem: linear classifiers Easy to implement and easy to optimize But limited to linear decision boundaries What

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick:

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

1 Kernel methods &amp; optimization One example of a kernel that is frequently used in practice

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

A kernel in a library Genodes custom kernel approach Martin Stein &lt;

L ECTURE 8: D UAL AND K ERNELS Prof. Julia Hockenmaier juliahmr@illinois.edu Admin CS446

Kernels Course of Machine Learning Master Degree in Computer Science Giorgio Gambosi a.a.

Why the Best Predictive What Do We Mean by . . . Models Are Often Different Main Result: . . .

The formation of gas dwarfs and rocky planets a case for the new DISPATCH Code ke Nordlund

Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2019 Soleymani

Kernel Methods for regression and classification Prof. Mike Hughes Many ideas/slides

Kernels CS678 Advanced Topics in Machine Learning Thorsten Joachims Spring

Cross-Hospital Bed Management System Abedian S, Kazemi H , Riazi H and Bitaraf E. In the Name of

1 Kernel methods & optimization One example of a kernel that is frequently used in practice

A kernel in a library Genodes custom kernel approach Martin Stein <