Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 - PowerPoint PPT Presentation

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Introduction Linear parametric models for regression and classification. Memory-based methods: Parzen probability density estimation, k-nearest neighbor. Storing the entire training set in order to make predictions for future data. Fast to “train”, but slow at prediction. Is it possible to connect these two different formulations? Lei Tang Kernel Methods

Dual Representations Many Linear models for regression and classification can be reformulated in terms of a dual representation in which kernel function arises naturally. N J ( w ) = 1 � � 2 + λ � w T φ ( x n ) − t n 2 w T w (1) 2 n =1 Lei Tang Kernel Methods

The derivative with respect to w is N � � � w T φ ( x n ) − t n ∇ J ( w ) = φ ( x n ) + λ w = 0 i =1 N N � � w = − 1 � � w T φ ( x n ) − t n a n φ ( x n ) = Φ T a = ⇒ = λ n =1 n =1 a n = − 1 � � w T φ ( x n ) − t n λ Lei Tang Kernel Methods

Plug in the new formulation of w = Φ T a into J ( w ), = 1 2(Φ w − t ) T (Φ w − t ) + λ 2 w T w J ( w ) = 1 t + 1 2 t T t + λ 2 a T ΦΦ T ΦΦ T a − a T ΦΦ T 2 ΦΦ T a � �� K = 1 2 a T KKa − a T K t + 1 2 t T t + λ 2 a T Ka J ( a ) = ( K + λ I N ) − 1 t = ⇒ a = w T φ ( x ) = a T Φ φ ( x ) = k ( x ) T ( K + λ I N ) − 1 t = a T k ( x ) y ( x ) Lei Tang Kernel Methods

Advantages of dual methods The dual formulation allows the solution to be expressed entirely in terms of the kernel function k ( x , x ′ ). In dual formulation, need to invert a N × N matrix as a = ( K + λ I N ) − 1 t In the original parameter, need to invert a M × M matrix, w = ( λ I + Φ T Φ) − 1 Φ T t If number of instances is smaller than dimensionality, dual formulation is preferred. Dual formulation directly works on kernels, avoids the explicit introduction of feature vector φ ( x ). Lei Tang Kernel Methods

The Representer Theorem More general case: Denote by Ω : [0 , ∞ ) → R a strictly monotonic increasing function, by X a set, and by c an arbitrary loss function. Then each minimizer f ∈ H of the regularized risk c (( x 1 , t 1 , f ( x 1 )) , · · · , ( x N , t N , f ( x N ))) + Ω( || f || H ) admits a representation of the form N � f ( x ) = a n k ( x n , x ) n =1 To be proved later ... Lei Tang Kernel Methods

A toy example √ Define φ ([ x ] 1 , [ x ] 2 ) = ([ x ] 2 1 , [ x ] 2 2 , 2[ x ] 1 [ x ] 2 ) or φ ([ x ] 1 , [ x ] 2 ) = ([ x ] 2 1 , [ x ] 2 2 , [ x ] 1 [ x ] 2 , [ x ] 2 [ x ] 1 ) Then � φ ( x ) , φ ( x ′ ) � [ x ] 2 1 [ x ′ ] 2 1 + [ x ] 2 2 [ x ′ ] 2 2 + 2[ x ] 1 [ x ] 2 [ x ′ ] 1 [ x ′ ] 2 = ([ x ] 1 [ x ′ ] 1 + [ x ] 2 [ x ′ ] 2 ) 2 = � x , x ′ � 2 = The dot product in the 3-dim space can be computed without computing φ . Lei Tang Kernel Methods

More general case Suppose the input vector dimension is M , and we define the feature mapping as to all the d -th order products (monomials) of [ x ] j of x [ x ] j 1 · [ x ] j 2 · · · [ x ] j d After mapping, the dimension becomes M d . To compute the inner product, require at least O ( M d ) operations. M M M � � � � φ d ( x ) , φ d ( x ′ ) � [ x ] j 1 · · · [ x ] j d · [ x ′ ] j 1 · · · [ x ′ ] j d = · · · j 1 =1 j 2 =1 j d =1 M M � � [ x ] j 1 · [ x ′ ] j 1 · · · [ x ] j d [ x ′ ] j d = j 1 =1 j d =1   d M � [ x ] j · [ x ′ ] j = � x , x ′ � d =   j =1 Requires only O ( M ) computation to get the inner product. Lei Tang Kernel Methods

Myths of Kernel Kernel is a similarity measure Kernel corresponds to dot products in feature space H via a mapping φ . k ( x , x ′ ) = � φ ( x ) , φ ( x ′ ) � Questions 1 What kind of kernel functions admits the above form? 2 Give a kernel, how to construct an associated feature space? Lei Tang Kernel Methods

Positive Definite Kernels Gram Matrix Given a function k : X 2 → R , and input x 1 , · · · x N ∈ X , then the matrix K ij := k ( x i , x j ) is called the Gram matrix. Positive Definite Kernel A function k on X × X which for any number of x 1 , x 2 , · · · , x N ∈ X gives rise to a positive semi-definite Gram matrix, is called a positive definite matrix. A positive definite kernel can always be written as inner products of some feature mapping! Lei Tang Kernel Methods

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 - PowerPoint PPT Presentation

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods Introduction Linear parametric models for regression and classification. Memory-based methods: Parzen probability density estimation, k-nearest

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick:

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

1 Kernel methods & optimization One example of a kernel that is frequently used in practice

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

A kernel in a library Genodes custom kernel approach Martin Stein <

Linux Kernel Synchronization System Calls Synchronization in Kernel the kernel RCU File

Convex Analysis in Stochastic Teams and Asymptotic Optimality of Finite Model Representations and

Time-Synchronization in Mobile Sensor Networks from Difference Measurements Chenda Liao and

Tensor Decomposition for Healthcare Analytics Matteo Ruffini Laboratory for Relational

Utilizing Macromodels in Floating Random Walk Based Capacitance Extraction Wenjian Yu Department

Fast Laplace Approximation for Gaussian Ketter Processes with a Tensor Product Kernel

Bayesian Optimization of Gaussian Processes applied to Performance Tuning Ramki Ramakrishna

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

TV Ads Attribution and Gaussian Processes Adrin Jalali November 16, 2016 1 / 27 Problem

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 - PowerPoint PPT Presentation

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods Introduction Linear parametric models for regression and classification. Memory-based methods: Parzen probability density estimation, k-nearest

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick:

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

1 Kernel methods &amp; optimization One example of a kernel that is frequently used in practice

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

A kernel in a library Genodes custom kernel approach Martin Stein &lt;

Linux Kernel Synchronization System Calls Synchronization in Kernel the kernel RCU File

Convex Analysis in Stochastic Teams and Asymptotic Optimality of Finite Model Representations and

Time-Synchronization in Mobile Sensor Networks from Difference Measurements Chenda Liao and

Tensor Decomposition for Healthcare Analytics Matteo Ruffini Laboratory for Relational

Utilizing Macromodels in Floating Random Walk Based Capacitance Extraction Wenjian Yu Department

Fast Laplace Approximation for Gaussian Ketter Processes with a Tensor Product Kernel

Bayesian Optimization of Gaussian Processes applied to Performance Tuning Ramki Ramakrishna

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

TV Ads Attribution and Gaussian Processes Adrin Jalali November 16, 2016 1 / 27 Problem

1 Kernel methods & optimization One example of a kernel that is frequently used in practice

A kernel in a library Genodes custom kernel approach Martin Stein <