Kernel Methods - I Henrik I Christensen Robotics & Intelligent - PowerPoint PPT Presentation

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernel Methods - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Kernel Methods 1 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Outline Introduction 1 Dual Representations 2 Kernel Design 3 Radial Basis Functions 4 Summary 5 Henrik I Christensen (RIM@GT) Kernel Methods 2 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Introduction This far the process has been about data compression and optimal regressions / discrimination Once process complete the training set is discarded and the model is used for processing What if data were kept and used directly for estimation? Why you ask? The decision boundaries migth not be simple or the modelling is too complicated Already discussed Nearest Neighbor (NN) as an example of direct data processing A complete class of memory based techniques Q: how to measure similarity between a data point and samples in memory? Henrik I Christensen (RIM@GT) Kernel Methods 3 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernel Methods What if we could predict based on a linear combination of features? Assume a mapping to a new feature space using φ ( x ) A kernel function is defined by k ( x , x ′ ) = φ ( x ) T φ ( x ′ ) Characteristics: The function is symmetric: k ( x , x ′ ) = k ( x ′ , x ) Can be used both on continuous and symbolic data Simple kernel k ( x , x ′ ) = x T x ′ the linear kernel. A kernel is basically an inner product performed in a feature/mapped space. Henrik I Christensen (RIM@GT) Kernel Methods 4 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernels Consider a complete set of data in memory How can we interpolate new values based on training values? I.e., N 1 � y ( x ) = � k k ( x , x n ) x n n =1 consider k ( ., . ) a weight function that determines contribution based on distance between x and x n Henrik I Christensen (RIM@GT) Kernel Methods 5 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Dual Representation Consider a regression problem as seen earlier N J ( w ) = 1 � 2 + λ � � w T φ ( x n ) − t n 2 w T w 2 n =1 with the solution N N w = − 1 � � � � w T φ ( x n ) − t n a n φ ( x n ) = Φ T a φ ( x n ) = λ n =1 n =1 where a is defined by a n = − 1 � � w T φ ( x n ) − t n λ Substitute w = Φ t a into J ( w ) to obtain J ( a ) = 1 2 a t ΦΦ T Φ T Φ a − a T ΦΦ T t + 1 2 t T t + λ 2 a T ΦΦ T a which is termed the dual representation Henrik I Christensen (RIM@GT) Kernel Methods 7 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Dual Representation II Define the Gram matrix - K = ΦΦ T to get J ( a ) = 1 2 a T KK T a − a T Kt + 1 2 t T t + λ 2 a T Ka where K nm = φ ( x m ) T φ ( x n ) = k ( x m , x n ) J ( a ) is then minimized by a = ( K + λ I N ) − 1 t Through substitution we obtain y ( x ) = w T φ ( x ) = a T Φ φ ( x ) = k ( x ) T ( K + λ I N ) − 1 t We have in reality mapped the program to another (dual) space in which it is possible to optimize the regression/discrimination problem Typically N >> M so the immediate advantage is not obvious. See later. Henrik I Christensen (RIM@GT) Kernel Methods 8 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Constructing Kernels How would we construct kernel functions? One approach is to choose a mapping and find corresponding kernels A one dimensional example M k ( x , x ′ ) = φ ( x ) T φ ( x ′ ) = � φ i ( x ) φ i ( x ′ ) n =1 where φ i ( . ) are basis functions Henrik I Christensen (RIM@GT) Kernel Methods 10 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernel Basis Functions - Example 1 1 1 0.5 0.75 0.75 0 0.5 0.5 −0.5 0.25 0.25 −1 0 0 −1 0 1 −1 0 1 −1 0 1 1.0 2.0 6.0 1.0 3.0 0.0 −0.4 0.0 0.0 −1 0 1 −1 0 1 −1 0 1 Henrik I Christensen (RIM@GT) Kernel Methods 11 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Construction of Kernels We can also design kernels directly. Must correspond to a scala product in “some” space Consider: k ( x , z ) = ( x T z ) 2 for a 2-dimensional space x = ( x 1 , x 2 ) ( x T z ) 2 = ( x 1 z 1 + x 2 z 2 ) 2 k ( x , z ) = x 2 1 z 2 1 + 2 x 1 z 1 x 2 z 2 + x 2 2 z 2 = z √ √ ( x 2 2 x 1 x 2 , x 2 2 )( z 2 2 z 1 z 2 , z 2 2 ) T = 1 , 1 , φ ( x ) T φ ( z ) = In general if the Gram matrix, K , is positive semi-definite the kernel function is valid Henrik I Christensen (RIM@GT) Kernel Methods 12 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Techniques for construction of kernels k ( x , x ′ ) c 1 k ( x , x ′ ) = k ( x , x ′ ) f ( x ) k ( x , x ′ ) f ( x ′ ) = k ( x , x ′ ) q ( k ( x , x ′ )) = k ( x , x ′ ) exp ( k ( x , x ′ )) = k ( x , x ′ ) k 1 ( x , x ′ ) + k 2 ( x , x ′ ) = k ( x , x ′ ) k 1 ( x , x ′ ) k 2 ( x , x ′ ) = k ( x , x ′ ) x T Ax ′ = Henrik I Christensen (RIM@GT) Kernel Methods 13 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary More kernel examples/generalizations We could generalize k ( x , x ′ ) = ( x T x ′ ) 2 in various ways k ( x , x ′ ) = ( x T x ′ + c ) 2 1 k ( x , x ′ ) = ( x T x ′ ) M 2 k ( x , x ′ ) = ( x T x ′ + c ) M 3 Example correlation between image regions Another option is k ( x , x ′ ) = e −|| x T − x ′ || / 2 σ 2 called the “Gaussian kernel” Several more examples in the book Henrik I Christensen (RIM@GT) Kernel Methods 14 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Radial Basis Functions What is a radial basis function? φ j ( x ) = h ( || x − x j || ) How to average/smooth across data entire based on distance? N � y ( x ) = w n h ( || x − x n || ) n =1 the weights w n could be estimated using LSQ A popular interpolation strategy is N � y ( x ) = t n h ( x − x n ) n =1 where ν ( x − x n ) h ( x − x n ) = � j ν ( x − x j ) Henrik I Christensen (RIM@GT) Kernel Methods 16 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary The effect of normalization? 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 Henrik I Christensen (RIM@GT) Kernel Methods 17 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Nadaraya-Watson Models Lets interpolate across all data! Using a Parzen density estimator we have N p ( x , t ) = 1 � f ( x − x n , t − t n ) N n =1 We can then estimate � ∞ y ( x ) = E [ t | x ] = tp ( t | x ) dt −∞ � tp ( x , t ) dt = � p ( x , t ) dt � n g ( x − x n ) t n = � m g ( x − x m ) � = k ( x , x n ) t n n where Henrik I Christensen (RIM@GT) Kernel Methods 18 / 22 �

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Gaussian Mixture Example Assume a particular one-dimensional function (here sine) with noise Each data point is an iso-tropic Gaussian Kernel Smoothing factors are determined for the interpolation Henrik I Christensen (RIM@GT) Kernel Methods 19 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Gaussian Mixture Example 1.5 1 0.5 0 −0.5 −1 −1.5 0 0.2 0.4 0.6 0.8 1 Henrik I Christensen (RIM@GT) Kernel Methods 20 / 22

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Summary Memory based methods - keeping the data! Design of distrance metrics for weighting of data in learning set Kernels - a distance metric based on dot-product in some feature space Being creative about design of kernels We’ll come back to the complexity issues Henrik I Christensen (RIM@GT) Kernel Methods 22 / 22

Kernel Methods - I Henrik I Christensen Robotics & Intelligent - PowerPoint PPT Presentation

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernel Methods - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick:

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

1 Kernel methods & optimization One example of a kernel that is frequently used in practice

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

A kernel in a library Genodes custom kernel approach Martin Stein <

Lagrangian Duality Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

Chapter 1 Linear Programming Paragraph 5 Duality What we did so far We developed the

Duality (I) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline The Lagrange Dual

SVM Duality summary Lagrangian n L ( w , ) = 1 2 w 2 T 2 + i (1 y i x i w

Outline: 1. A quick reminder of FZZ duality. 2. Target space interpretation of FZZ in the

A word on duality Jonathan Turk Arizona State University October 21, 2020 Overview

Probabilistic & Unsupervised Learning Exponential families: convexity, duality and free

Optimization Theory and n n n 1 minimize

Kernel Methods - I Henrik I Christensen Robotics & Intelligent - PowerPoint PPT Presentation

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernel Methods - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick:

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

1 Kernel methods &amp; optimization One example of a kernel that is frequently used in practice

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

A kernel in a library Genodes custom kernel approach Martin Stein &lt;

Lagrangian Duality Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

Chapter 1 Linear Programming Paragraph 5 Duality What we did so far We developed the

Duality (I) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline The Lagrange Dual

SVM Duality summary Lagrangian n L ( w , ) = 1 2 w 2 T 2 + i (1 y i x i w

Outline: 1. A quick reminder of FZZ duality. 2. Target space interpretation of FZZ in the

A word on duality Jonathan Turk Arizona State University October 21, 2020 Overview

Probabilistic &amp; Unsupervised Learning Exponential families: convexity, duality and free

Optimization Theory and n n n 1 minimize

1 Kernel methods & optimization One example of a kernel that is frequently used in practice

A kernel in a library Genodes custom kernel approach Martin Stein <

Probabilistic & Unsupervised Learning Exponential families: convexity, duality and free