Designing Kernel Functions Designing Kernel Functions Using the - PowerPoint PPT Presentation

July 7, 2004. Designing Kernel Functions Designing Kernel Functions Using the Karhunen-Loève Using the Karhunen-Loève Expansion Expansion 1 Fraunhofer FIRST, Germany 2 Tokyo Institute of Technology, Japan 1,2 2 Masashi Sugiyama and Hidemitsu Ogawa

2 Learning with Kernels Learning with Kernels � Kernel methods: Approximate unknown function by f ( x ) α : Parameters n i ∑ ′ = α ˆ f ( x ) K ( x , x ) K ( x , x ) : Kernel function i i = x : Training points i 1 i � Kernel methods are known to generalize very well, given appropriate kernel function. � Therefore, how to choose (or design) kernel function is critical in kernel methods.

3 Recent Development Recent Development in Kernel Design in Kernel Design � Recently, a lot of attention have been paid to designing kernel functions for non-vectorial structured data. e.g., strings, sequence, trees, graphs. � In this talk, however, we discuss the problem of designing kernel functions for standard vectorial data.

4 Choice of Kernel Function Choice of Kernel Function � A kernel function is specified by � A family of functions (Gaussian, polynomial, etc.) � Kernel parameters (width, order, etc.) � We usually focus on a particular family (say Gaussian), and optimize kernel parameters by, e.g., cross-validation. � In principle, it is possible to optimize the family of kernels by CV. � However, this does not seem so common because of too many degrees of freedom.

5 Goal of Our Research Goal of Our Research � We propose a method for finding optimal family of kernel functions using some prior knowledge on problem domain. � We focus on � Regression (squared-loss) � Translation-invariant kernel ′ ′ = − K ( x , x ) K ( x x ) � We do not assume kernel is positive semi- definite, since “kernel trick” is not needed in some regression methods (e.g. ridge).

6 Outline of The Talk Outline of The Talk � A general method for designing translation-invariant kernels. � Example of kernel design for binary regression. � Implication of the results.

7 Specialty of Learning with Specialty of Learning with Translation-Invariant Kernels Translation-Invariant Kernels � Ordinary linear models: p α ∑ ˆ = α ϕ : Parameters f ( x ) ( x ) i i i ϕ ( x ) : Basis function = i 1 i � Kernel models: ′ − K ( x x ) n ∑ = α − ˆ f ( x ) K ( x x ) : Translation- i i = i 1 invariant kernel � is center of kernels. x i � All basis functions have same shape!

8 Local Approximation by Kernels Local Approximation by Kernels � Intuitively, each kernel function is responsible for local approximation in the vicinity of each training input point. x x j i � Therefore, we consider the problem of approximating a function locally by a single kernel function.

9 Set of Local Functions Set of Local Functions and Function Space and Function Space x ′ ψ � : A local function centered at ( x ) Ψ � : Set of all local functions � : A functional Hilbert space H Ψ which contains (i.e., space of local functions) ψ � Suppose is a probabilistic function. ( x ) H ψ ( x ) ψ ( x ) x ′

10 Optimal Approximation to Optimal Approximation to Set of Local Functions Set of Local Functions � We are looking for the optimal approximation ψ Ψ to the set of local functions . ( x ) � Since we are interested in optimizing the family of functions, scaling is not important. φ � We search the optimal direction in . H opt H 2 φ = ψ − ψ arg min E ψ φ opt φ ∈ H E : Expectation over ψ ψ φ φ ψ φ ψ : Projection of onto φ

11 Karhunen-Loève Expansion Karhunen-Loève Expansion 2 φ = ψ − ψ arg min E φ opt φ ∈ H � : Correlation operator of local functions R [ ] ϕ = ϕ ψ ψ ψ R E , If is vector, [ ] = ψψ T ⋅ , ⋅ R E : Inner product in H φ � Optimal direction is given by the opt φ eigenfunction associated with the max λ largest eigenvalue of . R H max φ = λ φ R φ max max max max [ ] ψ ≠ � Similar to PCA, but . E 0

12 Principal Component Kernel Principal Component Kernel φ � Using , we define the kernel function by opt ′ ⎛ − ⎞ x ′ : Center x x ⎜ ⎟ ′ = φ K ( x , x ) ⎜ ⎟ c : Width opt ⎝ ⎠ c � Since the above kernel consists of the principal component of the correlation operator, we call it the principal component (PC) kernel.

13 Example of Kernel Design: Example of Kernel Design: Binary Regression Problem Binary Regression Problem � Learning target function is binary. � Learning target function is binary. 1 f ( x ) 0 � The set of local functions is a set of � The set of local functions is a set of rectangular functions with different width. rectangular functions with different width. 1 ψ ( x ) 0 x i

14 Widths of Rectangular Functions Widths of Rectangular Functions � We assume that the width of rectangular functions is bounded (and normalized). � Since we do not have prior knowledge on the width, we should define its distribution in an “unbiased” manner. � We use uniform distribution for the width since it is non-informative. 1 θ l θ , ~ U ( 0 , 1 ) r 0 θ θ l r

15 Eigenvalue Problem Eigenvalue Problem � We use -space as a function space . L H 2 � Considering the symmetry, the eigenvalue φ = λφ problem is expressed as R 1 ∫ φ = λφ r ( x , y ) ( y ) dy ( x ) 0 = − r ( x , y ) 1 max( x , y ) � The principal component is given by π ⎛ ⎞ φ = ⎟ ⎜ ( x ) 2 cos x max ⎝ ⎠ 2

16 PC Kernel for Binary Regression PC Kernel for Binary Regression ′ ⎧ ′ − − π ⎞ ⎛ x x x x ≤ ⎜ ⎟ ⎪ cos if ⎪ ⎝ ⎠ c c 2 ′ = ⎨ K ( x , x ) ⎪ 0 otherwise ⎪ ⎩ x ′ : Center c : Width ′ = = x 0 , c 1

17 Implication of The Result Implication of The Result � Binary classification is often solved as binary regression with squared-loss (e.g., regularization networks, least-squares SVMs). � Although binary function is not smooth at all, smooth Gaussian kernel often works very well in practice. � Why?

18 Implication of The Result (cont.) Implication of The Result (cont.) � By proper scaling, it can be confirmed that the shape of the obtained PC kernel is similar to Gaussian kernel. � Both kernels work similarly in experiments. Datasets PC kernel Gauss kernel 10.8 ± 0.6 11.4 ± 0.9 Banana 27.1 ± 4.6 27.1 ± 4.9 B.Cancer 23.2 ± 1.8 23.3 ± 1.7 Diabetes 33.6 ± 1.6 33.5 ± 1.6 F.Solar 16.1 ± 3.3 16.2 ± 3.4 Heart 2.9 ± 0.3 6.7 ± 0.9 Ringnorm 6.4 ± 3.0 6.1 ± 2.9 Thyroid 22.7 ± 1.4 22.7 ± 1.0 Titanic 2.6 ± 0.2 3.0 ± 0.2 Twonorm 10.1 ± 0.7 10.0 ± 0.5 Waveform

19 Implication of The Result (cont.) Implication of The Result (cont.) � This implies that Gaussian-like bell- shaped function approximates binary functions very well. � This partially explains why smooth Gaussian kernel is suitable for non- smooth classification tasks.

20 Conclusions Conclusions � Optimizing the family of kernel functions is a difficult task because it has infinitely many degrees of freedom. � We proposed a method for designing kernel functions in regression scenarios. � The optimal kernel shape is given by the principal component of correlation operator of local functions. � We can beneficially use prior knowledge on problem domain (e.g., binary)

Designing Kernel Functions Designing Kernel Functions Using the - PowerPoint PPT Presentation

July 7, 2004. Designing Kernel Functions Designing Kernel Functions Using the Karhunen-Love Using the Karhunen-Love Expansion Expansion 1 Fraunhofer FIRST, Germany 2 Tokyo Institute of Technology, Japan 1,2 2 Masashi Sugiyama and

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Designing for Designing for Greenspace Greenspace Greenspace Designing for Designing for

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Class 14 Slides SLIDE what is the designing principle how does designing principle

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

A kernel in a library Genodes custom kernel approach Martin Stein <

Linux Kernel Synchronization System Calls Synchronization in Kernel the kernel RCU File

Debugging the Linux Kernel with GDB Kieran Bingham Debugging the Linux Kernel with GDB Many

TOS Arno Puder 1 Demo Kernel /* tos/kernel/main.c */ #include <kernel.h> WINDOW

Designing Your Fashion Portfolio From Concept To Presentation Designing Your Fashion Portfolio

Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg Fasshauer Mike McCourt

Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 3 Jan-Willem van de Meent (

Support Vector Machine and Kernel Methods Jiayu Zhou 1 Department of Computer Science and

Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) Andrew Gordon Wilson

AND MACHINE LEARNING CHAPTER 6: KERNEL METHODS Previous Chapters - Presented linear models for

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Edge Detection CS/BIOEN 4640: Image Processing Basics February 9, 2012 Gaussian Blurring for

Loop over all pixels in image pixel F[i,j] a b c 1/9 1/9 1/9 starting from upper right

Designing Kernel Functions Designing Kernel Functions Using the - PowerPoint PPT Presentation

July 7, 2004. Designing Kernel Functions Designing Kernel Functions Using the Karhunen-Love Using the Karhunen-Love Expansion Expansion 1 Fraunhofer FIRST, Germany 2 Tokyo Institute of Technology, Japan 1,2 2 Masashi Sugiyama and

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Designing for Designing for Greenspace Greenspace Greenspace Designing for Designing for

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Class 14 Slides SLIDE what is the designing principle how does designing principle

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

A kernel in a library Genodes custom kernel approach Martin Stein &lt;

Linux Kernel Synchronization System Calls Synchronization in Kernel the kernel RCU File

Debugging the Linux Kernel with GDB Kieran Bingham Debugging the Linux Kernel with GDB Many

TOS Arno Puder 1 Demo Kernel /* tos/kernel/main.c */ #include &lt;kernel.h&gt; WINDOW

Designing Your Fashion Portfolio From Concept To Presentation Designing Your Fashion Portfolio

Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg Fasshauer Mike McCourt

Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 3 Jan-Willem van de Meent (

Support Vector Machine and Kernel Methods Jiayu Zhou 1 Department of Computer Science and

Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) Andrew Gordon Wilson

AND MACHINE LEARNING CHAPTER 6: KERNEL METHODS Previous Chapters - Presented linear models for

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Edge Detection CS/BIOEN 4640: Image Processing Basics February 9, 2012 Gaussian Blurring for

Loop over all pixels in image pixel F[i,j] a b c 1/9 1/9 1/9 starting from upper right

A kernel in a library Genodes custom kernel approach Martin Stein <

TOS Arno Puder 1 Demo Kernel /* tos/kernel/main.c */ #include <kernel.h> WINDOW