Kernel methods for Network Analysis: An introduction Chiranjib - PowerPoint PPT Presentation

Kernel methods for Network Analysis: An introduction Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 13th Jan, 2013

Computational Biology Which super-family does this protein structure belongs to?

Multimedia Who are the actors?

Social Networks How can one run a succesful Ad-campaign on this network?

Data Representation as a vector

Data Representation as a vector   f  e      a     t     u     r     e     m     a   p

When we have Feature maps Linear Classifiers, Principal Component Analysis

Similarity maybe readily available Problem Feature maps are not readily available

Kernel functions- a formal notion of similarity functions Kernel functions are essentially similarity functions. One can easily generalize many existing algorithms using kernel functions.Sometimes called the kernel trick Kernels can help integrate different sources of data

Agenda 1 Kernel Trick SVMs and Non-linear Classification Principal Component Analysis What can we compute with the dot product in feature spaces? 2 Mathematical Foundations RKHS, Representer theorem 3 Kernels on Graphs aka Networks Kernels on vertices of a Graph Kernels on graphs 4 Advanced Topics: Multiple Kernel Learning

1 Kernel Trick SVMs and Non-linear Classification Principal Component Analysis What can we compute with the dot product in feature spaces? 2 Mathematical Foundations RKHS, Representer theorem 3 Kernels on Graphs aka Networks Kernels on vertices of a Graph Kernels on graphs 4 Advanced Topics: Multiple Kernel Learning

PART 1: KERNEL TRICK

The problem of classification Given Training data D = { ( x i , y i ) | i = 1 ,..., m } observation x i class label y i ∈ {− 1 , 1 } Find A classifier f : X → {− 1 , 1 } . f ( x ) = sign ( w ⊤ x + b )

Regularized risk m 1 max ( 1 − y i ( w ⊤ x i + b ) , 0 ) 2 � w � 2 ∑ + min w , b C i = 1 � �� Regularization Risk

Regularized risk m 1 max ( 1 − y i ( w ⊤ x i + b ) , 0 ) 2 � w � 2 ∑ + min w , b C i = 1 � �� Regularization Risk The SVM formulation m 1 2 � w � 2 + C ∑ ξ i min w , b , ξ i = 1 subject to y i ( w ⊤ x i + b ) ≥ 1 − ξ i ξ i ≥ 0 ∀ i ∈ [ m ]

SVM formulation m α i − 1 α i α j y i y j x ⊤ ∑ 2 ∑ maximize α i x j i = 1 ij m ∑ subject to 0 ≤ α i , α i y i = 0 i = 1

SVM formulation m α i − 1 α i α j y i y j x ⊤ ∑ 2 ∑ maximize α i x j i = 1 ij m ∑ subject to 0 ≤ α i , α i y i = 0 i = 1 w = ∑ m i = 1 α i y i x i m α i y i x ⊤ ∑ f ( x ) = sign ( i x + b ) i = 1

C-SVM in feature spaces Let us work with a feature map, Φ ( x ) m maximize α − 1 α i α j y i y j Φ ( x i ) ⊤ Φ ( x j )+ 2 ∑ ∑ α i i = 1 ij subject to 0 ≤ α i , ∑ α i y i = 0 i m α i y i Φ ( x i ) ⊤ Φ ( x )+ b ) ∑ f ( x ) = sign ( i = 1 The dot product between any pair of examples computed in the feature space be denoted by K ( x , z ) = Φ ( x ) ⊤ Φ ( z )

C-SVM in feature spaces Let us work with a feature map, Φ ( x ) m maximize α − 1 2 ∑ ∑ α i α j y i y j K ( x i , x j )+ α i i = 1 ij subject to 0 ≤ α i , ∑ α i y i = 0 i m ∑ f ( x ) = sign ( α i y i K ( x i , x )+ b ) i = 1 The dot product between any pair of examples computed in the feature space be denoted by K ( x , z ) = Φ ( x ) ⊤ Φ ( z )

Principal Component Analysis(PCA) Principal Directions Given X = [ x 1 ,..., x m ] find directions of maximum variance( Jollife 2002). The direction of maximum variance, v , is given by 1 mXX ⊤ v = λ v (assuming that Xe = 0) Define v = X α 1 mXX ⊤ X α = λ X α leading to the following eigenvalue problem 1 m K α = λα where ( K ) ij = ( X ⊤ X ) ij = x ⊤ i x j .

Nonlinear component analysis(Scholkopf et al. 1996) Compute PCA in feature spaces Replace x ⊤ i x j by Φ ( x i ) ⊤ Φ ( x j ) Principal component of x In input space In feature space v ⊤ x ∑ m i = 1 α i K ( x i , x )

We just need the dot product √ Let x ∈ IR 2 and Φ ( x ) = [ x 2 2 x 1 x 2 ] ⊤ 1 x 2 2 K ( x , z ) = Φ ( x ) ⊤ Φ ( z ) = x 2 2 = ( x ⊤ z ) 2 1 z 2 1 + 2 x 1 x 2 z 1 z 2 + x 2 2 z 2 � d + r − 1 � If K ( x , z ) = ( x ⊤ z ) r is a dot product in a feature space r corresponding to x , z ∈ IR d . If d = 256 , r = 4, the feature space size is 6 , 35 , 376. However if we know K one can still solve the SVM formulation without explicitly evaluating Φ

Norms, Distances � � � Φ ( x ) � = � Φ ( x ) , Φ ( x ) � = K ( x , x ) Normalized features Φ ( x ) K ( x , z ) Φ ( x ) ⊤ ˆ ˆ K ( x , z ) = ˆ ˆ Φ ( x ) = Φ ( z ) = � � Φ ( x ) � K ( x , x ) K ( z , z ) Distances � Φ ( x ) − Φ ( z ) � 2 = ( Φ ( x ) − Φ ( z )) ⊤ ( Φ ( x ) − Φ ( z )) = K ( x , x )+ K ( z , z ) − 2 K ( x , z ) If Φ is normalized K ( x , x ) = 1 then � Φ ( x ) − Φ ( z ) � 2 = 2 − 2 K ( x , z )

In the sequel Will formalize these notions conditions on K will be discussed K for graphs

Definition of Kernel functions

Kernel function Kernel function K : X × X → IR is a Kernel function if K ( x , z ) = K ( z , x ) symmetric K is positive semidefinite, i.e. ∀ n , x 1 ,..., x n ∈ X , the matrix K ij = K ( x i , x j ) is psd Recall that a K ∈ IR d × d is psd if u ⊤ K u ≥ 0 for all u ∈ IR d .

Examples of Kernel functions K ( x , z ) = φ ( x ) ⊤ φ ( z ) where φ : X → IR d K is symmetric i.e. K ( x , z ) = K ( z , x ) Positive Semidefinite: Let D = { x 1 , x 2 ,..., x n } be set of arbitrarily chosen n elements of X . Define K ij = φ ( x i ) ⊤ φ ( x j ) For any u ∈ IR n it is straightforward to see that m u ⊤ K u = � u i φ ( x i ) � 2 ∑ 2 ≥ 0 i = 1

Examples of Kernel functions K ( x , z ) = x ⊤ z Φ ( x ) = x � 2 ... x t d K ( x , z ) = ( x ⊤ z ) r t 1 ! t 2 ! .... t d ! x t 1 1 x t 2 r ! Φ t 1 t 2 ... t d ( x ) = d ∑ d i = 1 t i = r K ( x , z ) = e − γ � x − z � 2

Kernel Construction Let K 1 and K 2 be two valid kernels. K ( x , y ) = φ ( x ) ⊤ φ ( y ) K ( u , v ) = K 1 ( u , v ) K 2 ( u , v ) K = α K 1 + β K 2 α , β ≥ 0 K ( x , y ) ˆ K ( x , y ) = � � K ( x , x ) K ( y , y )

Kernel Construction Let K 1 and K 2 be two valid kernels. K ( x , y ) = φ ( x ) ⊤ φ ( y ) K ( x , y ) = x ⊤ y K ( x , y ) = ( x ⊤ y ) i K ( u , v ) = K 1 ( u , v ) K 2 ( u , v ) K = α K 1 + β K 2 α , β ≥ 0 ( x ⊤ y ) i N = e x ⊤ y ∑ K ( x , y ) = lim K ( x , y ) i ! N → ∞ i = 0 ˆ K ( x , y ) = � � K ( x , x ) K ( y , y ) K ( x , y ) = e − 1 2 � x − y � 2 ˆ

Kernel function and feature map A theorem due to Mercer guarantees a feature map for symmetric, psd kernel functions. Loosely stated For a symmetric kernel K : X × X → IR, there exists an expansion K ( x , z ) = Φ ( x ) ⊤ Φ ( z ) iff � X g ( x ) g ( z ) K ( x , z ) dxdz ≥ 0

What is a Dot product(aka Inner Product) Let X be a vector space. What is a Dot product Symmetry = < v , u > u , v ∈ X Bilinear < α u + β v , w > = α + β < v , w > u , v , w , ∈ X Positive Semidefinite ≥ 0 u ∈ X = 0 iff u = 0 Norm � � x � = � x , x � � x � = 0 = ⇒ x = 0

Examples of Dot products X = IR n , = u ⊤ v n X = IR n , = ∑ λ i u i v i λ i ≥ 0 i = 1 � X f ( x ) 2 dx < ∞ } X = L 2 ( X ) = { f : � f , g ∈ X < f , g > = X f ( x ) g ( x ) dx

Kernel methods for Network Analysis: An introduction Chiranjib - PowerPoint PPT Presentation

Kernel methods for Network Analysis: An introduction Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 13th Jan, 2013 Computational Biology Which super-family does

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick:

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

1 Kernel methods & optimization One example of a kernel that is frequently used in practice

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

A kernel in a library Genodes custom kernel approach Martin Stein <

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 Thomas Grtner 2 1

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

SVMs and Kernel Methods Lecture 3 David Sontag New York University Slides adapted from Luke

Efficient Structure-Aware Selection Techniques for 3D Point Cloud Visualizations with 2DOF Input

Explicit Feature Methods for Accelerated Kernel Learning Purushottam Kar Quick Motivation

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

Kernel methods for Network Analysis: An introduction Chiranjib - PowerPoint PPT Presentation

Kernel methods for Network Analysis: An introduction Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 13th Jan, 2013 Computational Biology Which super-family does

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick:

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

1 Kernel methods &amp; optimization One example of a kernel that is frequently used in practice

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

A kernel in a library Genodes custom kernel approach Martin Stein &lt;

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 Thomas Grtner 2 1

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

SVMs and Kernel Methods Lecture 3 David Sontag New York University Slides adapted from Luke

Efficient Structure-Aware Selection Techniques for 3D Point Cloud Visualizations with 2DOF Input

Explicit Feature Methods for Accelerated Kernel Learning Purushottam Kar Quick Motivation

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &amp;

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

1 Kernel methods & optimization One example of a kernel that is frequently used in practice

A kernel in a library Genodes custom kernel approach Martin Stein <

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &