Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne - PowerPoint PPT Presentation

Kernel machines and sparsity 2 juillet, 2009 ENBIS’09, Saint Etienne Stéphane Canu & Alain Rakotomamonjy stephane.canu@litislab.eu

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Roadmap Introduction 1 A typical learning problem Kernel machines: a definition Tools: the functional framework 2 In the beginning was the kernel Kernel and hypothesis set Kernel machines and regularization path 3 non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simpleMKL: the multiple kernel solution Conclusion 5

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Optical character recognition Example (The MNIST database) ◮ MNIST 1 , data = « image-label » ◮ n = 60 , 000 ; d = 700 ; classes = 10 ◮ Kernel error rate = 0.56 %, ◮ Best error rate = 0.4 % . 7 8 7 9 1http://yann.lecun.com/exdb/mnist/index.html

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Learning challenges: the size effect 10 5 translation RCV 1 Number of variables Text Bio 3 key issues computing Scene analysis 10 4 Object recognition 1. learn any problem: 10 3 MNIST ◮ functional universality Speech Census 10 2 2. from data: Geostatistic Lucy 10 4 10 5 10 6 10 7 ◮ statistical consistency Sample size 3. with large data sets: ◮ computational efficency kernel machines adress these three issues (up to a certain point regarding efficency) L. Bottou, 2006

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Kernel machines Definition (Kernel machines) p � n � � � � � A ( x i , y i ) i = 1 , n ( x ) = ψ α i k ( x , x i ) + β j q j ( x ) i = 1 j = 1 α et β : parameters to be estimated. Exemples n � α i ( x − x i ) 3 A ( x ) = + + β 0 + β 1 x splines i = 1 � x − x i � 2 �� α i exp − A ( x ) = sign + β 0 SVM b i ∈ I �� I { y = y i } ( x ⊤ x i + b ) 2 � P ( y | x ) = 1 Z exp α i 1 I exponential family i ∈ I

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion In the beginning was the kenrel... Definition (Kernel) a function of two variable k from X × X to I R Definition (Positive kernel) A kernel k ( s , t ) on X is said to be positive ◮ if it is symetric: k ( s , t ) = k ( t , s ) ◮ an if for any finite positive interger n : n n � � ∀{ α i } i = 1 , n ∈ I R , ∀{ x i } i = 1 , n ∈ X , α i α j k ( x i , x j ) ≥ 0 i = 1 j = 1 it is strictly positive if for α i � = 0 n n � � α i α j k ( x i , x j ) > 0 i = 1 j = 1

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Examples of positive kernels R d , k ( s , t ) = s ⊤ t the linear kernel: s , t ∈ I symetric: s ⊤ t = t ⊤ s n n n n positive: � � � � α i α j x ⊤ α i α j k ( x i , x j ) = i x j i = 1 j = 1 i = 1 j = 1 � n � ⊤   2 n � n � � � � � � = α i x i α j x j = α i x i � �   � � i = 1 j = 1 � i = 1 � R d → I the product kernel: k ( s , t ) = g ( s ) g ( t ) for some g : I R , symetric by construction n n n n positive: � � � � α i α j k ( x i , x j ) = α i α j g ( x i ) g ( x j ) i = 1 j = 1 i = 1 j = 1 � n � n �   � 2 n � � � = α i g ( x i ) α j g ( x j ) = α i g ( x i )   i = 1 j = 1 i = 1 k is positive ⇔ (its square root exists) ⇔ k ( s , t ) = � φ s , φ t � J.P. Vert, 2006

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion positive definite Kernel (PDK) algebra (closure) if k 1 ( s , t ) and k 2 ( s , t ) are two positive kernels R + ◮ DPK are a convex cone: ∀ a 1 ∈ I a 1 k 1 ( s , t ) + k 2 ( s , t ) ◮ product kernel k 1 ( s , t ) k 2 ( s , t ) proofs ◮ by linearity: n n n n n n � � � � � � � � α i α j a 1 k 1 ( i , j ) k 2 ( i , j ) = a 1 α i α j k 1 ( i , j ) + α i α j k 2 ( i , j ) i = 1 j = 1 i = 1 j = 1 i = 1 j = 1 n n n n ◮ by linearity: � � � � � � � �� ψ ( x i ) ψ ( x j ) = α i ψ ( x i ) α j ψ ( x j ) α i α j i = 1 j = 1 i = 1 j = 1 ◮ assuming ∃ ψ ℓ s.t. k 1 ( s , t ) = � ℓ ψ ℓ ( s ) ψ ℓ ( t ) n n n n � � � � � � � α i α j k 1 ( x i , x j ) k 2 ( x i , x j ) = α i α j ψ ℓ ( x i ) ψ ℓ ( x j ) k 2 ( x i , x j ) i = 1 j = 1 i = 1 j = 1 ℓ n n � � � � � � � = α i ψ ℓ ( x i ) α j ψ ℓ ( x j ) k 2 ( x i , x j ) ℓ i = 1 j = 1 N. Cristianini and J. Shawe Taylor, kernel methods for pattern analysis, 2004

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Kernel engineering: building PDK ◮ for any polynomial with positive coef. φ from I R to I R � � φ k ( s , t ) R d to I R d ◮ if Ψ is a function from I � � k Ψ( s ) , Ψ( t ) R d to I R + , is minimum in 0 ◮ if ϕ from I k ( s , t ) = ϕ ( s + t ) − ϕ ( s − t ) ◮ convolution of two positive kernels is a positive kernel K 1 ⋆ K 2 the Gaussian kernel is a PDK = exp ( −� s � 2 − � t � 2 − 2 s ⊤ t ) exp ( −� s − t � 2 ) = exp ( −� s � 2 ) exp ( −� t � 2 ) exp ( 2 s ⊤ t ) ◮ s ⊤ t is a PDK and function exp as the limit of positive series expansion, so exp ( 2 s ⊤ t ) is a PDK ◮ exp ( −� s � 2 ) exp ( −� t � 2 ) is a PDK as a product kernel ◮ the product of two PDK is a PDK O. Catoni, master lecture, 2005

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion some examples of PD kernels... type name k ( s , t ) � � − r 2 radial gaussian exp , r = � s − t � b radial laplacian exp ( − r / b ) r 2 radial rationnal 1 − r 2 + b � d exp ( − r 2 0 , 1 − r � radial loc. gauss. max b ) 3 b ( s k − t k ) 2 χ 2 exp ( − r / b ) , r = � non stat. k s k + t k ( s ⊤ t ) p projective polynomial ( s ⊤ t + b ) p projective affine s ⊤ t / � s �� t � projective cosine � � s ⊤ t � s �� t � − b projective correlation exp Most of the kernels depends on a quantity b called the bandwidth

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion kernels for objects and structures kernels on histograms and probability distributions � � � k ( p ( x ) , q ( x )) = k i p ( x ) , q ( x ) P ( x ) dx I kernel on strings ◮ spectral string kernel k ( s , t ) = � u φ u ( s ) φ u ( t ) ◮ using sub sequences ◮ similarities by alignements k ( s , t ) = � π exp ( β ( s , t , π )) kernels on graphs ◮ the pseudo inverse of the (regularized) graph Laplacian L = D − A A is the adjency matrix D the degree matrix ◮ diffusion kernels 1 Z ( b ) exp bL ◮ subgraph kernel convolution (using random walks) and kernels on heterogeneous data (image), HMM, automata... Shawe-Taylor & Cristianini’s Book, 2004 ; JP Vert, 2006

Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion From kernel to functions � � R ; t j ∈ X , f ( x ) = � m f � � m f < ∞ ; f j ∈ I H 0 = f j = 1 f j k ( x , t j ) let define the bilinear form ( g ( x ) = � m g i = 1 g i k ( x , s i ) ) : m f m g � � ∀ f , g ∈ H 0 , � f , g � H 0 = f j g i k ( t j , s i ) j = 1 i = 1 Evaluation functional: ∀ x ∈ X f ( x ) = � f ( . ) , k ( x , . ) � H 0 from k to H with any postive kernel, a hypothesis set H = ¯ H 0 can be constructed with its metric

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne - PowerPoint PPT Presentation

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu & Alain Rakotomamonjy stephane.canu@litislab.eu Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel:

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity

Finite State Machines (FSM) AKA Finite State Automat on State Machines Introduction State

Segmented Regression Model 11 Oct, 2014 1E 2014 NNN 1 1E 2014 NNN 2 Segmented Are Global

Semantics for Probabilistic Programming Chris Heunen 1 / 27 Bayes law P ( A | B ) = P ( B | A

CSE 258 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression

Special Topics Some complex model-building problems can be handled using the linear regression

Data analysis Piecewise-constant linear regression Hidden Markov Models Let x t denote the

Chart Generation for Contextualized Responses to NL COVID-19 Queries Hannah DeBalsi, Bianca Yu

Line graphs of multigraphs and Hamilton-connectedness of claw-free graphs Zden ek Ryj a

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza