kernel machines and sparsity
play

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne - PowerPoint PPT Presentation

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu & Alain Rakotomamonjy stephane.canu@litislab.eu Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel:


  1. Kernel machines and sparsity 2 juillet, 2009 ENBIS’09, Saint Etienne Stéphane Canu & Alain Rakotomamonjy stephane.canu@litislab.eu

  2. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Roadmap Introduction 1 A typical learning problem Kernel machines: a definition Tools: the functional framework 2 In the beginning was the kernel Kernel and hypothesis set Kernel machines and regularization path 3 non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simpleMKL: the multiple kernel solution Conclusion 5

  3. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Optical character recognition Example (The MNIST database) ◮ MNIST 1 , data = « image-label » ◮ n = 60 , 000 ; d = 700 ; classes = 10 ◮ Kernel error rate = 0.56 %, ◮ Best error rate = 0.4 % . 7 8 7 9 1http://yann.lecun.com/exdb/mnist/index.html

  4. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Learning challenges: the size effect 10 5 translation RCV 1 Number of variables Text Bio 3 key issues computing Scene analysis 10 4 Object recognition 1. learn any problem: 10 3 MNIST ◮ functional universality Speech Census 10 2 2. from data: Geostatistic Lucy 10 4 10 5 10 6 10 7 ◮ statistical consistency Sample size 3. with large data sets: ◮ computational efficency kernel machines adress these three issues (up to a certain point regarding efficency) L. Bottou, 2006

  5. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Kernel machines Definition (Kernel machines) p � n � � � � � A ( x i , y i ) i = 1 , n ( x ) = ψ α i k ( x , x i ) + β j q j ( x ) i = 1 j = 1 α et β : parameters to be estimated. Exemples n � α i ( x − x i ) 3 A ( x ) = + + β 0 + β 1 x splines i = 1 � x − x i � 2 �� � α i exp − A ( x ) = sign + β 0 SVM b i ∈ I �� I { y = y i } ( x ⊤ x i + b ) 2 � P ( y | x ) = 1 Z exp α i 1 I exponential family i ∈ I

  6. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Roadmap Introduction 1 A typical learning problem Kernel machines: a definition Tools: the functional framework 2 In the beginning was the kernel Kernel and hypothesis set Kernel machines and regularization path 3 non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simpleMKL: the multiple kernel solution Conclusion 5

  7. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion In the beginning was the kenrel... Definition (Kernel) a function of two variable k from X × X to I R Definition (Positive kernel) A kernel k ( s , t ) on X is said to be positive ◮ if it is symetric: k ( s , t ) = k ( t , s ) ◮ an if for any finite positive interger n : n n � � ∀{ α i } i = 1 , n ∈ I R , ∀{ x i } i = 1 , n ∈ X , α i α j k ( x i , x j ) ≥ 0 i = 1 j = 1 it is strictly positive if for α i � = 0 n n � � α i α j k ( x i , x j ) > 0 i = 1 j = 1

  8. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Examples of positive kernels R d , k ( s , t ) = s ⊤ t the linear kernel: s , t ∈ I symetric: s ⊤ t = t ⊤ s n n n n positive: � � � � α i α j x ⊤ α i α j k ( x i , x j ) = i x j i = 1 j = 1 i = 1 j = 1 � n � ⊤   2 n � n � � � � � � = α i x i α j x j = α i x i � �   � � i = 1 j = 1 � i = 1 � R d → I the product kernel: k ( s , t ) = g ( s ) g ( t ) for some g : I R , symetric by construction n n n n positive: � � � � α i α j k ( x i , x j ) = α i α j g ( x i ) g ( x j ) i = 1 j = 1 i = 1 j = 1 � n � n �   � 2 n � � � = α i g ( x i ) α j g ( x j ) = α i g ( x i )   i = 1 j = 1 i = 1 k is positive ⇔ (its square root exists) ⇔ k ( s , t ) = � φ s , φ t � J.P. Vert, 2006

  9. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion positive definite Kernel (PDK) algebra (closure) if k 1 ( s , t ) and k 2 ( s , t ) are two positive kernels R + ◮ DPK are a convex cone: ∀ a 1 ∈ I a 1 k 1 ( s , t ) + k 2 ( s , t ) ◮ product kernel k 1 ( s , t ) k 2 ( s , t ) proofs ◮ by linearity: n n n n n n � � � � � � � � α i α j a 1 k 1 ( i , j ) k 2 ( i , j ) = a 1 α i α j k 1 ( i , j ) + α i α j k 2 ( i , j ) i = 1 j = 1 i = 1 j = 1 i = 1 j = 1 n n n n ◮ by linearity: � � � � � � � �� � ψ ( x i ) ψ ( x j ) = α i ψ ( x i ) α j ψ ( x j ) α i α j i = 1 j = 1 i = 1 j = 1 ◮ assuming ∃ ψ ℓ s.t. k 1 ( s , t ) = � ℓ ψ ℓ ( s ) ψ ℓ ( t ) n n n n � � � � � � � α i α j k 1 ( x i , x j ) k 2 ( x i , x j ) = α i α j ψ ℓ ( x i ) ψ ℓ ( x j ) k 2 ( x i , x j ) i = 1 j = 1 i = 1 j = 1 ℓ n n � � � � � � � = α i ψ ℓ ( x i ) α j ψ ℓ ( x j ) k 2 ( x i , x j ) ℓ i = 1 j = 1 N. Cristianini and J. Shawe Taylor, kernel methods for pattern analysis, 2004

  10. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Kernel engineering: building PDK ◮ for any polynomial with positive coef. φ from I R to I R � � φ k ( s , t ) R d to I R d ◮ if Ψ is a function from I � � k Ψ( s ) , Ψ( t ) R d to I R + , is minimum in 0 ◮ if ϕ from I k ( s , t ) = ϕ ( s + t ) − ϕ ( s − t ) ◮ convolution of two positive kernels is a positive kernel K 1 ⋆ K 2 the Gaussian kernel is a PDK = exp ( −� s � 2 − � t � 2 − 2 s ⊤ t ) exp ( −� s − t � 2 ) = exp ( −� s � 2 ) exp ( −� t � 2 ) exp ( 2 s ⊤ t ) ◮ s ⊤ t is a PDK and function exp as the limit of positive series expansion, so exp ( 2 s ⊤ t ) is a PDK ◮ exp ( −� s � 2 ) exp ( −� t � 2 ) is a PDK as a product kernel ◮ the product of two PDK is a PDK O. Catoni, master lecture, 2005

  11. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion some examples of PD kernels... type name k ( s , t ) � � − r 2 radial gaussian exp , r = � s − t � b radial laplacian exp ( − r / b ) r 2 radial rationnal 1 − r 2 + b � d exp ( − r 2 0 , 1 − r � radial loc. gauss. max b ) 3 b ( s k − t k ) 2 χ 2 exp ( − r / b ) , r = � non stat. k s k + t k ( s ⊤ t ) p projective polynomial ( s ⊤ t + b ) p projective affine s ⊤ t / � s �� t � projective cosine � � s ⊤ t � s �� t � − b projective correlation exp Most of the kernels depends on a quantity b called the bandwidth

  12. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion kernels for objects and structures kernels on histograms and probability distributions � � � k ( p ( x ) , q ( x )) = k i p ( x ) , q ( x ) P ( x ) dx I kernel on strings ◮ spectral string kernel k ( s , t ) = � u φ u ( s ) φ u ( t ) ◮ using sub sequences ◮ similarities by alignements k ( s , t ) = � π exp ( β ( s , t , π )) kernels on graphs ◮ the pseudo inverse of the (regularized) graph Laplacian L = D − A A is the adjency matrix D the degree matrix ◮ diffusion kernels 1 Z ( b ) exp bL ◮ subgraph kernel convolution (using random walks) and kernels on heterogeneous data (image), HMM, automata... Shawe-Taylor & Cristianini’s Book, 2004 ; JP Vert, 2006

  13. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Roadmap Introduction 1 A typical learning problem Kernel machines: a definition Tools: the functional framework 2 In the beginning was the kernel Kernel and hypothesis set Kernel machines and regularization path 3 non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simpleMKL: the multiple kernel solution Conclusion 5

  14. Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion From kernel to functions � � R ; t j ∈ X , f ( x ) = � m f � � m f < ∞ ; f j ∈ I H 0 = f j = 1 f j k ( x , t j ) let define the bilinear form ( g ( x ) = � m g i = 1 g i k ( x , s i ) ) : m f m g � � ∀ f , g ∈ H 0 , � f , g � H 0 = f j g i k ( t j , s i ) j = 1 i = 1 Evaluation functional: ∀ x ∈ X f ( x ) = � f ( . ) , k ( x , . ) � H 0 from k to H with any postive kernel, a hypothesis set H = ¯ H 0 can be constructed with its metric

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend