scalable learning in reproducing kernel kre n spaces
play

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 - PowerPoint PPT Presentation

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 Thomas Grtner 2 1 Department of Informatics, Kings College London 2 School of Computer Science, University of Nottingham In Proceedings of the 36th International Conference


  1. Scalable Learning in Reproducing Kernel Kre˘ ın Spaces Dino Oglic 1 Thomas Gärtner 2 1 Department of Informatics, King’s College London 2 School of Computer Science, University of Nottingham In Proceedings of the 36th International Conference on Machine Learning (ICML 2019)

  2. Learning in Reproducing Kernel Kre˘ ın Spaces Motivation In learning problems with structured data (e.g., time-series, strings, graphs), it is relatively easy to devise a pairwise (dis)similarity function based on intuition of a domain expert To find an optimal hypothesis with standard kernel methods positive definiteness of the kernel/similarity func- tion needs to be established A large number of pairwise (dis)similarity functions devised by experts are indefinite (e.g., edit distances for strings and graphs, dynamic time-warping algorithm, Wasserstein and Haussdorf distances) Goal Scalable kernel methods for learning with any notion of (dis)similarity between instances. Kre˘ ın Space (Bognár , 1974; Azizov & Iokhvidov, 1981) The vector space K with a bilinear form �· , ·� K is called Kre˘ ın space if it admits a decomposition into a direct sum K = H + ⊕ H − of �· , ·� K -orthogonal Hilbert spaces H ± such that �· , ·� K can be written as � f , g � K = � f + , g + � H + − � f − , g − � H − , where H ± are endowed with inner products �· , ·� H ± , f = f + ⊕ f − , g = g + ⊕ g − , and f ± , g ± ∈ H ± . 1 / 4

  3. Learning in Reproducing Kernel Kre˘ ın Spaces Overview Associated Hilbert Space For a decomposition K = H + ⊕ H − , the Hilbert space H K = H + ⊕ H − endowed with inner product � f , g � H K = � f + , g + � H + + � f − , g − � H − ( f ± , g ± ∈ H ± ) can be associated with K . All the norms �·� H K generated by di ff erent decompositions of K into direct sums of Hilbert spaces are topologi- cally equivalent (Langer, 1962) The topology on K defined by the norm of an associated Hilbert space is called the strong topology on K ⇒ � f , f � K = � f + � 2 H + − � f − � 2 ∃ f ∈ K : � f , f � K < 0 = H − does not induce a norm on a reproducing kernel Kre˘ ın space K The complexity of hypotheses can be penalized via decomposition components H ± and the strong topology Scalability ! Computational and space complexities are often quadratic in the number of instances and in several ap- proaches the computational complexity is cubic. 2 / 4

  4. Nyström Method for Indefinite Kernels Overview X is an instance space X = { x 1 ,..., x n } is an independent sample from a probability measure defined on X ın kernel with k ( x , x ′ ) = � k ( x , · ) , k ( x ′ , · ) � K k : X × X → � is a reproducing Kre˘ 5 k ( x s , · ) k ( x q , · ) k ( x r , · ) 4 3 k ( x p , · ) k ( x w , · ) 2 k ( x v , · ) 0 1 0 1 k ( x u , · ) 0 1 2 2 3 3 3 / 4

  5. Nyström Method for Indefinite Kernels Landmarks Z = { z 1 ,..., z m } is a set of landmarks (not necessarily a subset of X ) 5 k ( x s , · ) k ( x q , · ) k ( x r , · ) 4 3 k ( x p , · ) k ( z 3 , · ) 2 k ( z 2 , · ) 0 1 0 1 k ( z 1 , · ) 0 1 2 2 3 3 3 / 4

  6. Nyström Method for Indefinite Kernels Projections onto L Z = span ( { k ( z 1 , · ) , ··· , k ( z m , · ) } ) For a given set of landmarks Z , the Nyström method approximates the kernel matrix � � � ˜ � �� K with a low-rank matrix ˜ K given by ˜ K ij = ˜ k ( x i , · ) , ˜ = x j , · k x i , x j k K m k ( x , · ) + k ⊥ ( x , · ) k ⊥ ( x , · ) , L Z � � � k ( x , · ) = ˜ ˜ with k ( x , · ) = � i , x k ( z i , · ) ∧ K = 0 i = 1 5 k ( x s , · ) k ( x q , · ) k ( x r , · ) 4 3 k ( x p , · ) ˜ k ( z 3 , · ) 2 ˜ ˜ k ( x r , · ) k ( z 2 , · ) 0 1 ˜ k ( x q , · ) ˜ k ( x s , · ) 0 1 ˜ ˜ 0 k ( x p , · ) k ( z 1 , · ) 1 2 2 ˜ m , m K m , n = ˜ U m ˜ � m ˜ ˜ m ˜ 3 3 K = K n , m K − 1 U ⊤ U ⊤ with U m = � m m 3 / 4

  7. Scalable Learning in Reproducing Kernel Kre˘ ın Spaces Contributions First mathematically complete derivation of the Nyström method for indefinite kernels An approach for e ffi cient low-rank eigendecomposition of indefinite kernel matrices Two e ff ective landmark selection strategies for the Nyström method with indefinite kernels Nyström-based scalable least squares methods for learning in reproducing kernel Kre˘ ın spaces Nyström-based scalable support vector machine for learning in reproducing kernel Kre˘ ın spaces E ff ective regularization via decomposition components H ± and the strong topology python package for learning in reproducing kernel Kre˘ ın spaces (in preparation, early version available upon request) 4 / 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend