Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 - - PowerPoint PPT Presentation

scalable learning in reproducing kernel kre n spaces
SMART_READER_LITE
LIVE PREVIEW

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 - - PowerPoint PPT Presentation

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 Thomas Grtner 2 1 Department of Informatics, Kings College London 2 School of Computer Science, University of Nottingham In Proceedings of the 36th International Conference


slide-1
SLIDE 1

Scalable Learning in Reproducing Kernel Kre˘ ın Spaces

Dino Oglic 1 Thomas Gärtner 2

1 Department of Informatics, King’s College London 2 School of Computer Science, University of Nottingham

In Proceedings of the 36th International Conference on Machine Learning (ICML 2019)

slide-2
SLIDE 2

Learning in Reproducing Kernel Kre˘ ın Spaces

Motivation

In learning problems with structured data (e.g., time-series, strings, graphs), it is relatively easy to devise a pairwise (dis)similarity function based on intuition of a domain expert To find an optimal hypothesis with standard kernel methods positive definiteness of the kernel/similarity func- tion needs to be established A large number of pairwise (dis)similarity functions devised by experts are indefinite

(e.g., edit distances for strings and graphs, dynamic time-warping algorithm, Wasserstein and Haussdorf distances)

Goal Scalable kernel methods for learning with any notion of (dis)similarity between instances. Kre˘ ın Space (Bognár , 1974; Azizov & Iokhvidov, 1981) The vector space K with a bilinear form ·,·K is called Kre˘ ın space if it admits a decomposition into a direct sum K = H+ ⊕ H− of ·,·K-orthogonal Hilbert spaces H± such that ·,·K can be written as f,gK = f+,g+

  • H+ − f−,g−H− ,

where H± are endowed with inner products ·,·H±, f = f+ ⊕ f−, g = g+ ⊕ g−, and f±,g± ∈ H±.

1 / 4

slide-3
SLIDE 3

Learning in Reproducing Kernel Kre˘ ın Spaces

Overview

Associated Hilbert Space For a decomposition K = H+ ⊕ H−, the Hilbert space HK = H+ ⊕ H− endowed with inner product f,gHK = f+,g+

  • H+ + f−,g−H−

(f±,g± ∈ H±) can be associated with K. All the norms ·HK generated by different decompositions of K into direct sums of Hilbert spaces are topologi- cally equivalent (Langer, 1962) The topology on K defined by the norm of an associated Hilbert space is called the strong topology on K ∃f ∈ K: f,fK < 0 = ⇒ f,fK = f+2

H+ − f−2 H− does not induce a norm on a reproducing kernel Kre˘

ın space K The complexity of hypotheses can be penalized via decomposition components H± and the strong topology Scalability ! Computational and space complexities are often quadratic in the number of instances and in several ap- proaches the computational complexity is cubic.

2 / 4

slide-4
SLIDE 4

Nyström Method for Indefinite Kernels

Overview

X is an instance space X = {x1,...,xn} is an independent sample from a probability measure defined on X k : X × X → is a reproducing Kre˘ ın kernel with k (x,x′) = k (x, ·),k (x′, ·)K

1 2 3 1 2 3 1 2 3 4 5

k(xp, ·) k(xq, ·) k(xr, ·) k(xs, ·) k(xu, ·) k(xv, ·) k(xw, ·)

3 / 4

slide-5
SLIDE 5

Nyström Method for Indefinite Kernels

Landmarks

Z = {z1,...,zm} is a set of landmarks (not necessarily a subset of X)

1 2 3 1 2 3 1 2 3 4 5

k(xp, ·) k(xq, ·) k(xr, ·) k(xs, ·) k(z1, ·) k(z2, ·) k(z3, ·)

3 / 4

slide-6
SLIDE 6

Nyström Method for Indefinite Kernels

Projections onto LZ = span({k (z1, ·),··· ,k (zm, ·)})

For a given set of landmarks Z, the Nyström method approximates the kernel matrix K with a low-rank matrix ˜ K given by ˜ Kij = ˜ k

  • xi,xj
  • =

˜ k (xi, ·), ˜ k

  • xj, ·
  • K

k (x,·) = ˜ k (x,·) + k⊥ (x,·) with ˜ k (x,·) =

m

  • i=1

i,xk (zi,·) ∧

  • k⊥ (x,·),LZ
  • K = 0

1 2 3 1 2 3 1 2 3 4 5

k(xp, ·) ˜ k(xp, ·) k(xq, ·) ˜ k(xq, ·) ˜ k(xr, ·) k(xr, ·) k(xs, ·) ˜ k(xs, ·) ˜ k(z1, ·) ˜ k(z2, ·) ˜ k(z3, ·)

˜ K = Kn,mK−1

m,mKm,n = ˜

Um ˜ m ˜ U⊤

m

with ˜ U⊤

m ˜

Um = m

3 / 4

slide-7
SLIDE 7

Scalable Learning in Reproducing Kernel Kre˘ ın Spaces

Contributions First mathematically complete derivation of the Nyström method for indefinite kernels An approach for efficient low-rank eigendecomposition of indefinite kernel matrices Two effective landmark selection strategies for the Nyström method with indefinite kernels Nyström-based scalable least squares methods for learning in reproducing kernel Kre˘ ın spaces Nyström-based scalable support vector machine for learning in reproducing kernel Kre˘ ın spaces Effective regularization via decomposition components H± and the strong topology python package for learning in reproducing kernel Kre˘ ın spaces (in preparation, early version available upon request)

4 / 4