Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 - PowerPoint PPT Presentation

Scalable Learning in Reproducing Kernel Kre˘ ın Spaces Dino Oglic 1 Thomas Gärtner 2 1 Department of Informatics, King’s College London 2 School of Computer Science, University of Nottingham In Proceedings of the 36th International Conference on Machine Learning (ICML 2019)

Learning in Reproducing Kernel Kre˘ ın Spaces Motivation In learning problems with structured data (e.g., time-series, strings, graphs), it is relatively easy to devise a pairwise (dis)similarity function based on intuition of a domain expert To find an optimal hypothesis with standard kernel methods positive definiteness of the kernel/similarity function needs to be established A large number of pairwise (dis)similarity functions devised by experts are indefinite (e.g., edit distances for strings and graphs, dynamic time-warping algorithm, Wasserstein and Haussdorf distances) Goal Scalable kernel methods for learning with any notion of (dis)similarity between instances. Kre˘ ın Space (Bognár , 1974; Azizov & Iokhvidov, 1981) The vector space K with a bilinear form �· , ·� K is called Kre˘ ın space if it admits a decomposition into a direct sum K = H + ⊕ H − of �· , ·� K -orthogonal Hilbert spaces H ± such that �· , ·� K can be written as � f , g � K = � f + , g + � H + − � f − , g − � H − , where H ± are endowed with inner products �· , ·� H ± , f = f + ⊕ f − , g = g + ⊕ g − , and f ± , g ± ∈ H ± . 1 / 4

Learning in Reproducing Kernel Kre˘ ın Spaces Overview Associated Hilbert Space For a decomposition K = H + ⊕ H − , the Hilbert space H K = H + ⊕ H − endowed with inner product � f , g � H K = � f + , g + � H + + � f − , g − � H − ( f ± , g ± ∈ H ± ) can be associated with K . All the norms �·� H K generated by di ff erent decompositions of K into direct sums of Hilbert spaces are topologi- cally equivalent (Langer, 1962) The topology on K defined by the norm of an associated Hilbert space is called the strong topology on K ⇒ � f , f � K = � f + � 2 H + − � f − � 2 ∃ f ∈ K : � f , f � K < 0 = H − does not induce a norm on a reproducing kernel Kre˘ ın space K The complexity of hypotheses can be penalized via decomposition components H ± and the strong topology Scalability ! Computational and space complexities are often quadratic in the number of instances and in several ap- proaches the computational complexity is cubic. 2 / 4

Nyström Method for Indefinite Kernels Overview X is an instance space X = { x 1 ,..., x n } is an independent sample from a probability measure defined on X ın kernel with k ( x , x ′ ) = � k ( x , · ) , k ( x ′ , · ) � K k : X × X → � is a reproducing Kre˘ 5 k ( x s , · ) k ( x q , · ) k ( x r , · ) 4 3 k ( x p , · ) k ( x w , · ) 2 k ( x v , · ) 0 1 0 1 k ( x u , · ) 0 1 2 2 3 3 3 / 4

Nyström Method for Indefinite Kernels Landmarks Z = { z 1 ,..., z m } is a set of landmarks (not necessarily a subset of X ) 5 k ( x s , · ) k ( x q , · ) k ( x r , · ) 4 3 k ( x p , · ) k ( z 3 , · ) 2 k ( z 2 , · ) 0 1 0 1 k ( z 1 , · ) 0 1 2 2 3 3 3 / 4

Nyström Method for Indefinite Kernels Projections onto L Z = span ( { k ( z 1 , · ) , ··· , k ( z m , · ) } ) For a given set of landmarks Z , the Nyström method approximates the kernel matrix � � � ˜ � �� K with a low-rank matrix ˜ K given by ˜ K ij = ˜ k ( x i , · ) , ˜ = x j , · k x i , x j k K m k ( x , · ) + k ⊥ ( x , · ) k ⊥ ( x , · ) , L Z � � � k ( x , · ) = ˜ ˜ with k ( x , · ) = � i , x k ( z i , · ) ∧ K = 0 i = 1 5 k ( x s , · ) k ( x q , · ) k ( x r , · ) 4 3 k ( x p , · ) ˜ k ( z 3 , · ) 2 ˜ ˜ k ( x r , · ) k ( z 2 , · ) 0 1 ˜ k ( x q , · ) ˜ k ( x s , · ) 0 1 ˜ ˜ 0 k ( x p , · ) k ( z 1 , · ) 1 2 2 ˜ m , m K m , n = ˜ U m ˜ � m ˜ ˜ m ˜ 3 3 K = K n , m K − 1 U ⊤ U ⊤ with U m = � m m 3 / 4

Scalable Learning in Reproducing Kernel Kre˘ ın Spaces Contributions First mathematically complete derivation of the Nyström method for indefinite kernels An approach for e ffi cient low-rank eigendecomposition of indefinite kernel matrices Two e ff ective landmark selection strategies for the Nyström method with indefinite kernels Nyström-based scalable least squares methods for learning in reproducing kernel Kre˘ ın spaces Nyström-based scalable support vector machine for learning in reproducing kernel Kre˘ ın spaces E ff ective regularization via decomposition components H ± and the strong topology python package for learning in reproducing kernel Kre˘ ın spaces (in preparation, early version available upon request) 4 / 4

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 - PowerPoint PPT Presentation

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 Thomas Grtner 2 1 Department of Informatics, Kings College London 2 School of Computer Science, University of Nottingham In Proceedings of the 36th International Conference

Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 L. Rosasco RKHS About this

Reproducing Kernel Hilbert Spaces for Classification Katarina Domijan and Simon P. Wilson

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces RSS Robotics Science and

Positive kernels and reproducing kernel spaces: a rich tapestry of settings and applications

Econ 2148, fall 2019 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Composition operators on some analytic reproducing kernel Hilbert spaces Jan Stochel (Uniwersytet

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Lecture 1: Introduction to RKHS MLSS Cadiz, 2016 Gatsby Unit, CSML, UCL May 12, 2016 Lecture 1:

Lecture 1: Introduction to RKHS MLSS Tbingen, 2015 Gatsby Unit, CSML, UCL July 22, 2015

Krein spaces applied to Friedrichs systems Kre simir Burazin Department of Mathematics,

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

SVMs and Kernel Methods Lecture 3 David Sontag New York University Slides adapted from Luke

Efficient Structure-Aware Selection Techniques for 3D Point Cloud Visualizations with 2DOF Input

Panel f u nctions DATA VISU AL IZATION W ITH L ATTIC E IN R Deepa y an Sarkar Associate

Kernel methods for Network Analysis: An introduction Chiranjib Bhattacharyya Machine Learning

Explicit Feature Methods for Accelerated Kernel Learning Purushottam Kar Quick Motivation

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 - PowerPoint PPT Presentation

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 Thomas Grtner 2 1 Department of Informatics, Kings College London 2 School of Computer Science, University of Nottingham In Proceedings of the 36th International Conference

Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 L. Rosasco RKHS About this

Reproducing Kernel Hilbert Spaces for Classification Katarina Domijan and Simon P. Wilson

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces RSS Robotics Science and

Positive kernels and reproducing kernel spaces: a rich tapestry of settings and applications

Econ 2148, fall 2019 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Composition operators on some analytic reproducing kernel Hilbert spaces Jan Stochel (Uniwersytet

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Lecture 1: Introduction to RKHS MLSS Cadiz, 2016 Gatsby Unit, CSML, UCL May 12, 2016 Lecture 1:

Lecture 1: Introduction to RKHS MLSS Tbingen, 2015 Gatsby Unit, CSML, UCL July 22, 2015

Krein spaces applied to Friedrichs systems Kre simir Burazin Department of Mathematics,

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

SVMs and Kernel Methods Lecture 3 David Sontag New York University Slides adapted from Luke

Efficient Structure-Aware Selection Techniques for 3D Point Cloud Visualizations with 2DOF Input

Panel f u nctions DATA VISU AL IZATION W ITH L ATTIC E IN R Deepa y an Sarkar Associate

Kernel methods for Network Analysis: An introduction Chiranjib Bhattacharyya Machine Learning

Explicit Feature Methods for Accelerated Kernel Learning Purushottam Kar Quick Motivation

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &amp;

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &