Skoltech Skolkovo Institute of Science and Technology Kernel - PowerPoint PPT Presentation

Quadrature-based Features for Kernel Approximation Marina Munkhoeva , Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets Skoltech Skolkovo Institute of Science and Technology

Kernel Methods Refresher • Kernel trick: compute via kernel function K ( x , z ) = ⟨ ψ ( x ), ψ ( z ) ⟩ k ( x , z ) • Inner product in an implicit space using input features • Naively, kernel methods scale poorly with # of samples ψ Input space Feature space 1/9

Scalable Kernel Methods • Revert the trick: k ( x , z ) ≈ ϕ ( x ) ⊤ ϕ ( z ) • Use linear methods with mapped objects x → ϕ ( x ) • How to generate approximate mapping ? ϕ ( ⋅ ) ψ Input space Feature space k ( x , y ) = ⟨ ψ ( x ), ψ ( y ) ⟩ ≈ ϕ ( x ) ⊤ ϕ ( y ) 2/9

Kernel Function Approximation Consider kernels that allow integral representation: k ( x , y ) = 𝔽 p ( w ) f xy ( w ) = ∫ ℝ d f xy ( w ) p ( w ) d w = I ( f ), f xy ( w ) = ϕ ( w ⊤ x ) ϕ ( w ⊤ y ) = f ( w ), 3/9

Kernel Function Approximation Consider kernels that allow integral representation: k ( x , y ) = 𝔽 p ( w ) f xy ( w ) = ∫ ℝ d f xy ( w ) p ( w ) d w = I ( f ), p ( w ) = (2 π ) − d /2 e − ∥ w ∥ 2 f xy ( w ) = ϕ ( w ⊤ x ) ϕ ( w ⊤ y ) = f ( w ), 2 3/9

Kernel Function Approximation Consider kernels that allow integral representation: k ( x , y ) = 𝔽 p ( w ) f xy ( w ) = ∫ ℝ d f xy ( w ) p ( w ) d w = I ( f ), p ( w ) = (2 π ) − d /2 e − ∥ w ∥ 2 f xy ( w ) = ϕ ( w ⊤ x ) ϕ ( w ⊤ y ) = f ( w ), 2 • Shift-invariant kernels (e.g. radial basis functions (RBF) kernel) • Pointwise Nonlinear Gaussian kernels (e.g. arc-cosine kernels) 3/9

Random Fourier Features (RFF) [Rahimi and Recht, 2008] RFF mapping : ϕ ( ⋅ ) k ( x , z ) = 𝔽 [ ϕ w ( x ) ϕ w ( z )] ϕ w ( x ) = [ cos( w ⊤ x ), sin( w ⊤ x ) ] , w ∼ p ( w ) RFF Monte Carlo approximation for I ( f ) • Orthogonal points more accurate w • Structured faster w • Orthogonal + structured more accurate and faster w 4/9

Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ 5/9

Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ ∞ ∫ e − r 2 Integration over radius : 2 | r | d − 1 h ( r ) dr r −∞ 5/9

̂ Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ ∞ ∫ e − r 2 Integration over radius : 2 | r | d − 1 h ( r ) dr r −∞ l h ( ρ i ) + h ( − ρ i ) Use radial rules ∑ R ( h ) = w i 2 i =0 5/9

Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ ∫ U d Integration over unit d-sphere : U d s ( z ) d z p Use spherical rules ∑ ˜ w j s ( Qz j ) S Q ( s ) = j =1 5/9

Quadrature-based Features [Genz and Monahan, 1998] introduced Spherical-Radial (SR) rules Q , ρ ( f xy ) = ( 1 − d j =1 [ ] f xy ( − ρ Qv j ) + f xy ( ρ Qv j ) d +1 ρ 2 ) f xy ( 0 ) + d ∑ SR 3,3 2 ρ 2 d + 1 We propose to estimate the integral by SR rules n I ( f xy ) = 1 ∑ Q , ρ ( f xy )] ≈ ̂ I ( f xy ) = 𝔽 Q , ρ [ SR 3,3 SR 3,3 Q i , ρ i ( f xy ) n i =1 sample complexity with constant smaller than RFF 𝒫 ( ε − 2 ) 6/9

Our method generalizes RFF and ORF RFF are SR rules of degree (1, 1) Q , ρ = f ( ρ Qz ) + f ( − ρ Qz ) SR (1,1) SR (1,1) ρ ∼ χ ( d ), ρ Qz ∼ 𝒪 (0, I ) ⟹ w ∼ 𝒪 (0, I ) , Q , ρ = f ( w ), 2 7/9

Our method generalizes RFF and ORF RFF are SR rules of degree (1, 1) Q , ρ = f ( ρ Qz ) + f ( − ρ Qz ) SR (1,1) SR (1,1) ρ ∼ χ ( d ), ρ Qz ∼ 𝒪 (0, I ) ⟹ w ∼ 𝒪 (0, I ) , Q , ρ = f ( w ), 2 Orthogonal Random Features (ORF) are SR rules of degree (1, 3) d f ( ρ Qe i ) + f ( − ρ Qe i ) ∑ SR (1,3) ρ ∼ χ ( d ) Q , ρ = , 2 i =1 7/9

Faster mapping with orthogonal Q Use orthogonal butterfly matrices with structured factors c 1 − s 1 − s 2 0 0 c 2 0 0 − s 2 s 1 c 1 0 0 0 c 2 0 B (4) = − s 3 0 0 c 3 s 2 0 c 2 0 0 0 s 3 c 3 0 s 2 0 c 2 − s 1 c 2 − c 1 s 2 c 1 c 2 s 1 s 2 − s 1 s 2 − c 1 s 2 s 1 c 2 c 1 c 2 = − s 3 s 2 − s 3 c 2 c 3 s 2 c 3 c 2 s 3 s 2 c 3 s 2 s 3 c 2 c 3 c 2 Allow fast matrix-vector multiplication ( ) 𝒫 ( n log n ) 8/9

Kernel Approximation Accuracy (ours - B) Powerplant LETTER USPS MNIST CIFAR100 LEUKEMIA × 10 − 1 × 10 − 1 × 10 − 2 × 10 − 2 × 10 − 2 × 10 − 2 6 1 . 4 1 . 8 Arc-cosine 0 4 . 8 3 . 6 1 . 8 1 . 2 5 1 . 5 4 . 0 3 . 0 K k 1 . 5 1 . 0 k K � ˆ k K k 4 1 . 2 3 . 2 2 . 4 1 . 2 0 . 8 0 . 9 3 2 . 4 0 . 9 1 . 8 0 . 6 0 . 6 2 0 . 6 1 . 6 1 . 2 0 . 4 0 . 3 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 × 10 − 1 5 × 10 − 1 × 10 − 1 × 10 − 2 × 10 − 2 × 10 − 2 G 7 . 5 3 . 0 1 . 0 Arc-cosine 1 3 . 0 6 . 0 4 Gort 6 . 0 2 . 4 0 . 8 2 . 4 K k 4 . 5 ROM 3 k K � ˆ 4 . 5 k K k 1 . 8 1 . 8 0 . 6 QMC 2 3 . 0 1 . 2 3 . 0 0 . 4 1 . 2 GQ 1 0 . 6 1 . 5 1 . 5 0 . 2 0 . 6 B 0 0 . 0 0 . 0 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 × 10 − 2 × 10 − 2 × 10 − 2 × 10 − 3 × 10 − 3 × 10 − 4 5 1 . 25 3 . 0 2 . 5 4 . 0 7 . 5 Gaussian 4 1 . 00 2 . 5 3 . 2 2 . 0 6 . 0 K k k K � ˆ 3 k K k 2 . 0 0 . 75 2 . 4 1 . 5 4 . 5 2 1 . 5 0 . 50 1 . 6 1 . 0 3 . 0 1 . 0 0 . 8 0 . 25 1 0 . 5 1 . 5 0 . 5 0 . 0 0 . 00 0 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 n n n n n n 9/9

Summary Our method quadrature-based features • applicable to a wide range of kernels • achieves higher accuracy • uses structured matrices • generalizes previous work Poster #130

Skoltech Skolkovo Institute of Science and Technology Kernel - PowerPoint PPT Presentation

Quadrature-based Features for Kernel Approximation Marina Munkhoeva , Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick: compute

The CDIO Approach - and its implementation at Skoltech Kristina Edstrm Director of Educational

Batteries market landscape and sources for improvement Skoltech conference September 2017 1

Purification of cell components Sergiev P.V. 1755 Disintegration of samples Lysis of bacterial

Genetic engineering Sergiev P.V. 1755 Enzymes used for genetic engineering Restriction

Macromolecules structure and interactions Sergiev P.V. 1755 Structure of macromolecules X-ray

Indoor Localization Accuracy Estimation from Fingerprint Data Artyom Nikitin 1 Christos Laoudias 2

Bayesian Sparsification of Deep Complex-valued networks Ivan Nazarov, Evgeny Burnaev ADASE

Advanced Approaches to Object Recognition and 3D Model Construction from Heterogeneous Data

Announcements ICS 6B Regrades for everything returned today are due on Thursday Boolean

Query Answering with Transitive and Linear-Ordered Data Antoine Amar illi 1 , M i c h a el B

CSEE 3827: Fundamentals of Computer Systems Lecture 3 January 28, 2009 Martha Kim

CS 360 Programming Languages Day 11 Lexical Scope What is scope? The scope of a

Series Solutions Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University, College

Alternating Minimizations Converge to Second-order Optimal Solutions Qiuwei Li 1 Joint work with

Logic as a Tool Chapter 3: Understanding First-order Logic 3.2 Semantics of first-order logic

Maximum number of distinct and nonequivalent nonstandard squares in a word Tomasz Kociumaka 1

An introduction to weak memory consistency and the out-of-thin-air problem Viktor Vafeiadis Max

6 Plane Stress Transformations ASEN 3112 Lecture 6 Slide 1 ASEN 3112 - Structures Plane

Partially Ordered Sets and their M obius Functions III: Topology of Posets Bruce Sagan

Distance Metrics Mark Voorhies 4/5/2018 Mark Voorhies Distance Metrics List tricks Adding

recap: Overfitting Fitting the data more than is warranted Learning From Data Data Lecture 12

Resources Recovery and Digestate Utilisation Dr R.W. Lovitt, Darren Oatley-Radcliffe, Paul

Why MSPs Should Be Using A Cloud Based Archival Solution About Your Speakers Anthony Spiteri

Regression and Prediction Class 15. 23 Oct 2012 Instructor: Bhiksha Raj 23 Oct 2012

Sambuz

Useful Links

Newsletter

Mail Us