skoltech
play

Skoltech Skolkovo Institute of Science and Technology Kernel - PowerPoint PPT Presentation

Quadrature-based Features for Kernel Approximation Marina Munkhoeva , Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick: compute


  1. Quadrature-based Features for Kernel Approximation Marina Munkhoeva , Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets Skoltech Skolkovo Institute of Science and Technology

  2. Kernel Methods Refresher • Kernel trick: compute via kernel function K ( x , z ) = ⟨ ψ ( x ), ψ ( z ) ⟩ k ( x , z ) • Inner product in an implicit space using input features • Naively, kernel methods scale poorly with # of samples ψ Input space Feature space 1/9

  3. Scalable Kernel Methods • Revert the trick: k ( x , z ) ≈ ϕ ( x ) ⊤ ϕ ( z ) • Use linear methods with mapped objects x → ϕ ( x ) • How to generate approximate mapping ? ϕ ( ⋅ ) ψ Input space Feature space k ( x , y ) = ⟨ ψ ( x ), ψ ( y ) ⟩ ≈ ϕ ( x ) ⊤ ϕ ( y ) 2/9

  4. Kernel Function Approximation Consider kernels that allow integral representation: k ( x , y ) = 𝔽 p ( w ) f xy ( w ) = ∫ ℝ d f xy ( w ) p ( w ) d w = I ( f ), f xy ( w ) = ϕ ( w ⊤ x ) ϕ ( w ⊤ y ) = f ( w ), 3/9

  5. Kernel Function Approximation Consider kernels that allow integral representation: k ( x , y ) = 𝔽 p ( w ) f xy ( w ) = ∫ ℝ d f xy ( w ) p ( w ) d w = I ( f ), p ( w ) = (2 π ) − d /2 e − ∥ w ∥ 2 f xy ( w ) = ϕ ( w ⊤ x ) ϕ ( w ⊤ y ) = f ( w ), 2 3/9

  6. Kernel Function Approximation Consider kernels that allow integral representation: k ( x , y ) = 𝔽 p ( w ) f xy ( w ) = ∫ ℝ d f xy ( w ) p ( w ) d w = I ( f ), p ( w ) = (2 π ) − d /2 e − ∥ w ∥ 2 f xy ( w ) = ϕ ( w ⊤ x ) ϕ ( w ⊤ y ) = f ( w ), 2 • Shift-invariant kernels (e.g. radial basis functions (RBF) kernel) • Pointwise Nonlinear Gaussian kernels (e.g. arc-cosine kernels) 3/9

  7. Random Fourier Features (RFF) [Rahimi and Recht, 2008] RFF mapping : ϕ ( ⋅ ) k ( x , z ) = 𝔽 [ ϕ w ( x ) ϕ w ( z )] ϕ w ( x ) = [ cos( w ⊤ x ), sin( w ⊤ x ) ] , w ∼ p ( w ) RFF Monte Carlo approximation for I ( f ) • Orthogonal points more accurate w • Structured faster w • Orthogonal + structured more accurate and faster w 4/9

  8. Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ 5/9

  9. Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ ∞ ∫ e − r 2 Integration over radius : 2 | r | d − 1 h ( r ) dr r −∞ 5/9

  10. ̂ Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ ∞ ∫ e − r 2 Integration over radius : 2 | r | d − 1 h ( r ) dr r −∞ l h ( ρ i ) + h ( − ρ i ) Use radial rules ∑ R ( h ) = w i 2 i =0 5/9

  11. Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ ∫ U d Integration over unit d-sphere : U d s ( z ) d z p Use spherical rules ∑ ˜ w j s ( Qz j ) S Q ( s ) = j =1 5/9

  12. Quadrature-based Features [Genz and Monahan, 1998] introduced Spherical-Radial (SR) rules Q , ρ ( f xy ) = ( 1 − d j =1 [ ] f xy ( − ρ Qv j ) + f xy ( ρ Qv j ) d +1 ρ 2 ) f xy ( 0 ) + d ∑ SR 3,3 2 ρ 2 d + 1 We propose to estimate the integral by SR rules n I ( f xy ) = 1 ∑ Q , ρ ( f xy )] ≈ ̂ I ( f xy ) = 𝔽 Q , ρ [ SR 3,3 SR 3,3 Q i , ρ i ( f xy ) n i =1 sample complexity with constant smaller than RFF 𝒫 ( ε − 2 ) 6/9

  13. Our method generalizes RFF and ORF RFF are SR rules of degree (1, 1) Q , ρ = f ( ρ Qz ) + f ( − ρ Qz ) SR (1,1) SR (1,1) ρ ∼ χ ( d ), ρ Qz ∼ 𝒪 (0, I ) ⟹ w ∼ 𝒪 (0, I ) , Q , ρ = f ( w ), 2 7/9

  14. Our method generalizes RFF and ORF RFF are SR rules of degree (1, 1) Q , ρ = f ( ρ Qz ) + f ( − ρ Qz ) SR (1,1) SR (1,1) ρ ∼ χ ( d ), ρ Qz ∼ 𝒪 (0, I ) ⟹ w ∼ 𝒪 (0, I ) , Q , ρ = f ( w ), 2 Orthogonal Random Features (ORF) are SR rules of degree (1, 3) d f ( ρ Qe i ) + f ( − ρ Qe i ) ∑ SR (1,3) ρ ∼ χ ( d ) Q , ρ = , 2 i =1 7/9

  15. Faster mapping with orthogonal Q Use orthogonal butterfly matrices with structured factors c 1 − s 1 − s 2 0 0 c 2 0 0 − s 2 s 1 c 1 0 0 0 c 2 0 B (4) = − s 3 0 0 c 3 s 2 0 c 2 0 0 0 s 3 c 3 0 s 2 0 c 2 − s 1 c 2 − c 1 s 2 c 1 c 2 s 1 s 2 − s 1 s 2 − c 1 s 2 s 1 c 2 c 1 c 2 = − s 3 s 2 − s 3 c 2 c 3 s 2 c 3 c 2 s 3 s 2 c 3 s 2 s 3 c 2 c 3 c 2 Allow fast matrix-vector multiplication ( ) 𝒫 ( n log n ) 8/9

  16. Kernel Approximation Accuracy (ours - B) Powerplant LETTER USPS MNIST CIFAR100 LEUKEMIA × 10 − 1 × 10 − 1 × 10 − 2 × 10 − 2 × 10 − 2 × 10 − 2 6 1 . 4 1 . 8 Arc-cosine 0 4 . 8 3 . 6 1 . 8 1 . 2 5 1 . 5 4 . 0 3 . 0 K k 1 . 5 1 . 0 k K � ˆ k K k 4 1 . 2 3 . 2 2 . 4 1 . 2 0 . 8 0 . 9 3 2 . 4 0 . 9 1 . 8 0 . 6 0 . 6 2 0 . 6 1 . 6 1 . 2 0 . 4 0 . 3 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 × 10 − 1 5 × 10 − 1 × 10 − 1 × 10 − 2 × 10 − 2 × 10 − 2 G 7 . 5 3 . 0 1 . 0 Arc-cosine 1 3 . 0 6 . 0 4 Gort 6 . 0 2 . 4 0 . 8 2 . 4 K k 4 . 5 ROM 3 k K � ˆ 4 . 5 k K k 1 . 8 1 . 8 0 . 6 QMC 2 3 . 0 1 . 2 3 . 0 0 . 4 1 . 2 GQ 1 0 . 6 1 . 5 1 . 5 0 . 2 0 . 6 B 0 0 . 0 0 . 0 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 × 10 − 2 × 10 − 2 × 10 − 2 × 10 − 3 × 10 − 3 × 10 − 4 5 1 . 25 3 . 0 2 . 5 4 . 0 7 . 5 Gaussian 4 1 . 00 2 . 5 3 . 2 2 . 0 6 . 0 K k k K � ˆ 3 k K k 2 . 0 0 . 75 2 . 4 1 . 5 4 . 5 2 1 . 5 0 . 50 1 . 6 1 . 0 3 . 0 1 . 0 0 . 8 0 . 25 1 0 . 5 1 . 5 0 . 5 0 . 0 0 . 00 0 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 n n n n n n 9/9

  17. Summary Our method quadrature-based features • applicable to a wide range of kernels • achieves higher accuracy • uses structured matrices • generalizes previous work Poster #130

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend