Skoltech
Skolkovo Institute of Science and Technology
Quadrature-based Features for Kernel Approximation
Marina Munkhoeva, Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets
Skoltech Skolkovo Institute of Science and Technology Kernel - - PowerPoint PPT Presentation
Quadrature-based Features for Kernel Approximation Marina Munkhoeva , Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick: compute
Skolkovo Institute of Science and Technology
Marina Munkhoeva, Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets
Input space Feature space ψ
K(x, z) = ⟨ψ(x), ψ(z)⟩
1/9
Input space Feature space ψ
k(x, y) = ⟨ψ(x), ψ(y)⟩ ≈ ϕ(x)⊤ϕ(y)
2/9
Consider kernels that allow integral representation: k(x, y) = 𝔽p(w)fxy(w) = ∫ℝd fxy(w)p(w)dw = I(f ), fxy(w) = ϕ(w⊤x)ϕ(w⊤y) = f(w),
3/9
Consider kernels that allow integral representation: k(x, y) = 𝔽p(w)fxy(w) = ∫ℝd fxy(w)p(w)dw = I(f ), fxy(w) = ϕ(w⊤x)ϕ(w⊤y) = f(w),
p(w) = (2π)−d/2e− ∥w∥2
2
3/9
Consider kernels that allow integral representation: k(x, y) = 𝔽p(w)fxy(w) = ∫ℝd fxy(w)p(w)dw = I(f ),
p(w) = (2π)−d/2e− ∥w∥2
2
fxy(w) = ϕ(w⊤x)ϕ(w⊤y) = f(w),
3/9
w w w
[Rahimi and Recht, 2008] RFF mapping : RFF Monte Carlo approximation for
k(x, z) = 𝔽[ϕw(x)ϕw(z)]
ϕw(x) = [cos(w⊤x), sin(w⊤x)], w ∼ p(w)
I(f )
4/9
Change to polar coordinates ( )
I(f ) = (2π)− d
2 ∫ℝd
e− ∥w∥2
2 f(w)dw = (2π)− d 2
2 ∫Ud ∫
∞ −∞
e− r2
2 |r|d−1 f(rz)dr
dz
5/9
I(f ) = (2π)− d
2 ∫ℝd
e− ∥w∥2
2 f(w)dw = (2π)− d 2
2 ∫Ud ∫
∞ −∞
e− r2
2 |r|d−1 f(rz)dr
dz
Change to polar coordinates ( )
Integration over radius :
5/9
∫
∞ −∞
e− r2
2 |r|d−1h(r)dr
Change to polar coordinates ( )
R(h) =
l
∑
i=0
̂ wi h(ρi) + h(−ρi) 2
Integration over radius :
Use radial rules
5/9
I(f ) = (2π)− d
2 ∫ℝd
e− ∥w∥2
2 f(w)dw = (2π)− d 2
2 ∫Ud ∫
∞ −∞
e− r2
2 |r|d−1 f(rz)dr
dz
∫
∞ −∞
e− r2
2 |r|d−1h(r)dr
Change to polar coordinates ( )
I(f ) = (2π)− d
2 ∫ℝd
e− ∥w∥2
2 f(w)dw = (2π)− d 2
2 ∫Ud ∫
∞ −∞
e− r2
2 |r|d−1 f(rz)dr
dz
Integration over unit d-sphere :
Ud
∫Ud s(z)dz SQ(s) =
p
∑
j=1
˜ w js(Qzj)
Use spherical rules
5/9
SR3,3
Q,ρ(fxy) = (1 − d
ρ2 ) fxy(0) + d d + 1
d+1
∑
j=1 [
fxy(−ρQvj) + fxy(ρQvj) 2ρ2 ]
I(fxy) = 𝔽Q,ρ[SR3,3
Q,ρ(fxy)] ≈ ̂
I(fxy) = 1 n
n
∑
i=1
SR3,3
Qi,ρi(fxy)
[Genz and Monahan, 1998] introduced Spherical-Radial (SR) rules We propose to estimate the integral by SR rules sample complexity with constant smaller than RFF
𝒫(ε−2)
6/9
SR(1,1)
Q,ρ = f(ρQz) + f(−ρQz)
2 , ρ ∼ χ(d), ρQz ∼ 𝒪(0,I) ⟹ SR(1,1)
Q,ρ = f(w),
w ∼ 𝒪(0,I)
RFF are SR rules of degree (1, 1)
7/9
SR(1,1)
Q,ρ = f(ρQz) + f(−ρQz)
2 , ρ ∼ χ(d), ρQz ∼ 𝒪(0,I) ⟹ SR(1,1)
Q,ρ = f(w),
w ∼ 𝒪(0,I)
RFF are SR rules of degree (1, 1)
SR(1,3)
Q,ρ = d
∑
i=1
f(ρQei) + f(−ρQei) 2 , ρ ∼ χ(d)
Orthogonal Random Features (ORF) are SR rules of degree (1, 3)
7/9
Use orthogonal butterfly matrices with structured factors Allow fast matrix-vector multiplication ( )
B(4) = c1 −s1 s1 c1 c3 −s3 s3 c3 c2 −s2 c2 −s2 s2 c2 s2 c2
= c1c2 −s1c2 −c1s2 s1s2 s1c2 c1c2 −s1s2 −c1s2 c3s2 −s3s2 c3c2 −s3c2 s3s2 c3s2 s3c2 c3c2
8/9
1 2 3 4 5 1.6 2.4 3.2 4.0 4.8 kK ˆ Kk kKk ×10−1
Arc-cosine 0
Powerplant
1 2 3 4 5 0.6 0.9 1.2 1.5 1.8 ×10−1
LETTER
1 2 3 4 5 2 3 4 5 6 ×10−2
USPS
1 2 3 4 5 1.2 1.8 2.4 3.0 3.6 ×10−2
MNIST
1 2 3 4 5 0.3 0.6 0.9 1.2 1.5 1.8 ×10−2
CIFAR100
1 2 3 4 5 0.4 0.6 0.8 1.0 1.2 1.4 ×10−2
LEUKEMIA
1 2 3 4 5 1.5 3.0 4.5 6.0 7.5 kK ˆ Kk kKk ×10−1
Arc-cosine 1
1 2 3 4 5 1 2 3 4 5 ×10−1 1 2 3 4 5 0.0 0.2 0.4 0.6 0.8 1.0 ×10−1 1 2 3 4 5 1.5 3.0 4.5 6.0 ×10−2 1 2 3 4 5 0.0 0.6 1.2 1.8 2.4 3.0 ×10−2 1 2 3 4 5 0.6 1.2 1.8 2.4 3.0 ×10−2 1 2 3 4 5
n
1.5 3.0 4.5 6.0 7.5 kK ˆ Kk kKk ×10−2
Gaussian
1 2 3 4 5
n
0.00 0.25 0.50 0.75 1.00 1.25 ×10−2 1 2 3 4 5
n
0.5 1.0 1.5 2.0 2.5 3.0 ×10−2 1 2 3 4 5
n
1 2 3 4 5 ×10−3 1 2 3 4 5
n
0.5 1.0 1.5 2.0 2.5 ×10−3 1 2 3 4 5
n
0.0 0.8 1.6 2.4 3.2 4.0 ×10−4
G Gort ROM QMC GQ B
9/9
Our method quadrature-based features