Performance Guarantees for Random Fourier Features Limitations and - - PowerPoint PPT Presentation

performance guarantees for random fourier features
SMART_READER_LITE
LIVE PREVIEW

Performance Guarantees for Random Fourier Features Limitations and - - PowerPoint PPT Presentation

Performance Guarantees for Random Fourier Features Limitations and Merits Zolt an Szab o Joint work with Bharath K. Sriperumbudur (PSU) ML@SITraN, University of Sheffield June 25, 2015 Zolt an Szab o Random Fourier Features


slide-1
SLIDE 1

Performance Guarantees for Random Fourier Features – Limitations and Merits

Zolt´ an Szab´

  • Joint work with Bharath K. Sriperumbudur (PSU)

ML@SITraN, University of Sheffield June 25, 2015

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-2
SLIDE 2

Context

Given: k(x, y) =

  • Rd eiωT (x−y)dΛ(ω) =
  • Rd cos
  • ωT (x − y)
  • dΛ(ω).

ˆ k(x, y): Monte-Carlo estimator of k(x, y) using (ωj)m

j=1 i.i.d.

∼ Λ [Rahimi and Recht, 2007]. Motivation:

Primal form – fast linear solvers. Kernel function approximation: out-of-sample extension. Online applications.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-3
SLIDE 3

Performance measures

Uniform (r = ∞):

  • k − ˆ

k

  • S := sup

x,y∈S

  • k(x, y) − ˆ

k(x, y)

  • .

Lr (1 ≤ r < ∞): k − ˆ kLr (S) :=

  • S
  • S

|k(x, y) − ˆ k(x, y)|r dx dy 1

r

.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-4
SLIDE 4

Approximation of kernel derivatives

One could also consider ∂p,qk. Motivation [Zhou, 2008, Shi et al., 2010, Rosasco et al., 2010, Rosasco et al., 2013, Ying et al., 2012, Sriperumbudur et al., 2014]:

semi-supervised learning with gradient information, nonlinear variable selection, fitting of infD exp. family distributions.

Many of the presented results hold for derivatives ([p; q] = 0).

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-5
SLIDE 5

Goal

Large deviation inequalities Λm

  • k − ˆ

k

  • S ≤ ǫ
  • ≥ f1(ǫ, d, m, |S|),

Λm

  • k − ˆ

k

  • Lr ≤ ǫ
  • ≥ f2(ǫ, d, m, |S|).

Scaling of |S| and m ensuring a.s. convergence?

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-6
SLIDE 6

Existing results on the approximation quality

Notations: Xn = Op(rn) (Oa.s.(rn)) denotes Xn

rn boundedness in

probability (almost surely). [Rahimi and Recht, 2007]:

  • ˆ

k(x, y) − k(x, y)

  • S = Op
  • |S|
  • log m

m

  • .

[Sutherland and Schneider, 2015]: better constants.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-7
SLIDE 7

Contents

Uniform guarantee (empirical process theory), Two Lr guarantees (uniform consequence, direct). Kernel derivatives.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-8
SLIDE 8

High-level proof

1

Empirical process form:

  • k − ˆ

k

  • S = sup

g∈G

|Λg − Λmg| = Λ − ΛmG .

2

Λ − ΛmG concentrates by its bounded difference property: Λ − ΛmG Eω1:m Λ − ΛmG + 1 √m.

3

G is a uniformly bounded, separable Carath´ eodory family ⇒ Eω1:m Λ − ΛmG Eω1:mR (G, ω1:m) .

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-9
SLIDE 9

High-level proof

4

Using Dudley’s entropy integral: R (G, ω1:m) 1 √m |G|L2(Λm)

  • log N(G, L2(Λm), r)dr.

5

G is smoothly parameterized by a compact set ⇒

  • log N(G, L2(Λm), r) ≤
  • log

C (ω1:m) r + 1

Eω1:mR (G, ω1:m) 1 √m.

6

Putting together:

  • k − ˆ

k

  • S

1 √m + 1 √m = O

  • log |S|

m

  • .

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-10
SLIDE 10

Step-1: empirical process form

Notation: Λg =

  • g(ω)dΛ(ω), Λmg =
  • g(ω)dΛm(ω) = 1

m

m

j=1 g(ωj).

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-11
SLIDE 11

Step-1: empirical process form

Notation: Λg =

  • g(ω)dΛ(ω), Λmg =
  • g(ω)dΛm(ω) = 1

m

m

j=1 g(ωj).

Reformulation of the objective: sup

x,y∈S

  • k(x, y) − ˆ

k(x, y)

  • = sup

g∈G

|Λg − Λmg| =: Λ − ΛmG , where G = {gz : z ∈ S∆}, S∆ = S − S = {x − y : x, y ∈ S}, gz : ω → cos

  • ωTz
  • .

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-12
SLIDE 12

Step-2: bounded difference property of Λ − ΛmG

McDiarmid inequality: Let ω1, . . . , ωm ∈ D be independent r.v.-s, and f : Dm → R satisfy the bounded diff. property (∀r): sup

u1,...,um,u′

r∈D

  • f (u1, . . . , um) − f (u1, . . . , ur−1, u′

r, ur+1, . . . , um)

  • ≤ cr.

Then for ∀β > 0 P (f (ω1, . . . , ωm) − E [f (ω1, . . . , ωm)] ≥ β) ≤ e

2β2 m r=1 c2 r . Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-13
SLIDE 13

Step-2: bounded difference property of Λ − ΛmG

Our choice: f (ω1, . . . , ωm) := Λ − ΛmG. |f (ω1, . . . , ωr−1, ωr, ωr+1, . . . , ωm) − f (ω1, . . . , ωr−1, ω′

r, ωr+1, . . . , ωm)| =

=

  • sup

g∈G

  • Λg − 1

m

  • j=1

g(ωj)

  • − sup

g∈G

  • Λg − 1

m

  • j=1

g(ωj) + 1 m

  • g(ωr) − g(ω′

r)

  • Zolt´

an Szab´

  • Random Fourier Features – Limitations and Merits
slide-14
SLIDE 14

Step-2: bounded difference property of Λ − ΛmG

Our choice: f (ω1, . . . , ωm) := Λ − ΛmG. |f (ω1, . . . , ωr−1, ωr, ωr+1, . . . , ωm) − f (ω1, . . . , ωr−1, ω′

r, ωr+1, . . . , ωm)| =

=

  • sup

g∈G

  • Λg − 1

m

  • j=1

g(ωj)

  • − sup

g∈G

  • Λg − 1

m

  • j=1

g(ωj) + 1 m

  • g(ωr) − g(ω′

r)

  • (∗)

≤ 1 m sup

g∈G

  • g(ωr) − g(ω′

r)

  • Zolt´

an Szab´

  • Random Fourier Features – Limitations and Merits
slide-15
SLIDE 15

Step-2: bounded difference property of Λ − ΛmG

Our choice: f (ω1, . . . , ωm) := Λ − ΛmG. |f (ω1, . . . , ωr−1, ωr, ωr+1, . . . , ωm) − f (ω1, . . . , ωr−1, ω′

r, ωr+1, . . . , ωm)| =

=

  • sup

g∈G

  • Λg − 1

m

  • j=1

g(ωj)

  • − sup

g∈G

  • Λg − 1

m

  • j=1

g(ωj) + 1 m

  • g(ωr) − g(ω′

r)

  • (∗)

≤ 1 m sup

g∈G

  • g(ωr) − g(ω′

r)

  • ≤ 1

m sup

g∈G

  • |g(ωr)| +
  • g(ω′

r)

  • Zolt´

an Szab´

  • Random Fourier Features – Limitations and Merits
slide-16
SLIDE 16

Step-2: bounded difference property of Λ − ΛmG

Our choice: f (ω1, . . . , ωm) := Λ − ΛmG. |f (ω1, . . . , ωr−1, ωr, ωr+1, . . . , ωm) − f (ω1, . . . , ωr−1, ω′

r, ωr+1, . . . , ωm)| =

=

  • sup

g∈G

  • Λg − 1

m

  • j=1

g(ωj)

  • − sup

g∈G

  • Λg − 1

m

  • j=1

g(ωj) + 1 m

  • g(ωr) − g(ω′

r)

  • (∗)

≤ 1 m sup

g∈G

  • g(ωr) − g(ω′

r)

  • ≤ 1

m sup

g∈G

  • |g(ωr)| +
  • g(ω′

r)

  • ≤ 1

m

  • sup

g∈G

|g(ωr)| + sup

g∈G

  • g(ω′

r)

  • Zolt´

an Szab´

  • Random Fourier Features – Limitations and Merits
slide-17
SLIDE 17

Step-2: bounded difference property of Λ − ΛmG

Our choice: f (ω1, . . . , ωm) := Λ − ΛmG. |f (ω1, . . . , ωr−1, ωr, ωr+1, . . . , ωm) − f (ω1, . . . , ωr−1, ω′

r, ωr+1, . . . , ωm)| =

=

  • sup

g∈G

  • Λg − 1

m

  • j=1

g(ωj)

  • − sup

g∈G

  • Λg − 1

m

  • j=1

g(ωj) + 1 m

  • g(ωr) − g(ω′

r)

  • (∗)

≤ 1 m sup

g∈G

  • g(ωr) − g(ω′

r)

  • ≤ 1

m sup

g∈G

  • |g(ωr)| +
  • g(ω′

r)

  • ≤ 1

m

  • sup

g∈G

|g(ωr)| + sup

g∈G

  • g(ω′

r)

  • ≤ 1 + 1

m = 2 m.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-18
SLIDE 18

Step-2: (*) = reverse triangle inequality with sup

Lemma: G: set of functions, a, b : G → R maps; then

  • sup

g∈G

|a(g)| − sup

g∈G

|a(g) + b(g)|

slide-19
SLIDE 19

Step-2: (*) = reverse triangle inequality with sup

Lemma: G: set of functions, a, b : G → R maps; then

  • sup

g∈G

|a(g)| − sup

g∈G

|a(g) + b(g)|

  • ≤ sup

g∈G

|b(g)|.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-20
SLIDE 20

Step-2: (*) = reverse triangle inequality with sup

Lemma: G: set of functions, a, b : G → R maps; then

  • sup

g∈G

|a(g)| − sup

g∈G

|a(g) + b(g)|

  • ≤ sup

g∈G

|b(g)|. Proof: combine supg∈G|a(g) + b(g)| ≤ sup

g∈G

(|a(g)| + |b(g)|) ≤ sup

g∈G

|a(g)| + sup

g∈G

|b(g)|,

slide-21
SLIDE 21

Step-2: (*) = reverse triangle inequality with sup

Lemma: G: set of functions, a, b : G → R maps; then

  • sup

g∈G

|a(g)| − sup

g∈G

|a(g) + b(g)|

  • ≤ sup

g∈G

|b(g)|. Proof: combine supg∈G|a(g) + b(g)| ≤ sup

g∈G

(|a(g)| + |b(g)|) ≤ sup

g∈G

|a(g)| + sup

g∈G

|b(g)|, sup

g∈G

|a(g)| = sup

g∈G

|a(g) + b(g) − b(g)| ≤ sup

g∈G

|a(g) + b(g)| + sup

g∈G

|b(g)|.

slide-22
SLIDE 22

Step-2: (*) = reverse triangle inequality with sup

Lemma: G: set of functions, a, b : G → R maps; then

  • sup

g∈G

|a(g)| − sup

g∈G

|a(g) + b(g)|

  • ≤ sup

g∈G

|b(g)|. Proof: combine supg∈G|a(g) + b(g)| ≤ sup

g∈G

(|a(g)| + |b(g)|) ≤ sup

g∈G

|a(g)| + sup

g∈G

|b(g)|, sup

g∈G

|a(g)| = sup

g∈G

|a(g) + b(g) − b(g)| ≤ sup

g∈G

|a(g) + b(g)| + sup

g∈G

|b(g)|. ⇒ ±

  • sup

g∈G

|a(g)| − sup

g∈G

|a(g) + b(g)|

  • ≤ sup

g∈G

|b(g)|. Our choice: a(g) = Λg − 1

m

  • j=1 g(ωj), b(g) = 1

m [g(ωr) − g(ω′ r)].

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-23
SLIDE 23

Step-2

Applying McDiarmid to f (cr = 2

m): with probability 1 − e−τ

Λ − ΛmG ≤ Eω1:m Λ − ΛmG

  • Step-3: bounding this term

+ √ 2τ √m .

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-24
SLIDE 24

Step-3: bounding Eω1,...,ωm Λ − ΛmG

G = {gz : z ∈ S∆} is a separable Carath´ eodory family, i.e.

1

ω → cos

  • ωTz
  • : measurable for ∀z ∈ S∆.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-25
SLIDE 25

Step-3: bounding Eω1,...,ωm Λ − ΛmG

G = {gz : z ∈ S∆} is a separable Carath´ eodory family, i.e.

1

ω → cos

  • ωTz
  • : measurable for ∀z ∈ S∆.

2

z → cos

  • ωT z
  • : continuous for ∀ω.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-26
SLIDE 26

Step-3: bounding Eω1,...,ωm Λ − ΛmG

G = {gz : z ∈ S∆} is a separable Carath´ eodory family, i.e.

1

ω → cos

  • ωTz
  • : measurable for ∀z ∈ S∆.

2

z → cos

  • ωT z
  • : continuous for ∀ω.

3

Rd is separable, S∆ ⊆ Rd ⇒ S∆: separable.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-27
SLIDE 27

Step-3: bounding Eω1,...,ωm Λ − ΛmG

G = {gz : z ∈ S∆} is a separable Carath´ eodory family, i.e.

1

ω → cos

  • ωTz
  • : measurable for ∀z ∈ S∆.

2

z → cos

  • ωT z
  • : continuous for ∀ω.

3

Rd is separable, S∆ ⊆ Rd ⇒ S∆: separable. Thus, by [Steinwart and Christmann, 2008, Prop. 7.10] Eω1:m Λ − ΛmG ≤ 2Eω1:m[ R (G, ω1:m)

  • :=Eǫ supg∈G| 1

m

m

j=1 ǫjg(ωj)|

] using the uniformly boundedness of G (sup

g∈G

g∞ ≤ 1).

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-28
SLIDE 28

Step-4: bounding R

R

  • G, (ωj)m

j=1

  • ≤ 8

√ 2 √m |G|L2(Λm)

  • log N(G, L2(Λm), r)dr,

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-29
SLIDE 29

Step-4: bounding R

R

  • G, (ωj)m

j=1

  • ≤ 8

√ 2 √m |G|L2(Λm)

  • log N(G, L2(Λm), r)dr,

where L2(Λm) = L2(Rd, B(Rd), Λm), gL2(Λm) =

  • 1

m

m

j=1 g2(ωj),

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-30
SLIDE 30

Step-4: bounding R

R

  • G, (ωj)m

j=1

  • ≤ 8

√ 2 √m |G|L2(Λm)

  • log N(G, L2(Λm), r)dr,

where L2(Λm) = L2(Rd, B(Rd), Λm), gL2(Λm) =

  • 1

m

m

j=1 g2(ωj),

|G|L2(Λm) = supg1,g2∈G g1 − g2L2(Λm),

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-31
SLIDE 31

Step-4: bounding R

R

  • G, (ωj)m

j=1

  • ≤ 8

√ 2 √m |G|L2(Λm)

  • log N(G, L2(Λm), r)dr,

where L2(Λm) = L2(Rd, B(Rd), Λm), gL2(Λm) =

  • 1

m

m

j=1 g2(ωj),

|G|L2(Λm) = supg1,g2∈G g1 − g2L2(Λm), N(G, L2(Λm), r): r-covering number.

r-net: S ⊆ G, for ∀g ∈ G ∃s ∈ S such that g − sL2(Λm) ≤ r. N: size of the smallest r-net of G.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-32
SLIDE 32

Step-5: bound on |G|L2(Λm)

|G|L2(Λm) = sup

g1,g2∈G

g1 − g2L2(Λm) ≤ sup

g1,g2∈G

  • g1L2(Λm) + g2L2(Λm)
  • ≤ sup

g1∈G

g1L2(Λm) + sup

g1∈G

g2L2(Λm)

≤ 2 × 1,

slide-33
SLIDE 33

Step-5: bound on |G|L2(Λm)

|G|L2(Λm) = sup

g1,g2∈G

g1 − g2L2(Λm) ≤ sup

g1,g2∈G

  • g1L2(Λm) + g2L2(Λm)
  • ≤ sup

g1∈G

g1L2(Λm) + sup

g1∈G

g2L2(Λm)

≤ 2 × 1, sup

g∈G

gL2(Λm) = sup

z∈S∆

  • 1

m

m

  • j=1

g2

z (ωj)

= sup

z∈S∆

  • 1

m

m

  • j=1

cos2

  • ωT

j z

  • ≤ sup

z∈S∆

  • 1

m

m

  • j=1

1= 1.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-34
SLIDE 34

Step-5: bound on N (G, L2(Λm), r)

Let gz1, gz2 ∈ G. We want to bound gz1 − gz2L2(Λm). One term:

  • cos
  • ωT z1
  • − cos
  • ωT z2
  • =
  • ∇z cos
  • ωT zc
  • 2 z1 − z22

=

  • − sin
  • ωT zc
  • ω
  • 2 z1 − z22

≤ ω2 z1 − z22 , where zc ∈ (z1, z2).

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-35
SLIDE 35

Step-5: bound on N (G, L2(Λm), r)

Smooth parameterization: gz1 − gz2L2(Λm) ≤

  • 1

m

m

  • j=1
  • ωj2 z1 − z22

2 = z1 − z22

  • 1

m

m

  • j=1

ωj2

2

  • =:A

. r-net on (S∆, ·2) ⇒ r ′ = Ar-net on (G, L2(Λm)). In other words, N

  • G, L2(Λm), r
  • ≤ N
  • S∆, ·2 , r

A

  • .

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-36
SLIDE 36

Step-5: bound on N (G, L2(Λm), r)

Note that S∆ ⊆ B·2

  • t, |S∆|

2

  • for some t ∈ Rd.

N(B·2(s, R), ·2 , ǫ) ≤ 2R

ǫ + 1

d for ∀s ∈ Rd. Thus N

  • G, L2(Λm), r

2|S|A r + 1 d by |S∆| ≤ 2|S| and the compactness of S∆.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-37
SLIDE 37

Step-5: bound on R

Combining the obtained R (G, ω1:m) ≤ 8 √ 2 √m |G|L2(Λm)

  • log N(G, L2(Λm), r)dr,

|G|L2(Λm) ≤ 2, log

  • N
  • G, L2(Λm), r
  • ≤ d log

2|S|A r + 1

  • results

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-38
SLIDE 38

Step-5: bound on R

Combining the obtained R (G, ω1:m) ≤ 8 √ 2 √m |G|L2(Λm)

  • log N(G, L2(Λm), r)dr,

|G|L2(Λm) ≤ 2, log

  • N
  • G, L2(Λm), r
  • ≤ d log

2|S|A r + 1

  • results, we have (r ≤ 2)

R (G, ω1:m) ≤ 8 √ 2d √m 2

  • log

2|S|A + 2 r

  • dr.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-39
SLIDE 39

Step-5: bound on R

Using |S|A + 1 ≤ (|S| + 1)(A + 1) R (G, ω1:m) ≤ 8 √ 2d √m 2

  • log

2|S|A + 2 r

  • dr

≤ 8 √ 2d √m 2

  • log 2 (|S| + 1)

r dr + 2

  • log(A + 1)
  • = 16

√ 2d √m 1

  • log |S| + 1

r dr +

  • log(A + 1)
  • .

Applying 1 log a

r dr ≤ √log a + 1 2√log a (a > 1)

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-40
SLIDE 40

Step-5: bound on R

we get R (G, ω1:m) ≤ (1) 16 √ 2d √m

  • log(|S| + 1) +

1 2

  • log(|S| + 1)

+

  • log(A + 1)
  • .

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-41
SLIDE 41

Step-5: bound on R

we get R (G, ω1:m) ≤ (1) 16 √ 2d √m

  • log(|S| + 1) +

1 2

  • log(|S| + 1)

+

  • log(A + 1)
  • .

By the Jensen inequality

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-42
SLIDE 42

Step-5: bound on R

we get R (G, ω1:m) ≤ (1) 16 √ 2d √m

  • log(|S| + 1) +

1 2

  • log(|S| + 1)

+

  • log(A + 1)
  • .

By the Jensen inequality Eω1:m

  • log(A + 1) ≤
  • Eω1:m log(A + 1) ≤
  • log(Eω1:mA + 1),

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-43
SLIDE 43

Step-5: bound on R

we get R (G, ω1:m) ≤ (1) 16 √ 2d √m

  • log(|S| + 1) +

1 2

  • log(|S| + 1)

+

  • log(A + 1)
  • .

By the Jensen inequality Eω1:m

  • log(A + 1) ≤
  • Eω1:m log(A + 1) ≤
  • log(Eω1:mA + 1),

Eω1:mA ≤

  • 1

m

m

  • j=1

Eωj

  • ωj2

2

  • =: σ. ⇒

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-44
SLIDE 44

Step-5: bound on R

we get R (G, ω1:m) ≤ (1) 16 √ 2d √m

  • log(|S| + 1) +

1 2

  • log(|S| + 1)

+

  • log(A + 1)
  • .

By the Jensen inequality Eω1:m

  • log(A + 1) ≤
  • Eω1:m log(A + 1) ≤
  • log(Eω1:mA + 1),

Eω1:mA ≤

  • 1

m

m

  • j=1

Eωj

  • ωj2

2

  • =: σ. ⇒

Eω1:mR (G, ω1:m) ≤ (1), but with A → σ.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-45
SLIDE 45

Step-6: putting together

Result: k continuous, shift-invariant kernel; for any τ > 0, S = ∅ compact set, Λm

  • sup

x,y∈S

|ˆ k(x, y) − k(x, y)| ≥ h(d, |S|, σ) + √ 2τ √m

  • ≤ e−τ,

h(d, |S|, σ) := 32

  • 2d log(|S| + 1) + 32
  • 2d log(σ + 1) + 16
  • 2d

log(|S| + 1).

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-46
SLIDE 46

Step-6: putting together

Result: k continuous, shift-invariant kernel; for any τ > 0, S = ∅ compact set, Λm     sup

x,y∈S

|ˆ k(x, y) − k(x, y)| ≥ h(d, |S|, σ) + √ 2τ √m

  • :=ǫ

    ≤ e−τ, h(d, |S|, σ) := 32

  • 2d log(|S| + 1) + 32
  • 2d log(σ + 1) + 16
  • 2d

log(|S| + 1), Equivalently Λm

  • ˆ

k − k

  • S ≥ ǫ
  • ≤ e− [ǫ√m−h(d,|S|,σ)]2

2

.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-47
SLIDE 47

Discussion (Borel-Cantelli lemma)

A.s. convergence on compact sets: ˆ k

m→∞

− − − − → k at rate

  • log |S|

m .

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-48
SLIDE 48

Discussion (Borel-Cantelli lemma)

A.s. convergence on compact sets: ˆ k

m→∞

− − − − → k at rate

  • log |S|

m .

Growing diameter:

log |Sm| m m→∞

− − − − → 0 is enough (i.e., |Sm| = eo(m)) ↔ Old: |Sm| = o

  • m/ log m
  • .

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-49
SLIDE 49

Discussion (Borel-Cantelli lemma)

A.s. convergence on compact sets: ˆ k

m→∞

− − − − → k at rate

  • log |S|

m .

Growing diameter:

log |Sm| m m→∞

− − − − → 0 is enough (i.e., |Sm| = eo(m)) ↔ Old: |Sm| = o

  • m/ log m
  • .

Specifically:

asymptotically optimal result [Cs¨

  • rg˝
  • and Totik, 1983, Theorem 2] (if

ψ vanishes at ∞), at faster rate ⇒ even conv. in prob. would fail.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-50
SLIDE 50

Direct consequence: Lr guarantee (1 < r)

Idea: Note that ˆ k − kLr (S) =

  • S
  • S

|ˆ k(x, y) − k(x, y)|r dx dy 1

r

≤ ˆ k − kS×Svol2/r(S). vol(S) ≤ vol(B), where B :=

  • x ∈ Rd : x2 ≤ |S|

2

  • ,

vol(B) =

πd/2|S|d 2dΓ( d

2 +1). Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-51
SLIDE 51

Lr large deviation inequality

Under the previous assumptions: Λm  ˆ k − kLr (S) ≥

  • πd/2|S|d

2dΓ(d

2 + 1)

2/r h(d, |S|, σ) + √ 2τ √m   ≤ e−τ. In other words, ˆ k − kLr (S) = Oa.s.

  • m−1/2|S|2d/r

log |S|

  • .

For 2 ≤ r: direct Lr proof ⇒

  • log(|S|) factor can be discarded.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-52
SLIDE 52

Kernel derivatives

If supp(Λ) is bounded

k-proof can be extended (Lr as well), but Gaussian kernel:(

[Rahimi and Recht, 2007]’s proof:

Hoeffding inequality (boundedness!) + Lipschitzness,

Bernstein + Lipschitzness: handles ∂p,qk with

moment constraints on Λ (example: Gaussian kernel). slightly worse rates.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-53
SLIDE 53

Conclusion

Kernel + derivative approximations. Performance: uniform, Lr. Detailed finite-sample analysis, optimal rates. Paper (submitted to NIPS):

RFF: http://arxiv.org/abs/1506.02155, infD exp. fitting: http://arxiv.org/abs/1506.02564.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-54
SLIDE 54

Thank you for the attention!

Acknowledgments: This work was supported by the Gatsby Charitable Foundation.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-55
SLIDE 55

Cs¨

  • rg˝
  • , S. and Totik, V. (1983).

On how long interval is the empirical characteristic function uniformly consistent? Acta Sci. Math. (Szeged), 45:141–149. Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. In Neural Information Processing Systems (NIPS), pages 1177–1184. Rosasco, L., Santoro, M., Mosci, S., Verri, A., and Villa, S. (2010). A regularization approach to nonlinear variable selection. JMLR W&CP – International Conference on Artificial Intelligence and Statistics (AISTATS), 9:653–660. Rosasco, L., Villa, S., Mosci, S., Santoro, M., and Verri, A. (2013). Nonparametric sparsity and regularization. Journal of Machine Learning Research, 14:1665–1714.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-56
SLIDE 56

Shi, L., Guo, X., and Zhou, D.-X. (2010). Hermite learning with gradient data. Journal of Computational and Applied Mathematics, 233:3046–3059. Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Hyv¨ arinen, A., and Kumar, R. (2014). Density estimation in infinite dimensional exponential families. Technical report. http://arxiv.org/pdf/1312.3516.pdf. Steinwart, I. and Christmann, A. (2008). Support Vector Machines. Springer. Sutherland, D. and Schneider, J. (2015). On the error of random fourier features. In Conference on Uncertainty in Artificial Intelligience (UAI). Ying, Y., Wu, Q., and Campbell, C. (2012). Learning the coordinate gradients.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-57
SLIDE 57

Advances in Computational Mathematics, 37:355–378. Zhou, D.-X. (2008). Derivative reproducing properties for kernel methods in learning theory. Journal of Computational and Applied Mathematics, 220:456–463.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits
slide-58
SLIDE 58

Support of a measure

Ingredients:

(X, τ): topological space with a countable basis. B = σ(τ): sigma-algebra generated by τ. Λ: measure on (X, B).

Then supp(Λ) = ∪{A ∈ τ : Λ(A) = 0}, i.e., the complement of the union of all open Λ-null sets. Our choice: X = Rd.

Zolt´ an Szab´

  • Random Fourier Features – Limitations and Merits