Statistical and Computational Trade-Offs in Kernel K-Means Daniele - - PowerPoint PPT Presentation

statistical and computational trade offs in kernel k means
SMART_READER_LITE
LIVE PREVIEW

Statistical and Computational Trade-Offs in Kernel K-Means Daniele - - PowerPoint PPT Presentation

Statistical and Computational Trade-Offs in Kernel K-Means Daniele Calandriello, Lorenzo Rosasco LCSL - IIT/MIT and Universit` a di Genova NeurIPS, December 2018 K-Means Given n points, partition them into k clusters. n 1 j = 1 ,..., k


slide-1
SLIDE 1

NeurIPS, December 2018

Statistical and Computational Trade-Offs in Kernel K-Means

Daniele Calandriello, Lorenzo Rosasco LCSL - IIT/MIT and Universit` a di Genova

slide-2
SLIDE 2

K-Means

Given n points, partition them into k clusters.

  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k xi − cj2 Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 2/7

slide-3
SLIDE 3

K-Means

Given n points, partition them into k clusters.

  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k xi − cj2

Problem: only linear separation

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 2/7

slide-4
SLIDE 4

Kernel K-Means

Given n points, partition them into k clusters.

  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k

  • ϕ(xi) − cj
  • 2

Feature map ϕ(·) : Rd → RD

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 2/7

slide-5
SLIDE 5

Kernel K-Means

Given n points, partition them into k clusters.

  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k

  • ϕ(xi) − cj
  • 2

Feature map ϕ(·) : Rd → RD (e.g., ϕ([x, y]) = [x, y, x2 + y 2])

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 2/7

slide-6
SLIDE 6

Computing Kernel K-Means

  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k ϕ(xi) − cj2

ϕ(xi) − ϕ(xj)2 = ϕ(xi)2 + ϕ(xj)2 − 2 ϕ(xi)

Tϕ(xj)

  • K(xi, xj)

kernel

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 3/7

slide-7
SLIDE 7

Computing Kernel K-Means

  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k ϕ(xi) − cj2

ϕ(xi) − ϕ(xj)2 = ϕ(xi)2 + ϕ(xj)2 − 2 ϕ(xi)

Tϕ(xj)

  • K(xi, xj)

kernel K(x3, x1) K =

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 3/7

slide-8
SLIDE 8

Computing Kernel K-Means

  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k ϕ(xi) − cj2

ϕ(xi) − ϕ(xj)2 = ϕ(xi)2 + ϕ(xj)2 − 2 ϕ(xi)

Tϕ(xj)

  • K(xi, xj)

kernel K(x3, x1) K = Space n2 , Construct K n2 , Iter. time: n2

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 3/7

slide-9
SLIDE 9

K-Means with Uniform Nystr¨

  • m Embedding
  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k

  • ϕm (xi) − cj
  • 2

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 4/7

slide-10
SLIDE 10

K-Means with Uniform Nystr¨

  • m Embedding
  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k

  • ϕm (xi) − cj
  • 2

ϕm(xi) − ϕm(xj)2 = ϕm(xi)2 + ϕm(xj)2 − 2 ϕm(xi)

Tϕm(xj)

  • Km(xi, xj)

Nystr¨

  • m approximation

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 4/7

slide-11
SLIDE 11

K-Means with Uniform Nystr¨

  • m Embedding
  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k

  • ϕm (xi) − cj
  • 2

ϕm(xi) − ϕm(xj)2 = ϕm(xi)2 + ϕm(xj)2 − 2 ϕm(xi)

Tϕm(xj)

  • Km(xi, xj)

Nystr¨

  • m approximation

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 4/7

slide-12
SLIDE 12

K-Means with Uniform Nystr¨

  • m Embedding
  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k

  • ϕm (xi) − cj
  • 2

ϕm(xi) − ϕm(xj)2 = ϕm(xi)2 + ϕm(xj)2 − 2 ϕm(xi)

Tϕm(xj)

  • Km(xi, xj)

Nystr¨

  • m approximation

Space

nm n2, Construct Km

nm2 n2, Iter. time:

nmk n2

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 4/7

slide-13
SLIDE 13

K-Means with Uniform Nystr¨

  • m Embedding
  • C =

min

[c1,...,cj]

1 n

n

  • i=1

min

j=1,...,k

  • ϕm (xi) − cj
  • 2

ϕm(xi) − ϕm(xj)2 = ϕm(xi)2 + ϕm(xj)2 − 2 ϕm(xi)

Tϕm(xj)

  • Km(xi, xj)

Nystr¨

  • m approximation

Space

nm n2, Construct Km

nm2 n2, Iter. time:

nmk n2

How to choose m for optimal statistical vs computational trade-off?

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 4/7

slide-14
SLIDE 14

Main result

Let xi ∼ µ and the test error E( C) = Ex∼µ [minj=1,...,k ϕ(x) − cj2]

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 5/7

slide-15
SLIDE 15

Main result

Let xi ∼ µ and the test error E( C) = Ex∼µ [minj=1,...,k ϕ(x) − cj2]

Theorem

E( C) ≤ O(k/√n) statistical error + O(k/m) computational error

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 5/7

slide-16
SLIDE 16

Main result

Let xi ∼ µ and the test error E( C) = Ex∼µ [minj=1,...,k ϕ(x) − cj2]

Theorem

E( C) ≤ O(k/√n) statistical error + O(k/m) computational error m = √n is sufficient for k/√n rate! Previous results require m = n

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 5/7

slide-17
SLIDE 17

Main result

Let xi ∼ µ and the test error E( C) = Ex∼µ [minj=1,...,k ϕ(x) − cj2]

Theorem

E( C) ≤ O(k/√n) statistical error + O(k/m) computational error m = √n is sufficient for k/√n rate! Previous results require m = n Space Construct K/ Km

  • Iter. time

Kernel k-means n2 n2 n2 Nystr¨

  • m k-means

n√n n2 n√nk

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 5/7

slide-18
SLIDE 18

MNIST-60k: test cost vs embedding size m

E( C) m √n

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 6/7

slide-19
SLIDE 19

Recap

” ” designed by freepick from Flaticon

Improved statistical vs computational trade-off for k-means First computation saving with no loss of statistical accuracy Similar results for k-means++ (efficient) Open question: fast O(k/n) rate?

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 7/7

slide-20
SLIDE 20

Recap

” ” designed by freepick from Flaticon

Improved statistical vs computational trade-off for k-means First computation saving with no loss of statistical accuracy Similar results for k-means++ (efficient) Open question: fast O(k/n) rate? Taking suggestions at poster #129

Statistical and Computational Trade-Offs in Kernel K-Means

NeurIPS 2018 - 7/7