NeurIPS, December 2018
Statistical and Computational Trade-Offs in Kernel K-Means Daniele - - PowerPoint PPT Presentation
Statistical and Computational Trade-Offs in Kernel K-Means Daniele - - PowerPoint PPT Presentation
Statistical and Computational Trade-Offs in Kernel K-Means Daniele Calandriello, Lorenzo Rosasco LCSL - IIT/MIT and Universit` a di Genova NeurIPS, December 2018 K-Means Given n points, partition them into k clusters. n 1 j = 1 ,..., k
K-Means
Given n points, partition them into k clusters.
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k xi − cj2 Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 2/7
K-Means
Given n points, partition them into k clusters.
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k xi − cj2
Problem: only linear separation
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 2/7
Kernel K-Means
Given n points, partition them into k clusters.
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k
- ϕ(xi) − cj
- 2
Feature map ϕ(·) : Rd → RD
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 2/7
Kernel K-Means
Given n points, partition them into k clusters.
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k
- ϕ(xi) − cj
- 2
Feature map ϕ(·) : Rd → RD (e.g., ϕ([x, y]) = [x, y, x2 + y 2])
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 2/7
Computing Kernel K-Means
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k ϕ(xi) − cj2
ϕ(xi) − ϕ(xj)2 = ϕ(xi)2 + ϕ(xj)2 − 2 ϕ(xi)
Tϕ(xj)
- K(xi, xj)
kernel
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 3/7
Computing Kernel K-Means
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k ϕ(xi) − cj2
ϕ(xi) − ϕ(xj)2 = ϕ(xi)2 + ϕ(xj)2 − 2 ϕ(xi)
Tϕ(xj)
- K(xi, xj)
kernel K(x3, x1) K =
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 3/7
Computing Kernel K-Means
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k ϕ(xi) − cj2
ϕ(xi) − ϕ(xj)2 = ϕ(xi)2 + ϕ(xj)2 − 2 ϕ(xi)
Tϕ(xj)
- K(xi, xj)
kernel K(x3, x1) K = Space n2 , Construct K n2 , Iter. time: n2
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 3/7
K-Means with Uniform Nystr¨
- m Embedding
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k
- ϕm (xi) − cj
- 2
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 4/7
K-Means with Uniform Nystr¨
- m Embedding
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k
- ϕm (xi) − cj
- 2
ϕm(xi) − ϕm(xj)2 = ϕm(xi)2 + ϕm(xj)2 − 2 ϕm(xi)
Tϕm(xj)
- Km(xi, xj)
Nystr¨
- m approximation
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 4/7
K-Means with Uniform Nystr¨
- m Embedding
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k
- ϕm (xi) − cj
- 2
ϕm(xi) − ϕm(xj)2 = ϕm(xi)2 + ϕm(xj)2 − 2 ϕm(xi)
Tϕm(xj)
- Km(xi, xj)
Nystr¨
- m approximation
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 4/7
K-Means with Uniform Nystr¨
- m Embedding
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k
- ϕm (xi) − cj
- 2
ϕm(xi) − ϕm(xj)2 = ϕm(xi)2 + ϕm(xj)2 − 2 ϕm(xi)
Tϕm(xj)
- Km(xi, xj)
Nystr¨
- m approximation
Space
- ✒
nm n2, Construct Km
- ✒
nm2 n2, Iter. time:
- ✒
nmk n2
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 4/7
K-Means with Uniform Nystr¨
- m Embedding
- C =
min
[c1,...,cj]
1 n
n
- i=1
min
j=1,...,k
- ϕm (xi) − cj
- 2
ϕm(xi) − ϕm(xj)2 = ϕm(xi)2 + ϕm(xj)2 − 2 ϕm(xi)
Tϕm(xj)
- Km(xi, xj)
Nystr¨
- m approximation
Space
- ✒
nm n2, Construct Km
- ✒
nm2 n2, Iter. time:
- ✒
nmk n2
How to choose m for optimal statistical vs computational trade-off?
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 4/7
Main result
Let xi ∼ µ and the test error E( C) = Ex∼µ [minj=1,...,k ϕ(x) − cj2]
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 5/7
Main result
Let xi ∼ µ and the test error E( C) = Ex∼µ [minj=1,...,k ϕ(x) − cj2]
Theorem
E( C) ≤ O(k/√n) statistical error + O(k/m) computational error
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 5/7
Main result
Let xi ∼ µ and the test error E( C) = Ex∼µ [minj=1,...,k ϕ(x) − cj2]
Theorem
E( C) ≤ O(k/√n) statistical error + O(k/m) computational error m = √n is sufficient for k/√n rate! Previous results require m = n
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 5/7
Main result
Let xi ∼ µ and the test error E( C) = Ex∼µ [minj=1,...,k ϕ(x) − cj2]
Theorem
E( C) ≤ O(k/√n) statistical error + O(k/m) computational error m = √n is sufficient for k/√n rate! Previous results require m = n Space Construct K/ Km
- Iter. time
Kernel k-means n2 n2 n2 Nystr¨
- m k-means
n√n n2 n√nk
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 5/7
MNIST-60k: test cost vs embedding size m
E( C) m √n
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 6/7
Recap
” ” designed by freepick from Flaticon
Improved statistical vs computational trade-off for k-means First computation saving with no loss of statistical accuracy Similar results for k-means++ (efficient) Open question: fast O(k/n) rate?
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 7/7
Recap
” ” designed by freepick from Flaticon
Improved statistical vs computational trade-off for k-means First computation saving with no loss of statistical accuracy Similar results for k-means++ (efficient) Open question: fast O(k/n) rate? Taking suggestions at poster #129
Statistical and Computational Trade-Offs in Kernel K-Means
NeurIPS 2018 - 7/7