statistical and computational trade offs in kernel k means
play

Statistical and Computational Trade-Offs in Kernel K-Means Daniele - PowerPoint PPT Presentation

Statistical and Computational Trade-Offs in Kernel K-Means Daniele Calandriello, Lorenzo Rosasco LCSL - IIT/MIT and Universit` a di Genova NeurIPS, December 2018 K-Means Given n points, partition them into k clusters. n 1 j = 1 ,..., k


  1. Statistical and Computational Trade-Offs in Kernel K-Means Daniele Calandriello, Lorenzo Rosasco LCSL - IIT/MIT and Universit` a di Genova NeurIPS, December 2018

  2. K-Means Given n points, partition them into k clusters. n � 1 j = 1 ,..., k � x i − c j � 2 � C = min min n [ c 1 ,..., c j ] i = 1 Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 2/7

  3. K-Means Given n points, partition them into k clusters. n � 1 j = 1 ,..., k � x i − c j � 2 � C = min min n [ c 1 ,..., c j ] i = 1 Problem: only linear separation Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 2/7

  4. Kernel K-Means Given n points, partition them into k clusters. n � � � 1 � 2 � � ϕ ( x i ) − c j C = min min n [ c 1 ,..., c j ] j = 1 ,..., k i = 1 Feature map ϕ ( · ) : R d → R D Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 2/7

  5. Kernel K-Means Given n points, partition them into k clusters. n � � � 1 � 2 � � ϕ ( x i ) − c j C = min min n [ c 1 ,..., c j ] j = 1 ,..., k i = 1 Feature map ϕ ( · ) : R d → R D (e.g., ϕ ([ x , y ]) = [ x , y , x 2 + y 2 ] ) Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 2/7

  6. Computing Kernel K-Means � n 1 � j = 1 ,..., k � ϕ ( x i ) − c j � 2 C = min min n [ c 1 ,..., c j ] i = 1 � ϕ ( x i ) − ϕ ( x j ) � 2 = � ϕ ( x i ) � 2 + � ϕ ( x j ) � 2 − 2 ϕ ( x i ) T ϕ ( x j ) � �� � K ( x i , x j ) kernel Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 3/7

  7. Computing Kernel K-Means K ( x 3 , x 1 ) � n 1 � j = 1 ,..., k � ϕ ( x i ) − c j � 2 C = min min n [ c 1 ,..., c j ] i = 1 K = � ϕ ( x i ) − ϕ ( x j ) � 2 = � ϕ ( x i ) � 2 + � ϕ ( x j ) � 2 − 2 ϕ ( x i ) T ϕ ( x j ) � �� � K ( x i , x j ) kernel Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 3/7

  8. Computing Kernel K-Means K ( x 3 , x 1 ) � n 1 � j = 1 ,..., k � ϕ ( x i ) − c j � 2 C = min min n [ c 1 ,..., c j ] i = 1 K = � ϕ ( x i ) − ϕ ( x j ) � 2 = � ϕ ( x i ) � 2 + � ϕ ( x j ) � 2 − 2 ϕ ( x i ) T ϕ ( x j ) � �� � K ( x i , x j ) kernel Space n 2 , Construct K n 2 , Iter. time: n 2 Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 3/7

  9. K-Means with Uniform Nystr¨ om Embedding � n � � 1 � 2 � � ϕ m ( x i ) − c j C = min min n j = 1 ,..., k [ c 1 ,..., c j ] i = 1 Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 4/7

  10. K-Means with Uniform Nystr¨ om Embedding � n � � 1 � 2 � � ϕ m ( x i ) − c j C = min min n j = 1 ,..., k [ c 1 ,..., c j ] i = 1 � ϕ m ( x i ) − ϕ m ( x j ) � 2 = � ϕ m ( x i ) � 2 + � ϕ m ( x j ) � 2 − 2 ϕ m ( x i ) T ϕ m ( x j ) � �� � K m ( x i , x j ) Nystr¨ om approximation Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 4/7

  11. K-Means with Uniform Nystr¨ om Embedding � n � � 1 � 2 � � ϕ m ( x i ) − c j C = min min n j = 1 ,..., k [ c 1 ,..., c j ] i = 1 � ϕ m ( x i ) − ϕ m ( x j ) � 2 = � ϕ m ( x i ) � 2 + � ϕ m ( x j ) � 2 − 2 ϕ m ( x i ) T ϕ m ( x j ) � �� � K m ( x i , x j ) Nystr¨ om approximation Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 4/7

  12. K-Means with Uniform Nystr¨ om Embedding � n � � 1 � 2 � � ϕ m ( x i ) − c j C = min min n j = 1 ,..., k [ c 1 ,..., c j ] i = 1 � ϕ m ( x i ) − ϕ m ( x j ) � 2 = � ϕ m ( x i ) � 2 + � ϕ m ( x j ) � 2 − 2 ϕ m ( x i ) T ϕ m ( x j ) � �� � K m ( x i , x j ) Nystr¨ om approximation nm 2 nm nmk � ✒ ✒ � � ✒ Space � n 2 , Construct � n 2 , Iter. time: � K m � n 2 Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 4/7

  13. K-Means with Uniform Nystr¨ om Embedding � n � � 1 � 2 � � ϕ m ( x i ) − c j C = min min n j = 1 ,..., k [ c 1 ,..., c j ] i = 1 How to choose m for optimal statistical vs computational trade-off? � ϕ m ( x i ) − ϕ m ( x j ) � 2 = � ϕ m ( x i ) � 2 + � ϕ m ( x j ) � 2 − 2 ϕ m ( x i ) T ϕ m ( x j ) � �� � K m ( x i , x j ) Nystr¨ om approximation nm 2 nm nmk � ✒ ✒ � � ✒ Space � n 2 , Construct � n 2 , Iter. time: � K m � n 2 Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 4/7

  14. Main result Let x i ∼ µ and the test error E ( � c j � 2 ] C ) = E x ∼ µ [min j = 1 ,..., k � ϕ ( x ) − � Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 5/7

  15. Main result Let x i ∼ µ and the test error E ( � c j � 2 ] C ) = E x ∼ µ [min j = 1 ,..., k � ϕ ( x ) − � Theorem O ( k / √ n ) E ( � C ) ≤ + O ( k / m ) statistical error computational error Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 5/7

  16. Main result Let x i ∼ µ and the test error E ( � c j � 2 ] C ) = E x ∼ µ [min j = 1 ,..., k � ϕ ( x ) − � Theorem O ( k / √ n ) E ( � C ) ≤ + O ( k / m ) statistical error computational error m = √ n is sufficient for k / √ n rate! Previous results require m = n Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 5/7

  17. Main result Let x i ∼ µ and the test error E ( � c j � 2 ] C ) = E x ∼ µ [min j = 1 ,..., k � ϕ ( x ) − � Theorem O ( k / √ n ) E ( � C ) ≤ + O ( k / m ) statistical error computational error m = √ n is sufficient for k / √ n rate! Previous results require m = n Construct K / � Space K m Iter. time n 2 n 2 n 2 Kernel k -means n √ n n √ nk n 2 Nystr¨ om k -means Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 5/7

  18. MNIST-60k: test cost vs embedding size m C ) E ( � √ n m Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 6/7

  19. Recap Improved statistical vs computational trade-off for k -means First computation saving with no loss of statistical accuracy Similar results for k -means++ (efficient) Open question: fast O ( k / n ) rate? ” designed by freepick from Flaticon ” Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 7/7

  20. Recap Improved statistical vs computational trade-off for k -means First computation saving with no loss of statistical accuracy Similar results for k -means++ (efficient) Open question: fast O ( k / n ) rate? ” designed by freepick from Flaticon Taking suggestions at poster #129 ” Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 7/7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend