a better k means algorithm via local search
play

A better k-means++ Algorithm via Local Search Silvio Lattanzi - PowerPoint PPT Presentation

A better k-means++ Algorithm via Local Search Silvio Lattanzi Christian Sohler Google Research Google Research ICML 2019 k-means Find a set of k centers X c C d 2 ( x, c ) ( X, C ) = min x X Constant approximation algorithms


  1. A better k-means++ Algorithm via Local Search Silvio Lattanzi Christian Sohler 
 Google Research Google Research ICML 2019

  2. k-means Find a set of k centers X c ∈ C d 2 ( x, c ) φ ( X, C ) = min x ∈ X Constant approximation algorithms are known. Goal is to design a constant approximation algorithm that is efficient, easy to implement and has good experimental results. A better k-means++ Algorithm via Local Search

  3. <latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit> <latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit> <latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit> <latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit> k-means++ seeding Elegant and simple algorithm Experimentally gives good results when combined with Lloyd’s algorithm. The solution is a approximation in expectation. O (log k ) David Arthur, Sergei Vassilvitskii: k-means++: the advantages of careful seeding. SODA 2007: 1027-1035 A better k-means++ Algorithm via Local Search

  4. Local search Elegant and simple algorithm It returns a constant approximation and nice experimental results. The algorithm is a bit slow. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu: 
 A local search approximation algorithm for k-means clustering. Comput. Geom. 28(2-3): 89-112 (2004) A better k-means++ Algorithm via Local Search

  5. Combining the two algorithms Elegant and simple algorithm It returns a constant approximation, it is slightly slower than k-means++ and has better experimental results. A better k-means++ Algorithm via Local Search

  6. <latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit> <latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit> <latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit> <latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit> Main theoretical result Main idea is to adapt local search analysis to show that in every step with constant probability we reduce the cost of the solution by a ✓ ◆ 1 multiplicative factor 1 − 100 k A better k-means++ Algorithm via Local Search

  7. Experimental results Datasets: - RNA : 8 features from 488565 RNA input sequence pairs (Uzilov et al., 2006) - KDD-BIO : 145751 samples with 74 features measuring the match between a protein and a native sequence (KDD) - KDD-PHY : 100000 samples with 78 features representing a quantum physic task (KDD) A better k-means++ Algorithm via Local Search

  8. Experimental results KDD-BIO RNA KDD-PHY A better k-means++ Algorithm via Local Search

  9. Thanks A better k-means++ Algorithm via Local Search

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend