A better k-means++ Algorithm via Local Search
ICML 2019
Silvio Lattanzi Google Research Christian Sohler Google Research
A better k-means++ Algorithm via Local Search Silvio Lattanzi - - PowerPoint PPT Presentation
A better k-means++ Algorithm via Local Search Silvio Lattanzi Christian Sohler Google Research Google Research ICML 2019 k-means Find a set of k centers X c C d 2 ( x, c ) ( X, C ) = min x X Constant approximation algorithms
ICML 2019
Silvio Lattanzi Google Research Christian Sohler Google Research
Constant approximation algorithms are known.
A better k-means++ Algorithm via Local Search
Find a set of k centers φ(X, C) = X
x∈X
min
c∈C d2(x, c)
Goal is to design a constant approximation algorithm that is efficient, easy to implement and has good experimental results.
Experimentally gives good results when combined with Lloyd’s algorithm.
A better k-means++ Algorithm via Local Search
Elegant and simple algorithm The solution is a approximation in expectation.
O(log k)
<latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit><latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit><latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit><latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit>David Arthur, Sergei Vassilvitskii: k-means++: the advantages of careful seeding. SODA 2007: 1027-1035
It returns a constant approximation and nice experimental results.
A better k-means++ Algorithm via Local Search
Elegant and simple algorithm The algorithm is a bit slow.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu: A local search approximation algorithm for k-means clustering. Comput. Geom. 28(2-3): 89-112 (2004)
It returns a constant approximation, it is slightly slower than k-means++ and has better experimental results.
A better k-means++ Algorithm via Local Search
Elegant and simple algorithm
A better k-means++ Algorithm via Local Search
Main idea is to adapt local search analysis to show that in every step with constant probability we reduce the cost of the solution by a multiplicative factor
✓ 1 − 1 100k ◆
<latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit><latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit><latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit><latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit>A better k-means++ Algorithm via Local Search
Datasets:
al., 2006)
between a protein and a native sequence (KDD)
quantum physic task (KDD)
A better k-means++ Algorithm via Local Search
KDD-BIO RNA KDD-PHY
A better k-means++ Algorithm via Local Search