A better k-means++ Algorithm via Local Search Silvio Lattanzi - - PowerPoint PPT Presentation

a better k means algorithm via local search
SMART_READER_LITE
LIVE PREVIEW

A better k-means++ Algorithm via Local Search Silvio Lattanzi - - PowerPoint PPT Presentation

A better k-means++ Algorithm via Local Search Silvio Lattanzi Christian Sohler Google Research Google Research ICML 2019 k-means Find a set of k centers X c C d 2 ( x, c ) ( X, C ) = min x X Constant approximation algorithms


slide-1
SLIDE 1

A better k-means++ Algorithm via Local Search

ICML 2019

Silvio Lattanzi Google Research Christian Sohler
 Google Research

slide-2
SLIDE 2

k-means

Constant approximation algorithms are known.

A better k-means++ Algorithm via Local Search

Find a set of k centers φ(X, C) = X

x∈X

min

c∈C d2(x, c)

Goal is to design a constant approximation algorithm that is efficient, easy to implement and has good experimental results.

slide-3
SLIDE 3

k-means++ seeding

Experimentally gives good results when combined with Lloyd’s algorithm.

A better k-means++ Algorithm via Local Search

Elegant and simple algorithm The solution is a approximation in expectation.

O(log k)

<latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit><latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit><latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit><latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit>

David Arthur, Sergei Vassilvitskii: k-means++: the advantages of careful seeding. SODA 2007: 1027-1035

slide-4
SLIDE 4

Local search

It returns a constant approximation and nice experimental results.

A better k-means++ Algorithm via Local Search

Elegant and simple algorithm The algorithm is a bit slow.

Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu: 
 A local search approximation algorithm for k-means clustering. Comput. Geom. 28(2-3): 89-112 (2004)

slide-5
SLIDE 5

Combining the two algorithms

It returns a constant approximation, it is slightly slower than k-means++ and has better experimental results.

A better k-means++ Algorithm via Local Search

Elegant and simple algorithm

slide-6
SLIDE 6

Main theoretical result

A better k-means++ Algorithm via Local Search

Main idea is to adapt local search analysis to show that in every step with constant probability we reduce the cost of the solution by a multiplicative factor

✓ 1 − 1 100k ◆

<latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit><latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit><latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit><latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit>
slide-7
SLIDE 7

Experimental results

A better k-means++ Algorithm via Local Search

Datasets:

  • RNA: 8 features from 488565 RNA input sequence pairs (Uzilov et

al., 2006)

  • KDD-BIO: 145751 samples with 74 features measuring the match

between a protein and a native sequence (KDD)

  • KDD-PHY: 100000 samples with 78 features representing a

quantum physic task (KDD)

slide-8
SLIDE 8

Experimental results

A better k-means++ Algorithm via Local Search

KDD-BIO RNA KDD-PHY

slide-9
SLIDE 9

Thanks

A better k-means++ Algorithm via Local Search