clustering k means and
play

Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC Most - PowerPoint PPT Presentation

Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC Most slides courtesy Hamed Pirsiavash Recap from last time Geometric Rationale of LDiscA & PCA Objective: to rigidly rotate the axes of the D-dimensional space to new


  1. Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC Most slides courtesy Hamed Pirsiavash

  2. Recap from last time…

  3. Geometric Rationale of LDiscA & PCA Objective: to rigidly rotate the axes of the D-dimensional space to new positions (principal axes): ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance, .... , and axis D has the lowest variance covariance among each pair of the principal axes is zero (the principal axes are uncorrelated) Courtesy Antano Ε½ilinsko

  4. L-Dimensional PCA 1. Compute mean 𝜈 , priors, and common covariance Ξ£ 𝜈 = 1 Ξ£ = 1 𝑦 𝑗 βˆ’ 𝜈 π‘ˆ 𝑂 ෍ 𝑦 𝑗 𝑂 ෍ 𝑦 𝑗 βˆ’ 𝜈 𝑗 𝑗:𝑧 𝑗 =𝑙 2. Sphere the data (zero-mean, unit covariance) 3. Compute the (top L) eigenvectors, from sphere-d data, via V π‘Œ βˆ— = π‘ŠπΈ 𝐢 π‘Š π‘ˆ 4. Project the data

  5. Outline Clustering basics K-means: basic algorithm & extensions Cluster evaluation Non-parametric mode finding: density estimation Graph & spectral clustering Hierarchical clustering K-Nearest Neighbor

  6. Clustering Basic idea: group together similar instances Example: 2D points

  7. Clustering Basic idea: group together similar instances Example: 2D points One option: small Euclidean distance (squared) Clustering results are crucially dependent on the measure of similarity (or distance) between points to be clustered

  8. Clustering algorithms Simple clustering: organize elements into k groups K-means Mean shift Spectral clustering Hierarchical clustering: organize elements into a hierarchy Bottom up - agglomerative Top down - divisive

  9. Clustering examples: Image Segmentation image credit: Berkeley segmentation benchmark

  10. Clustering examples: News Feed Clustering news articles

  11. Clustering examples: Image Search Clustering queries

  12. Outline Clustering basics K-means: basic algorithm & extensions Cluster evaluation Non-parametric mode finding: density estimation Graph & spectral clustering Hierarchical clustering K-Nearest Neighbor

  13. Clustering using k-means Data: D-dimensional observations (x 1 , x 2 , …, x n ) Goal: partition the n observations into k (≀ n) sets S = {S 1 , S 2 , …, S k } so as to minimize the within-cluster sum of squared distances cluster center

  14. Lloyd’s algorithm for k -means Initialize k centers by picking k points randomly among all the points Repeat till convergence (or max iterations) Assign each point to the nearest center (assignment step) Estimate the mean of each group (update step) https://www.csee.umbc.edu/courses/graduate/678/spring18/kmeans/

  15. Properties of the Lloyd’s algorithm Guaranteed to converge in a finite number of iterations objective decreases monotonically l ocal minima if the partitions don’t change. finitely many partitions β†’ k-means algorithm must converge Running time per iteration Assignment step: O(NKD) Computing cluster mean: O(ND) Issues with the algorithm: Worst case running time is super-polynomial in input size No guarantees about global optimality Optimal clustering even for 2 clusters is NP-hard [Aloise et al., 09]

  16. k-means++ algorithm k-means++ algorithm for initialization: 1.Chose one center uniformly at A way to pick the good initial random among all the points centers 2.For each point x , compute Intuition: spread out the k D( x ), the distance between x initial cluster centers and the nearest center that has already been chosen The algorithm proceeds normally once the centers are initialized 3.Chose one new data point at random as a new center, using a weighted probability [Arthur and Vassilvitskii’07] The distribution where a point x is approximation quality is O(log k) in chosen with a probability expectation proportional to D( x ) 2 4.Repeat Steps 2 and 3 until k centers have been chosen

  17. k-means for image segmentation K=2 K=3 Grouping pixels based on intensity similarity feature space: intensity value (1D) 18

  18. Outline Clustering basics K-means: basic algorithm & extensions Cluster evaluation Non-parametric mode finding: density estimation Graph & spectral clustering Hierarchical clustering K-Nearest Neighbor

  19. Clustering Evaluation (Classification: accuracy, recall, precision, F-score) Greedy mapping: one-to-one Optimistic mapping: many-to-one Rigorous/information theoretic: V-measure

  20. Clustering Evaluation: One-to-One Each modeled cluster can at most only map to one gold tag type, and vice versa Greedily select the mapping to maximize accuracy

  21. Clustering Evaluation: Many (classes)-to-One (cluster) Each modeled cluster can map to at most one gold tag types, but multiple clusters can map to the same gold tag For each cluster: select the majority tag

  22. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness 𝐼 π‘Œ = βˆ’ ෍ π‘ž(𝑦 𝑗 ) log π‘ž 𝑦 𝑗 𝑗 entropy

  23. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness 𝐼 π‘Œ = βˆ’ ෍ π‘ž(𝑦 𝑗 ) log π‘ž 𝑦 𝑗 𝑗 entropy entropy(point mass) = 0 entropy(uniform) = log K

  24. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): k βž” cluster harmonic mean of homogeneity c βž” gold class and completeness 1, 𝐼 𝐿, 𝐷 = 0 Homogeneity: how well does 1 βˆ’ 𝐼 𝐷 𝐿 homogeneity = ࡞ , o/w each gold class map to a single 𝐼 𝐷 cluster? β€œIn order to satisfy our homogeneity criteria, a clustering must assign only those datapoints relative entropy is maximized when a cluster that are members of a single class to a single provides no new info. on class grouping β†’ cluster. That is, the class distribution within not very homogeneous each cluster should be skewed to a single class, that is, zero entropy.”

  25. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): k βž” cluster harmonic mean of homogeneity c βž” gold class and completeness Completeness: how well does 1, 𝐼 𝐿, 𝐷 = 0 each learned cluster cover a 1 βˆ’ 𝐼 𝐿 𝐷 completeness = ࡞ , o/w 𝐼 𝐿 single gold class? β€œIn order to satisfy the completeness criteria, a clustering must assign all of those datapoints relative entropy is maximized when each class that are members of a single class to a single is represented uniformly (relatively) β†’ cluster. β€œ not very complete

  26. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): k βž” cluster harmonic mean of homogeneity c βž” gold class and completeness Homogeneity: how well does 1, 𝐼 𝐿, 𝐷 = 0 1 βˆ’ 𝐼 𝐷 𝐿 each gold class map to a single homogeneity = ࡞ , o/w 𝐼 𝐷 cluster? Completeness: how well does 1, 𝐼 𝐿, 𝐷 = 0 each learned cluster cover a 1 βˆ’ 𝐼 𝐿 𝐷 completeness = ࡞ , o/w single gold class? 𝐼 𝐿

  27. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness 𝑏 𝑑𝑙 = # elements of class c in cluster k Homogeneity: how well does each gold class map to a single cluster? Completeness: how well does each learned 1, 𝐼 𝐿, 𝐷 = 0 cluster cover a single gold class? 1 βˆ’ 𝐼 𝐷 𝐿 homogeneity = ࡞ , o/w 𝐼 𝐷 𝐷 𝑏 𝑑𝑙 𝐿 𝑏 𝑑𝑙 𝐼 𝐷 𝐿) = βˆ’ ෍ ෍ 𝑂 log Οƒ 𝑑′ 𝑏 𝑑′𝑙 1, 𝐼 𝐿, 𝐷 = 0 𝑙 𝑑 1 βˆ’ 𝐼 𝐿 𝐷 completeness = ࡞ 𝐿 𝑏 𝑑𝑙 𝐷 , o/w 𝑏 𝑑𝑙 𝐼 𝐿 𝐼 𝐿 𝐷) = βˆ’ ෍ ෍ 𝑂 log Οƒ 𝑙′ 𝑏 𝑑𝑙′ 𝑑 𝑙

  28. Clustering Evaluation: V-Measure clusters Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and classes completeness Homogeneity: how well does each gold class map to a single cluster? Completeness: how well does each learned cluster cover a single gold class? a ck K=1 K=2 K=3 𝐷 𝑏 𝑑𝑙 𝐿 𝑏 𝑑𝑙 3 1 1 𝐼 𝐷 𝐿) = βˆ’ ෍ ෍ 𝑂 log Οƒ 𝑑′ 𝑏 𝑑′𝑙 1 1 3 𝑙 𝑑 𝐿 𝑏 𝑑𝑙 1 3 1 𝐷 𝑏 𝑑𝑙 𝐼 𝐿 𝐷) = βˆ’ ෍ ෍ 𝑂 log Οƒ 𝑙′ 𝑏 𝑑𝑙′ 𝑑 𝑙 Homogeneity = Completeness = V-Measure=0.14

  29. Outline Clustering basics K-means: basic algorithm & extensions Cluster evaluation Non-parametric mode finding: density estimation Graph & spectral clustering Hierarchical clustering K-Nearest Neighbor

  30. Clustering using density estimation One issue with k-means is that it is sometimes hard to pick k The mean shift algorithm seeks modes or local maxima of density in the feature space Mean shift automatically determines the number of clusters Kernel density estimator Small h implies more modes (bumpy distribution)

  31. Mean shift algorithm For each point x i : find m i , the amount to shift each point x i to its centroid return {m i }

  32. Mean shift algorithm For each point x i : set m i = x i while not converged: compute weighted average of neighboring point return {m i }

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend