clustering algorithms
play

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter - PowerPoint PPT Presentation

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018 Clustering Feature 2 Feature 1 Clustering cluster #1 Feature 2 cluster #2 Feature 1 Clustering Why should we look for clusters? cluster #1 Feature


  1. Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

  2. Clustering Feature 2 Feature 1

  3. Clustering cluster #1 Feature 2 cluster #2 Feature 1

  4. Clustering Why should we look for clusters? cluster #1 Feature 2 cluster #2 Feature 1

  5. Clustering

  6. K-means Input: measured features, and the number of clusters, k . The algorithm will classify all the objects in the sample into k clusters. Feature 2 Feature 1

  7. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 Feature 1

  8. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 Two centroids are randomly placed Feature 1

  9. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 The objects are associated to the closest cluster centroid (Euclidean distance). Feature 1

  10. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 New cluster centroids are computed using the average location of the cluster members. Feature 1

  11. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 The objects are associated to the closest cluster centroid (Euclidean distance). Feature 1

  12. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 The process stops when the objects that are associated with a given class do not change. Feature 1

  13. The anatomy of K-means cluster Internal choices and/or internal cost function: centroids (I) Initial centroids are randomly selected from the set of examples. (II) The global cost function that is minimized by K-means: cluster Euclidean members distance

  14. The anatomy of K-means cluster Internal choices and/or internal cost function: centroids (I) Initial centroids are randomly selected from the set of examples. (II) The global cost function that is minimized by K-means: cluster Euclidean members distance k=3, and two di ff erent random placements of centroids

  15. The anatomy of K-means Input dataset: a list of objects with measured features. For which datasets should we use K-means? Feature 2 Feature 2 Feature 1 Feature 1

  16. The anatomy of K-means Input dataset: a list of objects with measured features. What happens when we have an outlier in the dataset? Feature 2 outlier! Feature 1

  17. The anatomy of K-means Input dataset: a list of objects with measured features. What happens when we have an outlier in the dataset? Feature 2 outlier! Feature 1

  18. The anatomy of K-means Input dataset: a list of objects with measured features. What happens when the features have di ff erent physical units? input dataset K-means output

  19. The anatomy of K-means Input dataset: a list of objects with measured features. What happens when the features have di ff erent physical units? How can we avoid this? input dataset K-means output

  20. The anatomy of K-means Hyper-parameters: the number of clusters, k. Can we find the optimal k using the cost function? k=2 k=3 k=5

  21. The anatomy of K-means Hyper-parameters: the number of clusters, k. Can we find the optimal k using the cost function? k=2 k=3 k=5 Minimal cost function Elbow Number of clusters

  22. Questions?

  23. Hierarchal Clustering or, how to visualize complicated similarity measures Correa-Gallego+ 2016

  24. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Feature 2 Feature 1

  25. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 Feature 1

  26. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  27. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  28. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  29. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  30. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  31. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  32. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  33. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. The process stops when all the objects are merged into a single cluster Feature 2 distance Feature 1 Dendrogram

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend