bbm406
play

BBM406 Fundamentals of Machine Learning Lecture 21: Clustering - PowerPoint PPT Presentation

Photo by Unsplash user @foodiesfeed BBM406 Fundamentals of Machine Learning Lecture 21: Clustering K-Means Aykut Erdem // Hacettepe University // Fall 2019 Last time Boosting Idea: given a weak learner, run it multiple times on


  1. � � � � � � � � Clustering algorithms • Partitioning algorithms � � %; - Construct various partitions 
 � and then evaluate them by 
 � � some criterion � • K-means • Mixture of Gaussians � • Spectral Clustering • Hierarchical algorithms � � - Create a hierarchical decomposition 
 � � of the set of objects using some 
 � � � criterion - Bottom-up – agglomerative - Top-down – divisive slide by Eric Xing � � 38 � � � �

  2. Desirable Properties of a Clustering Algorithm • Scalability (in terms of both time and space) • Ability to deal with di ff erent data types • Minimal requirements for domain knowledge to determine input parameters • Ability to deal with noisy data • Interpretability and usability • Optional slide by Andrew Moore - Incorporation of user-specified constraints 39

  3. K-Means 
 Clustering 40

  4. K-Means Clustering Benefits • Fast • Conceptually straightforward • Popular slide by Tamara Broderick 41

  5. K-Means: Preliminaries slide by Tamara Broderick 42

  6. K-Means: Preliminaries Datum: Vector of continuous values slide by Tamara Broderick 43

  7. K-Means: Preliminaries Distance North Datum: Vector of continuous values slide by Tamara Broderick Distance East 44

  8. K-Means: Preliminaries Distance North Datum: Vector of continuous values 6 . 2 slide by Tamara Broderick 1 . 5 Distance East 45

  9. K-Means: Preliminaries Distance North Datum: Vector of continuous values North x 3 = (1 . 5 , 6 . 2) East Nor East 1.2 5.9 x 1 6 . 2 4.3 2.1 x 2 1.5 6.3 x 3 ... 4.1 2.3 x N slide by Tamara Broderick Distance East 1 . 5 Distance East 46

  10. K-Means: Preliminaries Datum: Vector of continuous values Feature 1 Feature 2 Feature 2 x 3 = (1 . 5 , 6 . 2) Nor East 1.2 5.9 x 1 6 . 2 4.3 2.1 x 2 1.5 6.3 x 3 ... 4.1 2.3 x N slide by Tamara Broderick Distance East 1 . 5 Feature 1 47

  11. K-Means: Preliminaries Datum: Vector of continuous values Feature 1 Feature 2 Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = (1 . 5 , 6 . 2) x 3 = ( x 3 , 1 , x 3 , 2 ) Nor East F F 1.2 5.9 x 1 , 1 x 1 , 2 x 1 x 1 4.3 2.1 x 2 x 2 x 2 , 1 x 2 , 2 1.5 6.3 x 3 , 1 x 3 , 2 x 3 x 3 ... ... 4.1 2.3 x N, 1 x N, 2 x N x N slide by Tamara Broderick Feature 1 Distance East Feature 1 48

  12. K-Means: Preliminaries Datum: Vector of D continuous values Feature 1 Feature 2 Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) Nor East F F 1.2 5.9 x 1 , 1 x 1 , 2 x 1 x 1 4.3 2.1 x 2 x 2 x 2 , 1 x 2 , 2 1.5 6.3 x 3 , 1 x 3 , 2 x 3 x 3 ... ... 4.1 2.3 x N, 1 x N, 2 x N x N slide by Tamara Broderick Feature 1 Distance East Feature 1 49

  13. K-Means: Preliminaries Datum: Vector of D continuous values Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 50

  14. K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 51

  15. K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 52

  16. K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 53

  17. K-Means: Preliminaries Dissimilarity: Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 54

  18. K-Means: Preliminaries Dissimilarity: Squared Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) dis ( x 3 , x 17 ) = ( x 3 , 1 − x 17 , 1 ) 2 + ( x 3 , 2 − x 17 , 2 ) 2 x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 55

  19. K-Means: Preliminaries Dissimilarity: Squared Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) D � dis ( x 3 , x 17 ) = ( x 3 , 1 − x 17 , 1 ) 2 ( x 3 ,d − x 17 ,d ) 2 dis ( x 3 , x 17 ) = d =1 + ( x 3 , 2 − x 17 , 2 ) 2 x 3 x 17 For each feature For each feature slide by Tamara Broderick Feature 1 Feature 1 56

  20. K-Means: Preliminaries Dissimilarity Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 57

  21. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) K = number of clusters slide by Tamara Broderick Feature 1 Feature 1 58

  22. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 59

  23. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 60

  24. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 61

  25. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 2 µ 3 µ 1 slide by Tamara Broderick Feature 1 Feature 1 62

  26. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 2 µ 3 1 = ( µ 1 , 1 , µ 1 , 2 ) µ 1 slide by Tamara Broderick Feature 1 Feature 1 63

  27. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ 2 , . . . , µ K µ 2 µ 3 1 = ( µ 1 , 1 , µ 1 , 2 ) µ 1 slide by Tamara Broderick Feature 1 Feature 1 64

  28. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ 2 , . . . , µ K • Data assignments to clusters slide by Tamara Broderick Feature 1 Feature 1 65

  29. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters slide by Tamara Broderick Feature 1 Feature 1 66

  30. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters = set of points in 
 S k = set of points in cluster k cluster k slide by Tamara Broderick Feature 1 Feature 1 67

  31. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters S 1 , S 2 , . . . , S K = set of points in 
 S k = set of points in cluster k cluster k slide by Tamara Broderick Feature 1 Feature 1 68

  32. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters µ 2 µ 3 S 1 , S 2 , . . . , S K = set of points in 
 S k = set of points in cluster k cluster k µ 1 slide by Tamara Broderick Feature 1 Feature 1 69

  33. K-Means: Preliminaries Dissimilarity Feature 2 Featur slide by Tamara Broderick Feature 1 70

  34. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 slide by Tamara Broderick Feature 1 71

  35. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster slide by Tamara Broderick Feature 1 72

  36. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster or each cluster For each data 
 or each data point in the 
 kth cluster slide by Tamara Broderick Feature 1 73

  37. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster or each cluster For each data 
 or each data or each data point in the 
 point in the kth cluster kth cluster slide by Tamara Broderick Feature 1 or each featur For each feature 74

  38. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 slide by Tamara Broderick Feature 1 75

  39. • Initialize K cluster centers K-Means Algorithm • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 Featur center to be the mean of its cluster’s data points slide by Tamara Broderick 76

  40. • Initialize K cluster centers K-Means Algorithm • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 Featur center to be the mean of its cluster’s data points slide by Tamara Broderick 77

  41. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 Featur the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 78

  42. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 79

  43. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 80

  44. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 81

  45. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 82

  46. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t Or no change Or no change Or no change 
 change: in in dis global ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 83

  47. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 84

  48. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 85

  49. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 86

  50. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 87

  51. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 88

  52. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 89

  53. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 90

  54. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 91

  55. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 92

  56. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 93

  57. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 94

  58. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 95

  59. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 96

  60. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ For k = 1,…,K For k = 1,...,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 97

  61. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ For k = 1,…,K For k = 1,...,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 98

  62. K-Means: Evaluation slide by Tamara Broderick 99

  63. K-Means: Evaluation • Will it terminate? Yes. Always. slide by Tamara Broderick 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend