npfl103 information retrieval 10
play

NPFL103: Information Retrieval (10) Document clustering Pavel - PowerPoint PPT Presentation

Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants NPFL103: Information Retrieval (10) Document clustering Pavel Pecina Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics


  1. Introduction K -means 1 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 29 / 114 × 2 2 ×

  2. Introduction 1 1 1 2 1 1 1 2 1 1 K -means 1 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 1 2 1 30 / 114 × × 2 2 × ×

  3. Introduction K -means b b b b b b b b b b b b b b b b b b b b Worked Example: Assign points to closest centroid Variants Hierarchical clustering How many clusters? Evaluation 31 / 114 × ×

  4. Introduction K -means 1 1 2 1 1 1 2 1 1 1 1 1 1 1 2 2 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 32 / 114 × 2 2 ×

  5. Introduction 1 1 1 2 1 1 1 2 1 1 K -means 1 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 2 2 1 33 / 114 × × 2 2 × ×

  6. Introduction K -means b b b b b b b b b b b b b b b b b b b b Worked Example: Assign points to closest centroid Variants Hierarchical clustering How many clusters? Evaluation 34 / 114 × ×

  7. Introduction K -means 1 1 2 1 1 1 2 1 1 2 1 1 1 1 2 2 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 35 / 114 × 2 2 ×

  8. Introduction 1 1 1 2 1 1 1 2 1 1 K -means 2 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 2 2 1 36 / 114 × × 2 2 × ×

  9. Introduction K -means b b b b b b b b b b b b b b b b b b b b Worked Example: Assign points to closest centroid Variants Hierarchical clustering How many clusters? Evaluation 37 / 114 × ×

  10. Introduction K -means 1 1 1 1 1 1 2 1 2 2 1 1 1 1 2 2 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 38 / 114 × 2 1 ×

  11. Introduction 1 1 1 1 1 1 1 2 1 2 K -means 2 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 2 2 1 39 / 114 × × 2 1 × ×

  12. Introduction K -means b b b b b b b b b b b b b b b b b b b b Worked Example: Assign points to closest centroid Variants Hierarchical clustering How many clusters? Evaluation 40 / 114 × ×

  13. Introduction K -means 1 1 1 1 1 1 2 1 2 2 1 1 1 1 2 2 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 41 / 114 × 1 1 ×

  14. Introduction 1 1 1 1 1 1 1 2 1 2 K -means 2 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 2 2 1 42 / 114 × 1 1 × × ×

  15. Introduction K -means b b b b b b b b b b b b b b b b b b b b Worked Example: Assign points to closest centroid Variants Hierarchical clustering How many clusters? Evaluation 43 / 114 × ×

  16. Introduction K -means 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 44 / 114 × 1 1 ×

  17. Introduction 1 1 1 1 1 1 1 1 1 2 K -means 2 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 2 2 1 45 / 114 × 1 1 × × ×

  18. Introduction K -means 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 2 2 Worked Example: Centroids and assignments afuer convergence Variants Hierarchical clustering How many clusters? Evaluation 46 / 114 × 1 1 ×

  19. Introduction K -means is guaranteed to converge: Proof K -means closest centroid 47 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ RSS = sum of all squared distances between document vector and ▶ RSS decreases during each reassignment step. ▶ because each vector is moved to a closer centroid ▶ RSS decreases during each recomputation step. ▶ See the book for a proof. ▶ There is only a finite number of clusterings. ▶ Thus: We must reach a fixed point. ▶ Assumption: Ties are broken consistently. ▶ Finite set & monotonically decreasing → convergence

  20. Introduction convergence and pptimality of K -means horrible. clustering! K -means 48 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ K -means is guaranteed to converge ▶ But we don’t know how long convergence will take! ▶ If we don’t care about a few docs switching back and forth, then convergence is usually fast ( < 10-20 iterations). ▶ However, complete convergence can take many more iterations. ▶ Convergence ̸ = optimality ▶ Convergence does not mean that we converge to the optimal ▶ This is the great weakness of K -means. ▶ If we start with a bad set of seeds, the resulting clustering can be

  21. Introduction Variants K -means Exercise: Suboptimal clustering 49 / 114 Hierarchical clustering How many clusters? Evaluation 3 d 1 d 2 d 3 × × × 2 × × × 1 d 4 d 5 d 6 0 0 1 2 3 4 ▶ What is the optimal clustering for K = 2 ? ▶ Do we converge on this clustering for arbitrary seeds d i , d j ?

  22. Introduction initialized. clustering for each, select the clustering with lowest RSS document space) outliers or find a set of seeds that has “good coverage” of the suboptimal clustering. K -means 50 / 114 Initialization of K -means Variants Hierarchical clustering How many clusters? Evaluation ▶ Random seed selection is just one of many ways K -means can be ▶ Random seed selection is not very robust: It’s easy to get a ▶ Betuer ways of computing initial centroids: ▶ Select seeds not randomly, but using some heuristic (e.g., filter out ▶ Use hierarchical clustering to find good seeds ▶ Select i (e.g., i = 10 ) difgerent random sets of seeds, do a K -means

  23. Introduction Time complexity of K -means document-centroid distances) K -means 51 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ Computing one distance of two vectors is O ( M ) . ▶ Reassignment step: O ( KNM ) (we need to compute KN ▶ Recomputation step: O ( NM ) (we need to add each of the document’s < M values to one of the centroids) ▶ Assume number of iterations bounded by I ▶ Overall complexity: O ( IKNM ) – linear in all important dimensions ▶ However: This is not a real worst-case analysis. ▶ In pathological cases, complexity can be worse than linear.

  24. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Evaluation 52 / 114

  25. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants What is a good clustering? clustering in the application. 53 / 114 ▶ Internal criteria ▶ Example of an internal criterion: RSS in K -means ▶ But an internal criterion ofuen does not evaluate the actual utility of a ▶ Alternative: External criteria ▶ Evaluate with respect to a human-defined classification

  26. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants External criteria for clustering quality used for the evaluation of classification groups, not the class labels.) purity 54 / 114 ▶ Based on a gold standard data set, e.g., the Reuters collection we also ▶ Goal: Clustering should reproduce the classes in the gold standard ▶ (But we only want to reproduce how documents are divided into ▶ First measure for how well we were able to reproduce the classes:

  27. Introduction N the set of classes. j max k K -means 55 / 114 External criterion: Purity Variants Hierarchical clustering How many clusters? Evaluation purity (Ω , C ) = 1 ∑ | ω k ∩ c j | ▶ Ω = { ω 1 , ω 2 , . . . , ω K } is the set of clusters and C = { c 1 , c 2 , . . . , c J } is ▶ For each cluster ω k : find class c j with most members n kj in ω k ▶ Sum all n kj and divide by total number of points

  28. Introduction o To compute purity: cluster 3 cluster 2 cluster 1 x x o K -means o x 56 / 114 x x Evaluation How many clusters? Hierarchical clustering Variants Example for computing purity x o x x ⋄ o ⋄ ⋄ ⋄ 5 = max j | ω 1 ∩ c j | (class x, cluster 1); 4 = max j | ω 2 ∩ c j | (class o, cluster 2); and 3 = max j | ω 3 ∩ c j | (class ⋄ , cluster 3). Purity is (1/17) × (5 + 4 + 3) ≈ 0 . 71 .

  29. Introduction same cluster decision is correct or incorrect. documents in the same or in difgerent clusters) … for N docs. true negatives (TN) false positives (FP) difgerent classes false negatives (FN) K -means same class difgerent clusters true positives (TP) 57 / 114 Hierarchical clustering not have this problem: Rand index. Another external criterion: Rand index Variants Evaluation How many clusters? ▶ Purity can be increased easily by increasing K – a measure that does TP + TN RI = TP + FP + FN + TN ▶ Based on 2x2 contingency table of all pairs of documents: ▶ Where: ( N ) ▶ TP+FN+FP+TN is the total number of pairs; 2 ▶ Each pair is either positive or negative (the clustering puts the two ▶ …and either “true” (correct) or “false” (incorrect): the clustering

  30. Introduction points, respectively, so the total number of “positives” or pairs of cluster 3, and the x pair in cluster 3 are true positives: K -means documents that are in the same cluster is: 58 / 114 Variants Hierarchical clustering Evaluation How many clusters? Example: compute Rand Index for the o/ ⋄ /x example ▶ We first compute TP + FP. The three clusters contain 6, 6, and 5 ( 6 ( 6 ( 5 ) ) ) TP + FP = + + = 40 2 2 2 ▶ Of these, the x pairs in cluster 1, the o pairs in cluster 2, the ⋄ pairs in ( 5 ( 4 ( 3 ( 2 ) ) ) ) TP = + + + = 20 2 2 2 2 ▶ Thus, FP = 40 − 20 = 20 . ▶ FN and TN are computed similarly.

  31. Introduction same cluster difgerent classes same class K -means difgerent clusters Variants Hierarchical clustering How many clusters? Evaluation 59 / 114 Rand index for the o/ ⋄ /x example TP = 20 FN = 24 FP = 20 TN = 72 RI is then (20 + 72)/(20 + 20 + 24 + 72) ≈ 0 . 68 .

  32. Introduction Two other external evaluation measures maximum MI classification? K -means 60 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ Two other measures ▶ Normalized mutual information (NMI) ▶ How much information does the clustering contain about the ▶ Singleton clusters (number of clusters = number of docs) have ▶ Therefore: normalize by entropy of clusters and classes ▶ F measure ▶ Like Rand, but “precision” and “recall” can be weighted

  33. Introduction 0.0 All measures range from 0 (bad clustering) to 1 (perfect clustering). 0.46 0.68 0.36 0.71 value for example 1.0 1.0 1.0 1.0 maximum 0.0 K -means 0.0 0.0 lower bound RI NMI purity Variants Hierarchical clustering How many clusters? Evaluation 61 / 114 Evaluation results for the o/ ⋄ /x example F 5

  34. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants How many clusters? 62 / 114

  35. Introduction How many clusters? clusters? K -means 63 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ Number of clusters K is given in many applications. ▶ E.g., there may be an external constraint on K . ▶ What if there is no external constraint? Is there a “right” number of ▶ One way to go: define an optimization criterion ▶ Given docs, find K for which the optimum is reached. ▶ What optimization criterion can we use? ▶ We can’t use RSS or average squared distance from centroid as criterion: always chooses K = N clusters.

  36. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Simple objective function for K : Basic idea from centroid 64 / 114 ▶ Start with 1 cluster ( K = 1 ) ▶ Keep adding clusters (= keep increasing K ) ▶ Add a penalty for each new cluster ▶ Then trade ofg cluster penalties against average squared distance ▶ Choose the value of K with the best tradeofg

  37. Introduction Simple objective function for K : Formalization (corresponds to average distance) K -means distance to centroid 65 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ Given a clustering, define the cost for a document as (squared) ▶ Define total distortion RSS(K) as sum of all individual document costs ▶ Then: penalize each cluster with a cost λ ▶ Thus for a clustering with K clusters, total cluster penalty is K λ ▶ Define the total cost of a clustering as distortion plus total cluster penalty: RSS(K) + K λ ▶ Select K that minimizes (RSS(K) + K λ ) ▶ Still need to determine good value for λ …

  38. Introduction Finding the “knee” in the curve Pick the number of clusters where curve “flatuens”. Here: 4 or 9. K -means 66 / 114 Variants Hierarchical clustering How many clusters? Evaluation 1950 1900 residual sum of squares 1850 1800 1750 2 4 6 8 10 number of clusters

  39. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Hierarchical clustering 67 / 114

  40. Introduction oil & gas clustering. TOP regions industries Kenya China UK K -means France poultry cofgee we saw earlier in Reuters: Our goal in hierarchical clustering is to create a hierarchy like the one Hierarchical clustering Variants Hierarchical clustering How many clusters? Evaluation 68 / 114 ▶ We want to create this hierarchy automatically. ▶ We can do this either top-down or botuom-up. ▶ The best known botuom-up method is hierarchical agglomerative

  41. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Hierarchical agglomerative clustering (HAC) clusters. 69 / 114 ▶ HAC creates a hierachy in the form of a binary tree. ▶ Assumes a similarity measure for determining similarity of two ▶ Up to now, our similarity measures were for documents. ▶ We will look at four difgerent cluster similarity measures.

  42. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants HAC: Basic algorithm 70 / 114 ▶ Start with each document in a separate cluster ▶ Then repeatedly merge the two clusters that are most similar ▶ Until there is only one cluster. ▶ The history of merging is a hierarchy in the form of a binary tree. ▶ The standard way of depicting this history is a dendrogram.

  43. Introduction K -means clustering. 0.1 or 0.4) to get a flat particular point (e.g., at dendrogram at a the merger was. what the similarity of each merger tells us 71 / 114 botuom to top. can be read ofg from A dendrogram Variants Hierarchical clustering How many clusters? Evaluation ▶ The history of mergers ▶ The horizontal line of ▶ We can cut the

  44. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Divisive clustering 72 / 114 ▶ Divisive clustering is top-down. ▶ Alternative to HAC (which is botuom up). ▶ Divisive clustering: ▶ Start with all docs in one big cluster ▶ Then recursively split clusters ▶ Eventually each node forms a cluster on its own. ▶ → Bisecting K -means at the end ▶ For now: HAC (= botuom-up)

  45. Introduction 4 return A 14 13 12 11 10 9 8 7 K -means 5 6 73 / 114 3 Evaluation How many clusters? 2 Hierarchical clustering Variants 1 Naive HAC algorithm SimpleHAC ( d 1 , . . . , d N ) for n ← 1 to N do for i ← 1 to N do C [ n ][ i ] ← Sim ( d n , d i ) I [ n ] ← 1 (keeps track of active clusters) A ← [] (collects clustering as a sequence of merges) for k ← 1 to N − 1 do ⟨ i , m ⟩ ← arg max {⟨ i , m ⟩ : i ̸ = m ∧ I [ i ]=1 ∧ I [ m ]=1 } C [ i ][ m ] A . Append ( ⟨ i , m ⟩ ) (store merge) for j ← 1 to N do (use i as representative for < i , m > ) C [ i ][ j ] ← Sim ( < i , m >, j ) C [ j ][ i ] ← Sim ( < i , m >, j ) I [ m ] ← 0 (deactivate cluster)

  46. Introduction Computational complexity of the naive algorithm operation. clusters. K -means 74 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ First, we compute the similarity of all N × N pairs of documents. ▶ Then, in each of N iterations: ▶ We scan the O ( N × N ) similarities to find the maximum similarity. ▶ We merge the two clusters with maximum similarity. ▶ We compute the similarity of the new cluster with all other (surviving) ▶ There are O ( N ) iterations, each performing a O ( N × N ) “scan” ▶ Overall complexity is O ( N 3 ) . ▶ We’ll look at more efgicient algorithms later.

  47. Introduction Key question: How to define cluster similarity same cluster the same cluster) K -means 75 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ Single-link: Maximum similarity ▶ Maximum similarity of any two documents ▶ Complete-link: Minimum similarity ▶ Minimum similarity of any two documents ▶ Centroid: Average “intersimilarity” ▶ Average similarity of all document pairs (but excluding pairs of docs in ▶ This is equivalent to the similarity of the centroids. ▶ Group-average: Average “intrasimilarity” ▶ Average similary of all document pairs, including pairs of docs in the

  48. Introduction K -means b b b b 76 / 114 Variants Cluster similarity: Example Evaluation How many clusters? Hierarchical clustering 4 3 2 1 0 0 1 2 3 4 5 6 7

  49. Introduction K -means b b b b 77 / 114 Variants Single-link: Maximum similarity Evaluation How many clusters? Hierarchical clustering 4 3 2 1 0 0 1 2 3 4 5 6 7

  50. Introduction K -means b b b b 78 / 114 Variants Complete-link: Minimum similarity Evaluation How many clusters? Hierarchical clustering 4 3 2 1 0 0 1 2 3 4 5 6 7

  51. Introduction K -means intersimilarity = similarity of two documents in difgerent clusters b b b b 79 / 114 Evaluation Centroid: Average intersimilarity How many clusters? Hierarchical clustering Variants 4 3 2 1 0 0 1 2 3 4 5 6 7

  52. Introduction K -means intrasimilarity = similarity of any pair, including cases in the same cluster b b b b 80 / 114 Evaluation Group average: Average intrasimilarity How many clusters? Hierarchical clustering Variants 4 3 2 1 0 0 1 2 3 4 5 6 7

  53. Introduction b b b b b b b b b b b b b b b b b b b b K -means Cluster similarity: Larger Example Evaluation How many clusters? Hierarchical clustering Variants 81 / 114 4 3 2 1 0 0 1 2 3 4 5 6 7

  54. Introduction b b b b b b b b b b b b b b b b b b b b K -means Single-link: Maximum similarity Evaluation How many clusters? Hierarchical clustering Variants 82 / 114 4 3 2 1 0 0 1 2 3 4 5 6 7

  55. Introduction b b b b b b b b b b b b b b b b b b b b K -means Complete-link: Minimum similarity Evaluation How many clusters? Hierarchical clustering Variants 83 / 114 4 3 2 1 0 0 1 2 3 4 5 6 7

  56. Introduction b b b b b b b b b b b b b b b b b b b b K -means Centroid: Average intersimilarity Evaluation How many clusters? Hierarchical clustering Variants 84 / 114 4 3 2 1 0 0 1 2 3 4 5 6 7

  57. Introduction b b b b b b b b b b b b b b b b b b b b K -means Group average: Average intrasimilarity Evaluation How many clusters? Hierarchical clustering Variants 85 / 114 4 3 2 1 0 0 1 2 3 4 5 6 7

  58. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Single link HAC maximum similarity of a document from the first cluster and a document from the second cluster. matrix? 86 / 114 ▶ The similarity of two clusters is the maximum intersimilarity – the ▶ Once we have merged two clusters, how do we update the similarity ▶ This is simple for single link: sim ( ω i , ( ω k 1 ∪ ω k 2 )) = max ( sim ( ω i , ω k 1 ) , sim ( ω i , ω k 2 ))

  59. Introduction members) being added dendrogram. derived by cutuing the clustering that can be 2-cluster or 3-cluster to the main cluster clusters (1 or 2 K -means This dendrogram was produced by single-link Variants Hierarchical clustering How many clusters? Evaluation 87 / 114 ▶ Notice: many small ▶ There is no balanced

  60. Introduction K -means of the cluster that we would get if we merged them. matrix? document from the second cluster. minimum similarity of a document from the first cluster and a Complete link HAC Variants Hierarchical clustering How many clusters? Evaluation 88 / 114 ▶ The similarity of two clusters is the minimum intersimilarity – the ▶ Once we have merged two clusters, how do we update the similarity ▶ Again, this is simple: sim ( ω i , ( ω k 1 ∪ ω k 2 )) = min ( sim ( ω i , ω k 1 ) , sim ( ω i , ω k 2 )) ▶ We measure the similarity of two clusters by computing the diameter

  61. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Complete-link dendrogram dendrogram is much more balanced than the single-link one. 2-cluster clustering with two clusters of about the same size. 89 / 114 ▶ Notice that this ▶ We can create a

  62. Introduction Variants K -means Exercise: Compute single and complete link clusterings 90 / 114 Hierarchical clustering How many clusters? Evaluation d 1 d 2 d 3 d 4 × × × × 3 2 d 5 d 6 d 7 d 8 × × × × 1 0 0 1 2 3 4

  63. Introduction Variants K -means Single-link clustering 91 / 114 Hierarchical clustering How many clusters? Evaluation d 1 d 2 d 3 d 4 × × × × 3 2 d 5 d 6 d 7 d 8 × × × × 1 0 0 1 2 3 4

  64. Introduction Variants K -means Complete link clustering 92 / 114 Hierarchical clustering How many clusters? Evaluation d 1 d 2 d 3 d 4 × × × × 3 2 d 5 d 6 d 7 d 8 × × × × 1 0 0 1 2 3 4

  65. Introduction Variants K -means Single-link vs. Complete link clustering 93 / 114 Hierarchical clustering How many clusters? Evaluation d 1 d 2 d 3 d 4 d 1 d 2 d 3 d 4 × × × × × × × × 3 3 2 2 d 5 d 6 d 7 d 8 d 5 d 6 d 7 d 8 × × × × × × × × 1 1 0 0 0 1 2 3 4 0 1 2 3 4

  66. Introduction Single-link: Chaining applications, these are undesirable. Single-link clustering ofuen produces long, stragglyclusters. For most K -means 94 / 114 Variants Hierarchical clustering Evaluation How many clusters? × × × × × × × × × × × × 2 × × × × × × × × × × × × 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12

  67. Introduction Variants K -means What 2-cluster clustering will complete-link produce? 95 / 114 Hierarchical clustering Evaluation How many clusters? d 1 d 2 d 3 d 4 d 5 × × × × × 1 0 0 1 2 3 4 5 6 7 Coordinates: 1 + 2 × ϵ, 4 , 5 + 2 × ϵ, 6 , 7 − ϵ .

  68. Introduction Complete-link: Sensitivity to outliers complete-link clustering. neighbors – clearly undesirable. K -means 96 / 114 Variants Evaluation How many clusters? Hierarchical clustering d 1 d 2 d 3 d 4 d 5 × × × × × 1 0 0 1 2 3 4 5 6 7 ▶ The complete-link clustering of this set splits d 2 from its right ▶ The reason is the outlier d 1 . ▶ This shows that a single outlier can negatively afgect the outcome of ▶ Single-link clustering does betuer in this case.

  69. Introduction Centroid HAC centroids: K -means average similarity of documents from the first cluster with documents from the second cluster. Variants Hierarchical clustering How many clusters? Evaluation 97 / 114 ▶ The similarity of two clusters is the average intersimilarity – the ▶ A naive implementation of this definition is inefgicient ( O ( N 2 ) ), but the definition is equivalent to computing the similarity of the sim-cent ( ω i , ω j ) = ⃗ µ ( ω i ) · ⃗ µ ( ω j ) ▶ Hence the name: centroid HAC ▶ Note: this is the dot product, not cosine similarity!

  70. Introduction Variants K -means Exercise: Compute centroid clustering 98 / 114 Hierarchical clustering Evaluation How many clusters? × d 1 × d 3 5 4 × d 2 × d 4 3 2 × × d 6 1 d 5 0 0 1 2 3 4 5 6 7

  71. Introduction K -means bc bc bc 99 / 114 Centroid clustering Variants Hierarchical clustering Evaluation How many clusters? × d 1 × d 3 5 µ 2 4 × d 2 × d 4 3 2 µ 3 × × d 6 1 d 5 µ 1 0 0 1 2 3 4 5 6 7

  72. Introduction Inversion in centroid clustering bc K -means Results in an “inverted” dendrogram. 100 / 114 Variants Hierarchical clustering Evaluation How many clusters? ▶ In an inversion, the similarity increases during a merge sequence. ▶ Below: Similarity of the first merger ( d 1 ∪ d 2 ) is -4.0, similarity of second merger (( d 1 ∪ d 2 ) ∪ d 3 ) is ≈ − 3 . 5 . d 3 5 × − 4 4 − 3 3 − 2 2 d 1 d 2 − 1 × × 1 0 0 d 1 d 2 d 3 0 1 2 3 4 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend