csce 478 878 lecture 8
play

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction - PowerPoint PPT Presentation

CSCE 478/878 Lecture 8: Clustering CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen Scott k -Means Clustering Hierarchical Clustering sscott@cse.unl.edu 1 / 19 Introduction CSCE 478/878 If


  1. CSCE 478/878 Lecture 8: Clustering CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen Scott k -Means Clustering Hierarchical Clustering sscott@cse.unl.edu 1 / 19

  2. Introduction CSCE 478/878 If no label information is available, can still perform Lecture 8: unsupervised learning Clustering Stephen Scott Looking for structural information about instance space instead of label prediction function Introduction Outline Approaches: density estimation, clustering, Clustering dimensionality reduction k -Means Clustering Clustering algorithms group similar instances together Hierarchical based on a similarity measure Clustering x1 x1 Clustering Algorithm x2 x2 2 / 19

  3. Outline CSCE 478/878 Lecture 8: Clustering Stephen Scott Introduction Clustering background Outline Similarity/dissimilarity measures Clustering k -Means k -means clustering Clustering Hierarchical clustering Hierarchical Clustering 3 / 19

  4. Clustering Background CSCE Goal: Place patterns into “sensible” clusters that reveal 478/878 Lecture 8: similarities and differences Clustering Definition of “sensible” depends on application Stephen Scott Introduction Outline Clustering Measures: Point-Point Measures: Point-Set Measures: Set-Set k -Means Clustering Hierarchical Clustering (a) How they bear young (b) Existence of lungs (c) Environment (d) Both (a) & (b) 4 / 19

  5. Clustering Background (cont’d) CSCE 478/878 Lecture 8: Clustering Types of clustering problems: Stephen Scott Hard (crisp): partition data into non-overlapping Introduction clusters; each instance belongs in exactly one cluster Outline Clustering Fuzzy: Each instance could be a member of multiple Measures: Point-Point clusters, with a real-valued function indicating the Measures: Point-Set Measures: Set-Set degree of membership k -Means Hierarchical: partition instances into numerous small Clustering clusters, then group the clusters into larger ones, and Hierarchical Clustering so on (applicable to phylogeny) End up with a tree with instances at leaves 5 / 19

  6. Clustering Background (Dis-)similarity Measures: Between Instances CSCE Dissimilarity measure: Weighted L p norm: 478/878 Lecture 8: � n Clustering � 1 / p � w i | x i − y i | p Stephen Scott L p ( x , y ) = i = 1 Introduction Outline Special cases include weighted Euclidian distance ( p = 2 ), Clustering weighted Manhattan distance Measures: Point-Point n Measures: Point-Set Measures: Set-Set � L 1 ( x , y ) = w i | x i − y i | , k -Means Clustering i = 1 Hierarchical and weighted L ∞ norm Clustering L ∞ ( x , y ) = max 1 ≤ i ≤ n { w i | x i − y i |} Similarity measure: Dot product between two vectors (kernel) 6 / 19

  7. Clustering Background (Dis-)similarity Measures: Between Instances (cont’d) CSCE 478/878 Lecture 8: Clustering If attributes come from { 0 , . . . , k − 1 } , can use measures for Stephen Scott real-valued attributes, plus: Introduction Outline Hamming distance : DM measuring number of places Clustering where x and y differ Measures: Point-Point Tanimoto measure : SM measuring number of places Measures: Point-Set Measures: Set-Set where x and y are same, divided by total number of k -Means places Clustering Ignore places i where x i = y i = 0 Hierarchical Clustering Useful for ordinal features where x i is degree to which x possesses i th feature 7 / 19

  8. Clustering Background (Dis-)similarity Measures: Between Instance and Set CSCE 478/878 Might want to measure proximity of point x to existing Lecture 8: Clustering cluster C Stephen Scott Can measure proximity α by using all points of C or by Introduction using a representative of C Outline If all points of C used, common choices: Clustering Measures: Point-Point α ps max ( x , C ) = max y ∈ C { α ( x , y ) } Measures: Point-Set Measures: Set-Set k -Means α ps min ( x , C ) = min y ∈ C { α ( x , y ) } Clustering Hierarchical Clustering avg ( x , C ) = 1 α ps � α ( x , y ) , | C | y ∈ C where α ( x , y ) is any measure between x and y 8 / 19

  9. Clustering Background (Dis-)similarity Measures: Between Instance and Set (cont’d) CSCE Alternative: Measure distance between point x and a 478/878 representative of the cluster C Lecture 8: Clustering Mean vector m p = 1 Stephen Scott � y | C | Introduction y ∈ C Mean center m c ∈ C : Outline Clustering � � d ( m c , y ) ≤ d ( z , y ) ∀ z ∈ C , Measures: Point-Point y ∈ C y ∈ C Measures: Point-Set Measures: Set-Set where d ( · , · ) is DM (if SM used, reverse ineq.) k -Means Clustering Median center : For each point y ∈ C , find median Hierarchical dissimilarity from y to all other points of C , then take Clustering min; so m med ∈ C is defined as med y ∈ C { d ( m med , y ) } ≤ med y ∈ C { d ( z , y ) } ∀ z ∈ C Now can measure proximity between C ’s representative and x with standard measures 9 / 19

  10. Clustering Background (Dis-)similarity Measures: Between Sets CSCE 478/878 Lecture 8: Clustering Given sets of instances C i and C j and proximity measure Stephen Scott α ( · , · ) Introduction Outline Max : α ss max ( C i , C j ) = x ∈ C i , y ∈ C j { α ( x , y ) } max Clustering Measures: Min : α ss min ( C i , C j ) = x ∈ C i , y ∈ C j { α ( x , y ) } Point-Point min Measures: Point-Set Measures: Set-Set 1 k -Means � � Average : α ss avg ( C i , C j ) = α ( x , y ) Clustering | C i | | C j | Hierarchical x ∈ C i y ∈ C j Clustering Representative (mean) : α ss mean ( C i , C j ) = α ( m C i , m C j ) , 10 / 19

  11. k -Means Clustering CSCE 478/878 Lecture 8: Very popular clustering algorithm Clustering Represents cluster i (out of k total) by specifying its Stephen Scott representative m i (not necessarily part of the original Introduction set of instances X ) Outline Each instance x ∈ X is assigned to the cluster with Clustering nearest representative k -Means Clustering Goal is to find a set of k representatives such that sum Algorithm Example of distances between instances and their Hierarchical representatives is minimized Clustering NP-hard in general Will use an algorithm that alternates between determining representatives and assigning clusters until convergence (in the style of the EM algorithm) 11 / 19

  12. k -Means Clustering Algorithm CSCE 478/878 Lecture 8: Clustering Choose value for parameter k Stephen Scott Initialize k arbitrary representatives m 1 , . . . , m k Introduction E.g., k randomly selected instances from X Outline Repeat until representatives m 1 , . . . , m k don’t change Clustering For all x ∈ X 1 k -Means Assign x to cluster C j such that � x − m j � (or other Clustering Algorithm measure) is minimized Example I.e., nearest representative Hierarchical Clustering For each j ∈ { 1 , . . . , k } 2 m j = 1 � y C j y ∈ C j 12 / 19

  13. k -Means Clustering Example with k = 2 k − means: Initial After 1 iteration 20 20 CSCE 478/878 Lecture 8: 10 10 Clustering 0 0 Stephen Scott x 2 x 2 − 10 − 10 Introduction − 20 − 20 Outline Clustering − 30 − 30 − 40 − 20 0 20 40 − 40 − 20 0 20 40 k -Means x 1 x 1 Clustering After 2 iterations After 3 iterations Algorithm 20 20 Example Hierarchical 10 10 Clustering 0 0 x 2 x 2 − 10 − 10 − 20 − 20 − 30 − 30 − 40 − 20 0 20 40 − 40 − 20 0 20 40 x 1 x 1 13 / 19

  14. Hierarchical Clustering CSCE 478/878 Lecture 8: Clustering Stephen Scott Introduction Useful in capturing hierarchical relationships, e.g., Outline evolutionary tree of biological sequences Clustering End result is a sequence (hierarchy) of clusterings k -Means Clustering Two types of algorithms: Hierarchical Agglomerative : Repeatedly merge two clusters into one Clustering Definitions Divisive : Repeatedly divide one cluster into two Pseudocode Example 14 / 19

  15. Hierarchical Clustering Definitions CSCE 478/878 Lecture 8: Clustering Stephen Scott Let C t = { C 1 , . . . , C m t } be a level- t clustering of X = { x 1 , . . . , x N } , where C t meets definition of hard Introduction clustering Outline Clustering C t is nested in C t ′ (written C t ⊏ C t ′ ) if each cluster in C t is k -Means a subset of a cluster in C t ′ and at least one cluster in C t Clustering is a proper subset of some cluster in C t ′ Hierarchical Clustering Definitions Pseudocode C 1 = {{ x 1 , x 3 } , { x 4 } , { x 2 , x 5 }} ⊏ {{ x 1 , x 3 , x 4 } , { x 2 , x 5 }} Example C 1 � ⊏ {{ x 1 , x 4 } , { x 3 } , { x 2 , x 5 }} 15 / 19

  16. Hierarchical Clustering Definitions (cont’d) CSCE 478/878 Lecture 8: Agglomerative algorithms start with Clustering C 0 = {{ x 1 } , . . . , { x N }} and at each step t merge two Stephen Scott clusters into one, yielding |C t + 1 | = |C t | − 1 and C t ⊏ C t + 1 Introduction At final step (step N − 1 ) have hierarchy: Outline Clustering C 0 = {{ x 1 } , . . . , { x N }} ⊏ C 1 ⊏ · · · ⊏ C N − 1 = {{ x 1 , . . . , x N }} k -Means Clustering Hierarchical Divisive algorithms start with C 0 = {{ x 1 , . . . , x N }} and at Clustering each step t split one cluster into two, yielding Definitions Pseudocode |C t + 1 | = |C t | + 1 and C t + 1 ⊏ C t Example At step N − 1 have hierarchy: C N − 1 = {{ x 1 } , . . . , { x N }} ⊏ · · · ⊏ C 0 = {{ x 1 , . . . , x N }} 16 / 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend