Clustering ! Unsupervised learning: learning without a teacher ! - PowerPoint PPT Presentation

So far in the course Supervised learning: learning with a teacher ! ‣ You had training data which was (feature, label) pairs and the goal was to learn a mapping from features to labels Clustering ! Unsupervised learning: learning without a teacher ! ‣ Only features and no labels Subhransu Maji Why is unsupervised learning useful? ! CMPSCI 689: Machine Learning ‣ Discover hidden structures in the data — clustering today ‣ Visualization — dimensionality reduction 2 April 2015 ➡ lower dimensional features might help learning 7 April 2015 CMPSCI 689 Subhransu Maji (UMASS) 2 /48 Clustering Clustering Basic idea: group together similar instances ! Basic idea: group together similar instances ! Example: 2D points Example: 2D points ! ! ! ! ! ! ! What could similar mean? ! ‣ One option: small Euclidean distance (squared) ! dist( x , y ) = || x − y || 2 2 ! ‣ Clustering results are crucially dependent on the measure of similarity (or distance) between points to be clustered CMPSCI 689 Subhransu Maji (UMASS) 3 /48 CMPSCI 689 Subhransu Maji (UMASS) 4 /48

Clustering algorithms Clustering examples Simple clustering: organize Image segmentation: break up the image into similar regions elements into k groups ! ‣ K-means ‣ Mean shift ‣ Spectral clustering ! ! ! Hierarchical clustering: organize elements into a hierarchy ! ‣ Bottom up - agglomerative ‣ Top down - divisive image credit: Berkeley segmentation benchmark CMPSCI 689 Subhransu Maji (UMASS) 5 /48 CMPSCI 689 Subhransu Maji (UMASS) 6 /48 Clustering examples Clustering examples Clustering news articles Clustering queries CMPSCI 689 Subhransu Maji (UMASS) 7 /48 CMPSCI 689 Subhransu Maji (UMASS) 8 /48

Clustering examples Clustering examples Clustering people by space and time Clustering species (phylogeny) phylogeny of canid species (dogs, wolves, foxes, jackals, etc) image credit: Pilho Kim [K. Lindblad-Toh, Nature 2005] CMPSCI 689 Subhransu Maji (UMASS) 9 /48 CMPSCI 689 Subhransu Maji (UMASS) 10 /48 Clustering using k-means Lloyd’s algorithm for k-means Given (x 1 , x 2 , …, x n ) partition the n observations into k ( ≤ n) sets Initialize k centers by picking k points randomly among all the points ! S = {S 1 , S 2 , …, S k } so as to minimize the within-cluster sum of Repeat till convergence (or max iterations) ! squared distances ‣ Assign each point to the nearest center (assignment step) ! k ! The objective is to minimize: X X || x − µ i || 2 arg min ! S k i =1 x ∈ S i ! X X || x − µ i || 2 arg min ‣ Estimate the mean of each group (update step) S i =1 x ∈ S i k X X || x − µ i || 2 arg min cluster center S i =1 x ∈ S i CMPSCI 689 Subhransu Maji (UMASS) 11 /48 CMPSCI 689 Subhransu Maji (UMASS) 12 /48

k-means in action Properties of the Lloyd’s algorithm Guaranteed to converge in a finite number of iterations ! ‣ The objective decreases monotonically over time ‣ Local minima if the partitions don’t change. Since there are finitely many partitions, k-means algorithm must converge ! Running time per iteration ! ‣ Assignment step: O(NKD) ‣ Computing cluster mean: O(ND) ! Issues with the algorithm: ! ‣ Worst case running time is super-polynomial in input size ‣ No guarantees about global optimality ➡ Optimal clustering even for 2 clusters is NP-hard [Aloise et al., 09] k-means++ http://simplystatistics.org/2014/02/18/k-means-clustering-in-a-gif/ CMPSCI 689 Subhransu Maji (UMASS) 13 /48 CMPSCI 689 Subhransu Maji (UMASS) 14 /48 k-means++ algorithm k-means for image segmentation A way to pick the good initial centers ! ‣ Intuition: spread out the k initial cluster centers The algorithm proceeds normally once the centers are initialized ! K=2 k-means++ algorithm for initialization: ! 1.Chose one center uniformly at random among all the points 2.For each point x , compute D( x ), the distance between x and the nearest center that has already been chosen 3.Chose one new data point at random as a new center, using a K=3 Grouping pixels based weighted probability distribution where a point x is chosen with a on intensity similarity probability proportional to D( x ) 2 4.Repeat Steps 2 and 3 until k centers have been chosen ! [Arthur and Vassilvitskii’07] The approximation quality is O(log k) in expectation feature space: intensity value (1D) http://en.wikipedia.org/wiki/K-means%2B%2B CMPSCI 689 Subhransu Maji (UMASS) 15 /48 CMPSCI 689 Subhransu Maji (UMASS) 16 /48

Clustering using density estimation Mean shift algorithm One issue with k-means is that it is sometimes hard to pick k ! Mean shift procedure: ! ‣ For each point, repeat till convergence The mean shift algorithm seeks modes or local maxima of density in the feature space — automatically determines the number of clusters ‣ Compute mean shift vector − || x − x i || 2 ✓ ◆ ‣ Translate the kernel window by m(x) # exp h Simply following the gradient of density 2 ! " # x - x $ n i x g ∑ % ' ( & i ' h ( % & i 1 = ) * Kernel density estimator m x ( ) x = − % & 2 # x - x $ ✓ − || x − x i || 2 ◆ n K ( x ) = 1 % & X g i exp ∑ ' ( % & Z h ' h ( i 1 i = ) * , - Small h implies more modes (bumpy distribution) Slide&by&Y.&Ukrainitz&&&B.&Sarel CMPSCI 689 Subhransu Maji (UMASS) 17 /48 CMPSCI 689 Subhransu Maji (UMASS) 18 /48 Mean shift Mean shift Search   Search   window window Center of Center of mass mass Mean Shift Mean Shift vector vector Slide&by&Y.&Ukrainitz&&&B.&Sarel Slide&by&Y.&Ukrainitz&&&B.&Sarel CMPSCI 689 Subhransu Maji (UMASS) 19 /48 CMPSCI 689 Subhransu Maji (UMASS) 20 /48

Mean shift Mean shift Search   Search   window window Center of Center of mass mass Mean Shift Mean Shift vector vector Slide&by&Y.&Ukrainitz&&&B.&Sarel Slide&by&Y.&Ukrainitz&&&B.&Sarel CMPSCI 689 Subhransu Maji (UMASS) 21 /48 CMPSCI 689 Subhransu Maji (UMASS) 22 /48 Mean shift Mean shift Search   Search   window window Center of Center of mass mass Mean Shift Mean Shift vector vector Slide&by&Y.&Ukrainitz&&&B.&Sarel Slide&by&Y.&Ukrainitz&&&B.&Sarel CMPSCI 689 Subhransu Maji (UMASS) 23 /48 CMPSCI 689 Subhransu Maji (UMASS) 24 /48

Mean shift Mean shift clustering Search   Cluster all data points in the attraction basin of a mode ! window Attraction basin is the region for which all trajectories lead to the Center of same mode — correspond to clusters mass Slide&by&Y.&Ukrainitz&&&B.&Sarel Slide&by&Y.&Ukrainitz&&&B.&Sarel CMPSCI 689 Subhransu Maji (UMASS) 25 /48 CMPSCI 689 Subhransu Maji (UMASS) 26 /48 Mean shift for image segmentation Mean shift clustering results Feature: L*u*v* color values ! Initialize windows at individual feature points ! Perform mean shift for each window until convergence ! Merge windows that end up near the same “peak” or mode http://www.caip.rutgers.edu/~comanici/MSPAMI/msPamiResults.html CMPSCI 689 Subhransu Maji (UMASS) 27 /48 CMPSCI 689 Subhransu Maji (UMASS) 28 /48

Mean shift discussion Spectral clustering Pros: ! ‣ Does not assume shape on clusters ‣ One parameter choice (window size) ‣ Generic technique ‣ Finds multiple modes Cons: ! ‣ Selection of window size ‣ Is rather expensive: O(DN 2 ) per iteration ‣ Does not work well for high-dimensional features [Shi & Malik ‘00; Ng, Jordan, Weiss NIPS ‘01] Kristen Grauman CMPSCI 689 Subhransu Maji (UMASS) 29 /48 CMPSCI 689 Subhransu Maji (UMASS) 30 /48 Spectral clustering Spectral clustering Group points based on the links in a graph ! ! ! ! B A ! How do we create the graph? ! ‣ Weights on the edges based on similarity between the points ‣ A common choice is the Gaussian kernel ✓ − || x i − x j || 2 ◆ ! W ( i, j ) = exp ! 2 σ 2 One could create ! ‣ A fully connected graph ‣ k-nearest graph (each node is connected only to its k-nearest neighbors) [Figures from Ng, Jordan, Weiss NIPS ‘01] slide credit: Alan Fern CMPSCI 689 Subhransu Maji (UMASS) 31 /48 CMPSCI 689 Subhransu Maji (UMASS) 32 /48

Clustering ! Unsupervised learning: learning without a teacher ! - PowerPoint PPT Presentation

So far in the course Supervised learning: learning with a teacher ! You had training data which was (feature, label) pairs and the goal was to learn a mapping from features to labels Clustering ! Unsupervised learning: learning without a

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Graph Clustering Why graph clustering is useful? Distance matrices are graphs as useful as

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

Information & Communication Technology (ICT) Fern Green Primary School believes in leveraging

TLD CS231b 2015 Project 2 Link Iretiayo Akinola Josh Tennefoss Challenges Getting

The Insiders Guide to Vivas Will start shortly at 15:30 Audio Check Im now talking - Can

Counterfactual distributions: estimation and inference in Stata Victor Chernozhukov Ivn

Workshop 2.1: Data frames Murray Logan 15 Jul 2017 Section 1 Data importation and

Sequential pivotal mechanisms for public project problems Krzysztof R. Apt (so not Krzystof and

Anna Medvedovsky ICERM institute postdoc (ph.d. Brandeis University 2015) Research interests

Clustering ! Unsupervised learning: learning without a teacher ! - PowerPoint PPT Presentation

So far in the course Supervised learning: learning with a teacher ! You had training data which was (feature, label) pairs and the goal was to learn a mapping from features to labels Clustering ! Unsupervised learning: learning without a

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Graph Clustering Why graph clustering is useful? Distance matrices are graphs as useful as

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

Information &amp; Communication Technology (ICT) Fern Green Primary School believes in leveraging

TLD CS231b 2015 Project 2 Link Iretiayo Akinola Josh Tennefoss Challenges Getting

The Insiders Guide to Vivas Will start shortly at 15:30 Audio Check Im now talking - Can

Counterfactual distributions: estimation and inference in Stata Victor Chernozhukov Ivn

Workshop 2.1: Data frames Murray Logan 15 Jul 2017 Section 1 Data importation and

Sequential pivotal mechanisms for public project problems Krzysztof R. Apt (so not Krzystof and

Anna Medvedovsky ICERM institute postdoc (ph.d. Brandeis University 2015) Research interests

Information & Communication Technology (ICT) Fern Green Primary School believes in leveraging