Clustering Lecture 8 David Sontag New York University - PowerPoint PPT Presentation

Clustering ¡ Lecture ¡8 ¡ David ¡Sontag ¡ New ¡York ¡University ¡ Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

Clustering Clustering: – Unsupervised learning – Requires data, but no labels – Detect patterns e.g. in • Group emails or search results • Customer shopping patterns • Regions of images – Useful when don’t know what you’re looking for – But: can get gibberish

Clustering • Basic idea: group together similar instances • Example: 2D point patterns

Clustering • Basic idea: group together similar instances • Example: 2D point patterns • What could “ similar ” mean? – One option: small Euclidean distance (squared) y || 2 dist( ~ x, ~ y ) = || ~ x − ~ 2 – Clustering results are crucially dependent on the measure of similarity (or distance) between “points” to be clustered

� � � � � � � � Clustering algorithms • 8+'%(%(,) � +"*,'(%-.$ � 9:"+%; – < � .&+)$ – =(>%#'& � ,? � @+#$$(+) – A2&0%'+" � !"#$%&'()* � • /(&'+'0-(0+" � +"*,'(%-.$ � – 1,%%,. � #2 � 3 +**",.&'+%(4& – 5,2 � 6,7) � 3 6(4($(4& � � � � � � �

Clustering examples ¡ Image ¡segmenta2on ¡ Goal: ¡Break ¡up ¡the ¡image ¡into ¡meaningful ¡or ¡ perceptually ¡similar ¡regions ¡ [Slide from James Hayes]

Clustering examples Clustering gene expression data Eisen et al, PNAS 1998

Clustering examples ¡ Cluster ¡news ¡ ar2cles ¡

Clustering examples Cluster ¡people ¡by ¡space ¡and ¡2me ¡ [Image from Pilho Kim]

Clustering examples Clustering ¡languages ¡ [Image from scienceinschool.org]

Clustering examples Clustering ¡languages ¡ [Image from dhushara.com]

Clustering examples Clustering ¡species ¡ (“phylogeny”) ¡ [Lindblad-Toh et al., Nature 2005]

Clustering examples Clustering ¡search ¡queries ¡

K-Means • An iterative clustering algorithm – Initialize: Pick K random points as cluster centers – Alternate: 1. Assign data points to closest cluster center 2. Change the cluster center to the average of its assigned points – Stop when no points ’ assignments change

K-‑means ¡clustering: ¡Example ¡ • Pick K random points as cluster centers (means) Shown here for K =2 17

K-‑means ¡clustering: ¡Example ¡ Iterative Step 1 • Assign data points to closest cluster center 18

K-‑means ¡clustering: ¡Example ¡ Iterative Step 2 • Change the cluster center to the average of the assigned points 19

K-‑means ¡clustering: ¡Example ¡ • Repeat ¡unDl ¡ convergence ¡ 20

ProperDes ¡of ¡K-‑means ¡ algorithm ¡ • Guaranteed ¡to ¡converge ¡in ¡a ¡finite ¡number ¡of ¡ iteraDons ¡ • Running ¡Dme ¡per ¡iteraDon: ¡ 1. Assign data points to closest cluster center O(KN) time 2. Change the cluster center to the average of its assigned points O(N) ¡

!"#$%& '(%)#*+#%,# !"#$%&'($ � � � � � � � � � �� -. /01 � � 2 � (340"05# � !" !"#$ � % � &' � ()#*+, � � � � � � � � � � � �� 6. /01 � !# � (340"05# � �� – 7$8# � 3$*40$9 � :#*0)$40)# � (; � � � $%: � &#4 � 4( � 5#*(2 � <# � =$)# with respect to � � � � �� !"#$ � - � &' � ()#*+, �� !"#$%& 4$8#& � $% � $94#*%$40%+ � (340"05$40(% � $33*($,=2 � #$,= � &4#3 � 0& � +>$*$%4##: � 4( � :#,*#$&# � 4=# � (?@#,40)# � A 4=>& � +>$*$%4##: � 4( � ,(%)#*+# [Slide from Alan Fern]

Example: K-Means for Segmentation K=2 Original Goal of Segmentation is Original image K = 2 K = 3 K = 10 to partition an image into regions each of which has reasonably homogenous visual appearance.

Example: K-Means for Segmentation K=2 K=3 K=10 Original Original image K = 2 K = 3 K = 10

Example: Vector quantization FIGURE 14.9. Sir Ronald A. Fisher ( 1890 − 1962 ) was one of the founders of modern day statistics, to whom we owe maximum-likelihood, su ffi ciency, and many other fundamental concepts. The image on the left is a 1024 × 1024 grayscale image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using 200 code vectors, with a compression rate of 1 . 9 bits/pixel. The right image uses only four code vectors, with a compression rate of 0 . 50 bits/pixel [Figure from Hastie et al. book]

Initialization • K-means algorithm is a heuristic – Requires initial means – It does matter what you pick! – What can go wrong? – Various schemes for preventing this kind of thing: variance-based split / merge, initialization heuristics

K-Means Getting Stuck A local optimum: Would be better to have one cluster here … and two clusters here

K-means not able to properly cluster Y X

Changing the features (distance function) can help R θ

Hierarchical ¡Clustering ¡

Agglomerative Clustering • Agglomerative clustering: – First merge very similar instances – Incrementally build larger clusters out of smaller clusters • Algorithm: – Maintain a set of clusters – Initially, each instance in its own cluster – Repeat: • Pick the two closest clusters • Merge them into a new cluster • Stop when there’s only one cluster left • Produces not one clustering, but a family of clusterings represented by a dendrogram

Agglomerative Clustering • How should we define “ closest ” for clusters with multiple elements?

Agglomerative Clustering • How should we define “ closest ” for clusters with multiple elements? • Many options: – Closest pair (single-link clustering) – Farthest pair (complete-link clustering) – Average of all pairs • Different choices create different clustering behaviors

Agglomerative Clustering • How should we define “ closest ” for clusters with multiple elements? Closest pair Farthest pair (single-link clustering) (complete-link clustering) 1 5 6 2 1 5 2 6 3 4 7 8 3 4 7 8 [Pictures from Thorsten Joachims]

Clustering ¡Behavior ¡ Average Farthest Nearest Mouse tumor data from [Hastie et al. ]

AgglomeraDve ¡Clustering ¡ When ¡can ¡this ¡be ¡expected ¡to ¡work? ¡ Strong separation property: Closest pair All points are more similar to points in (single-link clustering) their own cluster than to any points in any other cluster Then, the true clustering corresponds to some pruning of the tree obtained by 1 5 6 2 single-link clustering! Slightly weaker (stability) conditions are solved by average-link clustering 3 4 7 8 (Balcan et al., 2008)

Spectral ¡Clustering ¡ Slides adapted from James Hays, Alan Fern, and Tommi Jaakkola

Spectral ¡clustering ¡ K-means Spectral clustering twocircles, 2 clusters two circles, 2 clusters (K − means) 5 5 5 4.5 4.5 4.5 4 4 4 3.5 3.5 3.5 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 [Shi & Malik ‘00; Ng, Jordan, Weiss NIPS ‘01]

Spectral ¡clustering ¡ nips, 8 clusters lineandballs, 3 clusters fourclouds, 2 clusters 5 5 5 4.5 4.5 4.5 4 4 4 3.5 3.5 3.5 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 squiggles, 4 clusters threecircles − joined, 3 clusters twocircles, 2 clusters threecircles − joined, 2 clusters 5 5 5 5 4.5 4.5 4.5 4.5 4 4 4 4 3.5 3.5 3.5 3.5 3 3 3 3 2.5 2.5 2.5 2.5 2 2 2 2 1.5 1.5 1.5 1.5 − 1 1 1 1 − 0.5 0.5 0.5 0.5 − 0 0 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 [Figures from Ng, Jordan, Weiss NIPS ‘01]

Spectral ¡clustering ¡ ¡ ¡Group ¡points ¡based ¡on ¡links ¡in ¡a ¡graph ¡ B A [Slide from James Hays]

Clustering Lecture 8 David Sontag New York University - PowerPoint PPT Presentation

Clustering Lecture 8 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein Clustering Clustering: Unsupervised learning

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Web Information Retrieval Lecture 15 Clustering Todays Topic: Clustering Document

a (mod m ) b is congruent to modulo a m b mod mod a m b m

E0358 Uday Kumar Reddy B uday@csa.iisc.ernet.in Dept of CSA, Indian Institute of Science,

The Demographics of Web Search Ingmar Weber, Carlos Castillo Yahoo! Research Barcelona Warm-up

The K-FAC method for neural network optimization James Martens Thanks to my various

On the Topic of Jets BOOST 2018 Eric M. Metodiev Center for Theoretical Physics Massachusetts

Jehoshua (Shuki) Bruck From Screws to Systems The Lineage of BMW It happens in biological

Secure Systems Engineering Chester Rebeiro Indian Institute of Technology Madras Flaws that

The Lessons Learned of a BA on an Agile Project Presented by Jacqueline Sanders, PMP, CBAP