Partitional Clustering Boston University Slideshow Title Goes Here - PowerPoint PPT Presentation

Partitional Clustering Boston University Slideshow Title Goes Here • Clustering: David Arthur, Sergei Vassilvitskii. k-means ++: The Advantages of Careful Seeding . In SODA 2007 • Thanks A. Gionis and S. Vassilvitskii for the slides

What is clustering? • a grouping of data objects such that the objects within Boston University Slideshow Title Goes Here a group are similar (or near) to one another and dissimilar (or far) from the objects in other groups

How to capture this objective? a grouping of data objects such that the objects within Boston University Slideshow Title Goes Here a group are similar (or near) to one another and dissimilar (or far) from the objects in other groups minimize maximize intra-cluster inter-cluster distances distances

The clustering problem • Given a collection of data objects Boston University Slideshow Title Goes Here • Find a grouping so that • similar objects are in the same cluster • dissimilar objects are in different clusters ✦ Why we care ? ✦ stand-alone tool to gain insight into the data ✦ visualization ✦ preprocessing step for other algorithms ✦ indexing or compression often relies on clustering

Applications of clustering • image processing Boston University Slideshow Title Goes Here • cluster images based on their visual content • web mining • cluster groups of users based on their access patterns on webpages • cluster webpages based on their content • bioinformatics • cluster similar proteins together (similarity wrt chemical structure and/or functionality etc) • many more...

The clustering problem • Given a collection of data objects Boston University Slideshow Title Goes Here • Find a grouping so that • similar objects are in the same cluster • dissimilar objects are in different clusters ✦ Basic questions: ✦ what does similar mean? ✦ what is a good partition of the objects? i.e., how is the quality of a solution measured? ✦ how to find a good partition?

Notion of a cluster can be ambiguous Boston University Slideshow Title Goes Here How many clusters? Six Clusters Two Clusters Four Clusters � � � �

Types of clusterings Boston University Slideshow Title Goes Here • Partitional • each object belongs in exactly one cluster • Hierarchical • a set of nested clusters organized in a tree

Hierarchical clustering Boston University Slideshow Title Goes Here p1 p3 p4 p2 p1 p2 p3 p4 Traditional Hierarchical Clustering Traditional Dendrogram p1 p3 p4 p2 p1 p2 p3 p4 Non-traditional Hierarchical Clustering Non-traditional Dendrogram

Partitional clustering Boston University Slideshow Title Goes Here Original Points A Partitional Clustering

Partitional algorithms Boston University Slideshow Title Goes Here • partition the n objects into k clusters • each object belongs to exactly one cluster • the number of clusters k is given in advance

The k-means problem Boston University Slideshow Title Goes Here • consider set X={x 1 ,...,x n } of n points in R d • assume that the number k is given • problem: • find k points c 1 ,...,c k (named centers or means) so that the cost n n X X L 2 || x i − c j || 2 � min 2 ( x i , c j ) = min 2 j j i =1 i =1 is minimized

The k-means problem • consider set X={x 1 ,...,x n } of n points in R d Boston University Slideshow Title Goes Here • assume that the number k is given • problem: • find k points c 1 ,...,c k (named centers or means) • and partition X into {X 1 ,...,X k } by assigning each point x i in X to its nearest cluster center, • so that the cost n k X X X || x i − c j || 2 || x − c j || 2 min 2 = 2 j i =1 j =1 x ∈ X j is minimized

The k-means problem • k=1 and k=n are easy special cases (why?) Boston University Slideshow Title Goes Here • an NP-hard problem if the dimension of the data is at least 2 (d ≥ 2) • for d ≥ 2, finding the optimal solution in polynomial time is infeasible • for d=1 the problem is solvable in polynomial time • in practice, a simple iterative algorithm works quite well

The k-means algorithm Boston University Slideshow Title Goes Here • voted among the top-10 algorithms in data mining • one way of solving the k- means problem

The k-means algorithm Boston University Slideshow Title Goes Here 1.randomly (or with another method) pick k cluster centers {c 1 ,...,c k } 2.for each j, set the cluster X j to be the set of points in X that are the closest to center c j 3.for each j let c j be the center of cluster X j (mean of the vectors in X j ) 4.repeat (go to step 2) until convergence

Sample execution Boston University Slideshow Title Goes Here

Properties of the k-means algorithm Boston University Slideshow Title Goes Here • finds a local optimum • often converges quickly but not always • the choice of initial points can have large influence in the result

Effects of bad initialization Boston University Slideshow Title Goes Here

� � Limitations of k-means: different sizes Boston University Slideshow Title Goes Here K-means (3 Clusters) Original Points

Limitations of k-means: different density Boston University Slideshow Title Goes Here K-means (3 Clusters) Original Points

Limitations of k-means: non-spherical shapes Boston University Slideshow Title Goes Here Original Points K-means (2 Clusters)

Discussion on the k-means algorithm Boston University Slideshow Title Goes Here • finds a local optimum • often converges quickly but not always • the choice of initial points can have large influence in the result • tends to find spherical clusters • outliers can cause a problem • different densities may cause a problem

Initialization Boston University Slideshow Title Goes Here • random initialization • random, but repeat many times and take the best solution • helps, but solution can still be bad • pick points that are distant to each other • k-means++ • provable guarantees

k-means++ Boston University Slideshow Title Goes Here David Arthur and Sergei Vassilvitskii k-means++: The advantages of careful seeding SODA 2007

k-means algorithm: random initialization Boston University Slideshow Title Goes Here

k-means algorithm: initialization with further-first traversal Boston University Slideshow Title Goes Here 2 1 3 4

k-means algorithm: initialization with further-first traversal Boston University Slideshow Title Goes Here

but... sensitive to outliers Boston University Slideshow Title Goes Here 2 1 3

but... sensitive to outliers Boston University Slideshow Title Goes Here

Here random may work well Boston University Slideshow Title Goes Here

k-means++ algorithm • interpolate between the two methods Boston University Slideshow Title Goes Here • let D(x) be the distance between x and the nearest center selected so far • choose next center with probability proportional to (D(x)) a = D a (x) ✦ a = 0 random initialization ✦ a = ∞ furthest-first traversal ✦ a = 2 k-means++

k-means++ algorithm • initialization phase: Boston University Slideshow Title Goes Here • choose the first center uniformly at random • choose next center with probability proportional to D 2 (x) • iteration phase: • iterate as in the k-means algorithm until convergence

k-means++ initialization Boston University Slideshow Title Goes Here 3 1 2

k-means++ result Boston University Slideshow Title Goes Here

k-means++ provable guarantee Boston University Slideshow Title Goes Here Theorem: k-means++ is O(logk) approximate in expectation

k-means++ provable guarantee Boston University Slideshow Title Goes Here • approximation guarantee comes just from the first iteration (initialization) • subsequent iterations can only improve cost

k-means++ analysis • consider optimal clustering C * Boston University Slideshow Title Goes Here • assume that k-means++ selects a center from a new optimal cluster • then • k-means++ is 8-approximate in expectation • intuition: if no points from a cluster are picked, then it probably does not contribute much to the overall error • an inductive proof shows that the algorithm is O(logk) approximate

k-means++ proof : first cluster Boston University Slideshow Title Goes Here • fix an optimal clustering C * • first center is selected uniformly at random • bound the total error of the points in the optimal cluster of the first center

k-means++ proof : first cluster Boston University Slideshow Title Goes Here • let A be the first cluster • each point a 0 ∈ A is equally likely to be selected as center ✦ expected error: 1 X X || a − a 0 || 2 E [ φ ( A )] = | A | a 0 ∈ A a ∈ A A || 2 = 2 φ ∗ ( A ) || a − ¯ X = 2 a ∈ A

k-means++ proof : other clusters Boston University Slideshow Title Goes Here • suppose next center is selected from a new cluster in the optimal clustering C * • bound the total error of that cluster

Partitional Clustering Boston University Slideshow Title Goes Here - PowerPoint PPT Presentation

Partitional Clustering Boston University Slideshow Title Goes Here Clustering: David Arthur, Sergei Vassilvitskii. k-means ++: The Advantages of Careful Seeding . In SODA 2007 Thanks A. Gionis and S. Vassilvitskii for the slides What is

Uncertain Centroid based Partitional Clustering of Uncertain Data Francesco Gullo Andrea

Web Information Retrieval Lecture 15 Clustering Todays Topic: Clustering Document

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering CSE Colloquium Dr.

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Lecture 12: Clustering Geoffrey Hinton Clustering We assume that the data was generated from

Clustering Problem Given a set of points, with a

Clustering Lecture 14 David Sontag New York University

http://cs246.stanford.edu High dim. Graph Infinite Machine Apps data data data learning

Clustering ECE6133 Physical Design Automation of VLSI Systems Prof. Sung Kyu Lim School of

Clustering on Graphs: The Markov Cluster Algorithm (MCL) CS 595D Presentation By Kathy Macropol

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Data Mining Techniques: Partitioning Methods: K-Means Cluster Analysis Hierarchical