Clustering Duen Horng (Polo) Chau Georgia Tech Partly based on - PowerPoint PPT Presentation

Apr 10, 2023 •276 likes •418 views

CSE 6242 / CX 4242 Clustering Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le Song Clustering in Google Image Search How would you build this?

CSE 6242 / CX 4242 Clustering Duen Horng (Polo) Chau   Georgia Tech Partly based on materials by   Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le Song
Clustering in Google Image Search How would you build this? Video : http://youtu.be/WosBs0382SE 2 http://googlesystem.blogspot.com/2011/05/google-image-search-clustering.html
Clustering in Google Search How would you build this? 3
Clustering The most common type of unsupervised learning High-level idea: group similar things together “ Unsupervised ” because clustering model is learned without any labeled examples   (e.g., here are some pictures of dog, group them by their breed) 4
Applications of Clustering • google news • IMDB (movie sites) • anomaly detection • detecting population subgroups (community detection) • as in healthcare • Twitter hashtags • text-based clustering • (Age detection) 5
  Clustering techniques you’ve got to know K-means Hierarchical Clustering (DBSCAN)   6
K-means (the “simplest” technique) Demo: http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html Summary • We tell K-means the value of k (#clusters we want) • Randomly initialize the k cluster “means” (“centroids”) • Assign each item to the the cluster whose mean the item is closest to (so, we need a similarity function ) • Update the new “means” of all k clusters. • If all items’ assignments do not change, stop. 7
K-means What’s the catch? http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html Need to decide k ourselves . • How to find the optimal k? Only locally optimal (vs global) • Different initialization gives different clusters • How to “fix” this? • “Bad” starting points can cause algorithm to converge slowly • Can work for relatively large dataset • Time complexity O(n log n) 8
Hierarchical clustering http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletH.html High-level idea: build a tree (hierarchy) of clusters Agglomerative (bottom-up) • Start with individual items • Then iteratively group into larger clusters Divisive (top-down) • Start with all items as one cluster • Then iteratively divide into smaller clusters
Ways to calculate distances between two clusters Single linkage • minimum of distance between clusters • similarity of two clusters = similarity of the clusters’ most similar members Complete linkage • maximum of distance between clusters • similarity of two clusters = similarity of the clusters’ most dissimilar members Average linkage • distance between cluster centers 10
Hierarchical clustering for large datasets? • OK for small datasets (e.g., <10K items) • Time complexity between O(n^2) to O(n^3) where n is the number of data items • Not good for millions of items or more • But great for understanding concept of clustering   11
Visualizing Clusters https://github.com/mbostock/d3/wiki/Hierarchy-Layout 12
Visualizing Clusters http://www.cc.gatech.edu/~dchau/papers/11-chi-apolo.pdf 13
Visualizing Clusters 14

Recommend

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns in data, or grouping similar groups of data-points together into clusters . Clustering algorithms for numeric data: Lloyds K-means, EM

688 views • 19 slides

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

LUDWIG- MAXIMILIANS- INSTITUTE DATABASE UNIVERSITT FOR SYSTEMS MNCHEN MNCHEN INFORMATICS INFORMATICS GROUP GROUP Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview

513 views • 9 slides

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering Processing time stamped data to produce Processing time stamped data to produce a sequence of clustering. Each clustering should be similar to

431 views • 32 slides

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Clustering What is Clustering? Types of Data in Cluster Analysis Clustering A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 1 2 What is Clustering? What is Clustering? Typical

647 views • 18 slides

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for Group Key Management Key Management Key Management Key Management Hamida SEBA Graphs, Algorithms and Applications

418 views • 15 slides

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-Means Density Based Clustering, e.g. DBScan Grid Based Clustering Compendium slides for Guide to Intelligent

1.32k views • 82 slides

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein A quick review The clustering problem: partition genes into distinct sets with high homogeneity and high separation Many

768 views • 43 slides

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering What is Clustering? Types of Data in Cluster Analysis Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods Hierarchical Methods 1 2 What is Clustering?

617 views • 24 slides

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein A quick review The clustering problem: partition genes into distinct sets with high homogeneity

772 views • 40 slides

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

CSCE 478/878 Lecture 8: Clustering CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen Scott k -Means Clustering Hierarchical Clustering sscott@cse.unl.edu 1 / 19 Introduction CSCE 478/878 If

200 views • 19 slides

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering and Dimensionality Reduction Preview Clustering K -means clustering Mixture models Hierarchical clustering Dimensionality reduction Principal component analysis Multidimensional scaling Isomap

1.61k views • 17 slides

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means clustering Hierarchical clustering Incremental clustering Probability-based clustering Self-Organising Maps Classification vs. Clustering

1k views • 67 slides

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering? Aykut Erdem May 2016 Hacettepe University Last time K-Means An iterative clustering algorithm - Initialize: Pick K random points as cluster

933 views • 64 slides

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin Motivation Example Clustering cannot be analyzed without specifying what it will be used for! be used for! Example Cluster then pack

293 views • 28 slides

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K- means clustering What is clustering?

1.32k views • 66 slides

Graph Clustering Why graph clustering is useful? Distance matrices are graphs as useful as

Graph Clustering Why graph clustering is useful? Distance matrices are graphs as useful as any other clustering Identification of communities in social networks Webpage clustering for better data management of web data Outline

607 views • 41 slides

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning no labels/output, only x/input Clustering Group similar points together Machine learning taxonomy Supervised Semi-Supervised Unsupervised

366 views • 15 slides

Hierarchical Clustering Lecture 15 David Sontag New York

Hierarchical Clustering Lecture 15 David Sontag New York University Agglomerative Clustering Agglomerative clustering: First merge very similar instances Incrementally build larger clusters out

465 views • 7 slides

Author Profiling using Complementary Second Order Attributes and Stylometric Features

Introduction Proposed Method Experimental Results Conclusions and Future Work Author Profiling using Complementary Second Order Attributes and Stylometric Features Konstantinos Bougiatiotis* Anastasia Krithara Institute of Information and

906 views • 52 slides

Performance of b jet identification at s = 13 TeV with the ATLAS detector at CERN By Wasikul

FERMILAB-SLIDES-18-081-PPD Performance of b jet identification at s = 13 TeV with the ATLAS detector at CERN By Wasikul Islam Department of Physics, Oklahoma State University, USA & Argonne National Laboratory, USA New New Per

393 views • 18 slides

Machine Learning Lecture Notes on Clustering (II) 2017-2018 Davide Eynard davide.eynard@usi.ch

Machine Learning Lecture Notes on Clustering (II) 2017-2018 Davide Eynard davide.eynard@usi.ch Institute of Computational Science Universit` a della Svizzera italiana p. 1/39 Todays Outline K-Means limits K-Means extensions:

701 views • 39 slides

Projects Chandrasekar, Arun Kumar, Group 17 Nearly all group have submitted a proposal

Projects Chandrasekar, Arun Kumar, Group 17 Nearly all group have submitted a proposal May 21: Each person gives one slide, 15 min/group. First principles vs Data driven Small data Big data to train Data High reliance on domain

823 views • 44 slides

LECTURE 7 Clustering The k-means algorithm Hierarchical Clustering The DBSCAN algorithm

DATA MINING LECTURE 7 Clustering The k-means algorithm Hierarchical Clustering The DBSCAN algorithm Clustering Evaluation What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or

1.39k views • 110 slides

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018 Clustering Feature 2 Feature 1 Clustering cluster #1 Feature 2 cluster #2 Feature 1 Clustering Why should we look for clusters? cluster #1 Feature

815 views • 53 slides