L ECTURE 26: C LUSTERING Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS446 Introduction to Machine Learning (Spring 2015) University of Illinois at Urbana-Champaign http://courses.engr.illinois.edu/cs446 L ECTURE 26: C LUSTERING Prof. Julia Hockenmaier juliahmr@illinois.edu CS446 Machine Learning 1

Clustering What should a cluster algorithm achieve? – A cluster is a set of entities which are alike. – Entities in different clusters are not alike. What is alikeness?? This depends on the application/task. CS446 Machine Learning 2

Clustering Can we formalize this? A cluster is a set of points such that the distance between any two points in the same cluster is less than the distance between any point in the cluster and any point not in it. The distance metric has to be appropriate for the task in mind. CS446 Machine Learning 3

Distance Measures for Clustering CS446 Machine Learning 4

Distance measures In studying clustering techniques we will assume that we are given a matrix of distances between all pairs of data points: x x x x x 1 2 3 4 m x 1 x 2 x 3 d(x , x ) x i j 4 • • • • • • x m CS446 Machine Learning 5

Distance measures A distance measure d: R d × R d � R is a function that satisfies d( x , y ) ≥ 0 ⇔ x ≠ y and d( x , y ) = 0 ⇔ x = y d is a metric if it also satisfies: – The triangle inequality: d( x , y ) + d( y , z ) ≥ d( x , z ) – Symmetry: d( x , y ) = d( y , x ) CS446 Machine Learning 6

Distance measures Euclidean (L 2 ) d ( x i − y i ) 2 ∑ d ( x , y ) = i = 1 Manhattan (L 1 ) d d ( x , y ) = x - y = ∑ x i − y i i = 1 Infinity (Sup) Distance L ∞ d ( x , y ) = max 1 ≤ i ≤ d x i − y i Note that L ∞ < L 1 < L 2 , but different distances do not induce the same ordering on points. CS446 Machine Learning 7

Distance measures x = (x 1 , x 2 ) y = (x 1 –2, x 2 +4) Euclidean: (4 2 + 2 2 ) 1/2 = 4.47 Manhattan: 4 + 2 = 6 Sup: Max (4,2) = 4 4 2 8

Distance Measures Different distances do not induce the same ordering on points L (a, b) 5 = ∞ 2 2 1/2 L (a, b) (5 ) 5 = + ε = + ε 2 L (c, d) 4 = ∞ 2 2 1/2 4 L (c, d) (4 4 ) 4 2 5 . 66 = + = = 2 5 L (c, d) L (a, b) < ∞ ∞ L (c, d) L (a, b) > 4 2 2 9

Distance measures Clustering is sensitive to the distance measure. Sometimes it is beneficial to use a distance measure that is invariant to transformations that are natural to the problem: – Mahalanobis distance: Shift and scale invariance 10

Mahalanobis Distance ( x - y ) T Σ ( x − y ) d ( x , y ) = Σ is a (symmetric) Covariance Matrix: µ = 1 m ∑ x i , (average of the data) m i = 1 Σ = 1 m ( x − µ )( x − µ ) T , ∑ a matrix of size m × m m i = 1 Translates all the axes to a mean = 0 and variance = 1 (shift and scale invariance) 11

Distance measures Some algorithms require distances between a point x and a set of points A d( x , A) This might be defined e.g. as min/max/avg distance between x and any point in A. Others require distances between two sets of points A, B, d(A, B). This might be defined e.g as min/max/avg distance between any point in A and any point in B. CS446 Machine Learning 12

Clustering Methods CS446 Machine Learning 13

Clustering Methods Do the clusters partition the data? – Hard (yes) vs. soft clustering (no) Do the clusters have structure? – Hierarchical (yes) vs. flat clustering (no) Is the hierarchy induced top-down or bottom-up? – Top-Down: divisive – Bottom-up: agglomerative How do we represent the data points? – As vectors or as vertices in a graph? CS446 Machine Learning 14

Graph-based clustering Each data point is a vertex in an undirected graph. Edge weights correspond to (non-zero) similarities, not distance: 0 ≤ sim(x,y) ≤ 1 Clustering = Graph partitioning! – Graph cuts, minimum spanning tree, etc. 15

Vector-based clustering Each data point is a vector in a vector space. We can define a distance metric in this space. 16

(Hard) clustering Given an unlabeled dataset D = { x 1 ,…, x N }, a distance metric d( x , x’ ) over pairs of points and a clustering algorithm A, return a partition C of D. – Partition: a set of sets C, such that each element of D belongs to exactly one C i 17

Hierarchical Clustering Hierarchical clustering is a nested sequence of partitions Agglomerative: Places each object in its own cluster and gradually merge the atomic clusters into larger and larger clusters. Divisive: Start with all objects in one cluster and subdivide into smaller clusters. {(a) ,(b),(c),(d),(e)} a � b � c � d � e � {(a,b),(c),(d),(e)} {(a,b),(c,d),(e)} {(a,b,c,d),(e)} {(a,b,c,d,e)}

Agglomerative Clustering Assume a distance measure between points d( x 1 , x 2 ) Define a distance measure between clusters D(C1,C2) Algorithm: – Initialization: Put each point in a separate cluster. – At each stage, merge the two closest clusters according to D (merge the two D-closest clusters). Different definitions of D (for the same d) give rise to radically different partitions of the data. 19

Agglomerative Clustering Single Link Clustering Define cluster distance as distance of the closest pair D SL (C1,C2) = min { x 1 ∈ C1,x2 ∈ C2} d(x1,x2) Complete Link Clustering Define cluster distance as distance of the furthest pair D CL (C1,C2) = max { x 1 ∈ C1,x2 ∈ C2} d(x1,x2) Group Average Clustering Define cluster distance as the average distance of all pairs D GA (C1,C2) = avg { x 1 ∈ C1,x2 ∈ C2} d(x1,x2) Error-sum-of squares Clustering (Ward): ESS(C) = Σ x ∈ C ( x - η C ) 2 ( η C is the cluster mean) D ESS (C1,C2) = ESS(C1 ∪ C2) – ESS(C1) – ESS(C2) CS446 Machine Learning 20

Association-Dissociation Given a collection of points, one way to define the goal of a clustering process is to use the following two measures: – A measure of similarity within a group of points – A measure of similarity between different groups Ideally, we would like to define these so that: The within-similarity can be maximized The between-similarity can be minimized at the same time. This turns out to be hard. We often only optimize one of these objectives.

L ECTURE 26: C LUSTERING Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS446 Introduction to Machine Learning (Spring 2015) University of Illinois at Urbana-Champaign http://courses.engr.illinois.edu/cs446 L ECTURE 26: C LUSTERING Prof. Julia Hockenmaier juliahmr@illinois.edu CS446 Machine Learning 1 Clustering

S EMANTIC -B ASED M ULTILINGUAL D OCUMENT C LUSTERING VIA T ENSOR M ODELING Salvatore Romeo 1 ,

P rediction of U nderlying L atent C lasses via K -means and H ierarchical C lustering A lgorithm

Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv Batra Vir irgin inia ia

G RAVITATIONAL C LUSTERING OF THE S ELF -O RGANIZING M AP Nejc Ilc Andrej Dobnikar University of

C LUSTERING S TRATEGY : SWIFT GMM clustering with Sampling for k [ K min , K max ] BIC to

S TANDARD DPA ATTACK 0.6 Distinguisher value 3 # std deviations 0.4 2 0.2 1 0 0

18-759: Wireless Networks L ecture 17: Cellular Peter Steenkiste Departments of Computer Science

18-759: Wireless Networks L ecture 18: Cellular Peter Steenkiste Departments of Computer Science

L ECTURE 8: D YNAMICAL S YSTEMS 7 I NSTRUCTOR : G IANNI A. D I C ARO G EOMETRIES IN THE PHASE SPACE

AAP COVID-19 ECHO: Pediatric Emergency Readiness & Response L ECTURE COVID-19 Testing and

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

Wireless Networks L ecture 18: Wireless LANs 802.11* Peter Steenkiste CS and ECE, Carnegie

U nit 1: I ntroduction to data L ecture 1: D ata collection , observational studies , and

Wireless Networks L ecture 21: Wireless and the Internet Peter Steenkiste CS and ECE, Carnegie

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 1: N EUROIMAGING T ECHNIQUES Ilya Kuzovkin

Wireless Networks L ecture 5: Physical Layer Channel Properties Peter Steenkiste CS and ECE,

Machine Learning Lecture Notes on Clustering (II) 2016-2017 Davide Eynard davide.eynard@usi.ch

Data Mining Clustering Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy

How to Optimize Gower Distance Weights for the k-Medoids Clustering Algorithm to Obtain Mobility

Fast and Accurate Distance Computation from Unaligned Genomes Fabian Kltzl & Bernhard

Localization from Incomplete Noisy Distance Measurements Adel Javanmard and Andrea Montanari

Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Clustering, cont Genome 373 Genomic Informatics Elhanan Borenstein Some slides adapted from

ELEN E6884/COMS 86884 Speech Recognition Lecture 2 Michael Picheny, Ellen Eide, Stanley F. Chen

L ECTURE 26: C LUSTERING Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS446 Introduction to Machine Learning (Spring 2015) University of Illinois at Urbana-Champaign http://courses.engr.illinois.edu/cs446 L ECTURE 26: C LUSTERING Prof. Julia Hockenmaier juliahmr@illinois.edu CS446 Machine Learning 1 Clustering

S EMANTIC -B ASED M ULTILINGUAL D OCUMENT C LUSTERING VIA T ENSOR M ODELING Salvatore Romeo 1 ,

P rediction of U nderlying L atent C lasses via K -means and H ierarchical C lustering A lgorithm

Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv Batra Vir irgin inia ia

G RAVITATIONAL C LUSTERING OF THE S ELF -O RGANIZING M AP Nejc Ilc Andrej Dobnikar University of

C LUSTERING S TRATEGY : SWIFT GMM clustering with Sampling for k [ K min , K max ] BIC to

S TANDARD DPA ATTACK 0.6 Distinguisher value 3 # std deviations 0.4 2 0.2 1 0 0

18-759: Wireless Networks L ecture 17: Cellular Peter Steenkiste Departments of Computer Science

18-759: Wireless Networks L ecture 18: Cellular Peter Steenkiste Departments of Computer Science

L ECTURE 8: D YNAMICAL S YSTEMS 7 I NSTRUCTOR : G IANNI A. D I C ARO G EOMETRIES IN THE PHASE SPACE

AAP COVID-19 ECHO: Pediatric Emergency Readiness &amp; Response L ECTURE COVID-19 Testing and

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

Wireless Networks L ecture 18: Wireless LANs 802.11* Peter Steenkiste CS and ECE, Carnegie

U nit 1: I ntroduction to data L ecture 1: D ata collection , observational studies , and

Wireless Networks L ecture 21: Wireless and the Internet Peter Steenkiste CS and ECE, Carnegie

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 1: N EUROIMAGING T ECHNIQUES Ilya Kuzovkin

Wireless Networks L ecture 5: Physical Layer Channel Properties Peter Steenkiste CS and ECE,

Machine Learning Lecture Notes on Clustering (II) 2016-2017 Davide Eynard davide.eynard@usi.ch

Data Mining Clustering Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy

How to Optimize Gower Distance Weights for the k-Medoids Clustering Algorithm to Obtain Mobility

Fast and Accurate Distance Computation from Unaligned Genomes Fabian Kltzl &amp; Bernhard

Localization from Incomplete Noisy Distance Measurements Adel Javanmard and Andrea Montanari

Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Clustering, cont Genome 373 Genomic Informatics Elhanan Borenstein Some slides adapted from

ELEN E6884/COMS 86884 Speech Recognition Lecture 2 Michael Picheny, Ellen Eide, Stanley F. Chen

AAP COVID-19 ECHO: Pediatric Emergency Readiness & Response L ECTURE COVID-19 Testing and

Fast and Accurate Distance Computation from Unaligned Genomes Fabian Kltzl & Bernhard