MACHINE LEARNING Spectral Clustering 1 ADVANCED MACHINE LEARNING - PowerPoint PPT Presentation

ADVANCED MACHINE LEARNING MACHINE LEARNING Spectral Clustering 1

ADVANCED MACHINE LEARNING Outline of Today’s Lecture • Introduce the principle of spectral clustering • Show extension for other transformations of the space • Multi-dimensional scaling • Laplacian Eigenmaps • Isomaps • Exercise the principle of eigen-decomposition underlying these methods 2

ADVANCED MACHINE LEARNING Non-Linear Manifolds PCA and Kernel PCA belong to a more general class of methods that create non-linear manifolds based on spectral decomposition . (spectral decomposition of matrices is more frequently referred to as an eigenvalue decomposition.) Depending on which matrix we decompose, we get a different set of projections. • PCA decomposes the covariance matrix of the dataset  generates rotations and projections in the original space • kernel PCA decomposes the Gram matrix  generates partitions of the space by regrouping the datapoints (tight clusters with RBF, quadrans for polynomial kernel) 3

ADVANCED MACHINE LEARNING Non-Linear Manifolds • Spectral clustering decomposes the Graph Laplacian matrix: The Graph Laplacian is a matrix representation of a graph. • Eigenvalue decomposition of this matrix determines relationships across datapoints induced by the similarity across datapoints embedded in the graph. • The spectral decomposition of the Graph Laplacian matrix can be used to generate various projections, including scaling of the space, flattening and clustering. 4

ADVANCED MACHINE LEARNING Embed Data in a Graph Original dataset Graph representation of the dataset • Build a similarity graph • Each vertex on the graph is a datapoint 5

ADVANCED MACHINE LEARNING Measure Distances in Graph   0.9.....0.8. .. 0.2 ... 0.2     S .....      0.2.....0.2........0.7....0.9 Construct the similarity matrix S to denote whether points are close or far away to weight the edges of the graph: 6

ADVANCED MACHINE LEARNING Disconnected Graphs   1.........1. .. .....0 .......0     S .....  Disconnected Graph (binary entries):   Two data-points are connected (S=1) if:   0.........0..........1.........1 a) the similarity between them is higher than a threshold; b) or if they are k-nearest neighbors (according to the similarity metric) 7

ADVANCED MACHINE LEARNING Connected Components in a Graph   1.........1. .. .....0 .......0     S .....      0.........0..........1.........1 If all blue connections have value zero in the similarity matrix, then the graph has 2 connected components (i.e. two disconnected blocks of datapoints; datapoints within a block are connected). 8

ADVANCED MACHINE LEARNING Connected Components in a Graph • Next, we will see a method to discover the number of connected components. • Knowing this number allows to identify clusters according to the similarity matrix chosen. 9

ADVANCED MACHINE LEARNING Graph Laplacian   1 0 0 1   0 1 1 0     Given a similarity matrix S (4x4 example)  0 1 1 0     1 0 0 1 Construct the diagonal matrix composed of the sum on each line of : D S    S ................0     1 i S 1 0 0 -1   1 i   i   i      0 S ........0   0 S 1 -1 0   2 i 2 i         i i D L D S        0 -1 S 1 0 .... S 3 i   3 i   i i         1 0 0 S 1   0.................. 4 i S     4 i i   i   and then, build the Graph Laplacian matrix : L D S  L is positive semi-definite spectral decomposition possible 10

ADVANCED MACHINE LEARNING Graph Laplacian Eigenvalue decomposition of the Graph Laplacian matrix:   T L U U All eigenvalues of L are positive and the smallest eigenvalue of L is zero:  If we order the eigenvalues by increasin g order:        0 .... . 1 2 M Theorem (see annexes): If the graph has connected components, then the k  eigenvalue =0 has multiplicity . k  The multiplicity of the eigenvalue 0 determines the number of connected components in a graph.  The associated eigenvectors identify these connected components. 11

ADVANCED MACHINE LEARNING Spectral Clustering Let us do exercise I 12

ADVANCED MACHINE LEARNING Spectral Clustering: Exercise I Consider a two-dimensional dataset composed of two points. a) Build a similarity matrix using a threshold function on Euclidean (norm-2) distance. The metric outputs 1 if the points are close enough according to a threshold and zero otherwise. Consider two cases: when the two datapoints are close or far. b) For each of the two cases above, build the Laplacian matrix, perform an eigenvalue decomposition and discuss the eigenvalues. 13

ADVANCED MACHINE LEARNING Spectral Clustering  The multiplicity of the eigenvalue 0 determines the number of connected components in a graph.  The associated eigenvectors identify these connected components.  Identifying the number of clusters using the eigenvalue decomposition of the Laplacian matrix is then immediate (using above) when the similarity matrix is sparse.  What happens when the similarity matrix is full? 14

ADVANCED MACHINE LEARNING Spectral Clustering   1.0.....0.8. .. 0.2 ... 0.2       N N S ..... Similarity map : S      0.2.....0.2........0.7....1.0 Assume is composed of continuous values; each entry S is computed using the Gaussian kernel (Gram matrix) 2  i j x x      2 i j S x x , e 2 15

ADVANCED MACHINE LEARNING Spectral Clustering: exercise II Consider a two-dimensional dataset composed of two points (assume again two cases – points are close to one another or are far apart). a) Build a similarity matrix using a RBF kernel. Build the Laplacian matrix, perform an eigenvalue decomposition and discuss the eigenvalues and eigenvectors, for each of the two cases above. b) Repeat (a) using a homogeneous polynomial kernel with p=2. 16

ADVANCED MACHINE LEARNING Spectral Clustering When the similarity matrix is not sparse, the eigenvalue decomposition of the Laplacian matrix, yields rarely a solution with more than one eigenvalue zero. We then have one eigenvector with one eigenvalue zero. All other eigenvalues are positive.  The first eigenvalue is then still zero but with multiplicity 1 only (fully connected graph)! However, some of the other positive eigenvalues may be very close to 0. Idea: the smallest eigenvalues (close to zero) provide also information on the partitioning of the graph (see solution exercise II) 17

ADVANCED MACHINE LEARNING Spectral Clustering Algorithm in the general case ( not binary) S   1) Build the Laplacian matrix : L D S   T 2) Do the eigenvalue decomposition of the Laplacian matrix: L U U 3) Order the eigenvalues by increasing order:      ..   0 .. 1 2 M   4) Apply a threshold on the eigenvalues, such that small 0 5) Determine the number of clusters by looking at the multiplicity   of 0 after step 4 This provides an indication of the number of clusters K. We do not yet know how the points are partitioned in the clusters! Let us see now how we can infer the clusters from the eigenvalue decomposition. 18

ADVANCED MACHINE LEARNING Spectral Clustering Eigenvectors of the Laplacian matrix in : U   1 2 M e e ........ e Construct an embedding of each of the 1 1 1     1 e 1 i     i i e M datapoints through . x y 2 .     .        i U y .  This amounts to a non-linear mapping   .         .  M M     i i X x Y y .       i 1 i 1 M  e    i 1 2 M   e e ........ e M 1 1 i x 19

ADVANCED MACHINE LEARNING Spectral Clustering 2 y 3 y 1 4 y y     1 e 1 e  1  2   Construct an embedding of each of the      .  1 1  .  e e 3 4           i i 1   M datapoints through . x y y . 2 y . . .                 .   3 4 . y . y .       M  e  M     . .  e  1 Points well grouped in original space 2     M M     e e 3 4 i generate grouped images . y  Reduce dimensionality by picking   i K M eigenvectors , e i 1... , K  i on which the projections of , i y 1... M , are well grouped. 20

MACHINE LEARNING Spectral Clustering 1 ADVANCED MACHINE LEARNING - PowerPoint PPT Presentation

ADVANCED MACHINE LEARNING MACHINE LEARNING Spectral Clustering 1 ADVANCED MACHINE LEARNING Outline of Todays Lecture Introduce the principle of spectral clustering Show extension for other transformations of the space

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Poster #190 1 Spectral Clustering of Signed Graphs Poster #190 Our Goal: Extend Spectral

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Spectral Clustering Lecture 16 David Sontag New York

kernel CCA, kernel Kmeans Spectral Clustering 1 MACHINE LEARNING 2012 Change in timetable:

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Spectral Clustering Aarti Singh Machine Learning 10-701/15-781 Nov 22, 2010 Slides Courtesy:

Machine Learning 2 DS 4420 - Spring 2018 From clustering to EM Byron C. Wallace Clustering

Avoiding artifacts in spectral white matter fiber clustering and embedding Demian Wassermann

Clustering with k-means Introduction to Machine Learning Clustering, what? Cluster :

Guarantees for Spectral Clustering with Fairness Constraints Matthus Kleindessner, Samira Samadi

Dr. Damien Fay. SRG group, Computer Lab, University of Cambridge. A graph metric: motivation.

Off- -The The- -Shelf Classifiers Shelf Classifiers Off A method that can be applied directly

Introduction to Dialectometry II Wilbert Heeringa German Academic Exchange Service DAAD

Multidimensional scaling and flat split systems Monika Balvoi ut e joint work with

- AP & LLE Xiangliang Zhang King Abdullah University of Science and Technology

Statistics and learning Multivariate statistics 2 and clustering Emmanuel Rachelson and Matthieu

Recovering dialect geography from an unaligned comparable corpus Yves Scherrer LATL, Department

Partial order embedding with multiple kernels Brian McFee and Gert Lanckriet University of

MACHINE LEARNING Spectral Clustering 1 ADVANCED MACHINE LEARNING - PowerPoint PPT Presentation

ADVANCED MACHINE LEARNING MACHINE LEARNING Spectral Clustering 1 ADVANCED MACHINE LEARNING Outline of Todays Lecture Introduce the principle of spectral clustering Show extension for other transformations of the space

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Poster #190 1 Spectral Clustering of Signed Graphs Poster #190 Our Goal: Extend Spectral

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Spectral Clustering Lecture 16 David Sontag New York

kernel CCA, kernel Kmeans Spectral Clustering 1 MACHINE LEARNING 2012 Change in timetable:

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Spectral Clustering Aarti Singh Machine Learning 10-701/15-781 Nov 22, 2010 Slides Courtesy:

Machine Learning 2 DS 4420 - Spring 2018 From clustering to EM Byron C. Wallace Clustering

Avoiding artifacts in spectral white matter fiber clustering and embedding Demian Wassermann

Clustering with k-means Introduction to Machine Learning Clustering, what? Cluster :

Guarantees for Spectral Clustering with Fairness Constraints Matthus Kleindessner, Samira Samadi

Dr. Damien Fay. SRG group, Computer Lab, University of Cambridge. A graph metric: motivation.

Off- -The The- -Shelf Classifiers Shelf Classifiers Off A method that can be applied directly

Introduction to Dialectometry II Wilbert Heeringa German Academic Exchange Service DAAD

Multidimensional scaling and flat split systems Monika Balvoi ut e joint work with

- AP &amp; LLE Xiangliang Zhang King Abdullah University of Science and Technology

Statistics and learning Multivariate statistics 2 and clustering Emmanuel Rachelson and Matthieu

Recovering dialect geography from an unaligned comparable corpus Yves Scherrer LATL, Department

Partial order embedding with multiple kernels Brian McFee and Gert Lanckriet University of

- AP & LLE Xiangliang Zhang King Abdullah University of Science and Technology