Kernel K -Means Low Rank Approximation for Spectral Clustering and - PowerPoint PPT Presentation

Kernel K -Means Low Rank Approximation for Spectral Clustering and Diffusion Maps IDEAL 2014 Salamanca – Spain ´ Carlos M. Ala´ ız Angela Fern´ andez Yvonne Gala Jos´ e R. Dorronsoro Departamento de Ingenier´ ıa Inform´ atica Universidad Aut´ onoma de Madrid September 10, 2014 UNIVERSIDAD AUTONOMA

Contents UNIVERSIDAD AUTONOMA 1 Introduction 2 SC, DM and Nystr¨ om 3 Kernel KASP 4 Numerical Experiments 5 Conclusions

Contents: Introduction Introduction UNIVERSIDAD AUTONOMA 1 Introduction

Introduction Introduction UNIVERSIDAD AUTONOMA Spectral Clustering (SC) and Diffusion Maps (DM) are two of the leading methods for advanced clustering and dimensionality reduction. They require the eigenanalysis of a matrix with the same dimensionality N as the sample size. Complexity O ( N 3 ). It is difficult to compute the SC or DM projections of new patterns, as these projections are eigenvector components. The Nystr¨ om approach allows to extend an eigenanalysis to new points. It can be used for new patterns. To deal with costs, a common approach is to subsample the original patterns retaining a small subset that is used to define a first embedding, which is then extended to the entire sample. A proper subsampling can be critical for the performance of this approach. C. M. Ala´ ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 1 / 12

Contents: SC, DM and Nystr¨ om SC, DM and Nystr¨ om UNIVERSIDAD AUTONOMA 2 SC, DM and Nystr¨ om Spectral Clustering Diffusion Maps Nystr¨ om Extension

Spectral Clustering SC, DM and Nystr¨ om UNIVERSIDAD AUTONOMA Spectral Clustering Spectral Clustering SC is a manifold learning method for clustering. Scheme: 1 An appropriate similarity matrix W is built over the sample S = { x 1 , . . . , x N } . This defines a weighted graph G . 2 The random walk Laplacian is defined as L rw = I − D − 1 W = I − P . D is the diagonal degree matrix, D ii = d i = � j w ij . 3 K -means is applied over the spectral projections v ( x i ) = ( v 1 i , . . . , v m i ) ⊤ of a sample point x i . { v p } N − 1 p =0 are the right eigenvectors of L rw (or P ). m is the chosen projection dimension. i ) ⊤ can also be used for dimensionality re- The SC coordinates ( v 1 i , . . . , v m duction purposes. C. M. Ala´ ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 2 / 12

Diffusion Maps SC, DM and Nystr¨ om UNIVERSIDAD AUTONOMA Diffusion Maps Diffusion Maps Diffusion Maps add some improvements to SC. Scheme: 1 W is normalized to reflect the role of the sample density. In particular, w ( α ) = w ij / d α i d α j for 0 ≤ α ≤ 1. ij If α = 0, W α is the previously defined W . If α = 1, the effect of the density is compensated. 2 A Markov probability matrix is defined on the graph G as P α = ( D α ) − 1 W α . 3 The diffusion distance for t steps over the graph G is given by D t ( x i , x j ) 2 = � N − 1 j ) 2 , with v k and λ k the eigenvectors k =1 λ 2 t k ( v k i − v k and eigenvalues of P α . 4 The embedding is given by Ψ t ( x i ) = ( λ t N − 1 v N − 1 1 v 1 i , . . . , λ t ) ⊤ ; the Eu- i clidean distance between Ψ t ( x i ) and Ψ t ( x j ) is precisely D t ( x i , x j ). DM lends itself to dimensionality reduction and clustering, selecting the first m coordinates and using K -means on the Ψ projections. C. M. Ala´ ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 3 / 12

Nystr¨ om Extension SC, DM and Nystr¨ om UNIVERSIDAD AUTONOMA Nystr¨ om Extension SC and DM share two drawbacks. The cost of the eigenanalysis they require. The difficulty of computing the SC or DM projections of new, unseen patterns. Both can be dealt with using Nystr¨ om extension . For a kernel a ( x i , x j ) and its kernel matrix A , with AU = U Λ its eigende- composition, the Nystr¨ om extension to a new pattern x is the approxima- � N 1 u k ( x ) to the true u k ( x ) given by ˜ u k ( x ) = j =1 a ( x , x j ) u k tion ˜ j . λ k This approach can also be applied to the asymmetric matrix P , so its � N v k ( x ) = 1 j =1 P ( x , x j ) v k eigenvectors can be extended as ˜ j . λ k Therefore, an embedding can be built using just a subsample and then it can be extended to new points using Nystr¨ om. C. M. Ala´ ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 4 / 12

Reconstruction Error SC, DM and Nystr¨ om UNIVERSIDAD AUTONOMA Nystr¨ om Extension In order to compare different subsamples, some quality measure is needed. Let W and P be structured as � ˜ � ˜ B ⊤ � B ′ � W P , P = D − 1 W = W = P , B C B P C P where ˜ W is the K × K similarity of a K pattern subsample ˜ S . Considering only the subsample of the first K patterns, the eigenanalysis U ⊤ can be used to approximate that of W using Nystr¨ W = ˜ ˜ U ˜ Λ ˜ om, with � ˜ ˜ � � B ⊤ � U W U ′ = , W ′ = U ′ ˜ Λ U ′⊤ = , B ˜ U ˜ B ˜ Λ − 1 W − 1 B ⊤ B � ˜ B ′ � P P ′ = D − 1 W ′ = P . B P ˜ P − 1 B ′ B P P A possible measure to compare different ways of selecting ˜ S is the recon- om approximation P ′ , struction error between the real P and the Nystr¨ d F ( P , P ′ ) = � P − P ′ � F = � C P − B P ˜ P − 1 B ′ P � F . C. M. Ala´ ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 5 / 12

Contents: Kernel KASP Kernel KASP UNIVERSIDAD AUTONOMA 3 Kernel KASP KASP KKASP

K -Means and KASP Kernel KASP UNIVERSIDAD AUTONOMA KASP K -Means Scheme: 1 K initial centroids are chosen, { C 0 k } K k =1 . 2 Sample patterns x p are associated to their nearest centroid, giving a first set of clusters {C 0 k } K k =1 , with x p ∈ C 0 k if k = arg min ℓ � x p − C 0 ℓ � . 3 The new centroids C 1 k are the means of the C 0 k which are used to define a new set of clusters C 1 k . 4 This is repeated until no changes are made. This algorithm progressively minimize the within cluster sum of squares � K k � x p − C i k � 2 . � k =1 x p ∈C i K -Means-based Approximate Spectral Clustering (KASP) It consists in using standard K -means to build a set of representative centroids over which spectral clustering is done. In order to compute d F ( P , P ′ ), each centroid is approximated by its nearest pattern, using these pseudo-centroids as the subsample. C. M. Ala´ ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 6 / 12

Kernel K -Means and Kernel KASP Kernel KASP UNIVERSIDAD AUTONOMA KKASP Kernel K -Means K -means can be enhanced in a kernel setting replacing the sample patterns x by non linear extensions Φ( x ). k � 2 If Φ corresponds to a reproducing kernel K , the distances � Φ( x p ) − C i can be computed without working explicitly with Φ( x ): 1 2 k � 2 = K ( x p , x p ) + � Φ( x p ) − C i � � K ( x q , x r ) − K ( x p , x q ) . |C i k | 2 |C i k | x q , x r ∈C i x q ∈C i k k Thus the previous Euclidean K -means procedure extends straightforwardly to a kernel setting. Our Proposal: Kernel KASP (kKASP) Similarly to the KASP approach, but based on kernel K -means. The centroids are not available explicitly, so they are substituted by the pseudo-centroids (with respect to the kernel). C. M. Ala´ ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 7 / 12

KKASP Algorithm Kernel KASP UNIVERSIDAD AUTONOMA KKASP Algorithm Require: S = ( x 1 , . . . , x N ); K , the subsample size; S K = { z 1 , . . . , z K } ; 1: Apply kernel K -means on S and select K pseudo-centroids ˜ 2: Perform the eigenanalysis of the matrix P K associated to ˜ S K ; om extensions ˜ V K ; 3: Compute Nystr¨ V K and clustering; 4: If desired, perform dimensionality reduction on the ˜ The complexity analysis of the kKASP approach is easy: Kernel K -means: O ( KNI ), with I the number of iterations, plus the cost O ( N 2 ) of pre-computing the similarity matrix. Eigenanalysis of P : O ( K 3 ). Nystr¨ om extensions: O ( KN ). A DM over the entire sample would require the eigenanalysis of the com- plete matrix: O ( N 3 ). C. M. Ala´ ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 8 / 12

Contents: Numerical Experiments Numerical Experiments UNIVERSIDAD AUTONOMA 4 Numerical Experiments Experimental Framework Results

Framework (I) Numerical Experiments UNIVERSIDAD AUTONOMA Experimental Framework The similarity matrix W is defined with a Gaussian kernel with width parameter σ as the 10% percentile of all the distances. The distance d F ( P , P ′ ) is used as a quality measure, where P = D − 1 W is the transition probability matrix of SC. Models: S r : random selection. S k : KASP selection. S kk 1 : kKASP selection using kernel parameter σ the percentile 1%. It is more local, producing thus more clusters. S kk 1 : kKASP selection using kernel parameter σ the percentile 10%. The kernel matrix is the similarity matrix W . Sizes: 10, 50, 100, 200, 300, 400, 500, 750 and 1 , 000. For S r and S k these are the final sizes but, for S kk 1 and S kk 10 , kernel K – means can collapse some of the clusters giving a smaller subsample. C. M. Ala´ ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 9 / 12

Kernel K -Means Low Rank Approximation for Spectral Clustering and - PowerPoint PPT Presentation

Kernel K -Means Low Rank Approximation for Spectral Clustering and Diffusion Maps IDEAL 2014 Salamanca Spain Carlos M. Ala z Angela Fern andez Yvonne Gala Jos e R. Dorronsoro Departamento de Ingenier a Inform atica

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

National Charter Schools Conference 2018 Session Proposal Prep Webinar January 10, 2018 Hosted

Theory of Change approach. One more for the road? ToC

Trust Your Students Cathy N. Davidson @CathyNDavidson Senior Fellow, Andrew W. Mellon Foundation;

T HE 2018 I NFORMATION E XCHANGE Collaboration - Innovation - Transformation May 17-19, 2018

Modelling as an innovation tool Modelling as an innovation tool Digital methods for better

QI TALK TIME Building an Irish Network of Quality Improvers Liberating Structures: Practical

Dr Ren Castro Minister

Reduced Loop Quantum Gravity with Scalar fields Jurekfest Warsaw University 16-20.09.2019