Kernel K -Means Low Rank Approximation for Spectral Clustering and - - PowerPoint PPT Presentation

kernel k means low rank approximation for spectral
SMART_READER_LITE
LIVE PREVIEW

Kernel K -Means Low Rank Approximation for Spectral Clustering and - - PowerPoint PPT Presentation

Kernel K -Means Low Rank Approximation for Spectral Clustering and Diffusion Maps IDEAL 2014 Salamanca Spain Carlos M. Ala z Angela Fern andez Yvonne Gala Jos e R. Dorronsoro Departamento de Ingenier a Inform atica


slide-1
SLIDE 1

Kernel K-Means Low Rank Approximation for Spectral Clustering and Diffusion Maps

IDEAL 2014 Salamanca – Spain

Carlos M. Ala´ ız ´ Angela Fern´ andez Yvonne Gala Jos´ e R. Dorronsoro

Departamento de Ingenier´ ıa Inform´ atica Universidad Aut´

  • noma de Madrid

September 10, 2014

UNIVERSIDAD AUTONOMA

slide-2
SLIDE 2

UNIVERSIDAD AUTONOMA

Contents

1 Introduction 2 SC, DM and Nystr¨

  • m

3 Kernel KASP 4 Numerical Experiments 5 Conclusions

slide-3
SLIDE 3

UNIVERSIDAD AUTONOMA

Contents: Introduction

Introduction

1 Introduction

slide-4
SLIDE 4

UNIVERSIDAD AUTONOMA

Introduction

Introduction

Spectral Clustering (SC) and Diffusion Maps (DM) are two of the leading methods for advanced clustering and dimensionality reduction. They require the eigenanalysis of a matrix with the same dimensionality N as the sample size. Complexity O(N 3). It is difficult to compute the SC or DM projections of new patterns, as these projections are eigenvector components. The Nystr¨

  • m approach allows to extend an eigenanalysis to new points.

It can be used for new patterns. To deal with costs, a common approach is to subsample the original patterns retaining a small subset that is used to define a first embed- ding, which is then extended to the entire sample. A proper subsampling can be critical for the performance of this approach.

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 1 / 12

slide-5
SLIDE 5

UNIVERSIDAD AUTONOMA

Contents: SC, DM and Nystr¨

  • m

SC, DM and Nystr¨

  • m

2 SC, DM and Nystr¨

  • m

Spectral Clustering Diffusion Maps Nystr¨

  • m Extension
slide-6
SLIDE 6

UNIVERSIDAD AUTONOMA

Spectral Clustering

SC, DM and Nystr¨

  • m

Spectral Clustering

Spectral Clustering SC is a manifold learning method for clustering. Scheme:

1 An appropriate similarity matrix W is built over the sample S =

{x1, . . . , xN}. This defines a weighted graph G.

2 The random walk Laplacian is defined as Lrw = I − D−1W = I − P.

D is the diagonal degree matrix, Dii = di =

j wij.

3 K-means is applied over the spectral projections v(xi) = (v1

i , . . . , vm i )⊤

  • f a sample point xi.

{vp}N−1

p=0 are the right eigenvectors of Lrw (or P).

m is the chosen projection dimension. The SC coordinates (v1

i , . . . , vm i )⊤ can also be used for dimensionality re-

duction purposes.

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 2 / 12

slide-7
SLIDE 7

UNIVERSIDAD AUTONOMA

Diffusion Maps

SC, DM and Nystr¨

  • m

Diffusion Maps

Diffusion Maps Diffusion Maps add some improvements to SC. Scheme:

1 W is normalized to reflect the role of the sample density. In particular,

w(α)

ij

= wij/dα

i dα j for 0 ≤ α ≤ 1.

If α = 0, W α is the previously defined W . If α = 1, the effect of the density is compensated.

2 A Markov probability matrix is defined on the graph G as Pα =

(Dα)−1W α.

3 The diffusion distance for t steps over the graph G is given by

Dt(xi, xj)2 = N−1

k=1 λ2t k (vk i − vk j )2, with vk and λk the eigenvectors

and eigenvalues of Pα.

4 The embedding is given by Ψt(xi) = (λt

1v1 i , . . . , λt N−1vN−1 i

)⊤; the Eu- clidean distance between Ψt(xi) and Ψt(xj) is precisely Dt(xi, xj). DM lends itself to dimensionality reduction and clustering, selecting the first m coordinates and using K-means on the Ψ projections.

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 3 / 12

slide-8
SLIDE 8

UNIVERSIDAD AUTONOMA

Nystr¨

  • m Extension

SC, DM and Nystr¨

  • m

Nystr¨

  • m Extension

SC and DM share two drawbacks. The cost of the eigenanalysis they require. The difficulty of computing the SC or DM projections of new, unseen patterns. Both can be dealt with using Nystr¨

  • m extension.

For a kernel a(xi, xj) and its kernel matrix A, with AU = UΛ its eigende- composition, the Nystr¨

  • m extension to a new pattern x is the approxima-

tion ˜ uk(x) to the true uk(x) given by ˜ uk(x) =

1 λk

N

j=1 a(x, xj)uk j .

This approach can also be applied to the asymmetric matrix P, so its eigenvectors can be extended as ˜ vk(x) =

1 λk

N

j=1 P(x, xj)vk j .

Therefore, an embedding can be built using just a subsample and then it can be extended to new points using Nystr¨

  • m.
  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 4 / 12

slide-9
SLIDE 9

UNIVERSIDAD AUTONOMA

Reconstruction Error

SC, DM and Nystr¨

  • m

Nystr¨

  • m Extension

In order to compare different subsamples, some quality measure is needed. Let W and P be structured as W = ˜ W B⊤ B C

  • , P = D−1W =

˜ P B′

P

BP CP

  • ,

where ˜ W is the K × K similarity of a K pattern subsample ˜ S. Considering only the subsample of the first K patterns, the eigenanalysis ˜ W = ˜ U ˜ Λ ˜ U ⊤ can be used to approximate that of W using Nystr¨

  • m, with

U ′ =

  • ˜

U B ˜ U ˜ Λ−1

  • , W ′ = U ′˜

ΛU ′⊤ = ˜ W B⊤ B B ˜ W −1B⊤

  • ,

P′ = D−1W ′ = ˜ P B′

P

BP BP ˜ P−1B′

P

  • .

A possible measure to compare different ways of selecting ˜ S is the recon- struction error between the real P and the Nystr¨

  • m approximation P′,

dF(P, P′) = P − P′F = CP − BP ˜ P−1B′

PF.

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 5 / 12

slide-10
SLIDE 10

UNIVERSIDAD AUTONOMA

Contents: Kernel KASP

Kernel KASP

3 Kernel KASP

KASP KKASP

slide-11
SLIDE 11

UNIVERSIDAD AUTONOMA

K-Means and KASP

Kernel KASP KASP

K-Means Scheme:

1 K initial centroids are chosen, {C 0

k }K k=1.

2 Sample patterns xp are associated to their nearest centroid, giving a

first set of clusters {C0

k}K k=1, with xp ∈ C0 k if k = arg minℓ xp − C 0 ℓ .

3 The new centroids C 1

k are the means of the C0 k which are used to define

a new set of clusters C1

k.

4 This is repeated until no changes are made.

This algorithm progressively minimize the within cluster sum of squares K

k=1

  • xp∈Ci

k xp − C i

k2.

K-Means-based Approximate Spectral Clustering (KASP) It consists in using standard K-means to build a set of representative cen- troids over which spectral clustering is done. In order to compute dF(P, P′), each centroid is approximated by its nearest pattern, using these pseudo-centroids as the subsample.

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 6 / 12

slide-12
SLIDE 12

UNIVERSIDAD AUTONOMA

Kernel K-Means and Kernel KASP

Kernel KASP KKASP

Kernel K-Means K-means can be enhanced in a kernel setting replacing the sample patterns x by non linear extensions Φ(x). If Φ corresponds to a reproducing kernel K, the distances Φ(xp) − C i

k2

can be computed without working explicitly with Φ(x): Φ(xp) − C i

k2 = K(xp, xp) +

1 |Ci

k|2

  • xq,xr∈Ci

k

K(xq, xr) − 2 |Ci

k|

  • xq∈Ci

k

K(xp, xq). Thus the previous Euclidean K-means procedure extends straightforwardly to a kernel setting. Our Proposal: Kernel KASP (kKASP) Similarly to the KASP approach, but based on kernel K-means. The centroids are not available explicitly, so they are substituted by the pseudo-centroids (with respect to the kernel).

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 7 / 12

slide-13
SLIDE 13

UNIVERSIDAD AUTONOMA

KKASP Algorithm

Kernel KASP KKASP

Algorithm

Require: S = (x1, . . . , xN); K, the subsample size; 1: Apply kernel K-means on S and select K pseudo-centroids ˜ SK = {z1, . . . , zK}; 2: Perform the eigenanalysis of the matrix PK associated to ˜ SK; 3: Compute Nystr¨

  • m extensions ˜

V K; 4: If desired, perform dimensionality reduction on the ˜ V K and clustering;

The complexity analysis of the kKASP approach is easy: Kernel K-means: O(KNI), with I the number of iterations, plus the cost O(N 2) of pre-computing the similarity matrix. Eigenanalysis of P: O(K 3). Nystr¨

  • m extensions: O(KN).

A DM over the entire sample would require the eigenanalysis of the com- plete matrix: O(N 3).

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 8 / 12

slide-14
SLIDE 14

UNIVERSIDAD AUTONOMA

Contents: Numerical Experiments

Numerical Experiments

4 Numerical Experiments

Experimental Framework Results

slide-15
SLIDE 15

UNIVERSIDAD AUTONOMA

Framework (I)

Numerical Experiments Experimental Framework

The similarity matrix W is defined with a Gaussian kernel with width parameter σ as the 10% percentile of all the distances. The distance dF(P, P′) is used as a quality measure, where P = D−1W is the transition probability matrix of SC. Models: Sr: random selection. Sk: KASP selection. Skk1: kKASP selection using kernel parameter σ the percentile 1%. It is more local, producing thus more clusters. Skk1: kKASP selection using kernel parameter σ the percentile 10%. The kernel matrix is the similarity matrix W . Sizes: 10, 50, 100, 200, 300, 400, 500, 750 and 1, 000. For Sr and Sk these are the final sizes but, for Skk1 and Skk10, kernel K– means can collapse some of the clusters giving a smaller subsample.

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 9 / 12

slide-16
SLIDE 16

UNIVERSIDAD AUTONOMA

Framework (II)

Numerical Experiments Experimental Framework

Datasets: Synthetic fish-bowl (10, 000 three-dimensional sample patterns). Musk problem (with 6, 598 patterns and 166 features). Pen-based handwritten digits recognition problem (with 10, 992 in- stances of dimension 16). Image segmentation problem (with 2, 310 randomized instances of di- mension 19 from a database of 7 outdoor images). Each experiment is repeated 25 times to deal with the dependence on the initialization of kernel K-means.

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 10 / 12

slide-17
SLIDE 17

UNIVERSIDAD AUTONOMA

Results

Numerical Experiments Results 200 400 600 800 1,000 10−6 10−3 100 200 400 600 800 1,000 10−2 10−1

Sr Sk Skk1 Skk10

200 400 600 800 1,000 10−3 10−2 10−1 200 400 600 800 1,000 10−4 10−2 100

Reconstruction error (median) in function of the number of patterns for Fishbowl (t.l.), Musk (t.r.), Digits (b.l.) and Image (b.r.). Skk1 and Skk10 are competitive with Sk, and all of them are better than the baseline Sr.

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 11 / 12

slide-18
SLIDE 18

UNIVERSIDAD AUTONOMA

Contents: Conclusions

Conclusions

5 Conclusions

slide-19
SLIDE 19

UNIVERSIDAD AUTONOMA

Conclusions and Further Work

Conclusions

Conclusions We have introduced kKASP, an extension of the K-means-based approxi- mate spectral clustering procedure to a kernel framework. We have compared the approximations of kKASP with that of KASP over four datasets, obtaining promising results in terms of the reconstruction error between P and P′. Further Work The main goals of SC and DM are dimensionality reduction and clustering, for which the reconstruction error is just an initial metric that has to be refined. Other quality measurements closer to the problem at hand can be used, as confusion matrices comparing a full SC or DM clustering with their low rank counterparts. The results are given for the α = 0 case, while α = 1 is often a better choice.

  • C. M. Ala´

ız et al. (EPS–UAM) KKM Approximation for SC and DM September 10, 2014 12 / 12

slide-20
SLIDE 20

UNIVERSIDAD AUTONOMA

Questions and Suggestions Kernel K-Means Low Rank Approximation for Spectral Clustering and Diffusion Maps

Carlos M. Ala´ ız ´ Angela Fern´ andez Yvonne Gala Jos´ e R. Dorronsoro

Thank you for your attention.