Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, - - PowerPoint PPT Presentation

adaptive affinity matrix for unsupervised metric learning
SMART_READER_LITE
LIVE PREVIEW

Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, - - PowerPoint PPT Presentation

Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, Junxuan Chen, Yiru Zhao and Hongtao Lu Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, P


slide-1
SLIDE 1

Adaptive Affinity Matrix for Unsupervised Metric Learning

Yaoyi Li, Junxuan Chen, Yiru Zhao and Hongtao Lu

Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, P .R.China

July 2016

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 1 / 18

slide-2
SLIDE 2

Background: Spectral Clustering

Spectral clustering: nonlinear feature reduction. The distribution of real data does not always obey uniform or gaussian. Spectral clustering can preserve the local neighborhood information.

  • 15
  • 10
  • 5

5 10 15 20 25

  • 10
  • 5

5 10 15

(a)

  • 15
  • 10
  • 5

5 10 15 20 25

  • 10
  • 5

5 10 15

(b)

  • 15
  • 10
  • 5

5 10 15 20 25

  • 10
  • 5

5 10 15

(c)

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 2 / 18

slide-3
SLIDE 3

Background: Spectral Clustering

Spectral clustering demonstrates a splendid performance on many challenge data sets. Objective function: y = arg min

yT Dy=1 n

  • i,j

wijyi − yj2

2 ,

where wij is the similarity between data sample xi and xj (a.k.a. affinity graph). Shortcomings of spectral clustering

Out-of-sample extension is not straightforward Cubic time complexity Sensitive to the affinity graph

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 3 / 18

slide-4
SLIDE 4

Background: Locality Preserving Projections

Locality Preserving Projections (LPP) [HN04] is the linear approximation of Laplacian Eigenmap. Locality Preserving Projections conducts dimensionality reduction by solving the optimization problem: a = arg min

aT XDXa=1 n

  • i,j

wijaTxi − aTxj2

2 ,

The superiority of LPP

Explicit projection for out-of-sample extension Complexity is reduced

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 4 / 18

slide-5
SLIDE 5

Motivation

The performance of spectral clustering methods highly depends

  • n the robustness of the affinity graph.

Some weighting methods like k−NN heat kernel will be corrupted by noises. Our goal:

Learn a robust affinity graph by optimization efficiently. Optimize the linear projection and affinity graph simultaneously.

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 5 / 18

slide-6
SLIDE 6

Related Works

Dominant Neighbors [PP07] reduces the noise of the affinity matrix by maximal cliques. Consensus k-NNs [PK13] builds affinity graph by consensus information. ClustRF-Strct [ZLG14] constructs an affinity graph via the clustering random forests. CAN and PCAN [NWH14] learn data similarity and cluster structure simultaneously.

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 6 / 18

slide-7
SLIDE 7

AdaAM: Assumption

Assumption 1: The affinity matrix W is a positive semidefinite

  • matrix. Hence we have,

W = PPT . This assumption also appeared in [CC11] Assumption 2: The ideal affinity matrix W is a low rank matrix (1 for the sample in the same class and 0 for the others).

= x W P PT

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 7 / 18

slide-8
SLIDE 8

AdaAM: Diagram

A glance of our algorithm

= x

+

P low rank Δ k-NN W projection A projection A x = Metric

sparsification sparsification

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 8 / 18

slide-9
SLIDE 9

AdaAM: Intermediate Affinity Matrix ∆

Let ∆ be the intermediate affinity matrix, and assume ∆ = PPT. Compute P by solving optimization problem min

PT P=I tr(X T(D∆ − PPT)X)

⇒ min

PT P=I tr(X TD∆X) + tr(X T(−PPT)X)

similar to spectral clustering When X is normalized with zero mean, we have D∆ = 0. The above problem is equivalent to P = arg max

PT P=I

tr(PTXX TP)

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 9 / 18

slide-10
SLIDE 10

AdaAM: Final Adaptive Affinity Matrix

With the intermediate affinity matrix ∆, we can solve the following problem for a linear projection A: A = arg min

AT A=I

tr(ATX T(L + L∆)XA) L + L∆ is the combination of the Laplacian of k-NN heat kernel and the intermediate affinity matrix. With the linear projection A, we can rewrite the affinity

  • ptimization problem and update matrix P ( D∆ = 0 still holds).

P = arg max

PT P=I

tr(PTXAATX TP)

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 10 / 18

slide-11
SLIDE 11

Experiments

We evaluate the proposed approach on five image data sets

UMIST, COIL20, USPS, MNIST, ExYaleB

We impose the same parameter selection criteria on all the algorithms in our experiments.

the size of neighborhood k = Round(log2(n/c)) projected dimension is the same as the number of classes

We denote 10 times of k-Means as a round and select the clustering result with the minimal within-cluster sum as the result

  • f each round of k-Means.

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 11 / 18

slide-12
SLIDE 12

Accuracy

100 rounds k-Means to each algorithms for the evaluation of the performance.

Table: Clustering accuracy on image data sets(%)

AdaAM k-NN Cons-kNN DN ClustRF-Bi PCAN-kMeans PCAN Avg Max Avg Max Avg Max Avg Max Avg Max Avg Max UMIST 66.06 75.65 58.16 65.39 60.27 69.22 59.15 66.96 64.63 74.44 53.79 56.52 55.30 COIL20 74.72 87.29 71.89 81.18 75.53 84.31 71.95 82.01 76.50 85.07 72.28 83.75 81.74 USPS 69.36 69.61 68.25 68.35 68.21 68.34 68.08 68.31 58.74 65.90 64.04 67.95 64.20 MNIST 60.84 61.34 48.13 48.27 47.88 48.00 49.72 49.76 51.93 52.03 58.93 58.98 59.83 ExYaleB 54.36 57.87 24.17 26.76 25.63 28.75 24.21 27.42 23.10 26.43 25.74 27.63 25.89 Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 12 / 18

slide-13
SLIDE 13

Accuracy

10 rounds k-Means for the experiment of the sensitivity to the neighborhood size

4 6 8 10 12 14 16 18 20 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7

Neighbourhood Size Accuracy

kNN cons−kNN DN ClustRF−Bi PCAN−kMeans AdaAM

(d) UMIST

4 6 8 10 12 14 16 18 20 0.6 0.65 0.7 0.75 0.8

Neighbourhood Size Accuracy

kNN cons−kNN DN ClustRF−Bi PCAN−kMeans AdaAM

(e) COIL20 Figure: Comparison between different with different of neighborhood size k

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 13 / 18

slide-14
SLIDE 14

Accuracy

4 6 8 10 12 14 16 18 20 0.5 0.55 0.6 0.65 0.7

Neighbourhood Size Accuracy

kNN cons−kNN DN ClustRF−Bi PCAN−kMeans AdaAM

(a) USPS

4 6 8 10 12 14 16 18 20 0.1 0.2 0.3 0.4 0.5 0.6

Neighbourhood Size Accuracy

kNN cons−kNN DN ClustRF−Bi PCAN−kMeans AdaAM

(b) ExYaleB Figure: Comparison between different with different of neighborhood size k

Requires more information from the pairwise similarity. For small k, sometimes does not perform well.

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 14 / 18

slide-15
SLIDE 15

Time Consumption

5 1 1 5 2 3 5 1 23.24 3.912 10.16 94.97 620.8 986.8

Number of samples Time(s)

kNN cons−kNN DN ClustRF−Bi PCAN−kMeans AdaAM

Figure: Time consumption of six approaches with different number of data instances

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 15 / 18

slide-16
SLIDE 16

Conclusion & Future Work

Conclusion

We present a novel affinity learning approach for unsupervised metric learning. The affinity matrix is learned from the same framework of spectral clustering. The affinity learning can be reduced to a singular value decomposition problem. We employ the low rank trick to make our approach more efficient.

Future Work

A better way to learn the parameter of sparsification A better way to fuse low rank ∆ and k-NN W. More applications

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 16 / 18

slide-17
SLIDE 17

Thanks

Thanks for your Attention.

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 17 / 18

slide-18
SLIDE 18

References

[CC11] Xinlei Chen and Deng Cai, Large scale spectral clustering with landmark-based representation., AAAI, 2011. [HN04] Xiaofei He and Partha Niyogi, Locality preserving projections, NIPS, vol. 16, 2004,

  • p. 153.

[NWH14] Feiping Nie, Xiaoqian Wang, and Heng Huang, Clustering and projected clustering with adaptive neighbors, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2014, pp. 977–986. [PK13] Vittal Premachandran and Ramakrishna Kakarala, Consensus of k-nns for robust neighborhood selection on graph-based manifolds, Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, IEEE, 2013, pp. 1594–1601. [PP07] Massimiliano Pavan and Marcello Pelillo, Dominant sets and pairwise clustering, Pattern Analysis and Machine Intelligence, IEEE Transactions on 29 (2007), no. 1, 167–172. [ZLG14] Xiatian Zhu, Chen Change Loy, and Shaogang Gong, Constructing robust affinity graphs for spectral clustering, Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, 2014, pp. 1450–1457.

Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 18 / 18