distance metric learning with joint representation
play

Distance Metric Learning with Joint Representation Diversification - PowerPoint PPT Presentation

Introduction Method Experiment References Distance Metric Learning with Joint Representation Diversification Xu Chu 1,2 Yang Lin 1,2 Yasha Wang 2,3 Xiting Wang 4 Hailong Yu 1,2 Xin Gao 1,2 Qi Tong 2,5 1 School of Electronics Engineering and


  1. Introduction Method Experiment References Distance Metric Learning with Joint Representation Diversification Xu Chu 1,2 Yang Lin 1,2 Yasha Wang 2,3 Xiting Wang 4 Hailong Yu 1,2 Xin Gao 1,2 Qi Tong 2,5 1 School of Electronics Engineering and Computer Science, Peking University 2 Key Laboratory of High Confidence Software Technologies, Ministry of Education 3 National Engineering Research Center of Software Engineering, Peking University 4 Microsoft Research Asia 5 School of Software and Microelectronics, Peking University July 14, 2020

  2. Introduction Method Experiment References The goal of distance metric learning (DML) Learn a mapping f θ from the original feature space to a representation space where similar examples are closer than dissimilar examples in the learned representation space.

  3. Introduction Method Experiment References The training objectives of deep DML methods encourage intra-class compactness and inter-class separability. Embedding Loss Contrastive loss [Chopra et al., 2005]: ℓ contrastive = [ d ( x a , x p ) − m pos ] + + [ m neg − d ( x a , x n )] + Triplet loss [Schroff et al., 2015]: ℓ triplet = [ d ( x a , x p ) − d ( x a , x n ) + m ] + · · · Classification Loss e s ( Sim ( xi , wyi ) − m ) AMSoftmax loss [Wang et al., 2018]: ℓ AM = − log e s ( Sim ( xi , wyi ) − m ) +Σ C j � = yi e sSim ( xi , wj ) · · ·

  4. Introduction Method Experiment References The training objectives of deep DML methods encourage intra-class compactness and inter-class separability. Embedding Loss Contrastive loss [Chopra et al., 2005]: ℓ contrastive = [ d ( x a , x p ) − m pos ] + + [ m neg − d ( x a , x n )] + Triplet loss [Schroff et al., 2015]: ℓ triplet = [ d ( x a , x p ) − d ( x a , x n ) + m ] + · · · Classification Loss e s ( Sim ( xi , wyi ) − m ) AMSoftmax loss [Wang et al., 2018]: ℓ AM = − log e s ( Sim ( xi , wyi ) − m ) +Σ C j � = yi e sSim ( xi , wj ) · · ·

  5. Introduction Method Experiment References The training objectives of deep DML methods encourage intra-class compactness and inter-class separability. Embedding Loss Contrastive loss [Chopra et al., 2005]: ℓ contrastive = [ d ( x a , x p ) − m pos ] + + [ m neg − d ( x a , x n )] + Triplet loss [Schroff et al., 2015]: ℓ triplet = [ d ( x a , x p ) − d ( x a , x n ) + m ] + · · · Classification Loss e s ( Sim ( xi , wyi ) − m ) AMSoftmax loss [Wang et al., 2018]: ℓ AM = − log e s ( Sim ( xi , wyi ) − m ) +Σ C j � = yi e sSim ( xi , wj ) · · ·

  6. Introduction Method Experiment References Trade-off between intra-class compactness and inter-class separability. Intra-class compactness: risk of filtering out useful factors (for open-set classification ) Inter-class separability: risk of introducing nuisance factors

  7. Introduction Method Experiment References Trade-off between intra-class compactness and inter-class separability. Intra-class compactness: risk of filtering out useful factors (for open-set classification ) Inter-class separability: risk of introducing nuisance factors Blue Jay Florida Jay Seen Classes Hooded Warbler? Yellow Warbler? Unseen Wilson Warbler? Classes Orange Crowned Warbler?

  8. Introduction Method Experiment References Trade-off between intra-class compactness and inter-class separability. Intra-class compactness: risk of filtering out useful factors (for open-set classification ) Inter-class separability: risk of introducing nuisance factors Blue Jay Florida Jay Seen Classes Hooded Warbler? Yellow Warbler? Unseen Wilson Warbler? Classes Orange Crowned Warbler?

  9. Introduction Method Experiment References Motivation Is it possible to find a better balance point between intra-class compactness and inter-class separability? How to leverage the hierarchical representations of DNNs to improve the DML representation?

  10. Introduction Method Experiment References Motivation Is it possible to find a better balance point between intra-class compactness and inter-class separability? How to leverage the hierarchical representations of DNNs to improve the DML representation?

  11. Introduction Method Experiment References Motivation Is it possible to find a better balance point between intra-class compactness and inter-class separability? How to leverage the hierarchical representations of DNNs to improve the DML representation? Results 1 Additional explicit penalizations on intra-class distances of representations is risky for the classification loss methods (AMSoftmax).

  12. Introduction Method Experiment References Motivation Is it possible to find a better balance point between intra-class compactness and inter-class separability? How to leverage the hierarchical representations of DNNs to improve the DML representation? Results 1 Additional explicit penalizations on intra-class distances of representations is risky for the classification loss methods (AMSoftmax). 2 Encouraging inter-class separability by penalizing distributional similarities of joint representations is beneficial for the classification loss methods (AMSoftmax).

  13. Introduction Method Experiment References Motivation Is it possible to find a better balance point between intra-class compactness and inter-class separability? How to leverage the hierarchical representations of DNNs to improve the DML representation? Results 1 Additional explicit penalizations on intra-class distances of representations is risky for the classification loss methods (AMSoftmax). 2 Encouraging inter-class separability by penalizing distributional similarities of joint representations is beneficial for the classification loss methods (AMSoftmax). 3 We propose a framework distance metric learning with joint representation diversification (JRD).

  14. Introduction Method Experiment References Challenge How to measure the similarities of joint distributions of representations across multiple layers? Solution Representers of probability measures in the reproducing kernel Hilbert space (RKHS) Definition 1 (kernel mean embedding). Let M 1 + ( X ) be the space of all probability measures P on a measurable space ( X , Σ). RKHS is a reproducing kernel Hilbert space with reproducing kernel k . The kernel mean embedding is defined by the mapping, µ : M 1 � k ( · , x ) d P ( x ) � µ P . + ( X ) − → RKHS , P �− → Definition 2 (cross-covariance operator) Let M 1 + ( × L l =1 X l ) be the space of all probability measures P on × L l =1 X l . l =1 RKHS l = RKHS 1 ⊗ · · · ⊗ RKHS L is a tensor product space with ⊗ L reproducing kernels { k l } L l =1 . The cross-covariance operator is defined by the mapping, C X 1: L : M 1 + ( × L l =1 X l ) − → ⊗ L l =1 RKHS l , l =1 X l ( ⊗ L l =1 k l ( · , x l )) d P ( x 1 , . . . , x L ) � C X 1: L ( P ). P �→ � × L

  15. Introduction Method Experiment References Challenge How to measure the similarities of joint distributions of representations across multiple layers? Solution Representers of probability measures in the reproducing kernel Hilbert space (RKHS) Definition 1 (kernel mean embedding). Let M 1 + ( X ) be the space of all probability measures P on a measurable space ( X , Σ). RKHS is a reproducing kernel Hilbert space with reproducing kernel k . The kernel mean embedding is defined by the mapping, µ : M 1 � k ( · , x ) d P ( x ) � µ P . + ( X ) − → RKHS , P �− → Definition 2 (cross-covariance operator) Let M 1 + ( × L l =1 X l ) be the space of all probability measures P on × L l =1 X l . l =1 RKHS l = RKHS 1 ⊗ · · · ⊗ RKHS L is a tensor product space with ⊗ L reproducing kernels { k l } L l =1 . The cross-covariance operator is defined by the mapping, C X 1: L : M 1 + ( × L l =1 X l ) − → ⊗ L l =1 RKHS l , l =1 X l ( ⊗ L l =1 k l ( · , x l )) d P ( x 1 , . . . , x L ) � C X 1: L ( P ). P �→ � × L

  16. Introduction Method Experiment References Challenge How to measure the similarities of joint distributions of representations across multiple layers? Solution Representers of probability measures in the reproducing kernel Hilbert space (RKHS) Definition 1 (kernel mean embedding). Let M 1 + ( X ) be the space of all probability measures P on a measurable space ( X , Σ). RKHS is a reproducing kernel Hilbert space with reproducing kernel k . The kernel mean embedding is defined by the mapping, µ : M 1 � k ( · , x ) d P ( x ) � µ P . + ( X ) − → RKHS , P �− → Definition 2 (cross-covariance operator) Let M 1 + ( × L l =1 X l ) be the space of all probability measures P on × L l =1 X l . l =1 RKHS l = RKHS 1 ⊗ · · · ⊗ RKHS L is a tensor product space with ⊗ L reproducing kernels { k l } L l =1 . The cross-covariance operator is defined by the mapping, C X 1: L : M 1 + ( × L l =1 X l ) − → ⊗ L l =1 RKHS l , l =1 X l ( ⊗ L l =1 k l ( · , x l )) d P ( x 1 , . . . , x L ) � C X 1: L ( P ). P �→ � × L

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend