embedding network and its application to
play

Embedding Network and Its Application to Visual Recognition Qilong - PowerPoint PPT Presentation

G 2 DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition Qilong Wang 1 Peihua Li 1 Lei Zhang 2 1 Dalian University of Technology, 2 Hong Kong Polytechnic University Tendency of CNN architectures LeNet-5


  1. G 2 DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition Qilong Wang 1 Peihua Li 1 Lei Zhang 2 1 Dalian University of Technology, 2 Hong Kong Polytechnic University

  2. Tendency of CNN architectures LeNet-5 VGG-VD-19 /GoogLeNet-22 AlexNet-8 ResNet-152 /Inception-V4 CNN architectures tend to be Deeper & Wider More accurate ! Only Convolution, Non-linear (ReLU), Pooling

  3. Trainable structural layers O 2 P layer (LogCOV ) [DeepO 2 P, ICCV’15] … Bilinear pooling (COV) … [B- CNN, ICCV’15] …… Mean Map Embedding [DMMs, arXiv’15] Images Conv. layers Loss VLAD Coding [NetVLAD , CVPR’16] Modeling outputs of the last convolutional layer as trainable structural layers .

  4. Trainable structural layers Fine-grained Visual Classification B-CNN [D,D] (84.1, 84.1, 91.3) ~ 8% VGG-VD16 (76.4, 74.1, 79.8) T.-Y. Lin, A. RoyChowdhury, and S. Maji. Bilinear CNN models for fine-grained visual recognition. In ICCV, 2015.

  5. Trainable structural layers Place Recognition (Pitts30k) ~ 15% NetVLAD (85.6) VS. AlexNet(69.8) (+AlexNet) R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic. NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR, 2016.

  6. Trainable structural layers DMMs + GoogLeNet (49.00) VS. GoogLeNet(47.5) Scene Categorization (Place205) J. B. Oliva, D. J. Sutherland, B. P ´ oczos, and J. G. Schneider. Deep mean maps. arXiv, abs/1511.04150, 2015.

  7. Trainable structural layers DMMs + GoogLeNet (49.00) VS. GoogLeNet(47.5) Integration of trainable structural layers into deep Scene Categorization (Place205) CNNs achieves significant improvements in many challenging vision tasks. J. B. Oliva, D. J. Sutherland, B. P ´ oczos, and J. G. Schneider. Deep mean maps. arXiv, abs/1511.04150, 2015.

  8. Parametric probability distribution modeling ① Modelling abundant statistics of features. ② Producing fixed size representations regardless of varying feature sizes. Promising modeling performance ( > coding methods )  Nakayama et al. CVPR’10 Gaussian  Serra et al. CVIU’15 Parametric probability distribution modeling Distribution  Wang et al. CVPR’16 Gaussian Mixture Model High computational efficiency  Closed-form solution of parameters estimation …… Gaussian- Laplacian Model

  9. Embedding of global Gaussian in CNN … … …… Images Conv. layers Loss Global Gaussian

  10. Global Gaussian distribution embedding network ( G 2 DeNet ) 1    Global Σ μμ μ T 2   μ Σ   , Gaussian: μ T   1 … f Z ( ) Matrix Partition Sub- … X Y Square-rooted SPD Z layer Matrix Sub-layer 1   T T f ( ) X AX XA …… 1 MPL N  ( ) Y Y 2 f   2 ESRL  T T AX 1b B N sym ( ) Z  f ( ) Z  f  Y  X Images Conv. Layers Global Gaussian Embedding Layer Loss  A trainable global Gaussian embedding layer for modeling convolutional features.  The first attempt to plug a parametric probability distribution into deep CNNs .

  11. Challenges Riemannian Geometry Structure Forward Q: How to construct our trainable Propagation Algebraic global Gaussian embedding layer? Structure A: The key is to give the explicit Backward forms of Gaussian distributions. Differentiable Propagation

  12. Gaussian embedding The space of Gaussians is a Riemannian manifold having special geometric structure. [TPAMI’ 17] shows space of Gaussians is equipped with a Lie group structure. Cholesky decomp. left polar decomp.     L L T 1  A PO 1  T   , L    T L   T   2 ,     A    P  T T 0 1 T 1 , L         Gaussian Positive upper triangular matrix SPD matrix [TPAMI’ 17] Peihua Li, Qilong Wang et al. Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification. TPAMI, 2017.

  13. Global Gaussian embedding layer 1     T 2        , P Gaussian Embedding : T 1     2. Square-rooted SPD Matrix Sub-layer: 1. Matrix Partition Sub-layer : 1       T    Z f Y Y 2    Y f X ESRL T MPL 1     1   T U U 2   1 2    T T T T AX XA AX 1b B N N sym Y is a function of convolutional features X. Computing square-root of Y via SVD.

  14. BP for global Gaussian embedding layer 1   Global  Σ μμ μ T 2   μ Σ   , Gaussian: μ T   1 … X Y f Z ( ) Z … Matrix Partition Sub- Square-rooted SPD layer Matrix Sub-layer 1   T T f ( ) X AX XA …… MPL 1 N    f ( ) Y Y 2 2  T T AX 1b B ESRL N sym ( ) Z  ( ) f Z  f  Y  X Images Conv. Layers Global Gaussian Embedding Layer Loss       f Z f Z The first step is to compute The goal is to compute   X Y

  15. BP for square-rooted SPD matrix sub-layer    f Z   T Y U U Compute  Y    f f f    Y U : d : d : d [DeepO 2 P, ICCV’15 ]    Y U       T T T d U 2 U K U d YU , sym     T d U d YU . diag            f f f 1       T T T U 2 K U U , K              ij  Y U 2 2         diag i j sym [DeepO 2 P, ICCV’15]: Catalin Ionescu et al. Matrix Backpropagation for Deep Networks with Structured Layers. ICCV, 2015.

  16. BP for square-rooted SPD matrix sub-layer   f f Compute and   U    1 1 f f f      T    Z Y Y U U f : d Z : d U : d 2 2 ESRL    Z U   1 1 1       T T d Z  d U U  U d U 2 2 2   2   sym     1  1  f f f 1 f      T 2 U U , U U . 2 2       U Z Z 2   sym

  17. BP for global Gaussian embedding layer The goal is to compute  f given  f  X  Y     1 2     T T T T Y f X AX XA AX 1b B MPL N N sym       f 2 f   T T N XA 1b A     X Y     f f sym  X Y : d : d   X Y BP for global Gaussian embedding layer

  18. Global Gaussian distribution embedding network ( G 2 DeNet ) 1    Global Σ μμ μ T 2   μ Σ   , Gaussian: μ T   1 … f Z ( ) Matrix Partition Sub- … X Y Square-rooted SPD Z layer Matrix Sub-layer 1   T T f ( ) X AX XA …… 1 MPL N  ( ) Y Y 2 f   2 ESRL  T T AX 1b B N sym ( ) Z  f ( ) Z  f  Y  X Images Conv. Layers Global Gaussian Embedding Layer Loss  Gaussian Embedding.   f f  Structural Backpropagation and .   X Y

  19. Experiments on MS-COCO 890k segmented instances from MS-COCO dataset. 80 classes, ~600k training instances, ~290k validation ones. [DeepO 2 P, ICCV’ 15] DeepO 2 P DeepO 2 P-FC DeepO 2 P-FC [ICCV 15] (S) [ICCV 15] [ICCV 15] Err. 28.6 28.9 25.2 G 2 DeNet G 2 DeNet-FC (S) G 2 DeNet-FC (Ours) (Ours) (Ours) Err. 24.4 22.6 21.5 Convergence curve of our G 2 DeNet- Comparison of classification errors on MS-COCO. FC with AlexNet on MS-COCO.

  20. Experiments on MS-COCO 890k segmented instances from MS-COCO dataset. 80 classes, ~600k training instances, ~290k validation ones. [DeepO 2 P, ICCV’ 15] AlexNet DeepO 2 P DeepO 2 P-FC DeepO 2 P-FC (baseline) [ICCV 15] (S) [ICCV 15] [ICCV 15] Err. 25.3 28.6 28.9 25.2 G 2 DeNet G 2 DeNet-FC (S) G 2 DeNet-FC DMMs-FC [arXiv‘15] (Ours) (Ours) (Ours) Err. 24.6 24.4 22.6 21.5 Convergence curve of our G 2 DeNet- Comparison of classification errors on MS-COCO. FC with AlexNet on MS-COCO.

  21. Experiments on FGVR - Benchmarks Birds CUB-200-2011 FGVC-Aircraft FGVC-Car 200 classes 100 classes 196 classes 5,994 training/5,794 test 6,667 training/3,333 test 8,144 training/8,041 test

  22. Experiments on FGVR - Results Methods Birds CUB-200-2011 FGVC-Aircraft FGVC-Cars FC-CNN 76.4 74.1 79.8 FV-CNN 77.5 77.6 85.7 VLAD-CNN 79.0 80.6 85.6 NetFV [TPAMI’17] 79.9 79.0 86.2 NetVLAD [CVPR’16] 81.9 81.8 88.6 B- CNN [ICCV’15] 84.1 84.1 91.3 G 2 DeNet (Ours) 87.1 89.0 92.5 Comparison of different counterparts by using VGG-VD16 without Bounding Box & Part sharing the same settings with B-CNN. NetFV [TPAMI’17]: Lin et al. Bilinear CNNs for Fine-grained Visual Recognition. TPAMI, 2017.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend