G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition
Qilong Wang1 Peihua Li1 Lei Zhang2
1Dalian University of Technology, 2Hong Kong Polytechnic University
Embedding Network and Its Application to Visual Recognition Qilong - - PowerPoint PPT Presentation
G 2 DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition Qilong Wang 1 Peihua Li 1 Lei Zhang 2 1 Dalian University of Technology, 2 Hong Kong Polytechnic University Tendency of CNN architectures LeNet-5
1Dalian University of Technology, 2Hong Kong Polytechnic University
…… … …
Bilinear pooling (COV) [B-CNN, ICCV’15] O2P layer (LogCOV) [DeepO2P, ICCV’15] Mean Map Embedding [DMMs, arXiv’15] VLAD Coding [NetVLAD, CVPR’16]
T.-Y. Lin, A. RoyChowdhury, and S. Maji. Bilinear CNN models for fine-grained visual recognition. In ICCV, 2015.
NetVLAD (85.6) VS. AlexNet(69.8) (+AlexNet)
place recognition. In CVPR, 2016.
Gaussian Distribution Gaussian Mixture Model
…… … …
A trainable global Gaussian embedding layer for modeling convolutional features. The first attempt to plug a parametric probability distribution into deep CNNs.
Matrix Partition Sub- layer Square-rooted SPD Matrix Sub-layer
X Y
Z
……
( ) f Z
1 ( ) 2
T T MPL T T sym
f N N X AX XA AX 1b B
1 2
( )
ESRL
f Y Y
… …
1 2
, 1
T T
Σ μμ μ μ Σ μ
Global Gaussian:
( ) f Z Y
( ) f Z X
Q: How to construct our trainable global Gaussian embedding layer? A: The key is to give the explicit forms of Gaussian distributions. Forward Propagation Riemannian Geometry Structure Algebraic Structure Backward Propagation Differentiable
[TPAMI’17] shows space of Gaussians is equipped with a Lie group structure. The space of Gaussians is a Riemannian manifold having special geometric structure.
[TPAMI’17] Peihua Li, Qilong Wang et al. Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification. TPAMI, 2017.
,
T
T T L
1 2
1
T T
P
,
T
L
A PO
Gaussian Positive upper triangular matrix SPD matrix
1 T
L L left polar decomp. Cholesky decomp.
1 2
T T
1 1 2
T T MPL T T T T sym
f N N Y X AX XA AX 1b B
1 2 1 2 ESRL T
f Z Y Y U U
Gaussian Embedding : Y is a function of convolutional features X. Computing square-root of Y via SVD.
Matrix Partition Sub- layer Square-rooted SPD Matrix Sub-layer
……
( ) f Z
1 ( ) 2
T T MPL T T sym
f N N X AX XA AX 1b B 1 2
( )
ESRL
f Y Y
… …
1 2
, 1
T T
Σ μμ μ μ Σ μ
Global Gaussian:
( ) f Z Y
( ) f Z X
2 , .
T T T sym T diag
d d d d U U K U YU U YU
2 2
T T T ij diag i j sym
T
[DeepO2P, ICCV’15]
[DeepO2P, ICCV’15]: Catalin Ionescu et al. Matrix Backpropagation for Deep Networks with Structured Layers. ICCV, 2015.
1 1 2 2
T sym
1 1 2 2
T T sym
1 1 2 2 T ESRL
T T sym
T T T T MPL sym
BP for global Gaussian embedding layer
Matrix Partition Sub- layer Square-rooted SPD Matrix Sub-layer
X Y
Z
……
( ) f Z
1 ( ) 2
T T MPL T T sym
f N N X AX XA AX 1b B
1 2
( )
ESRL
f Y Y
… …
1 2
, 1
T T
Σ μμ μ μ Σ μ
Global Gaussian:
( ) f Z Y
( ) f Z X
Gaussian Embedding. Structural Backpropagation and .
Convergence curve of our G2DeNet- FC with AlexNet on MS-COCO.
DeepO2P [ICCV 15] DeepO2P-FC (S) [ICCV 15] DeepO2P-FC [ICCV 15] Err. 28.6 28.9 25.2 G2DeNet (Ours) G2DeNet-FC (S) (Ours) G2DeNet-FC (Ours) Err. 24.4 22.6 21.5
Comparison of classification errors on MS-COCO.
Convergence curve of our G2DeNet- FC with AlexNet on MS-COCO.
AlexNet (baseline) DeepO2P [ICCV 15] DeepO2P-FC (S) [ICCV 15] DeepO2P-FC [ICCV 15] Err. 25.3 28.6 28.9 25.2 DMMs-FC [arXiv‘15] G2DeNet (Ours) G2DeNet-FC (S) (Ours) G2DeNet-FC (Ours) Err. 24.6 24.4 22.6 21.5
Comparison of classification errors on MS-COCO.
Birds CUB-200-2011 FGVC-Aircraft FGVC-Car 100 classes 6,667 training/3,333 test 196 classes 8,144 training/8,041 test 200 classes 5,994 training/5,794 test
Methods Birds CUB-200-2011 FGVC-Aircraft FGVC-Cars FC-CNN 76.4 74.1 79.8 FV-CNN 77.5 77.6 85.7 VLAD-CNN 79.0 80.6 85.6 NetFV [TPAMI’17] 79.9 79.0 86.2 NetVLAD [CVPR’16] 81.9 81.8 88.6 B-CNN [ICCV’15] 84.1 84.1 91.3 G2DeNet (Ours) 87.1 89.0 92.5 Comparison of different counterparts by using VGG-VD16 without Bounding Box & Part sharing the same settings with B-CNN.
NetFV [TPAMI’17]: Lin et al. Bilinear CNNs for Fine-grained Visual Recognition. TPAMI, 2017.
Methods Birds CUB- 200-2011 FGVC- Aircraft FGVC- Cars Remarks PG-Alignment [CVPR’15]
82.0
PG-Alignment + BB PD [CVPR’16]
84.5
BoT [CVPR’16]
92.5
Bag of Triplets + BB SPDA-CNN[CVPR’16]
85.1
Boosted CNN [BMVC’16]
86.2 88.5 92.1
Boosted CNN + B-CNN RA-CNN [CVPR’17]
85.3
Recurrent attention CNN CVL [CVPR’17]
85.6
KP-CNN [CVPR’17]
Kernel Pooling for CNN G2DeNet (Ours)
87.1 (87.5) 89.0 92.5
VG-VD16 w/o BB & Part
Comparison of various state-of-the-art methods.
VD16-NoTr: Global Gaussian embedding layer + pre-trained VGG-VD16 on ImageNet. VD16-FT: Global Gaussian embedding layer + fine-tuned VGG-VD16. G2DeNet: Pre-trained VGG-VD16 on ImageNet + train G2DeNet in an end-to-end manner. Effects of different training methods on G2DeNet using VGG-VD16 on Birds dataset.
Comparison of different Gaussian embedding methods for G2DeNet on Birds dataset. Method Gaussian Embedding
Nakayama et al. [CVPR’2010] 83.5 Calvo et al. or Lovric et al. [JMVA’1990 & JMVA’2000] 84.1 Calvo et al. or Lovric et al. + Log- Euclidean [ICCV’2013] 83.8 Ours 87.1
1 2
1
T T
log 1
T T
1
T T
,
T T T
vec
…… … … … …