Fine-grained Image Recognition Lei Wang VILA group School of - - PowerPoint PPT Presentation

fine grained image recognition
SMART_READER_LITE
LIVE PREVIEW

Fine-grained Image Recognition Lei Wang VILA group School of - - PowerPoint PPT Presentation

Learning Kernel-matrix-based Representation for Fine-grained Image Recognition Lei Wang VILA group School of Computing and Information Technology University of Wollongong, Australia 11-Dec-2019 Introduction Fine-grained image recognition


slide-1
SLIDE 1

Lei Wang VILA group School of Computing and Information Technology University of Wollongong, Australia

11-Dec-2019

Learning Kernel-matrix-based Representation for Fine-grained Image Recognition

slide-2
SLIDE 2

Introduction

  • Fine-grained image recognition

Image courtesy of “Wei et. al., Deep learning for fine-grained image analysis: A survey”

slide-3
SLIDE 3

Introduction

  • Fine-grained image recognition

Image courtesy of “Wei et. al., Deep learning for fine-grained image analysis: A survey”

slide-4
SLIDE 4

Introduction

  • Fine-grained image recognition

Image courtesy of “Wei et. al., Deep learning for fine-grained image analysis: A survey”

slide-5
SLIDE 5

Introduction

  • Feature: how to represent an image?

– Scale, rotation, illumination, occlusion, deformation, … – Differences with respect to other classes – Ultimate goal: “Invariant and discriminative”

Cat:

slide-6
SLIDE 6
  • Hand-crafted, global features

– Color, texture, shape, structure, etc. – An image becomes a feature vector

  • Shallow classifiers

– K-nearest neighbor, SVMs, Boosting, …

  • 1. Before year 2000
slide-7
SLIDE 7
  • Invariant to view angle, rotation, scale,

illumination, clutter, ...

  • 2. Days of Bag of Features model (2003-12)

Local Invariant Features

Interest point detection

  • r

Dense sampling

An image becomes “A set of feature vectors”

slide-8
SLIDE 8
  • 3. Era of Deep Learning (since 2012)

Deep Local Descriptors

“Cat”

Depth Height Width

Again, an image becomes “A set of feature vectors”

slide-9
SLIDE 9

Image(s): a set of points/vectors

Image set classification

9

Action recognition

vs.

Neuroimaging analysis How to pool a set of points/vectors to obtain a global visual representation ? Object recognition

slide-10
SLIDE 10

Pooling operation

  • Covariance pooling (second-order pooling)

A set of local descriptors

x1 x2 . . . xn

How to pool?

  • Max pooling, average (sum) pooling, etc.
slide-11
SLIDE 11
  • Introduction on Covariance representation
  • Our research work

– Moving to Kernel-matrix-based Representation (KSPD) – End-to-end Learning KSPD in deep neural networks

  • Conclusion

Outline

11

slide-12
SLIDE 12

Introduction on Covariance representation

Covariance Matrix

vs.

slide-13
SLIDE 13

Introduction on Covariance representation

Use a Covariance matrix as a feature representation

13

Image is from http://www.statsref.com/HTML/index.html?multivariate_distributions.html

slide-14
SLIDE 14

Introduction on Covariance representation

14

belongs to Symmetric Positive Definite (SPD) matrix resides on a manifold instead of the whole space

slide-15
SLIDE 15

Introduction on Covariance representation

?

15

How to measure the similarity of two SPD matrices?

slide-16
SLIDE 16

Introduction on SPD matrix

Similarity measures for SPD matrices

16

SPD matrices Kernel method Geodesic distance Euclidean mapping

slide-17
SLIDE 17

Introduction on SPD matrix

17

Geodesic distance

Pennec X, Fillard P, Ayache N. A Riemannian framework for tensor

  • computing. IJCV, 2006

Fletcher P T, Principal geodesic analysis

  • n symmetric spaces: Statistics of

diffusion tensors. Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis., 2004 Förstner W, Moonen B. A metric for covariance matrices, Geodesy-The Challenge of the 3rd Millennium, 2003 Lenglet C, Statistics on the manifold of multivariate normal distributions: Theory and application to diffusion tensor MRI processing. Journal of Mathematical Imaging and Vision, 2006

2003 2004 2006

slide-18
SLIDE 18

Introduction on SPD matrix

18

Euclidean mapping

Arsigny V, Log‐Euclidean metrics for fast and simple calculus on diffusion tensors. Magnetic resonance in medicine, 2006, Veeraraghavan A, Matching shape sequences in video with applications in human movement

  • analysis. PAMI, IEEE Transactions on, 2005

Tuzel O, Pedestrian detection via classification on riemannian manifolds. PAMI, IEEE Transactions on, 2008

2005 2006 2008

slide-19
SLIDE 19

Introduction on SPD matrix

19

Kernel methods

Vemulapalli R, Pillai J K, Chellappa R. Kernel learning for extrinsic classification

  • f manifold features, CVPR, 2013

Harandi M et al. Sparse coding and dictionary learning for SPD matrices: a kernel approach, ECCV, 2012

  • S. Jayasumana, et. al., Kernel methods on the Riemannian

manifold of symmetric positive definite matrices, CVPR 2013. Sra S. Positive definite matrices and the S-

  • divergence. arXiv preprint arXiv:1110.1773,

2011. Wang R., et. al., Covariance discriminative learning: A natural and efficient approach to image set classification, CVPR, 2012

2011 2012 2013

Quang, Minh Ha, et. Al., Log-Hilbert- Schmidt metric between positive definite

  • perators on Hilbert spaces. NIPS. 2014.

2014

slide-20
SLIDE 20

Introduction on SPD matrix

20

Integration with deep learning

Li et al., Is Second-order Information Helpful for Large-scale Visual Recognition? ICCV2017 Huang et al., A Riemannian Network for SPD Matrix Learning, AAAI2017 Improved Bilinear Pooling with CNN, Lin and Maji, BMVC2017 Lin et al, Bilinear CNN Models for Fine-grained Visual Recognition, ICCV2015 Ionescu et al, , Matrix Backpropagation for Deep Networks with Structured Layers, ICCV2015

2015 2017

Koniusz et al., A Deeper Look at Power Normalizations,, CVPR 2018

2018

Improved Bilinear Pooling with CNN, Lin and Maji, BMVC2017

Wang et al., G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition, CVPR2017

2016

Gao et al. Compact Bilinear Pooling, CVPR2016 Cui et al., Kernel Pooling for Convolutional Neural Networks, CVPR 2017

slide-21
SLIDE 21

Introduction on SPD matrix

21

Integration with deep learning

Gao et al., Global Second-order Pooling Convolutional Networks, CVPR2019 Yu et al., Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition, ECCV2018 Wang et al., Deep Global Generalized Gaussian Networks, CVPR2019 Li et al, Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization, CVPR2018 Wei et al., Kernelized Subspace Pooling for Deep Local Descriptors, CVPR2018

2018

Brooks et al., Riemannian batch normalization for SPD neural networks, NeurIPS2019

2019

Zheng et al., Learning Deep Bilinear Transformation for Fine-grained Image Representation , NeurIPS2019

Yu and Salzmann, Statistically- motivated Second-order Pooling, ECCV2018 Lin et al., Second-order Democratic Aggregation, ECCV2018 Fang et al., Bilinear Attention Networks for Person Retrieval, ICCV2019 Zheng et al., Learning Deep Bilinear Transformation for Fine-grained Image Representation , NeurIPS2019

slide-22
SLIDE 22

Outline

22

  • Introduction on Covariance representation
  • Our research work

– Moving to Kernel-matrix-based Representation (KSPD) – End-to-end Learning KSPD in deep neural networks

  • Conclusion
slide-23
SLIDE 23

Motivation

Covariance Matrix

Covariance matrix needs to be estimated from data

slide-24
SLIDE 24

Motivation

  • Covariance estimate becomes singular

– High-dimensional (d) features – Small sample (n)

  • Covariance matrix only characterises linear

correlation between feature components

24

slide-25
SLIDE 25

Introduction

Applications with high dimensions but small sample issue

25

Small sample 10 ~ 300 High dimensions 50 ~ 400

slide-26
SLIDE 26

Introduction

Look into Covariance representation

26

i-th feature j-th feature Just a linear kernel function!

slide-27
SLIDE 27

Proposed kernel-matrix representation (ICCV15)

Let’s use a kernel function instead

27

Advantages:

  • Model nonlinear relationship between features;
  • For many kernels, M is guaranteed to be nonsingular, no matter

what the feature dimensions (d) and sample size (n) are.

  • Maintain the size of covariance representation to be d x d and

therefore the computational load.

Covariance Kernel Matrix!

slide-28
SLIDE 28

Application to skeletal action recognition

28

* Cov_JH_SVM uses a kernel function to map each of the n samples into an infinite-dimensional space and implicitly computes a covariance matrix there.

slide-29
SLIDE 29

Application to skeletal action recognition

29

* Cov_JH_SVM uses a kernel function to map each of the n samples into an infinite-dimensional space and implicitly computes a covariance matrix there.

slide-30
SLIDE 30

Application to object recognition (handcrafted features)

30

slide-31
SLIDE 31

Application to scene recognition (extracted deep features)

31

58 60 62 64 66 68 70 72 74 76 78 80 Alex Net (F7) VGG-19 Net (Conv5) Fisher Vector (CVPR15) Cov-RP Ker-RP (RBF)

Comparison on MIT Indoor Scenes Data Set

(Classification accuracy in percentage) *

* Cimpoi et. al., Deep filter banks for texture recognition and segmentation, CVPR2015

slide-32
SLIDE 32

Outline

32

  • Introduction on Covariance representation
  • Our research work

– Moving to Kernel-matrix-based Representation (KSPD) – End-to-end Learning KSPD in deep neural networks

  • Conclusion
slide-33
SLIDE 33

Covariance representation

Integration with Deep Learning

Bilinear CNN Models for Fine-grained Visual Recognition, Lin et al, ICCV2015

slide-34
SLIDE 34

Covariance representation

Integration with Deep Learning

Matrix Backpropagation for Deep Networks with Structured Layers, Ionescu et al, ICCV2015

slide-35
SLIDE 35

Covariance representation

Integration with Deep Learning

G^2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition, Wang et al, CVPR2017

slide-36
SLIDE 36

Covariance representation

Integration with Deep Learning

Improved Bilinear Pooling with CNN, Lin and Maji, BMVC2017

slide-37
SLIDE 37

Covariance representation

Integration with Deep Learning

Is Second-order Information Helpful for Large-scale Visual Recognition?, Li et al., ICCV2017

slide-38
SLIDE 38

Proposed DeepKSPD (ECCV18)

Motivation

38

 The kernel-matrix-based SPD (KSPD) representation  has not been developed upon deep local descriptors  has not been jointly learned via deep learning  Existing matrix backpropagation for learning covariance-

representation via deep networks

 encounters numerical stability issue

slide-39
SLIDE 39

Proposed DeepKSPD

Architecture and layers

39

slide-40
SLIDE 40

Proposed DeepKSPD

Matrix backpropagation

40

slide-41
SLIDE 41

Proposed DeepKSPD

Matrix backpropagation

41

H = f(K) on the kernel matrix K

~

?

slide-42
SLIDE 42

Proposed DeepKSPD

Existing matrix backpropagation

Matrix Backpropagation for Deep Networks with Structured Layers, Ionescu et al, ICCV2015

slide-43
SLIDE 43

Proposed DeepKSPD

Result from the literature of Operator Theory (1951)

slide-44
SLIDE 44

Proposed DeepKSPD

Existing matrix backpropagation (Ionescu et al, ICCV2015) Proposed matrix backpropagation What is their relationship?

slide-45
SLIDE 45

Proposed DeepKSPD

Generalize to matrix α-rooting normalization

slide-46
SLIDE 46

Experimental Result

Fine-grained Image Recognition

slide-47
SLIDE 47

Experimental Result (ECCV18)

Fine-grained Image Recognition

slide-48
SLIDE 48

Experimental Result

Numerical stability of backpropagation

slide-49
SLIDE 49

Experimental Result

DeepKSPD vs DeepCOV

slide-50
SLIDE 50

Experimental Result

Ablation study

  • Learning width θ in the GRBF kernel
  • Learning α in matrix α-rooting normalisation
slide-51
SLIDE 51

Our archive website

https://saimunur.github.io/spd-archive/

slide-52
SLIDE 52

Our archive website

https://saimunur.github.io/spd-archive/

slide-53
SLIDE 53

Our archive website

https://saimunur.github.io/spd-archive/

71 73 75 77 79 81 83 85 87 89 91

Accuracy

CUB-200-2011 dataset (SPD & related)

slide-54
SLIDE 54

Our archive website

https://saimunur.github.io/spd-archive/

76 78 80 82 84 86 88 90 92 94

Accuracy

FGVC-Aircraft dataset (SPD & related)

slide-55
SLIDE 55

Our archive website

https://saimunur.github.io/spd-archive/

90 90.5 91 91.5 92 92.5 93 93.5 94 94.5 95

Accuracy

Stanford Cars dataset (SPD & related)

slide-56
SLIDE 56

Research trends on learning SPD representation

  • Compactness of second-order feature representation &

Computational efficiency

  • Efficient training of SPD structural layers by considering the

underlying manifold structure

  • Second-order correlation across layers
  • Deeply integrated into convolutional neural networks
  • More applications explored
  • Generic and Fine-grained image recognition
  • Image segmentation, Person reidentification and retrieval
  • Action parsing & analysis, Image super-resolution
  • More to be explored…
slide-57
SLIDE 57

Key references

57

  • 1. R. Wang, H. Guo, L. S. Davis and Q. Dai, "Covariance discriminative learning: A natural and

efficient approach to image set classification," 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, 2012, pp. 2496-2503.

  • 2. S. Jayasumana, R. Hartley, M. Salzmann, H. Li and M. Harandi, "Kernel Methods on the

Riemannian Manifold of Symmetric Positive Definite Matrices," 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, 2013, pp. 73-80.

  • 3. T. Lin, A. RoyChowdhury and S. Maji, "Bilinear CNN Models for Fine-Grained Visual

Recognition," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1449-1457.

  • 4. P. Li, J. Xie, Q. Wang and W. Zuo, "Is Second-Order Information Helpful for Large-Scale

Visual Recognition?" 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 2089-2097.

  • 5. L. Wang, J. Zhang, L. Zhou, C. Tang and W. Li, "Beyond Covariance: Feature

Representation with Nonlinear Kernel Matrices," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 4570-4578.

  • 6. M. Engin, L. Wang, L. Zhou, X. Liu “DeepKSPD: Learning Kernel-matrix-based SPD

Representation for Fine-grained Image Recognition.” The European Conference on Computer Vision (ECCV), Munich, 2018, pp. 629—645.

  • 7. Second- and Higher-order Representations in Computer Vision, Tutorial at ICCV2019.
  • 8. SPD Representations Methods Archive
slide-58
SLIDE 58

Q&A

Images Courtesy of Google Image