Learning SPD-matrix-based Representation for Visual Recognition Lei - - PowerPoint PPT Presentation

learning spd matrix based representation for visual
SMART_READER_LITE
LIVE PREVIEW

Learning SPD-matrix-based Representation for Visual Recognition Lei - - PowerPoint PPT Presentation

Learning SPD-matrix-based Representation for Visual Recognition Lei Wang VILA group School of Computing and Information Technology University of Wollongong, Australia 22-OCT-2018 Introduction How to represent an image? Scale,


slide-1
SLIDE 1

Lei Wang VILA group School of Computing and Information Technology University of Wollongong, Australia

22-OCT-2018

Learning SPD-matrix-based Representation for Visual Recognition

slide-2
SLIDE 2

Introduction

  • How to represent an image?

– Scale, rotation, illumination, occlusion, background clutter, deformation, …

Cat:

slide-3
SLIDE 3
  • Hand-crafted, global features

– Color, texture, shape, structure, etc. – Goal: “Invariant and discriminative”

  • Classifier

– K-nearest neighbor, SVMs, Boosting, …

  • 1. Before year 2000
slide-4
SLIDE 4
  • Invariant to view angle, rotation, scale,

illumination, clutter, ...

  • 2. Days of the Bag of Features (BoF) model

Local Invariant Features

Interest point detection

  • r

Dense sampling An image becomes “A bag of features”

slide-5
SLIDE 5
  • 3. Era of Deep Learning

Deep Local Descriptors

“Cat”

Depth Height Width

slide-6
SLIDE 6

Image(s): a set of points/vectors

Image set classification

6

Action recognition

vs.

Neuroimaging analysis How to pool a set of points/vectors to obtain a global visual representation ? Object detection & classification

slide-7
SLIDE 7

Covariance representation

  • Max pooling, average (sum) pooling, etc.
  • Covariance pooling

A set of local descriptors

x1 x2 . . . xn

How to pool?

Essentially a second-order pooling

slide-8
SLIDE 8
  • Introduction on Covariance representation
  • Our research work

– Discriminatively Learning Covariance Representation – Exploring Sparse Inverse Covariance Representation – Moving to Kernel-matrix-based Representation (KSPD) – Learning KSPD in deep neural networks

  • Conclusion
slide-9
SLIDE 9

Introduction on Covariance representation

Covariance Matrix

vs.

slide-10
SLIDE 10

Introduction on Covariance representation

Use a Covariance matrix as a feature representation

10

Image is from http://www.statsref.com/HTML/index.html?multivariate_distributions.html

slide-11
SLIDE 11

Introduction on Covariance representation

11

belongs to Symmetric Positive Definite (SPD) matrix resides on a manifold instead of the whole space

slide-12
SLIDE 12

Introduction on Covariance representation

?

12

How to measure the similarity of two SPD matrices?

slide-13
SLIDE 13

Introduction on SPD matrix

Similarity measures for SPD matrices

13

SPD matrices Kernel method Geodesic distance Euclidean mapping

slide-14
SLIDE 14

Introduction on SPD matrix

14

Geodesic distance

Pennec X, Fillard P, Ayache N. A Riemannian framework for tensor

  • computing. IJCV, 2006

Fletcher P T, Principal geodesic analysis

  • n symmetric spaces: Statistics of

diffusion tensors. Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis., 2004 Förstner W, Moonen B. A metric for covariance matrices, Geodesy-The Challenge of the 3rd Millennium, 2003 Lenglet C, Statistics on the manifold of multivariate normal distributions: Theory and application to diffusion tensor MRI processing. Journal of Mathematical Imaging and Vision, 2006

2003 2004 2006

slide-15
SLIDE 15

Introduction on SPD matrix

15

Euclidean mapping

Arsigny V, Log-Euclidean metrics for fast and simple calculus on diffusion tensors. Magnetic resonance in medicine, 2006, Veeraraghavan A, Matching shape sequences in video with applications in human movement

  • analysis. PAMI, IEEE Transactions on, 2005

Tuzel O, Pedestrian detection via classification on riemannian manifolds. PAMI, IEEE Transactions on, 2008

2005 2006 2008

slide-16
SLIDE 16

Introduction on SPD matrix

16

Kernel methods

Vemulapalli R, Pillai J K, Chellappa R. Kernel learning for extrinsic classification

  • f manifold features, CVPR, 2013

Harandi M et al. Sparse coding and dictionary learning for SPD matrices: a kernel approach, ECCV, 2012

  • S. Jayasumana, et. al., Kernel methods on the Riemannian

manifold of symmetric positive definite matrices, CVPR 2013. Sra S. Positive definite matrices and the S-

  • divergence. arXiv preprint arXiv:1110.1773,

2011. Wang R., et. al., Covariance discriminative learning: A natural and efficient approach to image set classification, CVPR, 2012

2011 2012 2013

Quang, Minh Ha, et. Al., Log-Hilbert- Schmidt metric between positive definite

  • perators on Hilbert spaces. NIPS. 2014.

2014

slide-17
SLIDE 17

Introduction on SPD matrix

17

Integration with deep learning

Li et al., Is Second-order Information Helpful for Large-scale Visual Recognition? ICCV2017 Huang et al., A Riemannian Network for SPD Matrix Learning, AAAI2017 Improved Bilinear Pooling with CNN, Lin and Maji, BMVC2017 Lin et al, Bilinear CNN Models for Fine-grained Visual Recognition, ICCV2015 Ionescu et al, , Matrix Backpropagation for Deep Networks with Structured Layers, ICCV2015

2015 2017

Koniusz et al., A Deeper Look at Power Normalizations,, CVPR 2018

2018

slide-18
SLIDE 18
  • Introduction on Covariance representation
  • Our research work

– Discriminatively Learning Covariance Representation – Exploring Sparse Inverse Covariance Representation – Moving to Kernel-matrix-based Representation (KSPD) – Learning KSPD in deep neural networks

  • Conclusion
slide-19
SLIDE 19

Motivation

Covariance Matrix

Covariance matrix needs to be estimated from data

slide-20
SLIDE 20

Motivation

  • Covariance estimate becomes unreliable

– High-dimensional (d) features – Small sample (n)

  • Existing work

– Not consider the quality of covariance representation – Especially the estimate of eigenvalues

20

slide-21
SLIDE 21

Motivation

Stein Kernel

21

slide-22
SLIDE 22

Motivation

  • 1. Eigenvalue estimation becomes biased when the

number of samples is inadequate

22

slide-23
SLIDE 23

Motivation

  • 2. The eigenvalues are not collectively manipulated

toward greater discrimination

Class 1 Class 2

23

slide-24
SLIDE 24

Let’s do a data-dependent “eigenvalue massage”

Class 1 Class 2 Class 1 Class 2

Proposed method

24

adjustment

slide-25
SLIDE 25

We propose “Discriminative Covariance Representation”

Power-based adjustment Coefficient-based adjustment

Proposed method

25

slide-26
SLIDE 26
  • adjusted S-Divergence:
  • Power-based adjustment
  • Coefficient-based adjustment

Discriminative Stein kernel (DSK)

Proposed method

26

slide-27
SLIDE 27

How to learn the optimal adjustment parameter ?

  • Kernel Alignment based method
  • Class Separability based method
  • Radius-margin Bound based Framework

Proposed method

27

Discriminative Stein kernel (DSK)

slide-28
SLIDE 28

Experimental Result

  • Brodatz texture
  • ADNI rs-fMRI
  • ETH-80 object
  • FERET face

Data sets

slide-29
SLIDE 29

Experimental Result

29

The most difficult 15 pairs of Brodatz texture data set

slide-30
SLIDE 30

Experimental Result

30

The most difficult 15 pairs of Brodatz texture data set

slide-31
SLIDE 31

DSK vs. eigenvalue estimation improvement methods

Discussion

[1] X. Mestre, “Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates,” IEEE Trans. Inf. Theory, vol. 54, pp. 5113–5129, Nov. 2008. [2] B. Efron and C. Morris, “Multivariate empirical Bayes and estimation of covariance matrices,” Ann. Stat., vol. 4, pp. 22–32, 1976. [3] A. Ben-David and C. E. Davidson, “Eigenvalue estimation of hyper-spectral Wishart covariance matrices from limited number of samples,” IEEE Trans. Geosci. Remote Sens., vol. 50, pp. 4384–4396, May 2012.

31

slide-32
SLIDE 32
  • Introduction on Covariance representation
  • Our research work

– Discriminatively Learning Covariance Representation – Exploring Sparse Inverse Covariance Representation – Moving to Kernel-matrix-based Representation (KSPD) – Learning KSPD in deep neural networks

  • Conclusion
slide-33
SLIDE 33

Introduction

Applications with high dimensions but small sample issue

33

Small sample 10 ~ 300 High dimensions 50 ~ 400

slide-34
SLIDE 34

Introduction

34

This results in singular covariance estimate, which adversely affects representation. How to address this situation? Data + Prior knowledge Explore the underlying structure of visual features

slide-35
SLIDE 35

Proposed SICE representation

35

Structure sparsity in skeletal human action recognition

  • Only a small number of joints are directly linked.
  • How to represent such direct links?

Sparse Inverse Covariance Estimation

(SICE)

slide-36
SLIDE 36

Proposed SICE representation

36

slide-37
SLIDE 37

Proposed SICE representation

37

Properties of SICE representation:

  • is guaranteed to be nonsingular
  • reduces over-fitting, giving more reliable representation
  • Measures the partial correlation, allowing the sparsity

prior to be conveniently imposed

slide-38
SLIDE 38

Application to Skeletal Action Recognition

38

slide-39
SLIDE 39

Application to Skeletal Action Recognition

slide-40
SLIDE 40

Application to other tasks

40

The principle of ``Bet on sparsity''

slide-41
SLIDE 41
  • Introduction on Covariance representation
  • Our research work

– Discriminatively Learning Covariance Representation – Exploring Sparse Inverse Covariance Representation – Moving to Kernel-matrix-based Representation (KSPD) – Learning KSPD in deep neural networks

  • Conclusion
slide-42
SLIDE 42

Introduction

Again, look into Covariance representation

42

slide-43
SLIDE 43

Introduction

Again, look into Covariance representation

43

i-th feature j-th feature Just a linear kernel function!

slide-44
SLIDE 44

Introduction

Covariance representation

44

Resulting issues:

  • Only modeling linear correlation of features.
  • A single, fixed representation form.
  • Unreliable or even singular covariance estimate.
slide-45
SLIDE 45

Proposed kernel-matrix representation

Let’s use a kernel matrix instead

45

Advantages:

  • Model nonlinear relationship between features;
  • For many kernels, M is guaranteed to be nonsingular, no matter

what the feature dimensions and sample size are.

  • Maintain the size of covariance representation and the

computational load.

Covariance SPD Matrix!

slide-46
SLIDE 46

Application to Skeletal Action Recognition

46

slide-47
SLIDE 47

Application to Skeletal Action Recognition

47

slide-48
SLIDE 48

Application to Object Recognition

48

slide-49
SLIDE 49

Application to Deep Learning Features

49

58 60 62 64 66 68 70 72 74 76 78 80 Alex Net (F7) VGG-19 Net (Conv5) Fisher Vector (CVPR15) Cov-RP Ker-RP (RBF)

Comparison on MIT Indoor Scenes Data Set

(Classification accuracy in percentage)

slide-50
SLIDE 50

Discussion

SICE vs. Kernel matrix: which is better?

50

slide-51
SLIDE 51

Discussion

51

SICE vs. Kernel matrix representation: which is better?

slide-52
SLIDE 52
  • Introduction on Covariance representation
  • Our research work

– Discriminatively Learning Covariance Representation – Exploring Sparse Inverse Covariance Representation – Moving to Kernel-matrix-based Representation (KSPD) – Learning KSPD in deep neural networks

  • Conclusion
slide-53
SLIDE 53

Covariance representation

Integration with Deep Learning

Bilinear CNN Models for Fine-grained Visual Recognition, Lin et al, ICCV2015

slide-54
SLIDE 54

Covariance representation

Integration with Deep Learning

Matrix Backpropagation for Deep Networks with Structured Layers, Ionescu et al, ICCV2015

slide-55
SLIDE 55

Covariance representation

Integration with Deep Learning

Improved Bilinear Pooling with CNN, Lin and Maji, BMVC2017

slide-56
SLIDE 56

Covariance representation

Integration with Deep Learning

Is Second-order Information Helpful for Large-scale Visual Recognition?, Li et al., ICCV2017

slide-57
SLIDE 57

Proposed DeepKSPD

Motivation

57

 The kernel-matrix-based SPD representation  has not been developed upon deep local descriptors  has not been jointly learned via deep learning  Existing matrix backpropagation for learning covariance-

representation via deep networks

 encounters numerical stability issue

slide-58
SLIDE 58

Proposed DeepKSPD

Architecture and layers

58

slide-59
SLIDE 59

Proposed DeepKSPD

Matrix backpropagation

59

slide-60
SLIDE 60

Proposed DeepKSPD

Matrix backpropagation

60

H = f(K) on the kernel matrix K

~

?

slide-61
SLIDE 61

Proposed DeepKSPD

Existing matrix backpropagation

Matrix Backpropagation for Deep Networks with Structured Layers, Ionescu et al, ICCV2015

slide-62
SLIDE 62

Proposed DeepKSPD

Result from the literature of Operator Theory (1951)

slide-63
SLIDE 63

Proposed DeepKSPD

Existing matrix backpropagation (Ionescu et al, ICCV2015) Proposed matrix backpropagation What is their relationship?

slide-64
SLIDE 64

Proposed DeepKSPD

Generalise to matrix α-rooting normalisation

slide-65
SLIDE 65

Experimental Result

Fine-grained Image Recognition

slide-66
SLIDE 66

Experimental Result

Fine-grained Image Recognition

slide-67
SLIDE 67

Experimental Result

Numerical stability of backpropagation

slide-68
SLIDE 68

Experimental Result

DeepKSPD vs DeepCOV

slide-69
SLIDE 69

Experimental Result

Ablation study

  • Learning width θ in the GRBF kernel
  • Learning α in matrix α-rooting normalisation
slide-70
SLIDE 70

Research trend on learning SPD representation

  • Consider higher-order feature relationship

Kernel Pooling for Convolutional Neural Networks, Cui et al, CVPR2017

slide-71
SLIDE 71

Research trend on learning SPD representation

  • Improve the computational efficiency

(d) Low-rank Bilinear Pooling for Fine-Grained Classification, Kong et al, CVPR2017 (c) Compact Bilinear Pooling, Gao et al, CVPR2016 Statistically-motivated Second-order Pooling, Yu and Salzmann, ECCV2018

slide-72
SLIDE 72

Conclusion

  • Discriminative Stein kernel to address two issues in

covariance representation

  • SICE representation to incorporate structure sparsity
  • Kernel matrix representation to move beyond linear, fixed

covariance representation

  • End-to-end deep learning of KSPD representation

1.

  • J. Zhang, L. Wang, L. Zhou, and W. Li, Learning Discriminative Stein Kernel

for SPD Matrices and Its Applications, IEEE Transactions on Neural Networks and Learning Systems (TNNLS), Vol. 27, Issue 5, pp. 1020-1033, May 2016. 2.

  • J. Zhang, L. Wang, L. Zhou, and W. Li, Exploiting Structure Sparsity for

Covariance-based Visual Representation, arXiv:1610.08619 [cs.CV]. 3.

  • L. Wang, J. Zhang, L. Zhou, C. Tang and W. Li, Beyond Covariance: Feature

Representation with Nonlinear Kernel Matrices, IEEE International Conference

  • n Computer Vision (ICCV), December 2015.

4.

  • M. Engin, L. Wang, L. Zhou, and X. Liu, DeepKSPD: Learning Kernel-matrix-

based SPD Representation for Fine-grained Image Recognition, The 15th European Conference on Computer Vision (ECCV), September 2018.

slide-73
SLIDE 73

Conclusion

  • Better understand SPD-matrix-based representation

– What is it modelling, relationship to other pooling schemes?

  • Learn the optimal SPD representation from data

– Optimisation on manifold, kernel learning, prior knowledge?

  • Computational issue

– Deal with high-dimensional features and large data set?

  • Beyond SPD representation

– Rectangular matrix – Higher order information – Spatial or temporal order

On-going Issues

slide-74
SLIDE 74

Other related publications

74

  • J. Zhang, L. Zhou and L. Wang, Subject-adaptive Integration of Multiple SICE

Brain Networks with Different Sparsity, Pattern Recognition, 63 642-652, 2017.

  • L. Zhou, L. Wang, J. Zhang, Y. Shi and Y. Gao, Revisiting Distance Metric

Learning for SPD Matrix based Visual Representation, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.

  • L. Zhou, L. Wang, L. Liu, P. Ogunbona, and D. Shen, Learning Discriminative

Bayesian Networks from High-dimensional Continuous Neuroimaging Data, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Volume: 38 , Issue: 11 , Nov. 1 2016 .

  • J. Zhang, L. Zhou, L. Wang, and W. Li, Functional Brain Network Classification

With Compact Representation of SICE Matrices, IEEE Transactions on Biomedical Engineering, 62 (6), 1623-1634, 2015.

  • L. Zhou, L. Wang and P. Ogunbona. Discriminative Sparse Inverse Covariance

Matrix: Application in Brain Functional Network Classification, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2014

  • L. Zhou, L. Wang, L. Liu, P. Ogunbona and D. Shen. Max-margin Based

Learning for Discriminative Bayesian Network from Neuroimaging Data, In the 17th International Conference on MICCAI, September 2014.

slide-75
SLIDE 75

Q&A

Images Courtesy of Google Image