Learning Semantic Visual Codebook for Action Recognition by - - PowerPoint PPT Presentation

learning semantic visual codebook
SMART_READER_LITE
LIVE PREVIEW

Learning Semantic Visual Codebook for Action Recognition by - - PowerPoint PPT Presentation

Learning Semantic Visual Codebook for Action Recognition by Embedding into Concept Space Behrouz Saghafi Using Spatio-temporal Features Action recognition using silhouettes or optical flow encounters difficulties when dealing with non-uniform


slide-1
SLIDE 1

Learning Semantic Visual Codebook for Action Recognition by Embedding into Concept Space

Behrouz Saghafi

slide-2
SLIDE 2

Using Spatio-temporal Features

Action recognition using silhouettes or optical flow encounters difficulties when dealing with non-uniform background, severe camera jitter and noise Local spatio-temporal features are fast and easy to extract and reliable.

slide-3
SLIDE 3

Bag of Words model

The raw features are clustered based on the their appearance rather than their semantic relations. By utilizing the semantics, the recognition accuracy will improve.

slide-4
SLIDE 4

Incorporating Semantics into BoW model (Related work)

  • Build a model for each category and fit the query to
  • ne of the models in an unsupervised framework, like

Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation.

  • Their unsupervised nature limits their performance
  • The number of topics = the number of categories,

which limits their efficiency.

Generative methods

  • Try to construct a semantic vocabulary and use it with

a classifier.

  • Liu and shah (CVPR 2008): maximization of mutual

information between visual words and videos > The formed clusters do not necessarily represent topics or synonym words.

  • Liu et al. (CVPR 2009): use Diffusion Map (DM) to

construct a semantic visual vocabulary > Considering connectivity in measuring the semantic distance is not appropriate in the presence of polysemy.

Discriminative methods

slide-5
SLIDE 5

Embedding into Concept Space (Proposed)

  • We propose a framework for constructing a semantic visual

vocabulary via computing a rich semantic space (Concept space). The concept space is computed by Latent Semantic Models or Canonical Correlation Analysis.

  • The visual words are embedded into concept space to form

meaningful clusters representing semantic topics, consequently the formed histograms are more discriminative.

  • As opposed to generative methods which do not use category labels,
  • ur method uses a classifier trained on the training histograms.
  • The number of topics can be more than categories as opposed to

the unsupervised framework, which allows analysis in more details.

  • By using pLSA in constructing the concept space, the problem of

polysemy is handled.

slide-6
SLIDE 6

Overview of the proposed framework

Constructing the semantic visual vocabulary: Training steps of the proposed method:

slide-7
SLIDE 7

Latent Semantic Analysis (LSA) (1)

  • Latent Semantic Analysis (LSA) originally used in text mining

applications, is the factorization of word-video co-occurrence matrix into linear subspaces of words and videos.

  • The word vectors reveal the semantic relations of words,

since semantically synonymous words occur in similar documents.

Videos words

MxN word-video co-occurrence matrix

word vector video vector

slide-8
SLIDE 8

Latent Semantic Analysis (LSA) (2)

  • The word vectors are sparse so their correlation may not be so

representative of their semantic relations. Therefore, we need to find the reduced dimensional space. Rank L optimal representation:

  • The correlation of words based on word vectors:

Rows of are a good representation of rows of (words) in the sense that they approximate the correlation between words.

NxM ~= NxL x LxL x LxM videos videos topics topics topics topics words words

slide-9
SLIDE 9

Embedding into concept space using LSA

NxL

: representation of word i in the L-dimensional concept space

slide-10
SLIDE 10

Probabilistic Latent Semantic Analysis (pLSA)

Observed word distributions word distributions per topic Topic distributions per document

w d z

slide-11
SLIDE 11

Probabilistic Latent Semantic Analysis (pLSA)

w d z

known unknown

Likelihood

slide-12
SLIDE 12

Probabilistic Latent Semantic Analysis (pLSA)

w d z

E-step: M-step:

Maximum Likelihood by EM

slide-13
SLIDE 13

Embedding into concept space using pLSA

slide-14
SLIDE 14

Embedding into concept space using pLSA

representation of word i in the L-dimensional concept space

slide-15
SLIDE 15

Using LSA vs pLSA

  • pLSA can handle polysemy

– Polysemes are the words which have more than one meaning.

Table

slide-16
SLIDE 16

Using LSA vs pLSA

  • LSA can perform faster.

LSA pLSA Mean Training time

(having the initial vocabulary)

62 sec 4261 sec Mean Testing time

(having learned the concept space)

0.54 sec 0.71 sec

slide-17
SLIDE 17

Canonical Correlation Analysis (CCA)

  • Given a pair of vector sets, CCA finds the direction for each set,

such that the projection of the vectors onto these directions have maximal correlation.

slide-18
SLIDE 18

Canonical Correlation Analysis (CCA) (2)

slide-19
SLIDE 19

Embedding into concept space using CCA

noisy Noise covariance is reduced Raw feature representation Semantic representation

slide-20
SLIDE 20

Constructing the semantic visual vocabulary using CCA

slide-21
SLIDE 21

Local Feature Extractor

slide-22
SLIDE 22

Performance of proposed method (Latent Semantic Space) on KTH dataset with different number of topics

LSA pLSA

slide-23
SLIDE 23

Comparison of results with the classic framework for different sizes of vocabulary (KTH)

slide-24
SLIDE 24

Confusion matrix for the best result achieved using Latent Semantic Space (KTH)

Best recognition accuracy: 93.94% by pLSA with L=50, Kf=400.

slide-25
SLIDE 25

Effect of changing the vocabulary size (CCA Space)

slide-26
SLIDE 26

Confusion matrix for the best result on KTH dataset using CCA

Best recognition accuracy: 93.39% by Kf=700.

slide-27
SLIDE 27

Comparison with reported results on KTH dataset