1 BMVC '09 London
Heng WANG1,3, Muhammad Muneeb ULLAH2, Alexander KLÄSER1, Ivan LAPTEV2, Cordelia SCHMID1
1LEAR, INRIA, LJK – Grenoble, France 2VISTA, INRIA – Rennes, France 3LIAMA, NLPR, CASIA – Beijing, China
Evaluation of local spatio-temporal features for action recognition - - PowerPoint PPT Presentation
Evaluation of local spatio-temporal features for action recognition Heng WANG 1,3 , Muhammad Muneeb ULLAH 2 , Alexander KLSER 1 , Ivan LAPTEV 2 , Cordelia SCHMID 1 1 LEAR, INRIA, LJK Grenoble, France 2 VISTA, INRIA Rennes, France
1 BMVC '09 London
Heng WANG1,3, Muhammad Muneeb ULLAH2, Alexander KLÄSER1, Ivan LAPTEV2, Cordelia SCHMID1
1LEAR, INRIA, LJK – Grenoble, France 2VISTA, INRIA – Rennes, France 3LIAMA, NLPR, CASIA – Beijing, China
2 BMVC '09 London
– Different experimental settings – Different datasets – Evaluations limited to only few descriptors
3 BMVC '09 London
– Same datasets (varying difficulty):
– Same train / test data – Same classification method
4 BMVC '09 London
5 BMVC '09 London
6 BMVC '09 London
Space-time patches Detection of feature / interest points Description of space-time patches Patch representation as feature vector v = (v1, v2, ..., vn)
7 BMVC '09 London
Training feature vectors are clustered with k-means (k=4000) An entire video sequence is represented as occurrence histogram of visual words Classification with non-linear SVM and χ2-kernel Bag of space-time features + SVM [Schuldt’04, Niebles’06, Zhang’07] Each feature vector is assigned to its closest cluster center (visual word)
8 BMVC '09 London
9 BMVC '09 London
10 BMVC '09 London
11 BMVC '09 London
12 BMVC '09 London
13 BMVC '09 London
14 BMVC '09 London
15 BMVC '09 London
16 BMVC '09 London
17 BMVC '09 London
3x3x2x4bins HOG descriptor
descriptor
18 BMVC '09 London
19 BMVC '09 London
20 BMVC '09 London
21 BMVC '09 London
22 BMVC '09 London
– Training samples from 16 people – Testing samples from 9 people
23 BMVC '09 London
24 BMVC '09 London
– Large number of features on static background
25 BMVC '09 London
– We extend the dataset by flipping videos
26 BMVC '09 London
27 BMVC '09 London
28 BMVC '09 London
29 BMVC '09 London
30 BMVC '09 London
31 BMVC '09 London
– Importance of realistic video data – Limitations of current feature detectors – Note: large number of features (15-20 times more)
– HOG/HOF > HOG3D > Cuboids > ESURF & HOG – Combination of gradients + optical flow seems good choice
32 BMVC '09 London
33 BMVC '09 London
Harris3D + Hessian + Cuboid Dense + Dense + HOG/HOF ESURF Det.+Desc. HOG3D HOG/HOF Frames/sec 1.6 4.6 0.9 0.8 1.2 Features/frame 31 19 44 643 643