[PPT] - Localisation and Recognition of Human Actions Ioan Ioannis nis PowerPoint Presentation

SLIDE 1

CVPR 2011

Ioannis Patras

1

Localisation and Recognition of Human Actions

Ioan Ioannis nis Pat Patra ras

School of Electronic Engineering and Computer Science Queen Mary University of London

in collaboration with

A. Oikonomopoulos and M. Pantic, Imperial College London
I. Kotsia and Guo Weiwei, Queen Mary University of London

SLIDE 2

CVPR 2011

Ioannis Patras

2

Related research in QMUL

Scene analysis (Izquierdo, Diplaros)

Object Detection/ Semantic segmentation

Motion Analysis (Lagendijk, Hendriks, Hancock)

Motion estimation / segmentation Object Tracking

Facial (Expression) Analysis (Pantic, Koelstra, Rudovic)

Head tracking/Facial Feature Tracking Facial expression recognition

Action / Gesture Recognition (Kotsia, Guo, Kumar, Pantic)

Spatio-temporal representations for action recognition Pose estimation

Brain Computer Interfaces

Dynamic Vision Looking at / sensing people Static Analysis

URL: www.eecs.qmul.ac.uk/~ioannisp/

SLIDE 3

CVPR 2011

Ioannis Patras

3 3

Looking at/sensing people

Facial (Expression) Analysis

Head tracking/Facial Feature Tracking Facial expression recognition

Action / Gesture Recognition

Action recognition and localisation Pose estimation Tensor-based space-time analysis

Brain Computer Interfaces

SLIDE 4

CVPR 2011

Ioannis Patras

4

Localisation of Human Actions

Oikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing, Mar. 2011.

Goal: Recognize categories of actions Localize them in terms of their bounding box (space + time) Challenges: Occlusions, clutter, variations, … Hypothesis: Analysis can be restricted on a set of spatiotemporally „interesting‟/salient events

SLIDE 5

CVPR 2011

Ioannis Patras

5

Information theoretical spatial saliency

T. Kadir and M. Brady. IJVC, Nov. 2001

Proposal: Use signal unpredictability as an indicator of saliency

HD=3.866 HD=7.201

Spatial Saliency: Unpredictability in a single frame

SLIDE 6

CVPR 2011

Ioannis Patras

6 Scale (circle radius) Entropy

20 40 60 80

0.2

0.2 0.4 0.6 0.8 1

29 59

Towards scale invariance

The entropy maxima reveal the spatial scale(s) of a salient region

Detected salient points in a single frame

SLIDE 7

CVPR 2011

Ioannis Patras

7 Entropy (HD) 7

Spatial and spatiotemporal saliency

Oikonomopoulos, Patras, Pantic, IEEE Transaction s SMC, part B, 2006

Spatiotemporal Saliency: Driven by signal unpredictability in a spatiotemporal volume (cylinder / sphere) Examine entropy:

     

k k k

v H v w v Y 

Entropy‟s „height‟ Entropy‟s „peakness‟      dq

u d s p d d dq u d s p s s u d s w

q D q D

 

         , , , , , ,

SLIDE 8

CVPR 2011

Ioannis Patras

8 8

Descriptor extraction – codebook creation

Optical Flow after median subtraction Spatiotemporal Salient Point Detection

c1 c2 … cN

Codebook (class-specific) Optical Flow Input sequence t Feature ensembles O.Boiman & M.Irani [ICCV‟05] Feature selection Ensemble codewords Optical Flow + Spatial Gradient Descriptors. Bin in histograms and concatenate.

SLIDE 9

CVPR 2011

Ioannis Patras

99

Class-dependent Spatio-temporal probabilistic voting

Current frame T t

t

T-t

Parameters stored for each ensemble in the training set

average spatial position of ensemble with respect to subject center and lower bound. distance in frames of the activated ensemble from the start/end of the action average spatiotemporal scale of ensemble.

Localisation model learned for codeword/cluster :

     

d e i d i i

e p c e p w c p

d

| | |  





X



T



S



d

e

i

c

i

c

d

e

 

i X

c p

x |



SLIDE 10

CVPR 2011

Ioannis Patras

10

Discriminative learning

Higher weights for pdfs with low

localisation entropy

Class dictionary comprises of

discriminative codewords

Adaboost on the codeword similarities

   

i i i

c p c p d w | log | exp(   



 

 

i

c p | 

SLIDE 11

CVPR 2011

Ioannis Patras

11

Discriminative learning

Higher weights for pdfs with low temporal localisation entropy

SLIDE 12

CVPR 2011

Ioannis Patras

12 12

Spatio-temporal probabilistic voting

Extension in the space time domain of ‘Implicit Shape Model’, Leibe et al., ECCV’04

SLIDE 13

CVPR 2011

Ioannis Patras

13 13

Hypothesis verification with Relevance Vector Machine classification

Mean-shift responses

used as features in RVM-based classification

Two class classification problem (one-vs-all)
Select class l that maximizes the posterior probability

2 2

( , ') 2

( , ')

C

D F F

K F F e

 



 





 

N j i l j l l j l

F F K w w w F c , ) ; (

 

,... ...,

, 1 i

f f F 

 

1 ;

1 |

 

 

w F cl

e F l p

SLIDE 14

CVPR 2011

Ioannis Patras

14 14

Localisation of single actions

SLIDE 15

CVPR 2011

Ioannis Patras

15

Localisation accuracy (KTH)

SLIDE 16

CVPR 2011

Ioannis Patras

16

Localisation accuracy (KTH)

[SS-PE] Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. CVPR 2007

SLIDE 17

CVPR 2011

Ioannis Patras

17 17

Action recognition

KTH dataset – average : 88%
HoHA dataset – average : 37%

SLIDE 18

CVPR 2011

Ioannis Patras

18

Localisation under artificial occlusions (KTH)

SLIDE 19

CVPR 2011

Ioannis Patras

19

Localisation under clutter (KTH)

SLIDE 20

CVPR 2011

Ioannis Patras

20

Conclusions

Voting schemes based on local descriptors are robust to
cclusions
Good localisation and recognition accuracy
Relies on annotation in terms of action localisation.
More suitable for gestures rather than less „structured‟ actions

SLIDE 21

CVPR 2011

Ioannis Patras

21

Support Tensor Learning

I. Kotsia and I. Patras, “Support Tucker Machines” CVPR 2011, Thursday afternoon
I. Kotsia and I. Patras, "Relative Margin Support Tensor Machines for gait and action

recognition," in CIVR 2010.

SLIDE 22

CVPR 2011

Ioannis Patras

22

1

1 min s.t. 1 2 0, 1,...

N T T j j j j= j

w w +C ξ w φ(g )+b ξ ξ j = ,N   



Vector-based methods ignore the space (time) structure

f the visual data

Motivation

Large dimensionality in the case of linear SVMs

SLIDE 23

CVPR 2011

Ioannis Patras

23

1

min ( ) where ( ) a regularisation te rm e.g. ( ) , s.t. , 1 , 0, 1,...

N j j= j j

f W +C ξ f W f W W W X W +b ξ ξ j = ,N    



Variants of Linear SVMs, where constraints are imposed

n the separating tensorplane

Tensor Machines

Smaller dimensionality, structural constraints

Support Tensor Machines

[16] D. Tao, et al, KIS,13(1):1–42, 2007

I. Kotsia, I. Patras, CVPR 2011

Support Tucker Machines

I. Kotsia, I. Patras, CVPR 2011

S/Sw Support Tucker Machines

I. Kotsia, I. Patras, CVPR 2011

=

SLIDE 24

CVPR 2011

Ioannis Patras

24

Non-convex optimization problem w.r.t. A, B, C and core tensor G. But: Convex w.r.t. A or B or C or G alone Block coordinate optimization:

e.g. optimization w.r.t G keeping A, B, C fixed

Each step can be reduced to a vector-based SVM-like constrained

ptimization problem, e.g.

1 (1)

(1) (1) , , 1 (1) (1) :

1 min ( ( )) ( ( )) , 2 1 s.t. [( ( )) ( ] 1 , 2

M

I T i G b i T i i i i

A vec G A vec G C y A vec G vec X b



  



    

    



Supervised learning

SLIDE 25

CVPR 2011

Ioannis Patras

25 Probe Set Sota (five methods) SVMs STMs [16] (w vector) STMs (W tensor) RMSTMs (W tensor) StuMs (W tensor) Σw-StuMs (W tensor) A 100/100 80/97 92/100 99/100 100/100 99/100 100/100 B 89/90 79/93 81/90 85/93 89/97 85/93 87/95 C 83/88 68/85 73/88 79/93 83/95 79/90 81/91 D 39/55 30/54 47/67 53/72 56/75 53/71 55/74 E 33/55 23/46 48/79 62/88 65/91 63/86 65/90 F 30/46 24/49 29/49 41/71 44/74 42/63 44/66 G 29/48 12/37 31/71 50/88 53/90 52/87 54/90 Average

45/62

57/68 67/86 70/89 68/84 69/87

Gait Recognition (USF dataset)

Significant improvements in comparison to state of the art

SLIDE 26

CVPR 2011

Ioannis Patras

26

KTH recognition

[7] T.K.Kim and R. Cipolla, „Canonical Correlation analysis of video volume tensors for action categorization and detection,‟IEEE PAMI, vol. 31, no. 8, pp. 1415-1428, August 2009)

Input features: Dense oriented gradients (at each pixel) Results comparable to state of the art, using very simple features

SLIDE 27

CVPR 2011

Ioannis Patras

27

Conclusions

Tensors exploit topology of data better than vectors
The proposed algorithms (STuMs and Σ/Σw-STuMs) consistently
utperform previous approaches, producing state of the art results

Limitations:

Requires good alignment of the input data
More suitable for gestures rather than less „structured‟ actions

SLIDE 28

CVPR 2011

Ioannis Patras

28

References

A. Oikonomopoulos, I. Patras and M. Pantic, "Spatiotemporal Localization and Categorization of Human

Actions in Unsegmented Image Sequences" . IEEE Trans. Image Processing, vol. 20, no. 4, pp. 1126-1140, Mar. 2011

I. Kotsia and I. Patras, "Support Tucker Machines", Int'l Conf. Computer Vision and Pattern Recognition, Jun.

2011, Colorado, USA

I. Kotsia and I. Patras, "Relative Margin Support Tensor Machines for gait and action recognition," in Int'l Conf.

Image and Video Retrieval 2010, 5-7 July, Xi'an, China, 2010.

S. Koelstra, M. Pantic and I. Patras, "A Dynamic Texture based Approach to Recognition of Facial Actions and

their Temporal Models". IEEE Trans. Pattern Analysis and Machine Intelligence, Nov. 2010

O. Rudovic, I. Patras and M. Pantic, "Coupled Gaussian Process Regression for pose-invariant facial expression

recognition", European Conf. Computer Vision (ECCV‟10), pp. 350-363, Heraklion, Crete, Greece, Sept. 2010