Localisation and Recognition of Human Actions Ioan Ioannis nis - - PowerPoint PPT Presentation

localisation and recognition of human actions
SMART_READER_LITE
LIVE PREVIEW

Localisation and Recognition of Human Actions Ioan Ioannis nis - - PowerPoint PPT Presentation

Localisation and Recognition of Human Actions Ioan Ioannis nis Pat Patra ras School of Electronic Engineering and Computer Science Queen Mary University of London in collaboration with A. Oikonomopoulos and M. Pantic, Imperial College


slide-1
SLIDE 1

CVPR 2011

Ioannis Patras

1

Localisation and Recognition of Human Actions

Ioan Ioannis nis Pat Patra ras

School of Electronic Engineering and Computer Science Queen Mary University of London

in collaboration with

  • A. Oikonomopoulos and M. Pantic, Imperial College London
  • I. Kotsia and Guo Weiwei, Queen Mary University of London
slide-2
SLIDE 2

CVPR 2011

Ioannis Patras

2

Related research in QMUL

  • Scene analysis (Izquierdo, Diplaros)

Object Detection/ Semantic segmentation

  • Motion Analysis (Lagendijk, Hendriks, Hancock)

Motion estimation / segmentation Object Tracking

  • Facial (Expression) Analysis (Pantic, Koelstra, Rudovic)

Head tracking/Facial Feature Tracking Facial expression recognition

  • Action / Gesture Recognition (Kotsia, Guo, Kumar, Pantic)

Spatio-temporal representations for action recognition Pose estimation

  • Brain Computer Interfaces

Dynamic Vision Looking at / sensing people Static Analysis

URL: www.eecs.qmul.ac.uk/~ioannisp/

slide-3
SLIDE 3

CVPR 2011

Ioannis Patras

3 3

Looking at/sensing people

  • Facial (Expression) Analysis

Head tracking/Facial Feature Tracking Facial expression recognition

  • Action / Gesture Recognition

Action recognition and localisation Pose estimation Tensor-based space-time analysis

  • Brain Computer Interfaces
slide-4
SLIDE 4

CVPR 2011

Ioannis Patras

4

Localisation of Human Actions

Oikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing, Mar. 2011.

Goal: Recognize categories of actions Localize them in terms of their bounding box (space + time) Challenges: Occlusions, clutter, variations, … Hypothesis: Analysis can be restricted on a set of spatiotemporally „interesting‟/salient events

slide-5
SLIDE 5

CVPR 2011

Ioannis Patras

5

Information theoretical spatial saliency

  • T. Kadir and M. Brady. IJVC, Nov. 2001

Proposal: Use signal unpredictability as an indicator of saliency

HD=3.866 HD=7.201

Spatial Saliency: Unpredictability in a single frame

slide-6
SLIDE 6

CVPR 2011

Ioannis Patras

6 Scale (circle radius) Entropy

20 40 60 80

  • 0.2

0.2 0.4 0.6 0.8 1

29 59

Towards scale invariance

The entropy maxima reveal the spatial scale(s) of a salient region

Detected salient points in a single frame

slide-7
SLIDE 7

CVPR 2011

Ioannis Patras

7 Entropy (HD) 7

Spatial and spatiotemporal saliency

Oikonomopoulos, Patras, Pantic, IEEE Transaction s SMC, part B, 2006

Spatiotemporal Saliency: Driven by signal unpredictability in a spatiotemporal volume (cylinder / sphere) Examine entropy:

     

k k k

v H v w v Y 

Entropy‟s „height‟ Entropy‟s „peakness‟      dq

u d s p d d dq u d s p s s u d s w

q D q D

 

         , , , , , ,

slide-8
SLIDE 8

CVPR 2011

Ioannis Patras

8 8

Descriptor extraction – codebook creation

Optical Flow after median subtraction Spatiotemporal Salient Point Detection

c1 c2 … cN

Codebook (class-specific) Optical Flow Input sequence t Feature ensembles O.Boiman & M.Irani [ICCV‟05] Feature selection Ensemble codewords Optical Flow + Spatial Gradient Descriptors. Bin in histograms and concatenate.

slide-9
SLIDE 9

CVPR 2011

Ioannis Patras

99

Class-dependent Spatio-temporal probabilistic voting

Current frame T t

  • t

T-t

  • Parameters stored for each ensemble in the training set

average spatial position of ensemble with respect to subject center and lower bound. distance in frames of the activated ensemble from the start/end of the action average spatiotemporal scale of ensemble.

  • Localisation model learned for codeword/cluster :

     

d e i d i i

e p c e p w c p

d

| | |  

X

T

S

d

e

i

c

i

c

d

e

 

i X

c p

x |

slide-10
SLIDE 10

CVPR 2011

Ioannis Patras

10

Discriminative learning

  • Higher weights for pdfs with low

localisation entropy

  • Class dictionary comprises of

discriminative codewords

  • Adaboost on the codeword similarities

   

i i i

c p c p d w | log | exp(   

 

 

i

c p | 

slide-11
SLIDE 11

CVPR 2011

Ioannis Patras

11

Discriminative learning

Higher weights for pdfs with low temporal localisation entropy

slide-12
SLIDE 12

CVPR 2011

Ioannis Patras

12 12

Spatio-temporal probabilistic voting

Extension in the space time domain of ‘Implicit Shape Model’, Leibe et al., ECCV’04

slide-13
SLIDE 13

CVPR 2011

Ioannis Patras

13 13

Hypothesis verification with Relevance Vector Machine classification

  • Mean-shift responses

used as features in RVM-based classification

  • Two class classification problem (one-vs-all)
  • Select class l that maximizes the posterior probability

2 2

( , ') 2

( , ')

C

D F F

K F F e

 

 

 

N j i l j l l j l

F F K w w w F c , ) ; (

 

,... ...,

, 1 i

f f F 

 

 

 

1 ;

1 |

 

 

w F cl

e F l p

slide-14
SLIDE 14

CVPR 2011

Ioannis Patras

14 14

Localisation of single actions

slide-15
SLIDE 15

CVPR 2011

Ioannis Patras

15

Localisation accuracy (KTH)

slide-16
SLIDE 16

CVPR 2011

Ioannis Patras

16

Localisation accuracy (KTH)

[SS-PE] Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. CVPR 2007

slide-17
SLIDE 17

CVPR 2011

Ioannis Patras

17 17

Action recognition

  • KTH dataset – average : 88%
  • HoHA dataset – average : 37%
slide-18
SLIDE 18

CVPR 2011

Ioannis Patras

18

Localisation under artificial occlusions (KTH)

slide-19
SLIDE 19

CVPR 2011

Ioannis Patras

19

Localisation under clutter (KTH)

slide-20
SLIDE 20

CVPR 2011

Ioannis Patras

20

Conclusions

  • Voting schemes based on local descriptors are robust to
  • cclusions
  • Good localisation and recognition accuracy
  • Relies on annotation in terms of action localisation.
  • More suitable for gestures rather than less „structured‟ actions
slide-21
SLIDE 21

CVPR 2011

Ioannis Patras

21

Support Tensor Learning

  • I. Kotsia and I. Patras, “Support Tucker Machines” CVPR 2011, Thursday afternoon
  • I. Kotsia and I. Patras, "Relative Margin Support Tensor Machines for gait and action

recognition," in CIVR 2010.

slide-22
SLIDE 22

CVPR 2011

Ioannis Patras

22

1

1 min s.t. 1 2 0, 1,...

N T T j j j j= j

w w +C ξ w φ(g )+b ξ ξ j = ,N   

Vector-based methods ignore the space (time) structure

  • f the visual data

Motivation

Large dimensionality in the case of linear SVMs

slide-23
SLIDE 23

CVPR 2011

Ioannis Patras

23

1

min ( ) where ( ) a regularisation te rm e.g. ( ) , s.t. , 1 , 0, 1,...

N j j= j j

f W +C ξ f W f W W W X W +b ξ ξ j = ,N    

Variants of Linear SVMs, where constraints are imposed

  • n the separating tensorplane

Tensor Machines

Smaller dimensionality, structural constraints

Support Tensor Machines

[16] D. Tao, et al, KIS,13(1):1–42, 2007

  • I. Kotsia, I. Patras, CVPR 2011

Support Tucker Machines

  • I. Kotsia, I. Patras, CVPR 2011

S/Sw Support Tucker Machines

  • I. Kotsia, I. Patras, CVPR 2011

=

slide-24
SLIDE 24

CVPR 2011

Ioannis Patras

24

Non-convex optimization problem w.r.t. A, B, C and core tensor G. But: Convex w.r.t. A or B or C or G alone Block coordinate optimization:

  • e.g. optimization w.r.t G keeping A, B, C fixed

Each step can be reduced to a vector-based SVM-like constrained

  • ptimization problem, e.g.

1 (1)

(1) (1) , , 1 (1) (1) :

1 min ( ( )) ( ( )) , 2 1 s.t. [( ( )) ( ] 1 , 2

M

I T i G b i T i i i i

A vec G A vec G C y A vec G vec X b

  

    

    

Supervised learning

slide-25
SLIDE 25

CVPR 2011

Ioannis Patras

25 Probe Set Sota (five methods) SVMs STMs [16] (w vector) STMs (W tensor) RMSTMs (W tensor) StuMs (W tensor) Σw-StuMs (W tensor) A 100/100 80/97 92/100 99/100 100/100 99/100 100/100 B 89/90 79/93 81/90 85/93 89/97 85/93 87/95 C 83/88 68/85 73/88 79/93 83/95 79/90 81/91 D 39/55 30/54 47/67 53/72 56/75 53/71 55/74 E 33/55 23/46 48/79 62/88 65/91 63/86 65/90 F 30/46 24/49 29/49 41/71 44/74 42/63 44/66 G 29/48 12/37 31/71 50/88 53/90 52/87 54/90 Average

  • 45/62

57/68 67/86 70/89 68/84 69/87

Gait Recognition (USF dataset)

  • Significant improvements in comparison to state of the art
slide-26
SLIDE 26

CVPR 2011

Ioannis Patras

26

KTH recognition

[7] T.K.Kim and R. Cipolla, „Canonical Correlation analysis of video volume tensors for action categorization and detection,‟IEEE PAMI, vol. 31, no. 8, pp. 1415-1428, August 2009)

Input features: Dense oriented gradients (at each pixel) Results comparable to state of the art, using very simple features

slide-27
SLIDE 27

CVPR 2011

Ioannis Patras

27

Conclusions

  • Tensors exploit topology of data better than vectors
  • The proposed algorithms (STuMs and Σ/Σw-STuMs) consistently
  • utperform previous approaches, producing state of the art results

Limitations:

  • Requires good alignment of the input data
  • More suitable for gestures rather than less „structured‟ actions
slide-28
SLIDE 28

CVPR 2011

Ioannis Patras

28

References

  • A. Oikonomopoulos, I. Patras and M. Pantic, "Spatiotemporal Localization and Categorization of Human

Actions in Unsegmented Image Sequences" . IEEE Trans. Image Processing, vol. 20, no. 4, pp. 1126-1140, Mar. 2011

  • I. Kotsia and I. Patras, "Support Tucker Machines", Int'l Conf. Computer Vision and Pattern Recognition, Jun.

2011, Colorado, USA

  • I. Kotsia and I. Patras, "Relative Margin Support Tensor Machines for gait and action recognition," in Int'l Conf.

Image and Video Retrieval 2010, 5-7 July, Xi'an, China, 2010.

  • S. Koelstra, M. Pantic and I. Patras, "A Dynamic Texture based Approach to Recognition of Facial Actions and

their Temporal Models". IEEE Trans. Pattern Analysis and Machine Intelligence, Nov. 2010

  • O. Rudovic, I. Patras and M. Pantic, "Coupled Gaussian Process Regression for pose-invariant facial expression

recognition", European Conf. Computer Vision (ECCV‟10), pp. 350-363, Heraklion, Crete, Greece, Sept. 2010