M4 meeting@Delft 25-26.06.2003 Institute for Human-Machine Communication Munich University of Technology
Face Tracking Tracking and Person and Person Face Action - - PowerPoint PPT Presentation
Face Tracking Tracking and Person and Person Face Action - - PowerPoint PPT Presentation
Institute for Human-Machine Communication Munich University of Technology Face Tracking Tracking and Person and Person Face Action Recognition Recognition Action Martin Zobl, Frank Wallhoff M4 meeting@Delft 25-26.06.2003 Overview
M4 meeting@Delft 25-26.09.2003 Martin Zobl 2/18 Institute for Human-Machine Communication Munich University of Technology
- Recapitulation of methodology for action recognition
- Face tracking with particle filters
- Head orientation estimation
- Action segmentation with the Bayesian Information
Criterion
- Recognition performance comparison on actions from
the PETS-ICVS 2003 and the m4 dataset
- Outlook
Overview
M4 meeting@Delft 25-26.09.2003 Martin Zobl 3/18 Institute for Human-Machine Communication Munich University of Technology
Person Action Recognition
Extraction of person locations Temporal segmentation Feature calculation Classification of segments Face detection/tracking Background Subtraction Background Subtraction Background Subtraction Global Motion Features Bayesian Information Criterion Hidden Markov Models Actions, timestamps
M4 meeting@Delft 25-26.09.2003 Martin Zobl 4/18 Institute for Human-Machine Communication Munich University of Technology
- Feature extraction based on difference images Id
, , , , , ,
T x y x y x y
x m m m m i σ σ
- =
- Composition of a 7-dimensional feature vector:
- Person location normalized center of motion:
( ) ( ) ( )
t p t m t m
y x y x y x , , ' ,
− =
- Derivations of the center of motion:
( ) ( ) ( )
1
, , ,
− − = ∆ t m t m t m
y x y x y x
- Actions are represented by global motions in the hot-spot:
( ) ( )
( )
( )
( )
- ∈
∈
⋅ =
i i
R y x d R y x d y x
t y x t y x y x t m
I I
, , ' ,
, , , , ,
Center of motion
( ) ( ) ( )
( )
( )
( )
( )
- ∈
∈
− ⋅ =
i i
R y x d R y x y x d y x
t y x t m y x t y x t
I I
, , , ,
, , , , , σ
Variance of motion
( ) ( )
( ) ( )
- ∈
∈
=
i i
R y x R y x d
t y x t i
I
, ,
1 , ,
Intensity of motion
Computation of Global Motion Features
M4 meeting@Delft 25-26.09.2003 Martin Zobl 5/18 Institute for Human-Machine Communication Munich University of Technology
Actual Image Background- Difference Idb Difference- Image Id
Center of person px,y(t) Center of motion m‘x,y(t) Derivation mx,y(t)
Visualized Motion Features
M4 meeting@Delft 25-26.09.2003 Martin Zobl 6/18 Institute for Human-Machine Communication Munich University of Technology
Markov state-space model
dynamic model prior distribution
prediction prediction
likelihood
update update
t
y
- Observation
1
1 1 1 1 1
( | ,..., ) ( | ) ( | ) ( | ,..., )
t
t t t t t t t t t x
p x y y p y x p x x p x y y dx
−
− − −
∝
- Recursive filtering distribution
t
x
- Hidden system state
Particle Filter
( ) ( )
{( , ), 1,..., }
i i t t
x i N π =
- N weighted particles
( ) ( ) 1 1
ˆ ( | ,..., ) ( )
N i i N t t t t t i
p x y y x x π δ
=
= −
- Sampling the filtering distribution
( ) ( ) ( ) 1
( | )
i i i t t t t
p y x π π − =
- Updating using their likelihood
- Resampling to avoid degradation of particles
Face Tracking
M4 meeting@Delft 25-26.09.2003 Martin Zobl 7/18 Institute for Human-Machine Communication Munich University of Technology
( )
( )
( | ) /
i t
scr i x t t t sc
p y x s N =
- Skin color ratio
- Observations
( ) ( ) 1 1 ( ( ) ( ) 1 ) 1 1
( , , , )
i t i i i i t t t t
s T x s T
− − − − −
∆ ∆ =
- Particle i
Face Tracking (2)
MLP correction equalization preprocessing classification sample
( ) i t
x
( )
( | )
MLP i t t
p y x
- Face likelihood
( ) ( ) 1 i i t t t
x Ax Bw
−
= +
- Prediction with linear autoregressive model
Model trained with ADALINE
M4 meeting@Delft 25-26.09.2003 Martin Zobl 8/18 Institute for Human-Machine Communication Munich University of Technology
- Automatic initialization
by pyramid sampling and MLP classification
- Particle Filtering
Face Tracking (3)
M4 meeting@Delft 25-26.09.2003 Martin Zobl 9/18 Institute for Human-Machine Communication Munich University of Technology
Head Orientation Estimation Training data: feret + mugshot database
( )
( ) ( ) 1
arg max[ ( )] ( )
i
N i i HO i
p HO HO ϕ ϕ
=
=
p(i)(face) p(i)(left) p(i)(half left) p(i)(quarter left) p(i)(frontal) p(i)(quarter right) p(i)(half right) p(i)(right) MLP 1 MLP 2 MLP 8 ϕ (left)=180° ϕ (half left)=135° ϕ (quarter left)=115° ϕ (frontal)=90° ϕ (quarter right)=65° ϕ (half right)=45° ϕ (right)=0° ϕ (left)=180° ϕ (half left)=135° ϕ (quarter left)=115° ϕ (frontal)=90° ϕ (quarter right)=65° ϕ (half right)=45° ϕ (right)=0° Particle i
M4 meeting@Delft 25-26.09.2003 Martin Zobl 10/18 Institute for Human-Machine Communication Munich University of Technology
Head Orientation Estimation
M4 meeting@Delft 25-26.09.2003 Martin Zobl 11/18 Institute for Human-Machine Communication Munich University of Technology
- Already successfully applied for speech segmentation,
speaker turn detection and other clustering applications
- Split window at position i and compute the BICi value for this
position:
( )
n d d d i n i n BIC
s f w i
log 2 1 2 1 log 2 log 2 log 2
- +
+ + Σ − + Σ + Σ − = ∆ λ
- Segment boundary at the most negative value of all BICi
- d=dimension of vectors, w,f,s = covariance matrices of entire
window, the first and the second segment, is a penalty weigt Action segmentation with BIC
M4 meeting@Delft 25-26.09.2003 Martin Zobl 12/18 Institute for Human-Machine Communication Munich University of Technology
- BIC-
Segmentation based on feature vectors
n=15 =0.9 n=15 =1.1
- BIC-
Segmentation based on energy vectors
n=15 =6.5 n=20 =6.5
Application of Automatic Stream Segmentation
M4 meeting@Delft 25-26.09.2003 Martin Zobl 13/18 Institute for Human-Machine Communication Munich University of Technology
n=15, =6.5
Action Segmentation with BIC
M4 meeting@Delft 25-26.09.2003 Martin Zobl 14/18 Institute for Human-Machine Communication Munich University of Technology
Artificial training data, HMMs (5 states, 2 mixtures)
66% Overall 92% 92% 8% 0% 0% 0% Shaking head 42% 58% 42% 0% 0% 0% Nodding 63% 15% 0% 63% 4% 21% Raising hand 83% 0% 0% 0% 83% 17% Get up 50% 0% 0% 17% 33% 50% Sit down Score Shaking head Nodding Raising hand Get up Sit down
Recognition Performance PETS
M4 meeting@Delft 25-26.09.2003 Martin Zobl 15/18 Institute for Human-Machine Communication Munich University of Technology
- Classification results in an acceptable recognition performance,
considering: – The limited amount of available training examples – Large variations between artificial training and test material, as for example size and view direction
Performance Discussion PETS
M4 meeting@Delft 25-26.09.2003 Martin Zobl 16/18 Institute for Human-Machine Communication Munich University of Technology
m4 training data (TRN 01-30) m4 test data (TST 01-30), HMMs (9 states, 3 mixtures)
3 32 30 225 Nodding 42% 1 4 18 Shaking head 82% Overall 96% 69 Pointing 86% 25 471 22 Writing 78% 8 5 48 3 1 Nodding 86% 1 12 1 Stand up 90% 1 9 Sit down Score Pointing Writing Shaking head Stand up Sit down
Recognition Performance m4
M4 meeting@Delft 25-26.09.2003 Martin Zobl 17/18 Institute for Human-Machine Communication Munich University of Technology
– Improved recognition performance due to real training data – Dramatically varying action lengths – Singular action region initialization not sufficient
Performance Discussion m4
M4 meeting@Delft 25-26.09.2003 Martin Zobl 18/18 Institute for Human-Machine Communication Munich University of Technology
- Head orientation tracking
- Improving featurestream by smoothing with action-
specialized Kalman-Filters
- Action detection on m4 data
- Connection to Meeting Segmentation / Multimodal
Recognizer Outlook
M4 meeting@Delft 25-26.06.2003 Institute for Human-Machine Communication Munich University of Technology