Human Action Recognition using Pose-based Discriminant Embedding - - PowerPoint PPT Presentation
Human Action Recognition using Pose-based Discriminant Embedding - - PowerPoint PPT Presentation
Human Action Recognition using Pose-based Discriminant Embedding Behrouz Saghafi Advantages of Silhouettes Informative features for describing actions Capture the spatio-temporal characteristics of motion with lower computational cost No need
Advantages of Silhouettes
Informative features for describing actions Capture the spatio-temporal characteristics
- f motion with lower computational cost
No need for an explicit human body model
General frameworks for using silhouettes in action recognition
Frame recognition framework
- Classify sequences on a frame-by-frame basis
- Label for query sequence is obtained based on a voting
scheme
- Ignore the temporal information and kinematics
Sequence recognition framework
- Classify the sequence as a whole
- Compare actions based on distances defined between
sequences of points
- Kinematics is involved
Why embedding into lower dimensional space?
- Recognition methods operating in high-
dimensional space suffer from curse of dimensionality
- Information provided in high-dimensional
image space is too much to describe an action
- Structure of human body imposes a
constraint on possible postures
Examples of postures for run and its trajectory in a possible action space
Embeddings used to find the underlying action space
- PCA (Principal Components Analysis)
- LDA (Linear Discriminant Analysis)
- LPP (Locality Preserving Projections)
- LLE (Locally Linear Embedding)
- LE (Laplacian Eigenmaps)
- Kernel PCA
- LSTDE (Local Spatio-Temporal Discriminant Embedding)
>In all these methods, Embedding is defined based on the distance between points rather than sequences. >Thus they are not guaranteed to give optimum results in sequence recognition framework
Distances between sets of points
Median Hausdorff Distance (MHD): Spatiotemporal Correlation Distance (SCD):
Optimal Embedding Computation
We propose an embedding such that in the embedded space (action space), based on SCD as the distance metric:
- The intra-class sequences are as close as possible
- The inter-class sequences are as far apart as
possible.
Optimal Embedding Computation
Intra-class sequences be as close as possible in the action space The sum of all pairwise SCD between embedded intra-class sequences should be minimized with respect to A
Optimal Embedding Computation
Optimal Embedding Computation
Optimal Embedding Computation
- For the optimization of inter-class sequences:
Optimal Embedding Computation
Generalized eigenvalue problem
Overview of Approach
> Action recognition is done by comparing the similarity between test and train sequences in the low-dimensional action space in the nearest neighbor framework.
Period Estimation (1)
- Actions can be considered semantically periodic.
- Using a single period is more computationally efficient than
using the entire length.
- To estimate the action period, we have used the method based
- n absolute correlation between frames and improved it:
The object’s self-similarity is computed by:
Period Estimation (2)
S
A column Linearly detrend
ˆ z
z
autocorrelation
ˆˆ( ) zz
R m
ˆˆ
1 ˆ ˆ ( )
zz n m n
R m E z z N m
S
ˆ z
ˆˆ( ) zz
R m
Period Estimation (3)
False peak detections by zero-derivative method specified by red vertical lines
Warping
Bicubic interpolation technique
Aligning
Experimental Results (Datasets)
Weizmann database Maryland database KTH database
Weizmann database
9 subjects. 10 actions:
bending (bend), jumping jack (jack), jumping-forward-on-two-legs (jump), jumping-in- place-on-two-legs (pjump), running (run), galloping sideways (side), skipping (skip), walking (walk), waving-one-hand (wave1), and waving-two-hands (wave2)
Recognition accuracy vs. dimension for different values of T for SCD (Weizmann)
Studying the effect of T: Comparing the maximum and mean of recognition rate for different values of T for SCD(Weizmann)
Recognition accuracy vs. dimension for different values of T for MHD (Weizmann)
Studying the effect of T: Comparing the maximum and mean of recognition rate for different values of T for MHD(Weizmann)
Recognition accuracy vs. dimension, using test sequences without warping (Weizmann)
T=6 using MHD Best accuracy: 98.89%
Comparison of different dimension reduction methods (Weizmann)
Experimental Results (Weizmann)
Comparison with different dimension reduction methods
- 1
- 0.5
0.5 1
- 0.5
0.5
- 0.6
- 0.4
- 0.2
0.2 0.4 LDA
- 1
- 0.5
0.5 1
- 0.5
0.5 1
- 1
- 0.5
0.5 1 PCA
- 1
- 0.5
0.5 1
- 0.5
0.5
- 0.4
- 0.2
0.2 0.4 Supervised LPP
- 1
- 0.5
0.5
- 0.5
0.5
- 0.6
- 0.4
- 0.2
0.2 0.4 PDE
LDA PCA SLPP PDE
Comparison with other results on Weizmann dataset
Results of Robustness to Noise (Weizmann)
Weizmann’s robustness database for deformations
Results of Weizmann’s Deformation robustness test
Weizmann’s robustness database for viewpoint
Results of Weizmann’s viewpoint robustness test
Experimental Results (Maryland)
Result: 100% recognition rate
10 actions: pick up object, jog in place, push, squat, wave, kick, bend to the side, throw, turn around and talk on cell phone
Comparison of different dimension reduction methods (Maryland)
KTH dataset
6 actions, 25 subjects, 4 scenarios
s1 s2 s3 s4