IIIT Hyderabad
Human Pose Search using Deep Poselets
Nataraj Jammalamadaka * Andrew ZissermanΒ§
- C. V. Jawahar*
*CVIT,
IIIT Hyderabad, India
Β§Visual Geometry Group,
Department of Engineering, University of Oxford
Human Pose Search using Deep Poselets Nataraj Jammalamadaka * Andrew - - PowerPoint PPT Presentation
Human Pose Search using Deep Poselets Nataraj Jammalamadaka * Andrew Zisserman C. V. Jawahar * * CVIT, IIIT Hyderabad, India Visual Geometry Group, Department of Engineering, IIIT Hyderabad University of Oxford Human Pose: Gesture and
IIIT Hyderabad
*CVIT,
IIIT Hyderabad, India
Β§Visual Geometry Group,
Department of Engineering, University of Oxford
IIIT Hyderabad
Human pose is a very important precursor to gesture and action
Walking Gesturing Cover Drive
IIIT Hyderabad
Retrieve cover drive shots Retrieve Bharatanatyam poses
IIIT Hyderabad
Take a query Build a feature Search through video DB Return the retrieved results π¦1, β¦ , π¦π
IIIT Hyderabad
Deep Poselets Pose retrieval
Poselet Discovery Training Detection
convolutional neural networks
IIIT Hyderabad
Buffy Stickmen (Season 1, 5 episodes) ETH Pascal dataset (Flickr Images) H3D (Flickr Images)
IIIT Hyderabad
FLIC dataset (30 Hollywood movies) Movie dataset (Ours) ( 22 Hollywood movies) No overlap with FLIC
IIIT Hyderabad
Dataset Train Validation Test Total H3D 238 238 ETHZ Pascal 548 548 Buffy 747 747 Buffy-2 396 396 Movie 1098 491 2172 3756 Flic 2724 2279 5003 Total stickmen annotations 5198 2764 2720 10682 + Flipped version 10396 5528 5440 21364
IIIT Hyderabad
Deep Poselets Pose retrieval
Poselet Discovery Training Detection
convolutional neural networks
IIIT Hyderabad
Poselets model body parts in a particular spatial configuration.
IIIT Hyderabad
Poselets model body parts in a particular spatial configuration. Poselet 1
IIIT Hyderabad
Poselets model body parts in a particular spatial configuration. Poselet 2
IIIT Hyderabad
Poselet 3 Poselets model body parts in a particular spatial configuration.
IIIT Hyderabad
Training data with ground truth stickmen annotations
For each set, get pose descriptors
K-Means Clustering All parts except head LA + Head RA + head RA + head + torso LA + Head + Torso Reorganize
Left arm (LA) Right arm (RA)
Poselet Average Images
IIIT Hyderabad
Input Convolution followed by pooling
Fully connected layers
Deep Poselet labels Convolution followed by pooling
30 30 3 5 5 26 26 50 3x3
Convolution Max Pooling
13 13 50
Softmax layer Convolutional layers Layer 5 Layer 2 Layer 6 Layer 7 Layer 8
ReLU Non linearity: π(π¦) = max(0, π¦) Softmax layer: π(π¦π) =
ππ¦π π ππ¦π
IIIT Hyderabad
Input Convolution followed by pooling
Fully connected layers
Deep Poselet labels Convolution followed by pooling
Softmax layer Convolutional layers Layer 5 Layer 2 Layer 6 Layer 7 Layer 8
Training: Stochastic Gradient Descent π₯ = π₯ β πππ ππ₯ Input image: π¦ Model parameters: π₯ Ground truth: π Output: π§ = π(π¦, π₯) Loss function: π = π ππlog(π§π) Architecture from Krizhevsky et al., NIPS 2012
IIIT Hyderabad
Input Convolution followed by pooling
Fully connected layers
Deep Poselet labels Convolution followed by pooling
Softmax layer Convolutional layers Layer 5 Layer 2 Layer 6 Layer 7 Layer 8
Challenge:
Solution:
data present.
Fine tuning procedure:
using imagenet data of size 1.2 million.
random initialization.
IIIT Hyderabad
Given a test image, run all the deep poselets.
regions within a upper body detection.
center points of poseletsβ.
accuracy. Expected center points of poselets.
IIIT Hyderabad
Score: 0.3 Score: 0.2 Score: 0.7
Problem: The three detections fired in the same area.
IIIT Hyderabad
Solution: For each poselet, learn regression function whose
Score: 0.3 ο 0 Score: 0.2 ο 0 Score: 0.7 ο 1
Problem: The three detections fired in the same area. Objective: Rescore detection 2 to 1 and the detections 1,3 to 0.
IIIT Hyderabad
average precision.
using HOG feature.
Method MAP-test
HOG
32.6
CNN before fine-tuning
48.6
CNN after fine-tuning
56.0
IIIT Hyderabad
78.1 1863
AP #positives in train set
40.4 698
AP #positives in train set
Rank 1 Rank 6 Rank 11 Rank 16 Rank 21 Rank 26 Rank 31 Rank 36 Rank 21 Rank 26 Rank 31 Rank 36 Rank 1 Rank 6 Rank 11 Rank 16
IIIT Hyderabad
Deep Poselets Pose retrieval
Poselet Discovery Training Detection
convolutional neural networks
IIIT Hyderabad
For each frame in the video DB collection
Index in a database
Descriptor: Max pool the Deep Poselet detections 122D vector
IIIT Hyderabad
Given a query image
Return the retrieved results
Build Bag of Deep poselets
Using cosine distance, search through the database
IIIT Hyderabad
β Detect sift ο K means (K = 1000) ο VQ.
β Run poselets ο Bag of parts.
β Run human pose estimation algorithms β Concatenate (sin(x),cos(x)) of all the body part angles.
Methods compared against Experimental setup Results
Method MAP BOVW 14.2 BPL 15.3 HPE [1] 17.5 Ours 34.6
[1] Y. Yang and D. Ramanan. βArticulated pose estimation with flexible mixtures-of-parts.β In CVPR, 2011.
IIIT Hyderabad
Comparison with the state-of-the-art
0 10 20 30 40 50 60 70 80 90 100
Average Precision Percentage of queries
HPE [1]: 17.5 Ours: 34.6
5 10 15 20 25 30 35 40 45 75% queries < 20% AP 5% queries > 50% AP 45% queries < 20% AP 25% queries > 50% AP
IIIT Hyderabad
proposals weighted by their likelihood
detections are wrong.
to wrong pose.
perform poorly. S: 0.2 S: 0.3 S: 0.7 HPE OURS Ground truth Detection
IIIT Hyderabad
Query
Precision Recall AP: 59.4 Rank 1 Rank 5 Rank 10 Rank 15 Rank 20 Rank 25
IIIT Hyderabad
AP: 44.5
Query
Recall Precision
Rank 1 Rank 5 Rank 10 Rank 15 Rank 20 Rank 25
IIIT Hyderabad
Rank 1 Rank 5 Rank 10
Query
Rank 15 Rank 20 Rank 25 AP: 40.3 Precision Recall
IIIT Hyderabad
IIIT Hyderabad