human pose search using deep
play

Human Pose Search using Deep Poselets Nataraj Jammalamadaka * Andrew - PowerPoint PPT Presentation

Human Pose Search using Deep Poselets Nataraj Jammalamadaka * Andrew Zisserman C. V. Jawahar * * CVIT, IIIT Hyderabad, India Visual Geometry Group, Department of Engineering, IIIT Hyderabad University of Oxford Human Pose: Gesture and


  1. Human Pose Search using Deep Poselets Nataraj Jammalamadaka * Andrew Zisserman Β§ C. V. Jawahar * * CVIT, IIIT Hyderabad, India Β§ Visual Geometry Group, Department of Engineering, IIIT Hyderabad University of Oxford

  2. Human Pose: Gesture and action Cover Drive Walking Gesturing Human pose is a very important IIIT Hyderabad precursor to gesture and action

  3. Pose Search: Motivation Retrieve cover drive shots Retrieve Bharatanatyam poses IIIT Hyderabad

  4. Pose Search: System 𝑦 1 , … , 𝑦 π‘œ Build a feature Take a query IIIT Hyderabad Search through video DB Return the retrieved results

  5. Overview Deep Poselets Poselet Discovery Training Detection β€’ Cluster pose space β€’ Train poselets using β€’ Detect poselets convolutional neural networks Pose retrieval … IIIT Hyderabad β€’ Given a query image β€’ Build Bag of Deep poselets β€’ Return the retrieved results

  6. Datasets Buffy Stickmen (Season 1, 5 episodes) ETH Pascal dataset (Flickr Images) H3D (Flickr Images) IIIT Hyderabad

  7. Datasets FLIC dataset (30 Hollywood movies) IIIT Hyderabad Movie dataset (Ours) ( 22 Hollywood movies) No overlap with FLIC

  8. Datasets Dataset Train Validation Test Total H3D 238 0 0 238 ETHZ Pascal 0 0 548 548 Buffy 747 0 0 747 Buffy-2 396 0 0 396 Movie 1098 491 2172 3756 Flic 2724 2279 0 5003 Total stickmen 5198 2764 2720 10682 annotations + Flipped version 10396 5528 5440 21364 IIIT Hyderabad

  9. Overview Deep Poselets Poselet Discovery Training Detection β€’ Cluster pose space β€’ Train poselets using β€’ Detect poselets convolutional neural networks Pose retrieval … IIIT Hyderabad β€’ Given a query image β€’ Build Bag of Deep poselets β€’ Return the retrieved results

  10. Poselets Poselets model body parts in a particular spatial configuration. IIIT Hyderabad

  11. Poselets Poselets model body parts in a particular spatial configuration. Poselet 1 IIIT Hyderabad

  12. Poselets Poselets model body parts in a particular spatial configuration. Poselet 2 IIIT Hyderabad

  13. Poselets Poselets model body parts in a particular spatial configuration. Poselet 3 IIIT Hyderabad

  14. Poselets: Discovery Reorganize Left arm (LA) LA + Head LA + Head + Torso All parts except head Training data with ground truth stickmen annotations Right arm (RA) RA + head RA + head + torso Poselet Average Images For each set, get pose descriptors K-Means Clustering β€’ For each body part, note the angle β€’ Cluster on the angles IIIT Hyderabad

  15. Deep Poselets: CNNs . ... . Convolution Convolution . Deep Poselet labels followed by followed by pooling pooling Input Layer 2 Layer 5 Layer 7 Layer 8 Layer 6 Softmax Fully connected Convolutional layers layer layers ReLU Non linearity: 26 30 𝑔(𝑦) = max(0, 𝑦) 13 3x3 5 5 13 30 26 Softmax layer: 50 𝑓 𝑦𝑗 𝑔(𝑦 𝑗 ) = π‘˜ 𝑓 π‘¦π‘˜ 3 50 Max Pooling Convolution IIIT Hyderabad

  16. Deep Poselets: Training . ... . Convolution Convolution . Deep Poselet labels followed by followed by pooling pooling Input Layer 2 Layer 5 Layer 7 Layer 8 Layer 6 Softmax Fully connected Convolutional layers layer layers Input image: 𝑦 Model parameters: π‘₯ Ground truth: 𝑕 Output: 𝑧 = 𝑔(𝑦, π‘₯) Training: Stochastic Gradient Descent Loss function: 𝑀 = π‘˜ 𝑕 π‘˜ log(𝑧 π‘˜ ) π‘₯ = π‘₯ βˆ’ πœƒπœ–π‘€ πœ–π‘₯ IIIT Hyderabad Architecture from Krizhevsky et al., NIPS 2012

  17. Deep Poselets: Fine tuning . ... . Convolution Convolution . Deep Poselet labels followed by followed by pooling pooling Input Layer 2 Layer 5 Layer 7 Layer 8 Layer 6 Softmax Fully connected Convolutional layers layer layers Challenge: Fine tuning procedure: -- Network has 40 million parameters. -- Required training data ~1-2 million. -- Train image classification task -- Available training data ~50K. using imagenet data of size 1.2 million. Solution: -- Replace the softmax layer with -- Train the network on a task with enough random initialization. data present. IIIT Hyderabad -- Fine-tune the network to the current task. -- Run the gradient descent.

  18. Deep Poselets: Detection Given a test image, run all the deep poselets. β€’ Each poselet occurs in a localized regions within a upper body detection. β€’ Run the classifiers on the β€œExpected center points of poselets ”. Expected center points of poselets. β€’ This improves both the speed and accuracy. IIIT Hyderabad

  19. Deep Poselets: Spatial reasoning Score: 0.3 1 Problem: The three detections fired in the same area. Score: 0.7 2 3 IIIT Hyderabad Score: 0.2

  20. Deep Poselets: Spatial reasoning Score: 0.3 οƒ  0 1 Problem: The three detections fired in the same area. Score: 0.7 οƒ  1 Objective: Rescore detection 2 to 1 and the detections 1,3 to 0. 2 Solution: For each poselet, learn regression function whose -- Input: Scores of other poselet detections -- Output: New score 3 IIIT Hyderabad Score: 0.2 οƒ  0

  21. Deep Poselets: Results Method MAP-test β€’ Evaluation measure: Mean HOG 32.6 average precision. CNN before fine-tuning 48.6 β€’ Comparison: Poselets are trained using HOG feature. CNN after fine-tuning 56.0 IIIT Hyderabad

  22. Deep Poselets: Results 40.4 78.1 AP AP #positives #positives 1863 698 in train set in train set Rank 1 Rank 11 Rank 16 Rank 1 Rank 6 Rank 11 Rank 16 Rank 6 IIIT Hyderabad Rank 21 Rank 26 Rank 31 Rank 36 Rank 36 Rank 21 Rank 26 Rank 31

  23. Overview Deep Poselets Poselet Discovery Training Detection β€’ Cluster pose space β€’ Train poselets using β€’ Detect poselets convolutional neural networks Pose retrieval … IIIT Hyderabad β€’ Given a query image β€’ Build Bag of Deep poselets β€’ Return the retrieved results

  24. Pose Search: Indexing β€’ Detect the upper body. β€’ Run all the poselets. β€’ Perform spatial reasoning. For each frame in the video DB collection Descriptor: Max pool the Deep Poselet detections 122D vector … IIIT Hyderabad Index in a database

  25. Pose Search: Retrieval Build Bag of Deep poselets … Given a query image Using cosine distance , search IIIT Hyderabad Return the retrieved results through the database

  26. Pose Search: Results Experimental setup β€’ Database: Test data of size 5440 is used as the database. β€’ Queries: All the samples in the test data are used as query. β€’ Evaluation metric: Mean average precision (MAP). Methods compared against Results β€’ Bag of visual words (BOVW) – Detect sift οƒ  K means (K = 1000) οƒ  VQ. Method MAP BOVW 14.2 β€’ Berkeley Poselets (BPL) BPL 15.3 – Run poselets οƒ  Bag of parts. HPE [1] 17.5 β€’ Human pose estimation [1] (HPE) Ours 34.6 – Run human pose estimation algorithms – Concatenate (sin(x),cos(x)) of IIIT Hyderabad all the body part angles. [1] Y. Yang and D. Ramanan . β€œArticulated pose estimation with flexible mixtures-of- parts.” In CVPR, 2011.

  27. Pose Search: Results 45 HPE [1]: 17.5 40 75% queries < 20% AP 5% queries > 50% AP 35 Percentage of queries Ours: 34.6 30 45% queries < 20% AP 25 25% queries > 50% AP 20 15 10 5 0 0 10 20 30 40 50 60 70 80 90 100 Average Precision Comparison with the state-of-the-art IIIT Hyderabad

  28. Pose Search: Analysis HPE Ground truth Detection β€’ Pose detection algorithms often commit to wrong pose. β€’ Pose search systems based on them perform poorly. OURS β€’ Bag of poselets descriptor encodes multiple S: 0.3 S: 0.2 proposals weighted by their likelihood β€’ Hence it can recover when some of the detections are wrong. IIIT Hyderabad S: 0.7

  29. Pose Search: Results AP: 59.4 Precision Query Recall IIIT Hyderabad Rank 15 Rank 1 Rank 20 Rank 25 Rank 5 Rank 10

  30. Pose Search: Results AP: 44.5 Precision Query Recall IIIT Hyderabad Rank 25 Rank 1 Rank 5 Rank 10 Rank 15 Rank 20

  31. Pose Search: Results AP: 40.3 Precision Query Recall Rank 25 IIIT Hyderabad Rank 1 Rank 5 Rank 10 Rank 15 Rank 20

  32. Summary β€’ We propose a novel Deep Poselets based method for human pose search system. β€’ Our Deep Poselet method outperforms HOG based poselets by 25% MAP. β€’ Our pose retrieval method improves the performance of the current state-of-art system by 17% MAP. IIIT Hyderabad

  33. Thank you. Questions? IIIT Hyderabad

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend