modeling mutual context of object and human pose in human
play

Modeling Mutual Context of Object and Human Pose in Human-object - PowerPoint PPT Presentation

Modeling Mutual Context of Object and Human Pose in Human-object Interaction Activities Bangpeng Yao Li Fei-Fei Presented by Sahil Shah Agenda Introduction Problem Formulation Learning Inference Results Agenda


  1. Modeling Mutual Context of Object and Human Pose in Human-object Interaction Activities • Bangpeng Yao • Li Fei-Fei Presented by Sahil Shah

  2. Agenda • Introduction • Problem Formulation • Learning • Inference • Results

  3. Agenda • Introduction • Problem Formulation • Learning • Inference • Results

  4. Introduction • Note on author – Pioneer of ImageNet dataset – Must see TED talk in March 2015

  5. Introduction • Problem: Detecting objects in cluttered scenes and estimating articulated human body parts especially in human object interaction activities

  6. Introduction

  7. Introduction

  8. Introduction • Key insight: Mutual Context – Automatically discover relevant poses – Automatically discover spatial relationships – Optimize for mutual co-occurrence of object and pose

  9. Introduction • Contribution – Builds up on Prof. Gupta’s work – First to use mutual context – Jointly solve object detection & pose estimation

  10. Agenda • Introduction • Problem Formulation • Learning • Inference • Results

  11. Problem Formulation • Goal: Given an image of HOI activity we need to estimate human pose(H), detect the object(O) and classify HOI activity(A) • Model – Hierarchical Random Field – A,O and H contribute to detection of each other – H is a hidden variable – Body parts {P n } are found using feature based detectors and they compose to form H

  12. Problem Formulation Golf ¡Swing ¡ Tennis ¡Forehand ¡

  13. Problem Formulation

  14. Problem Formulation • Why need to learn structure? – The model captures important connections between object and the body parts – Which parts of the body should be connected to overall pose (H) and object (O)?

  15. Problem Formulation • Model – Overall model: Ψ = ∑ 𝑥 𝑓 𝜔 𝑓 – A,O,H: 𝜔 𝑓 ( 𝐵 , 𝑃 ), 𝜔 𝑓 ( 𝐵 , 𝐼 ), and 𝜔 𝑓 ( 𝑃 , 𝐼 ) • Counting co-occurrence frequencies – Spatial Relationships: 𝜔 𝑓 ( 𝑃 , 𝑄 𝑜 ) & 𝜔𝑓 ( 𝑄 𝑛 , 𝑄 𝑜 ) • bin( l 𝑃 − l 𝑄𝑜 ) ⋅ bin( 𝜄 𝑃 − 𝜄 𝑄𝑜 ) ⋅ 𝒪 ( 𝑡 𝑃 / 𝑡 𝑄𝑜 ) – Compatibility: 𝜔 𝑓 ( 𝐼 , 𝑄 𝑜 ) • bin( l 𝑄𝑜 − l 𝑄 1 ) ⋅ bin( 𝜄 𝑄𝑜 ) ⋅ 𝒪 ( 𝑡 𝑄𝑜 ) – Object & Body parts: 𝜔 𝑓 ( 𝑃 , 𝑔 𝑃 ) and 𝜔 𝑓 ( 𝑄 𝑜 , 𝑔 𝑄𝑜 ) • Shape context feature based detectors

  16. Agenda • Introduction • Problem Formulation • Learning • Inference • Results

  17. Learning • Input and Output Images with labeled Set of models- each for objects, body parts & Model Learning one human pose in a HOI particular HOI activity

  18. Learning • Overall Algorithm

  19. Learning • Hill climbing structure learning – Each pose in each HOI activity class – Add/remove an edge and check for optima – Keep tabu list to avoid revisiting solutions – Randomly initialize thrice to avoid local optimas

  20. Learning • Max-margin for parameter estimation – Maximize discrimination between different A – Each A has subclasses, hence multiple models and multiple weight vectors – Training sample: (x 𝑗 , 𝑑 𝑗 , 𝑧 ( 𝑑 𝑗 )) 𝑧 : maps 𝑑 𝑗 to class label – F: 𝑧 (F(x 𝑗 )) = 𝑧 ( 𝑑𝑗 ) F(x 𝑗 ) = argmax 𝑠 {w 𝑠 ⋅ x 𝑗 } w r : weights for r th sub- class.

  21. Learning • Overall Algorithm

  22. Agenda • Introduction • Problem Formulation • Learning • Inference • Results

  23. Inference • Given a test image(I), estimate pose and detect object and classify activity – To detect object (O) we maximize likelihood of the models given that object. Denoted as max 𝑃 , 𝐼 Ψ ( 𝐵 𝑙 , 𝑃 , 𝐼 , I) – To detect human pose (H), compute max 𝑃 , 𝐼 Ψ ( 𝐵 𝑙 , 𝑃 , 𝐼 , I) for each A k and select the one corresponding to the ML score

  24. Inference

  25. Agenda • Introduction • Problem Formulation • Learning • Inference • Results

  26. Results

  27. Results

  28. Results • Object Detection – Compare with two experiments 1. Sliding window as baseline 2. Pedestrian detector for human’s location context

  29. Results

  30. Results • Pose Estimation

  31. Results • HOI classification – Compare with SVM with BoW – Compare with Gupta et. al.

  32. Results • Upper-left → object detection by mutual context • Lower-left → object detection by a scanning window • Upper-right → pose estimation by mutual context • Lower-right → pose estimation by the state-of-the-art pictorial structure method

  33. Results • Upper-left → object detection by mutual context • Lower-left → object detection by a scanning window • Upper-right → pose estimation by mutual context • Lower-right → pose estimation by the state-of-the-art pictorial structure method

  34. Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend