Modeling Mutual Context of Object and Human Pose in Human-object - - PowerPoint PPT Presentation

modeling mutual context of object and human pose in human
SMART_READER_LITE
LIVE PREVIEW

Modeling Mutual Context of Object and Human Pose in Human-object - - PowerPoint PPT Presentation

Modeling Mutual Context of Object and Human Pose in Human-object Interaction Activities Bangpeng Yao Li Fei-Fei Presented by Sahil Shah Agenda Introduction Problem Formulation Learning Inference Results Agenda


slide-1
SLIDE 1

Modeling Mutual Context of Object and Human Pose in Human-object Interaction Activities

  • Bangpeng Yao
  • Li Fei-Fei

Presented by Sahil Shah

slide-2
SLIDE 2

Agenda

  • Introduction
  • Problem Formulation
  • Learning
  • Inference
  • Results
slide-3
SLIDE 3

Agenda

  • Introduction
  • Problem Formulation
  • Learning
  • Inference
  • Results
slide-4
SLIDE 4

Introduction

  • Note on author

– Pioneer of ImageNet dataset – Must see TED talk in March 2015

slide-5
SLIDE 5

Introduction

  • Problem: Detecting objects in cluttered

scenes and estimating articulated human body parts especially in human object interaction activities

slide-6
SLIDE 6

Introduction

slide-7
SLIDE 7

Introduction

slide-8
SLIDE 8

Introduction

  • Key insight: Mutual Context

– Automatically discover relevant poses – Automatically discover spatial relationships – Optimize for mutual co-occurrence of object and pose

slide-9
SLIDE 9

Introduction

  • Contribution

– Builds up on Prof. Gupta’s work – First to use mutual context – Jointly solve object detection & pose estimation

slide-10
SLIDE 10

Agenda

  • Introduction
  • Problem Formulation
  • Learning
  • Inference
  • Results
slide-11
SLIDE 11

Problem Formulation

  • Goal: Given an image of HOI activity we

need to estimate human pose(H), detect the

  • bject(O) and classify HOI activity(A)
  • Model

– Hierarchical Random Field – A,O and H contribute to detection of each other – H is a hidden variable – Body parts {Pn} are found using feature based detectors and they compose to form H

slide-12
SLIDE 12

Problem Formulation

Golf ¡Swing ¡ Tennis ¡Forehand ¡

slide-13
SLIDE 13

Problem Formulation

slide-14
SLIDE 14

Problem Formulation

  • Why need to learn structure?

– The model captures important connections between object and the body parts – Which parts of the body should be connected to

  • verall pose (H) and object (O)?
slide-15
SLIDE 15

Problem Formulation

  • Model

– Overall model: Ψ = ∑ 𝑥𝑓𝜔𝑓 – A,O,H: 𝜔𝑓(𝐵, 𝑃), 𝜔𝑓(𝐵, 𝐼), and 𝜔𝑓(𝑃, 𝐼)

  • Counting co-occurrence frequencies

– Spatial Relationships: 𝜔𝑓(𝑃,𝑄𝑜) & 𝜔𝑓 (𝑄𝑛,𝑄𝑜)

  • bin(l𝑃 −l𝑄𝑜)⋅bin(𝜄𝑃 −𝜄𝑄𝑜)⋅𝒪(𝑡𝑃/𝑡𝑄𝑜)

– Compatibility: 𝜔𝑓(𝐼,𝑄𝑜)

  • bin(l𝑄𝑜 −l𝑄1)⋅bin(𝜄𝑄𝑜)⋅𝒪(𝑡𝑄𝑜)

– Object & Body parts: 𝜔𝑓(𝑃,𝑔𝑃) and 𝜔𝑓(𝑄𝑜,𝑔𝑄𝑜)

  • Shape context feature based detectors
slide-16
SLIDE 16

Agenda

  • Introduction
  • Problem Formulation
  • Learning
  • Inference
  • Results
slide-17
SLIDE 17

Learning

  • Input and Output

Images with labeled

  • bjects, body parts &

HOI Model Learning Set of models- each for

  • ne human pose in a

particular HOI activity

slide-18
SLIDE 18

Learning

  • Overall Algorithm
slide-19
SLIDE 19

Learning

  • Hill climbing structure learning

– Each pose in each HOI activity class – Add/remove an edge and check for optima – Keep tabu list to avoid revisiting solutions – Randomly initialize thrice to avoid local optimas

slide-20
SLIDE 20

Learning

  • Max-margin for parameter estimation

– Maximize discrimination between different A – Each A has subclasses, hence multiple models and multiple weight vectors – Training sample: (x𝑗, 𝑑𝑗, 𝑧(𝑑𝑗)) 𝑧: maps 𝑑𝑗 to class label – F: 𝑧(F(x𝑗)) = 𝑧(𝑑𝑗) F(x𝑗) = argmax𝑠{w𝑠⋅x𝑗} wr: weights for rth sub- class.

slide-21
SLIDE 21

Learning

  • Overall Algorithm
slide-22
SLIDE 22

Agenda

  • Introduction
  • Problem Formulation
  • Learning
  • Inference
  • Results
slide-23
SLIDE 23

Inference

  • Given a test image(I), estimate pose and

detect object and classify activity

– To detect object (O) we maximize likelihood of the models given that object. Denoted as max𝑃,𝐼 Ψ(𝐵𝑙, 𝑃, 𝐼, I) – To detect human pose (H), compute max𝑃,𝐼 Ψ(𝐵𝑙, 𝑃, 𝐼, I) for each Ak and select the one corresponding to the ML score

slide-24
SLIDE 24

Inference

slide-25
SLIDE 25

Agenda

  • Introduction
  • Problem Formulation
  • Learning
  • Inference
  • Results
slide-26
SLIDE 26

Results

slide-27
SLIDE 27

Results

slide-28
SLIDE 28

Results

  • Object Detection

– Compare with two experiments

  • 1. Sliding window as baseline
  • 2. Pedestrian detector for human’s location context
slide-29
SLIDE 29

Results

slide-30
SLIDE 30

Results

  • Pose Estimation
slide-31
SLIDE 31

Results

  • HOI classification

– Compare with SVM with BoW – Compare with Gupta et. al.

slide-32
SLIDE 32

Results

  • Upper-left → object detection by mutual context
  • Lower-left → object detection by a scanning window
  • Upper-right → pose estimation by mutual context
  • Lower-right → pose estimation by the state-of-the-art pictorial structure method
slide-33
SLIDE 33

Results

  • Upper-left → object detection by mutual context
  • Lower-left → object detection by a scanning window
  • Upper-right → pose estimation by mutual context
  • Lower-right → pose estimation by the state-of-the-art pictorial structure method
slide-34
SLIDE 34

Thank you!