HICO: A Benchmark for Recognizing Human-Object Interactions in - - PowerPoint PPT Presentation

hico a benchmark for recognizing human object
SMART_READER_LITE
LIVE PREVIEW

HICO: A Benchmark for Recognizing Human-Object Interactions in - - PowerPoint PPT Presentation

HICO: A Benchmark for Recognizing Human-Object Interactions in Images Yu-Wei Chao, Zhan Wang, Yugeng He, Jiaxuan Wang, and Jia Deng ICCV 2015 Presented by Chia-Wen Cheng, Chia-Cheng Hsu HICO ~47,000 labeled images in 600 human-object


slide-1
SLIDE 1

HICO: A Benchmark for Recognizing Human-Object Interactions in Images

Yu-Wei Chao, Zhan Wang, Yugeng He, Jiaxuan Wang, and Jia Deng ICCV 2015

Presented by Chia-Wen Cheng, Chia-Cheng Hsu

slide-2
SLIDE 2

HICO

~47,000 labeled images in 600 human-object interaction (HOI) categories Object-Verb sports ball - block sports ball - carry sports ball - sign sports ball - hold wine glass - fill apple - peel V .... V X X ? ?

slide-3
SLIDE 3

Human-Object Interaction Prediction

Horse-Ride Horse-Sit on

slide-4
SLIDE 4

Evaluate the best proposed model

slide-5
SLIDE 5

Pipeline of the DNN Model

AlexNet

feature vector Pretrained on ImageNet . . . . SVM SVM SVM binary SVM per category

slide-6
SLIDE 6

Weird Output Distribution

x-axis: number of prediction labels y-axis: % of testing sets

slide-7
SLIDE 7

Weird Output Distribution

x-axis: number of prediction labels y-axis: % of testing sets

A lot of testing images are not predicted as any category.

slide-8
SLIDE 8

Long Tail Distribution of Categories

slide-9
SLIDE 9
slide-10
SLIDE 10

Weighted Loss for Unbalanced Dataset

Positive Sample Negative Sample Binary Classifier for Class 1 Class 1 Class 2, 3, …,600 Total Loss = w_p * loss on positive samples + w_n * loss on negative samples

slide-11
SLIDE 11

Experiments on w_p/w_n

w_p/w_n mAP (%) 1 18.58 3 19.05 10 19.39 30 19.24

slide-12
SLIDE 12

Experiment on w_p/w_n

w_p/w_n mAP (%) 1 18.58 3 19.05 10 19.39 30 19.24

slide-13
SLIDE 13

Our Implementation: End-to-End Network

slide-14
SLIDE 14

Multi-Label Classification

CNN

logistic sigmoid layer ground truth 1 1 . . cross entropy

slide-15
SLIDE 15

Experimental Setting

CNN Model:

  • Inception v3
  • softmax layer -> logistic sigmoid layer
  • number of classes -> 600

Training:

  • Use pretrained model on ImageNet
  • Fine-tune only the last layer
  • Optimizer: Adam
  • Learning rate: 0.001
  • Batch size: 64
  • Epochs: 10
slide-16
SLIDE 16

Source Code

  • Implemented in TensorFlow
  • TF-Slim Library
  • Github: https://github.com/chiawen/multi-label-classification-hico
slide-17
SLIDE 17

Performance

Method mAP (%) DNN (fine-tune O) 19.38 DNN (ImageNet) + weighted loss (ours) 19.39 Inception V3 + fine-tune (ours) 26.31

slide-18
SLIDE 18

Related Work

slide-19
SLIDE 19

Performance of HICO Benchmark

Arun Mallya and Svetlana Lazebnik. Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering. In ECCV, 2016.

Method mAP (%) DNN (fine-tune O) 19.38 DNN (ImageNet) + weighted loss (ours) 19.39 Inception V3 + fine-tune (ours) 26.31