Image Retrieval using Scene Graphs Justin Johnson, Ranjay Krishna, - - PowerPoint PPT Presentation

image retrieval using scene graphs
SMART_READER_LITE
LIVE PREVIEW

Image Retrieval using Scene Graphs Justin Johnson, Ranjay Krishna, - - PowerPoint PPT Presentation

Image Retrieval using Scene Graphs Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Li Fei-Fei CVPR 2015 Presented by Youngki Kwon Contents Introduction Background Main approach


slide-1
SLIDE 1

Image Retrieval using Scene Graphs

Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Li Fei-Fei CVPR 2015 Presented by Youngki Kwon

slide-2
SLIDE 2

2

Contents

  • Introduction
  • Background
  • Main approach
  • Experiment
  • Conclusion
slide-3
SLIDE 3

3

Introduction

  • There are needs to retrieve semantically

similar images by describing detailed semantic of scene

  • Scene graph can represent scene
  • How about using scene graph as query?

Ideal Result

slide-4
SLIDE 4

4

Introduction

  • Develop novel framework for semantic

image retrieval based on the notion of a scene graph

  • Use scene graphs as query
  • Introduce a novel dataset of 5K human-

generated scene graphs grounded to images

Object & Attribute Relationship

Query

Output

Measure Score

slide-5
SLIDE 5

5

Background

  • Scene graph is data structure that

describes the contents of scene

  • Encode object instances, attributes of objects,

and relationships between objects

<Ranjay Krishna et al. IJCV16>

slide-6
SLIDE 6

6

Background

  • Attribute can be
  • Relationship can be

<Ali Farhadi et al. CVPR09> <Cewu Lu et al. ECCV16>

slide-7
SLIDE 7

7

Main approach

  • Under the assumption that scene graph

query is given and image is represented by a set of candidate bounding boxes

  • Measure the agreement between query

scene graph and an unannotated test image

  • Examining the best possible grounding of the

scene graph to the image

  • Perform maximum a posteriori (MAP)

inference to find the most likely grounding

  • Likelihood of this MAP solution is taken as the

score measuring the agreement between the scene graphs and the image

slide-8
SLIDE 8

8

Main approach

  • Scene Graph Grounding
  • G = (O, E) is a scene graph
  • B is a set of bounding boxes in image
  • 𝜹 is a grounding of the scene graph to the image
  • Model the distribution over possible groundings as

Unary Potential Binary Potential

slide-9
SLIDE 9

9

Main approach

  • Unary Potential
  • Model how well the bounding box 𝜹𝒑 agree with

the known object class and attributes of the

  • bjects o
  • If o = (c, A) then we decompose this term as

R-CNN

0.113 0.4213 . . . 0.712

Output Class 1 Class 2 Class N Attribute 1 Attribute M Input

slide-10
SLIDE 10

10

Main approach

  • Binary Potential
  • Model how well the pair of bounding boxes 𝜹𝒑,

𝜹𝒑′ express the tuple (𝒑, 𝒔, 𝒑′)

  • Extract features 𝒈(𝜹𝒑, 𝜹𝒑′) encoding their

relative position and scale

  • Train Gaussian mixture model (GMM) to model

𝑸 𝒈 𝜹𝒑, 𝜹𝒑′ 𝒅, 𝒔, 𝒅′) using training data and use GMM density function as probability

Input Output (o,r,o’) 1 (o,r,o’) 2 (o,r,o’) N GMM

0.482 0.134 . 0.772

slide-11
SLIDE 11

11

Experiment

  • Perform image retrieval experiments using

two types of scene graph as queries

  • Full ground-truth scene graph
  • Simple scene graph
  • Evaluate the groundings found by proposed

models

  • Check object localization performance

[1] Full scene graph [2] Simple scene graph

slide-12
SLIDE 12

12

Experiment

  • Full scene graph queries
slide-13
SLIDE 13

13

Experiment

  • Simple scene graph queries
slide-14
SLIDE 14

14

Experiment

Success Case

[1] [2] [3]

slide-15
SLIDE 15

15

Experiment

Failure case

slide-16
SLIDE 16

16

Conclusion

  • Use scene graph as novel representation

for detailed semantics in visual scene

  • Introduce a dataset of scene graphs

grounded to real world images

  • Construct CRF model for semantic image

retrieval using scene graphs as queries

slide-17
SLIDE 17

17

Reference

  • Visual Genome: Connecting Language and

Vision Using Crowdsourced Dense Image Annotations - Ranjay Krishna et al. (IJCV16)

  • Describing objects by their attributes - Ali

Farhadi et al. (CVPR09)

  • Visual Relationship Detection with

Language Priors - Cewu Lu et al. (ECCV16)

slide-18
SLIDE 18
slide-19
SLIDE 19

19

Quiz

  • 1. Scene graph consists of object, attribute

and ( ).

  • A. relationship
  • B. tag
  • C. visual feature
  • D. relative position
  • 2. For measuring score, examining the best

possible ( ) of the scene graph to the image

  • A. reconstruction
  • B. grounding
  • C. resizing
  • D. transformation