image retrieval using scene graphs
play

Image Retrieval using Scene Graphs Justin Johnson, Ranjay Krishna, - PowerPoint PPT Presentation

Image Retrieval using Scene Graphs Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Li Fei-Fei CVPR 2015 Presented by Youngki Kwon Contents Introduction Background Main approach


  1. Image Retrieval using Scene Graphs Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Li Fei-Fei CVPR 2015 Presented by Youngki Kwon

  2. Contents ● Introduction ● Background ● Main approach ● Experiment ● Conclusion 2

  3. Introduction ● There are needs to retrieve semantically similar images by describing detailed semantic of scene ● Scene graph can represent scene ● How about using scene graph as query? Ideal Result 3

  4. Introduction ● Develop novel framework for semantic image retrieval based on the notion of a scene graph ● Use scene graphs as query ● Introduce a novel dataset of 5K human- generated scene graphs grounded to images Measure Score Output Query Object & Attribute Relationship 4

  5. Background ● Scene graph is data structure that describes the contents of scene ● Encode object instances, attributes of objects, and relationships between objects 5 <Ranjay Krishna et al. IJCV16>

  6. Background ● Attribute can be <Ali Farhadi et al. CVPR09> ● Relationship can be <Cewu Lu et al. ECCV16> 6

  7. Main approach ● Under the assumption that scene graph query is given and image is represented by a set of candidate bounding boxes ● Measure the agreement between query scene graph and an unannotated test image ● Examining the best possible grounding of the scene graph to the image ● Perform maximum a posteriori (MAP) inference to find the most likely grounding ● Likelihood of this MAP solution is taken as the score measuring the agreement between the scene graphs and the image 7

  8. Main approach ● Scene Graph Grounding ● G = (O, E) is a scene graph ● B is a set of bounding boxes in image ● 𝜹 is a grounding of the scene graph to the image ● Model the distribution over possible groundings as Unary Potential Binary Potential 8

  9. Main approach ● Unary Potential ● Model how well the bounding box 𝜹 𝒑 agree with the known object class and attributes of the objects o ● If o = (c, A) then we decompose this term as Output 0.113 Class 1 Input Class 2 0.4213 . Class N . Attribute 1 . 0.712 Attribute M R-CNN 9

  10. Main approach ● Binary Potential ● Model how well the pair of bounding boxes 𝜹 𝒑 , 𝜹 𝒑 ′ express the tuple (𝒑, 𝒔, 𝒑 ′ ) ● Extract features 𝒈(𝜹 𝒑 , 𝜹 𝒑 ′ ) encoding their relative position and scale ● Train Gaussian mixture model (GMM) to model 𝑸 𝒈 𝜹 𝒑 , 𝜹 𝒑 ′ 𝒅, 𝒔, 𝒅 ′ ) using training data and use GMM density function as probability Input Output (o,r,o ’) 1 0.482 0.134 (o,r,o ’) 2 . (o,r,o ’) N 0.772 GMM 10

  11. Experiment ● Perform image retrieval experiments using two types of scene graph as queries ● Full ground-truth scene graph ● Simple scene graph ● Evaluate the groundings found by proposed models ● Check object localization performance [1] Full scene graph [2] Simple scene graph 11

  12. Experiment ● Full scene graph queries 12

  13. Experiment ● Simple scene graph queries 13

  14. Experiment Success Case [1] [2] [3] 14

  15. Experiment Failure case 15

  16. Conclusion ● Use scene graph as novel representation for detailed semantics in visual scene ● Introduce a dataset of scene graphs grounded to real world images ● Construct CRF model for semantic image retrieval using scene graphs as queries 16

  17. Reference ● Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Ranjay Krishna et al. (IJCV16) ● Describing objects by their attributes - Ali Farhadi et al. (CVPR09) ● Visual Relationship Detection with Language Priors - Cewu Lu et al. (ECCV16) 17

  18. Quiz ● 1. Scene graph consists of object, attribute and ( ). ● A. relationship ● B. tag ● C. visual feature ● D. relative position ● 2. For measuring score, examining the best possible ( ) of the scene graph to the image ● A. reconstruction ● B. grounding ● C. resizing ● D. transformation 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend