Structured Query-Based Image Retrieval using Scene Graphs Brigit - PowerPoint PPT Presentation

Structured Query-Based Image Retrieval using Scene Graphs Brigit Schroeder , UCSC Subarna Tripathi, Intel Labs

Complexity of Object Interactions for Retrieval woman rides vs woman motorcycle motorcycle ● Structured queries capture complexity of object interactions unlike single objects. ● Visual relationships are directed subgraph with subject and object as nodes connected by a predicate. ● We propose to retrieve images from such queries (NOT from RGB image features) utilizing a learned scene embedding space.

Related Work ● Image Retrieval Using Scene Graphs (Johnson et al., CVPR 2015) ⇒ Use a CRF model to match best possible bounding box groundings from SG to image for retrieval. ● Cross-Modal Scene Graph Matching for Relationship-Aware Image-Text Retrieval (Wang et al., WACV 2020) ⇒ Use cross-modal scene graphs for image-text retrieval relying upon both word embeddings and image features.

Subgraph Query for Retrieval ● Directed subgraphs are extracted from scene graphs (objects as nodes, predicated as edges). ● Each subgraph contains a subject and object as nodes connected by an edge representing a predicate relationship. ● Visual relationship, represented as the above subgraph, posed as structured queries. ● Similarity metric for retrieval in scene embedding space. Scene ● Scene embedding learned via a pretext task Graph (described in the next slide)

Scene Graph Embeddings from Layout Prediction ● Scene graph embedding is learned via a pretext task, layout prediction. ● Layout prediction utilizes object localization for individual objects AND Triplet-superbox regression network and Triplet-mask prediction network (described in the next slide). ● Visual relation as directed subgraph as structured query, such as: Left of giraffe giraffe ● Euclidean distance in this scene embedding space used for retrieval. Learning Scene Graph Embedding

Triplet Mask Network Triplet mask prediction: Triplets containing a <subject,predicate,object> found in a scene graph are used to predict corresponding triplet masks, labelling pixels either as subject and object. The mask prediction is used as supervisory signal during training

Qualitative Retrieval Results Image Retrieval Results . Retrieval for structured queries with object types with varying levels of frequency in COCO-Stuff dataset: (a) head ( person, tree ), (b) (long-tail) medium frequency ( zebra, truck) , and (c) (long-tail) low frequency ( skateboard, skis ). Query is in left-most column corresponding to red boxes.

Quantitative Results Image Retrieval Performance. Recall@k for all classes (left) and long-tail vs. head classes (right) found in COCO-Stuff

Retrieval (NO input RGB image features) Performance Adding a visual relationship-inspired (triplet) loss boosts our recall by 10% in the best case.

Conclusions ● We have trained scene graph embeddings for layout prediction with triplet-based loss functions. ● For the downstream application of image retrieval, we use structured queries formed using the learned embeddings instead of input image content. ● Our approach achieves high recall even on long-tailed object classes in the COCO-Stuff dataset.

Thank You! Please check out our paper online: https://arxiv.org/pdf/2005.06653.pdf Brigit Schroeder Subarna Tripathi UC Santa Cruz Intel Labs brschroe@ucsc.edu subarna.tripathi@intel.com http://www.cs.uml.edu/~bschroed/ https://subarnatripathi.github.io/

Structured Query-Based Image Retrieval using Scene Graphs Brigit - PowerPoint PPT Presentation

Structured Query-Based Image Retrieval using Scene Graphs Brigit Schroeder , UCSC Subarna Tripathi, Intel Labs Complexity of Object Interactions for Retrieval woman rides vs woman motorcycle motorcycle Structured queries capture

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Structured Document Retrieval Benjamin Piwowarski DCC October 28, 2004 B. Piwowarski (DCC)

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Image-Based Rendering and Modeling l Image-based rendering (IBR): A scene is represented as a

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

You Only Look Once: Unified, Real-Time Object Detection Redmon et al., CVPR 2016 Mincheul Kang

CS6501: Deep Learning for Visual Recognition Recognizing People in Images Todays Class

An AI for a Advisor Dr. Chris Pollett Modification of Dr. Mark Stamp Dou Di Zhu Dr. Fabio Di

Tulczyjews Triple in Classical Field Theories: Lagrangian submanifolds of premultisymplectic

Componentwise accurate numerical methods for Markov-modulated Brownian motion Giang T. Nguyen 1

Why is the Probability Space a Triple? Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of

GoBack Enhancing the PRIMME Eigensolver for Computing Accurately Singular Triplets of Large

CPSC 121: Models of Computation Unit 3: Representing Values in a Computer CPSC 121 2011W T2

CORAR Council on Radionuclides and Radiopharmaceuticals, Inc. 3911 Campolindo Drive Moraga, CA

Structured Query-Based Image Retrieval using Scene Graphs Brigit - PowerPoint PPT Presentation

Structured Query-Based Image Retrieval using Scene Graphs Brigit Schroeder , UCSC Subarna Tripathi, Intel Labs Complexity of Object Interactions for Retrieval woman rides vs woman motorcycle motorcycle Structured queries capture

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs &amp; hierarchies

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Structured Document Retrieval Benjamin Piwowarski DCC October 28, 2004 B. Piwowarski (DCC)

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Image-Based Rendering and Modeling l Image-based rendering (IBR): A scene is represented as a

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --&gt; Scene Parsing Scene

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

You Only Look Once: Unified, Real-Time Object Detection Redmon et al., CVPR 2016 Mincheul Kang

CS6501: Deep Learning for Visual Recognition Recognizing People in Images Todays Class

An AI for a Advisor Dr. Chris Pollett Modification of Dr. Mark Stamp Dou Di Zhu Dr. Fabio Di

Tulczyjews Triple in Classical Field Theories: Lagrangian submanifolds of premultisymplectic

Componentwise accurate numerical methods for Markov-modulated Brownian motion Giang T. Nguyen 1

Why is the Probability Space a Triple? Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of

GoBack Enhancing the PRIMME Eigensolver for Computing Accurately Singular Triplets of Large

CPSC 121: Models of Computation Unit 3: Representing Values in a Computer CPSC 121 2011W T2

CORAR Council on Radionuclides and Radiopharmaceuticals, Inc. 3911 Campolindo Drive Moraga, CA

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene