referitgame referring to objects in photographs of
play

ReferItGame: Referring to Objects in Photographs of Natural Scenes - PowerPoint PPT Presentation

ReferItGame: Referring to Objects in Photographs of Natural Scenes Motivation First large-scale referring expression dataset Reference expressions are the natural way people talk Of psychological interest in the 70s; Grice,


  1. ReferItGame: Referring to Objects in Photographs of Natural Scenes

  2. Motivation ● First large-scale referring expression dataset ● Reference expressions are the natural way people talk – Of psychological interest in the ‘70s; Grice, Rosch, Winograd ● Application to human-computer interaction, robots ● Introduce – A large-scale dataset of referring expressions – A benchmark model for generating referral expressions

  3. Motivation ● Natural referring expressions are free-form – ‘smiling boy’; only subject – ‘man on left’; subject and preposition ● Other work requires expression as (subj, prep, obj) – ‘cat on the chair’

  4. Dataset ● Build on SAIAPR TC-12 dataset with 238 object categories ● Visual features include segmentations with – absolute properties: area, boundary, width, height… – relative properties: adjacent, disjoint, beside, X-aligned, above…

  5. Dataset ● Player 1 writes an expression referencing the segmented object ● Player 2 clicks on where that object should be – This verifies the expression is reasonable

  6. Dataset ● Collected through Turkers and volunteers – ~130,000 expressions – ~100,000 distinct objects – ~20,000 photographs ● www.referitgame.com is down unfortunately

  7. Dataset ● Parse expressions into 7-tuple set of attributes, R – entry-level category; ‘bird’ – color; ‘blue’ – size; ‘tiny’ – absolute location; ‘top of the image’ – relative location relation; ‘the car to the left of the tree’ – relative location object; ‘the car to the left of the tree’ – generic; ‘wooden’, ‘round’ ● The big old white cabin beside the tree – R = {cabin, white, big, Ø, beside, tree, old} ● StanfordCoreNLP parser and attribute template

  8. Dataset ● Psychology analysis – ‘woman’ often replaced with ‘person’

  9. Dataset ● Attribute use – Roughly half of parsed descriptions are just category

  10. Model ● Optimize R over P and S using ILP – R is 7-tuple set of attributes – P is visual features of object being referred to – S is visual features of the scene ● Different hand-engineered distributions for different attributes ● Unary priors between attribute and object ● Pairwise priors between pairs of attributes

  11. Evaluation ● Three test sets of 500 images each – A contains interesting objects – B contains most frequently occurring interesting objects – C contains interesting objects when multiple are present ● Baseline model – Incorporates only the priors, so no S or attributes ● Humans ~72% accuracy

  12. Critique ● How important is the scene for the attributes? – S is only used for relative location {relation, object} attributes – Absolute location is most commonly used attribute – Over half of parsed descriptions only include object category ● Why don’t the authors include more information on the visual features? – Which visual features are most important? ● Better metric than precision and recall? – Just ask AMT workers if description is reasonable?

  13. Critique ● Why don’t the authors analyze training referral expressions more? – Paid Turk workers per every 10 images – Some human expressions are just the object

  14. Future Work ● Scale up the dataset and train end-to-end with the best neural networks ● Identify referred object instead of generating expression – Done in upcoming MAttNet paper ● Make the images and expressions more challenging

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend