Reasoning about Fine-grained Attribute Phrases using Reference Games - PowerPoint PPT Presentation

Reasoning about Fine-grained Attribute Phrases using Reference Games Jong-Chyi Su* Chenyun Wu* Huaizu Jiang Subhransu Maji   University of Massachusetts, Amherst ICCV 2017

Expert-designed Attributes Is military plane? No Is propellor plane? No ✔ Modular - an instance can be described by a set of attributes ✘ A fixed set of attributes designed by experts before collecting the dataset (49 attributes from OID-Aircraft [1] ) [1] Vedaldi et al., Understanding Objects in Detail with Fine-grained Attributes, CVPR , 2014. 2

Image Captions A large Air France jet sitting on top of a runway. � Usually a longer sentence describing many aspects ✔ Compositional language-based ✘ Not designed to describe di ff erences between a pair of images 3

Image Captions A large airplane on a runway. A large Air France jet sitting on top of a runway. � Usually a longer sentence describing many aspects ✔ Compositional language-based ✘ Not designed to describe di ff erences between a pair of images 4

New Dataset - “Attribute Phrases” • Short phrases describing visual di ff erences within a pair of images sampled from di ff erent categories • 9400 image pairs in total Facing right vs. Facing left vs. Jet engine Propeller In the air vs. On the ground vs. Two-tone gray body Red and white body Closed cockpit vs. Open cockpit vs. Pointed nose Flat nose White and green vs. White and blue color vs. Grounded In flight Propeller spinning vs. Propeller stopped vs. No pilot visible Pilot visible ✔ Modular like attributes ✔ Compositional and free-form like image captions ✔ More expressive and discriminative at fine-grained level 5

Attribute Phrases • How to generate? “Blue plane vs. Red plane” • How to evaluate? “Red plane” • Use reference game 6

Reference Game • Refer It Game [1] • RefCOCO [2] Generation Comprehension • Refer to a specific object in an image • Usually focus on the category, spatial relationship etc. • Our task focuses on attributes that enable fine-grained discrimination with instances of a category [1] Kazemzadeh et al. "ReferItGame: Referring to Objects in Photographs of Natural Scenes”, EMNLP, 2014. [2] Yu et al. "Modeling Context in Referring Expressions”, ECCV, 2016. 7

Overview of Our Model • Generation task - speaker model • Comprehension task - listener model “Red plane” Speaker Listener 1. Train the speaker and listener model separately 2. Use the listener model to evaluate the speaker model 3. Rerank phrases by the listener, then evaluate by human 8

Use Listener Model for Comprehension Task “Red plane” Listener • Task : Given an attribute phrase and two images, find which image it is referring to • Method : Measure the similarity between the attribute phrase and images in a common embedded space 9

Use Speaker Model for Generation Task Speaker “Red plane” • Task : Given two images, generate discriminative attributes • Method : Use the image captioning model [1] as the speaker model [1] Vinyals et al., Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, TPAMI , 2016. 10

Variances of the Speaker Model Speaker Listener “Red plane” Red Red DS SS vs. plane Blue - Simple Speaker (SS) : Given one image, generate one phrase - Discerning Speaker (DS) : Given two images, generate a pair of phrases Speaker Top Accuracy (%) • Use the listener model to 1 81.7 evaluate the quality of the SS 5 80.6 generated phrases 10 80.0 ~10% • DS generates better attribute 1 92.8 phrases than SS DS 5 91.4 10 90.5 11

Discerning Speaker Generate Better Phrases Ground Truth: (Human generated) 1) small size VS large size 2) single seat VS more seated 3) facing left VS facing right 4) private VS commercial 5) wings at the top VS wings at the bottom DS: SS: 1) private plane VS commercial plane 1) no engine 2) private VS commercial 2) small 3) small plane VS large plane 3) private plane 4) facing left VS facing right 4) on the ground 5) short VS long 5) propellor engine 6) white VS red 6) on ground 7) high wing VS low wing 7) glider 8) small VS large 8) white color 9) glider VS jetliner 9) small plane 10) white and blue color VS 10) no propeller Some phrases are correct white red and blue color but not discriminative 12

Pragmatic Speaker Helps � Red plane � Red plane � Glider � Propellor engine Re-rank by ? Facing left ? Facing left Speaker Listener � Propellor engine � Glider … … 1. Use speaker to generate attribute phrases 2. Re-rank the phrases by the scores from the listener model � More discriminative phrases on the top SS + Re-ranking: SS: DS: DS + Re-ranking: ✔ commercial plane ✔ passenger plane ✔ commercial plane ✔ commercial plane ✔ large ✔ jet engine ? white ? facing right ✔ large size ✔ jet engine ✔ turbofan engine ✔ turbofan engine ✔ jet engine ✔ on concrete ✔ twin engine ? facing right ✔ on runway ✔ commercial plane ✔ t tail ✔ on concrete ✔ passenger plane ✘ _UNK ✔ jet engine ✔ multi seater ? on the ground ✔ twin engine ✔ t tail ? on the ground ✘ _UNK ✔ large ✔ multi seater ✔ white and red ✔ large size ? white ✔ white and red ? facing right ✔ white colour with red stripes ✔ on runway ? facing right ✔ white colour with red stripes [1] Andreas et al., “Reasoning About Pragmatics with Neural Listeners and Speakers”, EMNLP , 2016 13

Pragmatic Speaker Helps • Use human listener for evaluation: • Given a attribute phrase, let users choose the image among two Original A7er Re-ranking Speaker Top Acc. (%) Acc. (%) 1 82.0 95.0 Discerning 5 80.2 90.0 Speaker 7 79.1 86.7 Re-ranking improves ~10% on top-5 accuracy 14

Are Attribute Phrases Better than Expert-designed Attributes? • Use attribute as the feature for fine-grained classification task • Use our listener model to get the scores between the image and the top-k most frequent attribute phrases • Use expert-designed 46 attributes from OID dataset • Test on FGVC-Aircraft dataset [1] (100 classes) • ~20% improvement Attribute phrases ~24% ~32% OID attributes ~12% 15 [1] Maji et al., Fine-grained Visual Classification of Aircraft, arXiv:1306.5151 , 2013.

Generate Attribute for Sets • Select two categories (A,B), generate attribute phrases for randomly selected image pairs (Im 1 ∈ A, Im 2 ∈ B) • Sort them by frequency 747-400 ATR-42 large plane private plane more windows less windows commercial plane medium plane more windows on body propellor engine big plane fewer windows on body commercial small plane jet engine private turbofan engine propeller engine engines under wings stabilizer on top of tail on ground british airways 16

Use the Listener Model for Image Retrieval • Query : attribute phrase(s) • Get scores of the query phrase and test images by the listener model • We show top 18 images ranked by the scores 17

t-SNE Embeddings of Attribute Phrases from the Listener Model Large commercial planes Military planes 18

Thank you! Dataset and Code are available at: 19

Reasoning about Fine-grained Attribute Phrases using Reference Games - PowerPoint PPT Presentation

Reasoning about Fine-grained Attribute Phrases using Reference Games Jong-Chyi Su* Chenyun Wu* Huaizu Jiang Subhransu Maji University of Massachusetts, Amherst ICCV 2017 Expert-designed Attributes Is military plane? No Is propellor

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

Fine-Grained Access Control Fine Grained Access Control Fine-grained access control examples:

Adverbial Phrases Aim Aim To identify prepositional phrases and adverbial phrases To

2 Syntax 2.1 Words 2.2 The Elements of Simple Noun Phrases 2.3 Verb Phrases and Simple

Fine-Grained Geographic Communication (Geocast) Nexus Workshop Frank Drr 23.07.2003 1

Average-Case Fine-Grained Hardness Marshall Ball Alon Rosen Manuel Sabin Prashant Nalini

Fine-grained Visual Analysis: From Classification to Retrieval Yi-Zhe Song SketchX Lab, CVSSP,

Mechanized Verification of Fine-grained Concurrent Programs Ilya Sergey Aleks Nanevski

Fine-Grained Power Modeling for Smartphones Using System Call Tracing Based on paper and

Why attribute-based signatures? The kind of authentication required in an attribute-based system

Identification of Fine Grained Feature Based Event and Sentiment Phrases from Business News

Junfeng Fan ESAT/COSIC ECC implementation methods Multi-core systems Coarse-Grained

The Basics of Syntax Introducing Noun Phrases Some Further Details Introducing Verb Phrases

Automated Reasoning Course Presentation Summary Automated Reasoning Motivations Course Plan

Combining Data-Intense and Compute-Intense Methods for Fine-Grained Morphological Analyses Petra

Fine-Grained Tracking of Grid Infections Ashish Gehani SRI Basim Baig, Salman Mahmood, Dawood

Aircraft Group Overview Executive Summary Page | 1 Moog When Performance Really Matters

Civil Remotely Piloted Aircraft System (RPAS) Regulations in Australia Cees Bil School of

Vectors Vectors and Scalars Properties of Vectors Components of a Vector and Unit

Planning and Scheduling in Aerospace Applications with Simulators Only Florent

Probability and Statistics for Computer Science A major use of probability in sta4s4cal

PLANES MATH 200 MAIN QUESTIONS FOR TODAY How do we describe planes in space? Can we find

Aircraft Operational Reliability - A Model-based Approach Kossi Tiassou, Mohamed

Enhanced Approach to Model Air Quality Impacts of Aircra8