Knowledge Guided Attention and Inference for Describing Images - PowerPoint PPT Presentation

Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger rettinger@kit.edu, http://www.aifb.kit.edu/web/Achim_Rettinger/en, http://www.aifb.kit.edu/web/Inproceedings3603 ADAPTIVE DATA ANALYTICS GROUP INSTITUTE OF APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS (AIFB) KIT – The Research University in the Helmholtz Association www.kit.edu

Multi- Images Lingual Text Knowledge Graphs PD Dr. Achim Rettinger Adaptive Data Analytics Group 2 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Steffen Thoma, Achim Rettinger, Fabian Can we aggregate Both Towards Holistic Concept complementing Representations: Embedding Relational Knowledge, Visual Attributes, and information across Distributional Word Semantics modalities? The Semantic Web – ISWC 2017, Springer, October, 2017 Yes. Cross-modal embeddings do better on several benchmarks. PD Dr. Achim Rettinger Adaptive Data Analytics Group 3 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Fabian Both, Steffen Thoma, Achim Rettinger. Cross-modal Knowledge Transfer: Improving the Word Embedding of Apple by Looking at Oranges. Can we extrapolate cross-modal K-CAP2017, The 9th International Conference on Knowledge Capture, information to entities unseen in ACM, Dezember, 2017 some of the other modalities? Yes. Specifically hyponyms profit more 3M 1.5K than hypernyms. PD Dr. Achim Rettinger Adaptive Data Analytics Group 4 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Aditya Mogadala, Umanga Bista, Lexing Xie and Achim Rettinger. Knowledge Guided Attention and Inference for Describing Images Which Can we extrapolate knowledge Contain Unseen Objects, ESWC 2018 about translating entities across modalities without having seen them during training? ? PD Dr. Achim Rettinger Adaptive Data Analytics Group 5 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

IMAGE CAPTION GENERATION

Visual Object Detection Images on the Web depict a huge variety of visual objects Truffle Mammoth Blackbird Papaya 642 Visual Object Categories by ImageNet PD Dr. Achim Rettinger Adaptive Data Analytics Group 7 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Description Generation for Images Training data for image captioning (i.e. image- caption pairs) cover only a fraction of objects that can be detected by image classifiers. 80 MSCOCO Visual Object Categories PD Dr. Achim Rettinger Adaptive Data Analytics Group 8 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Challenge - Missing Captions for Images Parallel caption training examples are missing for images containing visual object category “ pizza ”. Caption Generation A man is making a sandwich in a with Standard restaurant. Model A man is holding a pizza in his Expected from hands. Model PD Dr. Achim Rettinger Adaptive Data Analytics Group 9 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Related Work Approaches that can handle unseen objects. Caption PD Dr. Achim Rettinger Adaptive Data Analytics Group 10 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Missing in Related Work Our attention mechanism learns to Attention focus on the salient aspects in the image for caption generation. Inference Transfer either before or during inference. We do both. PD Dr. Achim Rettinger Adaptive Data Analytics Group 11 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

KNOWLEDGE GUIDED ATTENTION AND INFERENCE

Our Contributions ESA Introduce an attention mechanism into the caption generation model from External Semantic Knowledge (ESA) provided by a knowledge graph (KG) CI Constraint before and during Inference (CI) for transferring information between seen words and unseen visual object categories by exploiting external semantic knowledge provided by a knowledge graph (KG). PD Dr. Achim Rettinger Adaptive Data Analytics Group 13 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Knowledge-Guided Assistance Caption Generation (KGA-CGM) Multi Word-Label Classifier p EOS ~ y EOS p 0 ~ y 0 p t ~ y t p t+1 ~ y t+1 ... Softmax Softmax Softmax Softmax TSV Layer TSV Layer TSV Layer TSV Layer Visual Features c L-1 c BOS c t-1 c t Pizza Restaurant Multi Entity-Label Classifier Chef Entity Vectors Hat Camera Restaurant I I I ... LSTM LSTM LSTM LSTM L2-F ... Node1 I Node2 F Node6 L1-F ... ... LSTM LSTM I LSTM LSTM Node3 Node4 P {pizza,restaurant,hat,chef,camera} W P F Chef Pizza ... W w L-1 w BOS w t-1 w t Node5 Partial Scene Graph Grounding Language Model (Image->KB) PD Dr. Achim Rettinger Adaptive Data Analytics Group 14 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

External Sematic Attention Multi Word-Label Classifier p 0 ~ y 0 Softmax TSV Layer Visual Features c BOS Pizza Restaurant Multi Entity-Label Classifier Chef Entity Vectors Hat Camera Restaurant I I I LSTM L2-F ... Node1 I Node2 F Node6 L1-F ... LSTM I Node3 Node4 P {pizza,restaurant,hat,chef,camera} W P F Chef Pizza W w BOS Node5 Partial Scene Graph Grounding (Image->KB) PD Dr. Achim Rettinger Adaptive Data Analytics Group 15 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

TSV Layer Multi Word-Label Classifier p 0 ~ y 0 Softmax TSV Layer Visual Features c BOS Pizza Restaurant Multi Entity-Label Classifier Chef Entity Vectors Hat Camera Restaurant I I I LSTM L2-F ... Node1 I Node2 F Node6 L1-F ... LSTM I Node3 Node4 P {pizza,restaurant,hat,chef,camera} W P F Chef Pizza W w BOS Node5 Partial Scene Graph Grounding (Image->KB) PD Dr. Achim Rettinger Adaptive Data Analytics Group 16 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

[UnseenObj17] Inference – Generating unseen objects Input: M= { W he , W h 2 t , W c t , W I t } Output: M new 1 Initialize List(closest) = cosine distance(List(unseen),vocabulary) ; 2 Initialize W c t [ v unseen ,:], W h 2 t [ v unseen ,:], W I t [ v unseen ,:] = 0 ; 3 Function Before Inference forall items T in closest and Z in unseen do 4 if T and Z is vocabulary then 5 W c t [ v Z ,:] = W c t [ v T ,:] ; 6 t [ v Z ,:] = W h 2 t [ v T ,:] ; W h 2 7 W I t [ v Z ,:] = W I t [ v T ,:] ; 8 end 9 if i T and i Z in visual features then 10 W I t [ i Z , i T ]=0 ; 11 W I t [ i T , i Z ]=0 ; 12 end 13 end 14 M new = M ; 15 return M new ; 16 17 end PD Dr. Achim Rettinger Adaptive Data Analytics Group 17 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

EVALUATION

Evaluation Setup 8 held out objects from MSCOCO • Image-Caption Pairs: 70K Training, 20K Validation, 20K Testing • CNN Architectures: VGG16 [Simoyan et. Al. 2014] • Unpaired Textual Corpus: British National Corpus, Wikipedia, SBU1M • Entity Vectors: RDF2Vec [Ristoski et. Al. 2014] • Evaluation Metrics: Meteor, Spice, F1 • Microwave, Racket, Bottle, Zebra, Pizza, Couch , Bus, Suitcase PD Dr. Achim Rettinger Adaptive Data Analytics Group 19 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Qualitative Results Unseen Object: Zebra Predicted Entity-Labels (Top-3): Zebra,Enclosure,Zoo Base: A couple of animals that are standing in a field NOC: Zebras standing together in a field with zebras KGA-CGM: A group of zebras standing in a line Unseen Object: Pizza Predicted Entity-Labels (Top-3): Pizza,Restaurant,Hat Base: A man is making a sandwich in a restaurant NOC: A man standing next to a table with a pizza in front of it. KGA-CGM: A man is holding a pizza in his hands PD Dr. Achim Rettinger Adaptive Data Analytics Group 20 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Quantitative Results F1-Score KGA-CGM (our proposed model). Underline represent second best PD Dr. Achim Rettinger Adaptive Data Analytics Group 21 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Quantitative Results METEOR KGA-CGM (our proposed model) and underline represent second best PD Dr. Achim Rettinger Adaptive Data Analytics Group 22 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Scaling it by an order of magnitude Unseen Object: Truffle Guidance Before Inference: food → truffle Base: A person holding a piece of paper . KGA-CGM: A close up of a person holding truffle Unseen Object: Papaya Guidance Before Inference: banana → papaya Base: A woman standing in a garden . KGA-CGM: These are ripe papaya hanging on a tree PD Dr. Achim Rettinger Adaptive Data Analytics Group 23 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Knowledge Guided Attention and Inference for Describing Images - PowerPoint PPT Presentation

Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger rettinger@kit.edu, http://www.aifb.kit.edu/web/Achim_Rettinger/en,

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Guided Therapeutics in Cancer Surgery Guided Therapeutics in Cancer Surgery Guided Therapeutics

MVC Guided Pathways Brief review of Guided Pathways at MVC Plan for Today Spring

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Neurally-Guided Structure Inference http://ngsi.csail.mit.edu Sidi Lu, Jiayuan Mao, Josh

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

The Attention Economy What is the attention economy? A business model where you (as the

Year 3 Guided Pathways Plan Presentation Presented by: Palomar Guided Pathways Team DATE: May

Guided Pathways Equity & Education Update Feb 7, 2020 Guided Pathways Decision Making

Towards Knowledge-guided Genetic Improvement [1] GI@ICSE 3. July 2020 Abstract -- Grammar-guided

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Describing and summarizing data Describing and summarizing data Abhijit Dasgupta Abhijit

For Describing Uncertainty, Which Set S 0 Should . . . Ellipsoids Are Better than Main

Multimodal Learning for Image Captioning and Visual Question Answering Xiaodong He Deep Learning

Preventing and Managing Overpayments: A Webinar for Social Security Beneficiaries Date:

Mon Month th Agenda Agenda Preparedness Barriers National Preparedness Month Objectives

Neural network architectures for image captioning By Emily Kern Given a set of images and

Br Broadcastin ing Access ssib ibil ilit ity Fund Meeting the Challenge of Content

TECHNOLOGY SERVICES Presenter: Mike Finch Technology Services Director FY 19-20 Proposed Budget

Resources for Remote Learning Presented by: Riann Batch, Jamie Corpuz, and Sarah Paziuk May 19,

Sa Safe Harbor St Statement Th This presentation contains forward-lo lookin ing st

Knowledge Guided Attention and Inference for Describing Images - PowerPoint PPT Presentation

Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger rettinger@kit.edu, http://www.aifb.kit.edu/web/Achim_Rettinger/en,

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Guided Therapeutics in Cancer Surgery Guided Therapeutics in Cancer Surgery Guided Therapeutics

MVC Guided Pathways Brief review of Guided Pathways at MVC Plan for Today Spring

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Neurally-Guided Structure Inference http://ngsi.csail.mit.edu Sidi Lu*, Jiayuan Mao*, Josh

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

The Attention Economy What is the attention economy? A business model where you (as the

Year 3 Guided Pathways Plan Presentation Presented by: Palomar Guided Pathways Team DATE: May

Guided Pathways Equity &amp; Education Update Feb 7, 2020 Guided Pathways Decision Making

Towards Knowledge-guided Genetic Improvement [1] GI@ICSE 3. July 2020 Abstract -- Grammar-guided

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Describing and summarizing data Describing and summarizing data Abhijit Dasgupta Abhijit

For Describing Uncertainty, Which Set S 0 Should . . . Ellipsoids Are Better than Main

Multimodal Learning for Image Captioning and Visual Question Answering Xiaodong He Deep Learning

Preventing and Managing Overpayments: A Webinar for Social Security Beneficiaries Date:

Mon Month th Agenda Agenda Preparedness Barriers National Preparedness Month Objectives

Neural network architectures for image captioning By Emily Kern Given a set of images and

Br Broadcastin ing Access ssib ibil ilit ity Fund Meeting the Challenge of Content

TECHNOLOGY SERVICES Presenter: Mike Finch Technology Services Director FY 19-20 Proposed Budget

Resources for Remote Learning Presented by: Riann Batch, Jamie Corpuz, and Sarah Paziuk May 19,

Sa Safe Harbor St Statement Th This presentation contains forward-lo lookin ing st

Neurally-Guided Structure Inference http://ngsi.csail.mit.edu Sidi Lu, Jiayuan Mao, Josh

Guided Pathways Equity & Education Update Feb 7, 2020 Guided Pathways Decision Making