NEURO-SYMBOLIC VISUAL REASONING: DISENTANGLING VISUAL FROM REASONING - PowerPoint PPT Presentation

NEURO-SYMBOLIC VISUAL REASONING: DISENTANGLING “VISUAL” FROM “REASONING” HAMID PALANGI SAEED AMIZADEH ALEX POLOZOV HPALANGI@MICROSOFT.COM SAAMIZAD@MICROSOFT.COM POLOZOV@MICROSOFT.COM YICHEN HUANG KAZUHITO KOISHIDA 8/14/2020 YICHUANG@MIT.EDU KAZUKOI@MICROSOFT.COM NEURO-SYMBOLIC VISUAL REASONING 1

VISUAL QUESTION ANSWERING Language Signal [ GQA: Hudson & Manning, 2019] Q : “What color is the food on the red object Reasoning left of the small girl that Answer is holding a hamburger?” Visual Perception VQA Model Visual Signal A : “Yellow.” 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 2

REASONING LOGICAL REASONING + EXTRA CAPABILITIES Pure logical reasoning does not often suffice for visual reasoning because visual perception is noisy and uncertain. Example: imperfect visual perception classifies . Then, Yet “ in the living room ” or the visual context should resolve the ambiguity. 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 3

RESEARCH QUESTIONS 1. Given a visual featurization of a visual scene, how informative is on its own to answer a question about the scene without learned reasoning? 2. How solvable is VQA/GQA given perfect vision? 3. For an arbitrary VQA model , how much its reasoning abilities can compensate for the imperfections in perception to solve the task? 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 4

OUR CONTRIBUTIONS Test-Dev Base Model 𝝔 Easy Set Hard Set (II) Evaluation of Reasoning vs. Perception (I) Differentiable First-Order Logic ( -FOL) for Visual Description & Reasoning for VQA models using -FOL 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 5

FIRST ORDER LOGIC FOR SCENE DESCRIPTION Scene Graph Representation FOL Representation “There is a cat to the left of all objects.” Mug Cat Phone � - Variables enumerates over detected objects. Left - Atomic Predicates represent object names, Left attributes and binary relations. Pen - Formulas represent a statement or a question about the scene. 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 6

FOL FOR POSING A HYPOTHETICAL QUESTION Scene Graph Representation FOL Representation “There is a cat to the left of all objects.” Mug Cat Phone � Left “Is there a cat to the left of all objects?” Left This question can be answered probabilistically by evaluating the likelihood: Pen 𝑹 𝑹 exponentially hard to calculate directly  8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 7

-FOL: INFERENCE IN POLYNOMIAL TIME In order to do inference in polynomial time, we introduce the intermediate notion of attention on the object � w.r.t. formula : Where 𝒋 𝒀�𝒚 𝒋 𝒀�𝒚 𝒋 𝒋 Then the answer likelihood can be reduced to computing attention via aggregation operators ∀ and ∃ : 𝑶 𝑶 𝒋 ∀ 𝒋 ∃ 𝒋�𝟐 𝒋�𝟐 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 8

-FOL: RECURSIVE CALCULATION OF ATTENTION Smaller FOL Negation Operator NOT formula 𝜷 𝑮|𝒚 𝒋 = 𝟐 − 𝜷 𝑯|𝒚 𝒋 ≜ 𝐎𝐟𝐡[𝜷 𝑯|𝒚 𝒋 ] Every Smaller FOL Filter Operator Unary AND FOL formula Predicate 𝜷 𝑮|𝒚 𝒋 = 𝜷 𝝆|𝒚 𝒋 . 𝜷 𝑯|𝒚 𝒋 ≜ 𝐆𝐣𝐦𝐮𝐟𝐬 𝛒 [𝜷 𝑯|𝒚 𝒋 ] formula Relate Operator Smaller FOL Binary AND 𝜷 𝑮|𝒚 𝒋 = 𝑩 𝒓 � 𝜷 𝝆|𝒚 𝒋 , 𝒁 ⊙ 𝜷 𝑯|𝒁 formula Predicate 𝝆∈𝚸 𝐘𝐙 ≜ 𝐒𝐟𝐦𝐛𝐮𝐟 𝐫,𝚸 𝐘𝐙 [𝜷 𝑯|𝒁 ], ∀𝒋 ∈ 𝟐. . 𝑶 , 𝒓 = 𝑹𝒗𝒃𝒐𝒖𝒋𝒈𝒋𝒇𝒔 𝒁 ∈ {∃, ∀} 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 9

THE LANGUAGE SYSTEM: FROM NATURAL LANGUAGE TO FOL FORMULA Natural “Is there a ball on the table?” Language Semantic parsing Task-dependent Select (Table)  Relate(on, Ball)  Exists(?) DSL Compilation Task-independent ∃ 𝐂𝐛𝐦𝐦 𝐩𝐨,∃ 𝐔𝐛𝐜𝐦𝐟 -FOL Equivalence First-order Logic 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 10

GQA DOMAIN SPECIFIC LANGUAGE 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 11

VISUAL SYSTEM: FROM IMAGE TO PREDICATES Off-the-shelf Object 𝒋 Featurization Detection Detection Object Object (e.g. Faster- RCNN, Ren et al. 2015) 𝒋 Neural Visual Oracle 𝒋 Neural Visual 𝒋 Oracle . Man … . Cat Dog . Queried Predicates 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 12

THE WHOLE SYSTEM 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 13

USING -FOL TO EVALUATE PERCEPTION Q1: Given a visual featurization for a certain VR task, how informative is on its own to solve the task using mere FOL for reasoning? For GQA: The visual featurization is the Faster-RCNN featurization [Ren et. al, 2015]. 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 14

BUILDING THE BASE MODEL The Base Model Golden Programs 1) Put -FOL on the top of a neural Visual Oracle . 2) Train the resulted architecture using the Faster-RCNN featurization, the golden programs and golden answers in GQA via indirect supervision from the answer. 3) Denote the result as the Base Model 𝝔 . 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 15

USING -FOL TO EVALUATE PERCEPTION Q1: Given a visual featurization for a certain VR task, how informative is on its own to solve the task using mere FOL for reasoning? -FOL has no trainable parameters, so the accuracy of 𝝔 on test data indirectly captures the amount of information in . 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 16

USING -FOL TO MEASURE THE IMPORTANCE OF PERCEPTION Q2 : how well a VR task can be achieved given perfect vision? For GQA: What happens if we replace the visual system by the Golden Scene Graphs? 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 17

BUILDING THE PERFECT MODEL The Perfect Model Golden Programs 1) Replace the trained in 𝝔 with the golden GQA scene ∗ . graphs, denoted as 2) Denote the result as the ∗ . Perfect Model Golden Scene Graphs 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 18

USING -FOL TO MEASURE THE IMPORTANCE OF PERCEPTION Q2 : how well a VR task can be achieved given perfect vision? ∗ on the GQA validation set is The accuracy of 96% . Achieving such high upper-bound shows that:  The -FOL is sound.  The GQA task is heavily vision-dependent. 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 19

USING -FOL TO EVALUATE REASONING Q3: How much the reasoning abilities of a candidate model can compensate for the imperfections in perception to solve the task? is arbitrary! Need not be DFOL-based. Important: For GQA: we compare MAC Network [Hudson & Manning, 2018] vs LXMERT [Tan & Bansal, 2019]. 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 20

HARD SET VS EASY SET The accuracy of on the hard set Test-Dev ( 𝒊 ) captures the amount the reasoning process of compensates for its imperfect perception. Base Model 𝝔 The error of on the easy set ( 𝒇 ) captures the degree to which the reasoning process of distorts the Easy Set Hard Set informative visual signals. 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 21

USING -FOL TO EVALUATE REASONING Q3: How much the reasoning abilities of a candidate model can compensate for the imperfections in perception to solve the task? 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 22

CONCLUSION REMARKS In this work, we 1. Proposed a differentiable visual description and reasoning formalism directly derived from first order logic. 2. Proposed coherent methodology for separately evaluating perception and reasoning using our differentiable first order logic formalism. 3. Incorporated our framework for the GQA task and two of its famous models and arrived at insightful observations. Thank you  8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 23

SUPPLEMENTAL MATERIALS 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 24

MODELING OPEN QUESTIONS USING FOL For open questions, we generate all potential options for the answer, treat each option as a binary question and choose the one with highest likelihood. For example: “ What is the color of the ball on the left of all objects ?” can be answered by answering a set of binary questions: “Is the ball on the left of all objects blue?”  𝑹 𝟐 “Is the ball on the left of all objects red?”  𝑹 𝟑 “Is the ball on the left of all objects green?”  𝑹 𝟒 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 25

BEYOND PURE LOGICAL REASONING: TOP-DOWN CONTEXTUAL CALIBRATION Example of a reasoning technique beyond pure DFOL: Reminder: suppose . Then, However, the context “ in the living room ” should help resolve the ambiguity. In other words, the context can be used to calibrate the attentions values in the top-down manner. 8/14/2020 NEURO-SYMBOLIC VISUAL REASONING 26

NEURO-SYMBOLIC VISUAL REASONING: DISENTANGLING VISUAL FROM REASONING - PowerPoint PPT Presentation

NEURO-SYMBOLIC VISUAL REASONING: DISENTANGLING VISUAL FROM REASONING HAMID PALANGI SAEED AMIZADEH ALEX POLOZOV HPALANGI@MICROSOFT.COM SAAMIZAD@MICROSOFT.COM POLOZOV@MICROSOFT.COM YICHEN HUANG KAZUHITO KOISHIDA 8/14/2020

Decidability Decidability and Symbolic Symbolic Verification Symbolic Symbolic Verification

OBJECTIVES: THE NEURO EXAM IN THE 1) REVIEW THE NEURO

EMBODIED NEURO- SYMBOLIC COMPUTATION SERGE THILL SERGE.THILL@HIS.SE CONTENTS (from

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

Automated Reasoning Course Presentation Summary Automated Reasoning Motivations Course Plan

Efferent Visual Dysfunction in Neuro-degenerative Diseases: Clinical Pearls MJ Thurtell

PERCEPTUAL ACCOUNT OF SYMBOLIC REASON LANDY, ALLEN, ZEDNIK 2014 Rishav Raj Agarwal Arpit Agarwal

THE NEURO EXAM IN THE ALTERED PATIENT Hugh H. West, M.D. Associate Professor UCSF Dept. of EM HREM

Evidential and Causal Reasoning Much reasoning in AI can be seen as evidential reasoning ,

Overview of neuro-symbolic processing in Neural Blackboard Architectures Frank van der Velde

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Consequences of Neuro-Visual q Processing Dysfunction Affecting Balance Posture Balance,

A Corpus of Natural Language for Visual Reasoning Cornell Natural Language Visual Reasoning

Visual Question Answering and Visual Reasoning Zhe Gan 6/15/2020 Overview Goal of this part

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and

SECTION 1: Introductions Code Reasoning Forward Reasoning CODE REASONING +

Continued fraction expansions and generalized indefinite strings Jonathan Eckhardt Loughborough

The Constituency of Hyperlinks in a Hypertext Corpus . mitcho (Michael Yoshitaka Erlewine)

Studying the asymptotic structure of solutions of hydrodynamical equations W. Pauls Max Planck

Create compelling tests with a risk based approach @sammy_lee12 A bit about Tyro Banking

Direct Complementarity Jonathan Weinstein May 11, 2020 ICERM Conference Brown University How

Wavelets meet Burgulence : CVS-filtered Burgers equation Romain Nguyen van yen a , Marie Farge a ,

Ontology-based automatic generation of computerized cognitive exercises Giorgio Leonardi a,c ,

Networking Overview: Everything you need to know, in 50 minutes CS 161: Computer Security