Learning quantities from vision and language Raffaella Bernardi - PowerPoint PPT Presentation

Learning quantities from vision and language Raffaella Bernardi University of Trento March 23, 2017 Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 1 / 44

Cardinals and Quantifiers Three of the animals are dogs. vs. Most of the animals are dogs Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 2 / 44

Quantifiers: are they in a scale? Expected abstract scale: < no, few, some, most, all > Q. How do we learn they are in this order? Q. Do we take this order into account when using them? Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 3 / 44

Litteral vs. Pragmatic meaning What do we learn from language, what from vision, what from both? Conjecture 1: we can learn their litteral meaning (respecting the abstract scale) from images . Conjecture 2: they can be represented by a cross-modal function . Conjecture 3: text corpora could help learning their use . Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 4 / 44

New Challenge for CV From content words to Function words Most tasks considered so far involve processing of objects and lexicalised relations amongst objects ( content words ). Humans (even pre-school children) can abstract over raw data to perform certain types of higher-level reasoning, expressed in natural language by function words . Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 5 / 44

Operations inolved in quatifying A logical strategy Quantifiers require: 1 an approximate number estimation mechanism, acting over the relevant sets in the image; 2 a quantification comparison step. A “logical” strategy: 1 from raw data to abstract set representation 2 from the latter to quantifiers. Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 6 / 44

Comparison step Look, some green circles!: Learning to quantify from images (Sorodoc et al., 2016): Very high results: NNs should be able to learn the second subtask quite easily. Is the “logical” strategy a good one? Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 7 / 44

Learning quantification from images Layout Learning quantification from images 1 Quantifiers vs. Cardinals 2 Behavioral Study 3 Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 8 / 44

Learning quantification from images Learning quantification from images Pay attention to those sets! Learning quantification from images Sorodoc et. al. just submitted. (a) (b) (c) (d) (e) Query: fish are red. Answers: (a) All, (b) Most, (c) Some, (d) Few, (e) No. Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 9 / 44

Learning quantification from images Not raw data: All sorts of variances in place The system cannot memorize correlations between type of objects and quantifiers property of objects and quantifiers number of objects and quantifiers Quite challenging! Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 10 / 44

Learning quantification from images Quantifiers as proportions Q of the fish are red . �� restrictor scope We take quantifiers to be a fiexed relation: | scope ∩ restrictor | ( e . g . | red ∩ fish | ) | restrictor | | fish | Prevalence estimates (Khemlain et al 2009): No: 0% Few: 1% - 17% (inc) Some: 17 % - 70% Most: 70% (inc) - 99% (inc.) All: 100% Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 11 / 44

Learning quantification from images Computer Vision Models Start simple: concatenation. CNN+BOW Zhou et al. Simple Baseline for Visual Question Answering 2015 (iBOWIMG) Memorize correlations, no higher level abstraction Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 12 / 44

Learning quantification from images Computer Vision Models Lesson learned from SoA: Memory and Attention Memory process new information based on previous ones. (LSTM, GRU) Attention Mechanism Use language to help making the representation of the image more focused Stacked Attention use language to focus the visual representation and use the later to focus the linguistic representation. Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 13 / 44

Learning quantification from images Sequential Processing CNN+LSTM model Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 14 / 44

Learning quantification from images Attention Mechanism: SAN’s attention layer Yang, Z., et al. (CVPR 2016). Stacked attention networks (SAN) for image question answering. Linguistic Nonlinear input transformation Linear Tanh Softmax transformation transformation transformation + + Visual + input + + + Gist Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 15 / 44

Learning quantities from vision and language Raffaella Bernardi - PowerPoint PPT Presentation

Learning quantities from vision and language Raffaella Bernardi University of Trento March 23, 2017 Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 1 / 44 Cardinals and Quantifiers Three of the animals are dogs.

Radiation Quantities and Radiation Quantities and Units Units George Starkschall, Ph.D. George

Measurable Quantities: T , P , V Thermodynamic Balances: S , H , U , G (Gibbs Free Energy), A

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Vision, Language, Interaction and Generation Qi Wu Australian Institute for Machine Learning

EXPOSURE MONITORING AND DRLS IN FLUOROSCOPY AND FGI PROCEDURES QUANTITIES, PROCEDURES, METHODS

Dual Finite Element Formulations and Associated Global Quantities for Field-Circuit Coupling

Lecture 11 : The Basic Numerical Quantities Associated to a Continuous X 0/ 25 In this lecture we

Estimating Differential Quantities using Polynomial fitting of Osculating Jets

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Structure: Exposition General problem Purpose / stated goal(s) Experimental setup

Bias and Generalization in Deep Generative Models Shengjia Zhao, Hongyu Ren, Arianna Yuan,

Data Preprocessing Week 2 Topics Topics Data Types Data Repositories Data

Motivation Garbage-in, garbage-out Cannot get good mining results from bad data Data

Prototype Selection Using Polyhedron Curvature Benyamin Ghojogh, Fakhri Karray, Mark Crowley

SCATTERPLOTS: TASKS, DATA AND DESIGN A. Sarikaya and M. Gleicher Presented By: IEEE Transaction

Money illusion Jean-Robert Tyran U Vienna Outline Introduction Evidence Surveys

Quantified Sentences as a Window into Prediction and Priming: An ERP Study Aniello De Santo Jon

Learning quantities from vision and language Raffaella Bernardi - PowerPoint PPT Presentation

Learning quantities from vision and language Raffaella Bernardi University of Trento March 23, 2017 Raffaella Bernardi (University of Trento) LaVi: quantifiers March 23, 2017 1 / 44 Cardinals and Quantifiers Three of the animals are dogs.

Radiation Quantities and Radiation Quantities and Units Units George Starkschall, Ph.D. George

Measurable Quantities: T , P , V Thermodynamic Balances: S , H , U , G (Gibbs Free Energy), A

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Vision, Language, Interaction and Generation Qi Wu Australian Institute for Machine Learning

EXPOSURE MONITORING AND DRLS IN FLUOROSCOPY AND FGI PROCEDURES QUANTITIES, PROCEDURES, METHODS

Dual Finite Element Formulations and Associated Global Quantities for Field-Circuit Coupling

Lecture 11 : The Basic Numerical Quantities Associated to a Continuous X 0/ 25 In this lecture we

Estimating Differential Quantities using Polynomial fitting of Osculating Jets

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Structure: Exposition General problem Purpose / stated goal(s) Experimental setup

Bias and Generalization in Deep Generative Models Shengjia Zhao*, Hongyu Ren*, Arianna Yuan,

Data Preprocessing Week 2 Topics Topics Data Types Data Repositories Data

Motivation Garbage-in, garbage-out Cannot get good mining results from bad data Data

Prototype Selection Using Polyhedron Curvature Benyamin Ghojogh, Fakhri Karray, Mark Crowley

SCATTERPLOTS: TASKS, DATA AND DESIGN A. Sarikaya and M. Gleicher Presented By: IEEE Transaction

Money illusion Jean-Robert Tyran U Vienna Outline Introduction Evidence Surveys

Quantified Sentences as a Window into Prediction and Priming: An ERP Study Aniello De Santo Jon

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Bias and Generalization in Deep Generative Models Shengjia Zhao, Hongyu Ren, Arianna Yuan,