Language and Vision: where we are and where we could go next - PowerPoint PPT Presentation

Language and Vision: where we are and where we could go next Raffaella Bernardi University of Trento June 9th, 2017 Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 1 / 26

Language and Vision Shared tasks Image Captioning VQA Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 2 / 26

LaVi Models Parikh et. ali Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 3 / 26

CV Models CNN: feature hierarchy Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 4 / 26

CV Models Visualizing CNN layers Aravindh Mahendran and Andrea Vedaldi 2015 Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 5 / 26

LaVi@UniTN FOIL Original : A young boy on a Original : A little girl trying to Original : A narrow room with Original : A child wearing a very couch holding two stuffed push a skateboard with other various luggage and two men large and loosely tied necktie FOIL : A broad room with various FOIL : A child wearing a very large animals standing around luggage and two men and narrowly tied necktie FOIL : A young boy beside a FOIL : A little girl trying to pull a couch holding two stuffed skateboard with other standing animals around Verb Adjective Adverb Preposition Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 6 / 26

LaVi@UniTN Quantification Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 7 / 26

LaVi@UniTN Quantifiers vs. Cardinals Most of the animals are dogs vs. Three of the animals are dogs. Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 8 / 26

Mass and Count nouns a linguistic distinction Stanford Encyclopedia of Philosophy. Mass nouns: Examples: milk , furniture and wisdom . they are invariable in grammatical number. Depending on the language [..] in English, mass nouns can be used with determiners like much and a lot of , but neither with one nor many. Count nouns: Examples: rabbit , table and idea they can be used in the singular and in the plural. [..] in English, count nouns can be employed with numerals like one and determiners like many , but not with much . Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 9 / 26

Mass and Couns Is there a perceptual distintincion? Mass milk furniture wisdom Count rabbit table idea Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 10 / 26

Mass (Substance) and Count (objects) Dataset: Construction Starting point: Bochum English Countability Lexicon (BECL) Kiss et al. 2016 Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 11 / 26

Mass (Substance) and Count (objects) Dataset: Sample of mass nouns Noun Synset Description #occu. #ima. dough dough.n.01 a flour mixture stiff enough to knead or roll 45 497 soil/dirt soil.n.02 the part of the earth’s surface consisting of 398/169 235 humusand disintegrated rock milk milk.n.01 a white nutritious liquid secreted by 386 196 mammals and used as food by human beings coffee coffee.n.01 a beverage consisting of an infusion of 356 159 ground coffee beans; coffee coffee.n.02 any of several small trees and shrubs native to 356 70 the tropical Old World yielding coffee beans Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 12 / 26

Mass and Count Examples of images dough.n.01 Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 13 / 26

Mass and Count Dataset: Numbers Open American National Corpus (OANC) – metrics in BECL #imgs #imgs OANC OANC #syns #uniq N (avg) (range) freq (avg) freq (range) mass 58 56 214.66 64 - 705 112.6 10 - 447 count 58 53 303.93 60 - 1467 1435.16 33 - 4121 Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 14 / 26

Mass and Count Variances Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 15 / 26

CV Models Convoutional Neural Network We used the VGG-19 model (Simonyan and Zisserman (2014)), trained to classify objects. Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 16 / 26

Mass and Count Feature layers of a CNN Each Conv consists of various hidden layers followed by a max pooling step which reduce the dimension by extracting salient features. The Conv layers represent low-visual features (edges, texture, color) vs. the fc ones represent abstract features. We compute the variances for the first and last Conv 2 − Conv 5 layers’ outputs (low-features) and for the fc layers (abstract-features). Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 17 / 26

Mass and Count Variance: at which perceptual level? ration(count/mass) =1 [variance of the two groups is equal] ration(count/mass) > 1 [mass’s variance lower than count’s] ration(count/mass) < 1 [mass’s variance higher than count’s] *** significant difference at p < . 001; ** at p < . 01; * at p < . 05. Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 18 / 26

Mass and Count Variance: Conv5 1 ? Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 19 / 26

Mass and Count Synset with highest vs. lowest variances Conv5 1 intra- variance Conv5 1 inter- variance top-10 bottom-10 top-10 bottom-10 magazine 01 (c) range 04 (c) magazine 01 (c) egg yolk 01 (m) salad 01 (m) dough 01 (m) shop 01 (c) range 04 (c) shop 01 (c) mountain 01 (c) salad 01 (m) dough 01 (m) church 02 (c) mesa 01 (c) machine 01 (c) mountain 01 (c) machine 01 (c) flour 01 (m) church 02 (c) mesa 01 (c) floor 02 (c) milk 01 (m) stage 03 (c) milk 01 (m) press 03 (c) glacier 01 (m) press 03 (c) flour 01 (m) stage 03 (c) butter 01 (m) floor 02 (c) butter 01 (m) pasta 01 (m) egg yolk 01 (m) brunch 01 (m) glacier 01 (m) brunch 01 (m) floor 04 (c) building 01 (c) sugar 01 (m) Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 20 / 26

Mass and Count: next step Can CNN learn to quantify both objects and substance? Most of the animals are dogs . Most of the sand is dirty . Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 21 / 26

Mass and Count: next step Can CNN learn that mass nouns (substance/liquid) are uncountable? + = + = Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 22 / 26

UniTN Team Ionut Sandro Ravi Addisson Aurelie me Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 23 / 26

Language and Vision: where we are and where we could go next - PowerPoint PPT Presentation

Language and Vision: where we are and where we could go next Raffaella Bernardi University of Trento June 9th, 2017 Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 1 / 26

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Vision, Language, Interaction and Generation Qi Wu Australian Institute for Machine Learning

VISION: A mental image of what the future could look like. - Oxford VISION When there is

COSC 4P14 What else could we discuss? Brock University Brock University What else could we

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

Language and Computers Relation to language Encoding written language Prologue: Encoding

Beryllium material properties and dimensional constraints Daniel Bowring FNAL April 15, 2014

Experience in Teaching Chaos Physics Ildik Szatmry-Bajk Szent Istvn Gimnzium Budapest

Good Morning! LIS1001 (BBA) Information and Technology for Searching October 2016, Ulrich

February 15, 2016 Mr. Scott Moore (Acting) Director, Office of Nuclear Material Safety and

AFM 9000 automatic falafel machine PINK A problem solution market Chicago Boston 190 | 30

HAPPY PENTECOST ! 1 2 The Fe a s t of We e k s 3 Strong's Greek Dictionary 4005.

@KateRaworth

Selecting a Dozen Donuts MDM4U: Mathematics of Data Management In how many ways can we select one

Sambuz

Useful Links

Newsletter

Mail Us

Language and Vision: where we are and where we could go next - PowerPoint PPT Presentation

Language and Vision: where we are and where we could go next Raffaella Bernardi University of Trento June 9th, 2017 Raffaella Bernardi (University of Trento) Language and Vision: where we are and where we could go next June 9th, 2017 1 / 26

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Vision, Language, Interaction and Generation Qi Wu Australian Institute for Machine Learning

VISION: A mental image of what the future could look like. - Oxford VISION When there is

COSC 4P14 What else could we discuss? Brock University Brock University What else could we

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

Language and Computers Relation to language Encoding written language Prologue: Encoding

Beryllium material properties and dimensional constraints Daniel Bowring FNAL April 15, 2014

Experience in Teaching Chaos Physics Ildik Szatmry-Bajk Szent Istvn Gimnzium Budapest

Good Morning! LIS1001 (BBA) Information and Technology for Searching October 2016, Ulrich

February 15, 2016 Mr. Scott Moore (Acting) Director, Office of Nuclear Material Safety and

AFM 9000 automatic falafel machine PINK A problem solution market Chicago Boston 190 | 30

HAPPY PENTECOST ! 1 2 The Fe a s t of We e k s 3 Strong's Greek Dictionary 4005.

@KateRaworth

Selecting a Dozen Donuts MDM4U: Mathematics of Data Management In how many ways can we select one

Sambuz

Useful Links

Newsletter

Mail Us

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007