Learning the visual interpreta0on of sentences C. L. - PowerPoint PPT Presentation

Learning ¡the ¡visual ¡interpreta0on ¡ of ¡sentences ¡ C. ¡L. ¡Zitnick, ¡D. ¡Parikh, ¡and ¡L. ¡Vanderwende*, ¡ICCV ¡13 ¡ ¡ Presenter: ¡Shenlong ¡Wang ¡ CSC ¡2523 ¡ ¡ *Many ¡images ¡from ¡Larry ¡Zitnick’s ¡ICCV ¡13 ¡and ¡slides, ¡Coyne ¡SIGGRAPH ¡01 ¡

We ¡will ¡discuss… ¡ • Text ¡to ¡clip ¡arts ¡images ¡ – Learning ¡the ¡Visual ¡Interpreta1on ¡of ¡Sentences , ¡ICCV ¡ 2013 ¡C. ¡L. ¡Zitnick, ¡D. ¡Parikh, ¡and ¡L. ¡Vanderwende ¡ – Bringing ¡Seman1cs ¡Into ¡Focus ¡Using ¡Visual ¡ Abstrac1on , ¡CVPR ¡2013 ¡(Oral) ¡C. ¡L. ¡Zitnick ¡and ¡D. ¡ Parikh ¡ • Text ¡to ¡3D ¡scene ¡ – WordsEye: ¡an ¡automa1c ¡text-‑to-‑scene ¡conversion ¡ system , ¡SIGGRAPH ¡2001, ¡B. ¡Coyne, ¡and ¡R. ¡Sproat. ¡ – Learning ¡Spa1al ¡Knowledge ¡for ¡Text ¡to ¡3D ¡Scene ¡ Genera1on, ¡ A. ¡Chang, ¡M. ¡Savva, ¡C. ¡Manning, ¡EMNLP ¡ 2014 ¡

Brief ¡Review ¡ • Image ¡to ¡Sentence ¡ – Retrieval ¡ – Genera0on ¡ • Sentence ¡to ¡Image ¡ – Retrieval ¡ – Genera0on? ¡

Goal ¡ • To ¡generate ¡seman0c ¡meaningful ¡images ¡ ¡ Zitnick, ¡2013 ¡

Two ¡professors ¡converse ¡in ¡front ¡of ¡a ¡blackboard. ¡

Two ¡professors ¡converse ¡in ¡front ¡of ¡a ¡blackboard. ¡ Blackboard ¡ Receding ¡hairline ¡ Equa0on ¡ Person ¡ Person ¡ Gaze ¡ Mustache ¡ Gaze ¡ Tie ¡ Equa0on ¡ Tie ¡ Table ¡ Zitnick, ¡2013 ¡

Two ¡professors ¡converse ¡in ¡front ¡of ¡a ¡blackboard. ¡ Zitnick, ¡2013 ¡

Face ¡ Person ¡ Face ¡ Person ¡ Cat ¡ Dining ¡table ¡ Felzenszwalb, ¡2010 ¡

Two ¡professors ¡converse ¡in ¡front ¡of ¡a ¡blackboard. ¡ Zitnick, ¡2013 ¡

Two ¡professors ¡converse ¡in ¡front ¡of ¡a ¡blackboard. ¡ Image ¡from ¡123RF.com ¡

Photorealism ¡is ¡not ¡necessary ¡for ¡learning ¡ visual ¡interpreta0on ¡of ¡seman0cs ¡ Coyne, ¡2001 ¡

Abstract ¡scenes ¡via ¡2D ¡Clip ¡Art ¡ ¡ ¡ • Avoid ¡the ¡challenging ¡vision ¡ parts ¡(detec0on, ¡ segmenta0on, ¡a`ributes, ¡etc.) ¡ for ¡real ¡images. ¡ ¡ • Reduce ¡the ¡varia0ons ¡of ¡the ¡ real-‑world ¡images ¡with ¡the ¡ same ¡seman0c ¡meaning. ¡ Jenny ¡ Mike ¡ Zitnick, ¡20 1 3 ¡

Summary ¡of ¡the ¡dataset ¡ • Clip ¡arts ¡ – 56 ¡Objects, ¡80 ¡pieces ¡of ¡clip ¡arts, ¡10000 ¡scenes ¡ – 3D ¡loca0on ¡with ¡facing ¡direc0on ¡ – A`ributes ¡for ¡humans ¡ • MTurker ¡to ¡label ¡the ¡data ¡ – Image ¡to ¡Sentence ¡ – Sentence ¡to ¡Image ¡

Zitnick, ¡20 1 3 ¡

Target ¡ ? Jenny is catching the ball. Mike is kicking the ball. The table is next to the tree.

Sentence ¡Parsing ¡ <primary object> <relation> <secondary object> Jenny is catching the ball. <Jenny> <catch> <ball> Mike is kicking the ball. <Mike> <kick> <ball> The table is next to the tree. <table> <next to> <tree> Jenny and Mike are running <Jenny> <run from> <snake> from the snake. <Mike> <run from> <snake>

CRF ¡model ¡ log P ( c, Φ , Ψ | S, θ ) = 0 1 occurrence abs. location attributes z }| { z }| { z }| { X B C ψ i ( c i , S ; θ c ) + λ i ( Φ i , S ; θ λ ) + π i ( Ψ i , S ; θ π ) A + @ i rel. location z }| { X φ ij ( Φ i , Φ j , S ; θ φ ) − log Z ( S, θ ) (1) ij . Φ i = { x i , y i , z i , d i } Absolute ¡loca0on ¡of ¡object ¡(3D ¡loca0on ¡+ ¡facing) ¡ direction utes Ψ i = { e i , g i , h i } A`ributes ¡of ¡persons ¡(expression, ¡pose, ¡accessory) ¡ pose and clothing able c i . Occurrence ¡of ¡object ¡

Learning ¡& ¡Inference ¡ • Learning ¡ – Noun ¡mapping ¡ – Update ¡parameters ¡according ¡to ¡empirical ¡ probability ¡ • Inference ¡ – Itera0ve ¡condi0onal ¡modes ¡ – Random ¡selec0on ¡

Occurrence ¡and ¡Posi0on ¡ Zitnick, ¡20 1 4 ¡

A`ributes ¡ Zitnick, ¡20 1 3 ¡

Rela0ve ¡Loca0on ¡ Zitnick, ¡20 1 4 ¡

Results ¡ Zitnick, ¡20 1 4 ¡

Quan0ta0ve ¡Results ¡ Zitnick, ¡20 1 4 ¡

Results ¡ GT Full-CRF BoW Noun-CRF Random Figure 19: Input description: Jenny is kicking the football. The pizza is on the table. The airplane is flying over Jenny. Tuples: Jenny kick football; pizza be table; airplane fly:p:over Jenny; GT Full-CRF BoW Noun-CRF Random Figure 20: Input description: Mike is sitting next to a cat. Mike is angry because he fell down. Jenny is running towards Mike to help him. Tuples: Mike sit:p:next to cat; Mike be:pa:angry ; he fall ; Jenny run:p:towards Mike; Jenny help ; Zitnick, ¡20 1 4 ¡

Failure ¡cases ¡ GT Full-CRF BoW Noun-CRF Random Figure 21: Input description: It is lighting out. Mike and Jenny are upset. Mike and Jenny are sitting on the ground with there legs crossed. Tuples: it light ; Mike sit ground; Jenny sit ground; ground with leg; Failed ¡sentence ¡parsing, ¡rela0ve ¡loca0on ¡prior ¡ GT Full-CRF BoW Noun-CRF Random Figure 43: Input description: Mike is mad his ice melted. Jenny is scared of the bear. The bear is wearing a viking hat. Tuples: Mike be:pa:mad ; Jenny be:pa:scared ; bear wear hat; Zitnick, ¡20 1 4 ¡ Rare ¡co-‑occurrence ¡

Conclusion ¡ • Conclusion ¡ – New ¡approach ¡for ¡learning ¡“common ¡sense” ¡ knowledge ¡about ¡our ¡visual ¡world. ¡ – Don’t ¡need ¡to ¡wait ¡for ¡object ¡recogni0on ¡to ¡be ¡ solved. ¡ • Future ¡Works ¡ – Be`er ¡language ¡model? ¡ – Larger ¡photorealis0c ¡dataset? ¡

Text ¡to ¡3D ¡Scene ¡ Figure 8: The bird is in the bird cage. The bird cage is on the chair. Coyne, ¡20 0 1 ¡

WordsEye ¡ John said that the cat was on the table Figure 2: Dependency structure for John said that the cat was on the table. . Figure 6: Spatial tags for “base” and “cup”. Figure 11: John rides the bicycle. John plays the trumpet. Coyne, ¡20 0 1 ¡

Objects ¡not ¡depicable ¡ • Texturaliza+on ¡ • Emblema+za+on ¡ ¡ – Light ¡bulb ¡ for ¡ idea , ¡ church ¡for ¡ religion ¡ • Characteriza+on ¡ ¡ – Football ¡player ¡ will ¡wear ¡a ¡ football ¡helmet ¡ • Conven+on ¡icon ¡ – Don’t ¡think ¡ • Degeneraliza+on ¡ ¡ – Chair ¡ for ¡furniture ¡ ¡

Text ¡to ¡3D ¡Scene ¡ Figure 16: John does not believe the radio is green. Figure 15: The blue daisy is not in the army boot. Coyne, ¡20 0 1 ¡

Text ¡to ¡3D ¡Scene ¡ Figure 17: The devil is in the details. Figure 14: The cat is facing the wall. Coyne, ¡20 0 1 ¡

WordsEye ¡ ¡ 2014 ¡ the ¡large ¡radio ¡is ¡on ¡the ¡small ¡car. ¡the ¡large ¡woman ¡is ¡8 ¡feet ¡behind ¡the ¡car. ¡she ¡is ¡facing ¡the ¡car. ¡the ¡woman ¡is ¡ unreflec0ve. ¡the ¡small ¡chair ¡is ¡2 ¡feet ¡to ¡the ¡east ¡of ¡the ¡car. ¡the ¡small ¡chair ¡is ¡facing ¡the ¡car. ¡the ¡small ¡barn ¡is ¡5 ¡feet ¡to ¡the ¡ lel ¡of ¡the ¡woman. ¡the ¡small ¡barn ¡is ¡facing ¡the ¡woman. ¡the ¡large ¡plant ¡is ¡on ¡the ¡chair. ¡the ¡chair ¡is ¡white. ¡the ¡small ¡dog ¡is ¡ under ¡the ¡chair. ¡the ¡large ¡pig ¡is ¡.2 ¡feet ¡to ¡the ¡right ¡of ¡the ¡dog. ¡the ¡pig ¡is ¡unreflec0ve. ¡the ¡pig ¡is ¡facing ¡the ¡dog. ¡the ¡man ¡is ¡ 1 ¡feet ¡in ¡front ¡of ¡the ¡car. ¡he ¡is ¡facing ¡the ¡car. ¡the ¡man ¡is ¡unreflec0ve. ¡it ¡is ¡sunset. ¡the ¡ground ¡is ¡dark ¡texture. ¡camera-‑ light ¡is ¡red. ¡the ¡light ¡is ¡5 ¡feet ¡above ¡the ¡plant. ¡

Learning the visual interpreta0on of sentences C. L. - PowerPoint PPT Presentation

Learning the visual interpreta0on of sentences C. L. Zitnick, D. Parikh, and L. Vanderwende, ICCV 13 Presenter: Shenlong Wang CSC 2523 Many

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Activity 1 Describe this character using as many 2a sentences as you can. Try and use ambitious

Nouns, V erbs, and Sentences 98-348: Lecture 2 Nouns, verbs and sentences 98-348: Lecture 2

Quantifier Elimination Helpful lemmas Let S be a set of sentences. Helpful lemmas Let S be a set

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic

Interior Design Visual Presentation Mitton Maureen Interior Design Visual Presentation Mitton

VISUAL LIBRARY THE VISUAL LIBRARY CONTACT URL: https://visuals.newzealand.com Contact: Jodi

Analysing the Cognitive Effectiveness of the UCM Visual Notation of the UCM Visual Notation

Machine visual perception Cordelia Schmid INRIA Grenoble Machine visual perception

!"#$%&'()'"&%&+(,-(.)/( !"#$%&'()'"*&%&+ (

Product and Workplace Safety J. Parman (College of William & Mary) Regulation of Markets,

Epic Botmaking Martin OLeary @mewo2 I actually take great care to make my bots seem as

Food Saver Common Storage & Preservation Methods This educational program is brought to you

Early Twentieth-Century Fiction e20fic19.blogs.rutgers.edu Prof. Andrew Goldstone

Sign language linguistics Day 2: Morphology + Syntax Jeremy Kuhn Insitut Jean Nicod, CNRS,

Source attribution of French clinical isolates of Campylobacter jejuni Pr Philippe Lehours E.

UK Vitrification Plant Throughput & Operational Waste Disposal Nick Gribble Joint ICTP-IAEA

Learning the visual interpreta0on of sentences C. L. - PowerPoint PPT Presentation

Learning the visual interpreta0on of sentences C. L. Zitnick, D. Parikh, and L. Vanderwende*, ICCV 13 Presenter: Shenlong Wang CSC 2523 *Many

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Activity 1 Describe this character using as many 2a sentences as you can. Try and use ambitious

Nouns, V erbs, and Sentences 98-348: Lecture 2 Nouns, verbs and sentences 98-348: Lecture 2

Quantifier Elimination Helpful lemmas Let S be a set of sentences. Helpful lemmas Let S be a set

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic

Interior Design Visual Presentation Mitton Maureen Interior Design Visual Presentation Mitton

VISUAL LIBRARY THE VISUAL LIBRARY CONTACT URL: https://visuals.newzealand.com Contact: Jodi

Analysing the Cognitive Effectiveness of the UCM Visual Notation of the UCM Visual Notation

Machine visual perception Cordelia Schmid INRIA Grenoble Machine visual perception

!&quot;#$%&amp;'()'&quot;*&amp;%&amp;+(,-*(.)/( !&quot;#$%&amp;'()'&quot;*&amp;%&amp;+ (

Product and Workplace Safety J. Parman (College of William &amp; Mary) Regulation of Markets,

Epic Botmaking Martin OLeary @mewo2 I actually take great care to make my bots seem as

Food Saver Common Storage &amp; Preservation Methods This educational program is brought to you

Early Twentieth-Century Fiction e20fic19.blogs.rutgers.edu Prof. Andrew Goldstone

Sign language linguistics Day 2: Morphology + Syntax Jeremy Kuhn Insitut Jean Nicod, CNRS,

Source attribution of French clinical isolates of Campylobacter jejuni Pr Philippe Lehours E.

UK Vitrification Plant Throughput &amp; Operational Waste Disposal Nick Gribble Joint ICTP-IAEA

Learning the visual interpreta0on of sentences C. L. Zitnick, D. Parikh, and L. Vanderwende, ICCV 13 Presenter: Shenlong Wang CSC 2523 Many

!"#$%&'()'"&%&+(,-(.)/( !"#$%&'()'"*&%&+ (

Product and Workplace Safety J. Parman (College of William & Mary) Regulation of Markets,

Food Saver Common Storage & Preservation Methods This educational program is brought to you

UK Vitrification Plant Throughput & Operational Waste Disposal Nick Gribble Joint ICTP-IAEA