Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy - PowerPoint PPT Presentation

Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” Jesse Thomason Jivko Sinapov, Maxwell Svetlik, Peter Stone, and Raymond J. Mooney The University of Texas at Austin 0

Grounded Linguistic Semantics • Service robots are present in stores, factory floors, hospitals, and offices • Need to understand language commands about the environment 1

Grounded Linguistic Semantics • “Bring me the empty cup” • Learn word meanings in terms of robot perception 2

Grounded Linguistic Semantics • Traditionally done in vision space • Predicates like “red” and “rectangle” can be learned through vision alone • But looking isn’t all humans do • “Empty”, “heavy”, “rattles” • To understand some predicates, need to interact with objects beyond vision • Equip a robot with both a camera and an arm 3

Multi-Modal Grounded Linguistic Semantics • Interact with objects beyond just looking Grasp Lower Lift Press Push Drop 4

Multi-Modal Grounded Linguistic Semantics • Represent objects with features from all behaviors • Traditional and deep vision features from looking • Audio, haptic, and proprioceptive features from manipulation behaviors • Different types of features form sensory modalities 5

Multi-Modal Grounded Linguistic Semantics • Every combination of behavior and modality forms an understanding context • “Red” in the look + color context • “Empty” in the lift + haptic context • “Tall” in look + shape, press + auditory contexts • Predicate classifiers composed of confidence- weighted votes from context classifiers 6

Learning Multi-Modal Grounded Linguistic Semantics • Connect human language to features of sensory contexts • Need labeled training data – This object is pink and short • How do humans describe objects in question? • Past work uses “I Spy” game (Parde 2015) 7

Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” • Let the human and robot take turns describing objects • Human descriptions give positive examples • Robot descriptions followed up with dialog for positive and negative examples 8

“An empty metallic aluminum container” “An empty metallic aluminum container” 9

Initially, robot has no training data and randomly guesses objects. 10

Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” • System remembered positive and negative object examples for each predicate empty container metallic pink aluminum yellow 11

Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” • Train predicate classifiers from positive and negative object examples empty: positive negative 12

Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” • Predicate classifiers are a weighted vote of trained context classifiers giving decisions in [-1, 1] representing confidence empty? Behavior / color … audio haptics Modality look 0.02 - - … … … … … lift - … -0.04 0.8 drop - … 0.4 0.02 13

Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” • Use predicate classifiers confidences to decide how to describe a chosen object to the human tub (+.8) short (-.8) light (+.7) half-full (-.05) tall (+.9) empty (+.6) pink (+.02) 14

Robot Turn “I am thinking of an object I would describe as light and tall and tub.” • Follow-up dialog gathers both positive and negative examples 15

Robot Turn “Would you describe this object as light?” “Would you describe this object as tall?” “Would you describe this object as tub?” --- “Would you describe this object as pink?” “Would you describe this object as half-full?” 16

Playing “I Spy” • Divided 32 objects into training folds of 8 each • 10 participants played 4 games each with the robot; 4 objects per game 17

Playing “I Spy” • Robot started with no vocabulary for first fold of 8 objects • After each fold, learning phase allowed lexical acquisition and grounding • Measured game performance on novel objects as more learning had taken place 18

Evaluating Multi-Modal Grounding • Two learning algorithms compared • Vision only baseline and multi-modal system • During learning, vision only baseline only considered look behavior • Users were unaware of multiple systems but interacted with both in 2 games each – All 8 objects seen by both systems per user • Measured robot guesses for correct object 19

Results for Robot Guesses Bold : Lower than fold 0 average. * : Lower than vision only baseline 20

Results for Predicate Agreement • Leave-one-object-out cross validation across predicate labels on objects (74 total learned) Metric System vision only multi-modal precision .250 .378+ recall .179 .348* F 1 .196 .354* • *: significantly greater with p < 0.05 • +: trending greater with p < 0.1 21

Correlations to Physical Properties • Pearson’s r between predicate decision in [-1, 1] on object and height and weight • vision only system learns no predicates with correlations p < 0.05 and | r | > 0.5 • multi-modal learns correlated predicates: – “tall” with height ( r = .521) – “small” against weight ( r = -.665) – “water” with weight ( r = .549) 22

“A tall blue cylindrical container” “A tall blue cylindrical container” 23

Conclusions • We move beyond vision for grounding language predicates • Auditory, haptic, and proprioceptive senses help understand words humans use to describe objects • Some predicates assisted by multi-modal – “tall”, “wide”, “small” • Some can be impossible without multi-modal – “half-full”, “rattles”, “empty” 24

Future Work • Use one-class classification to remove need for negative examples – Move beyond “I Spy” to object retrieval alone • Detect polysemy across modalities, as for the predicate “light” (color versus weight) • Explore only as needed on novel objects – If predicate is “pink” with known relevant context look + color , only perform look behavior to decide 25

Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” Jesse Thomason Jivko Sinapov, Maxwell Svetlik, Peter Stone, and Raymond J. Mooney The University of Texas at Austin 26

Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” https://youtu.be/jLHzRXPCi_w 27

Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy - PowerPoint PPT Presentation

Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy Jesse Thomason Jivko Sinapov, Maxwell Svetlik, Peter Stone, and Raymond J. Mooney The University of Texas at Austin 0 Grounded Linguistic Semantics Service robots

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic

The Expressive Power of Backround Modal Dependence Logic Modal logic Team semantics Modal

Guiding Interaction Behaviors for Multi-modal Grounded Language Learning Jesse Thomason, Jivko

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Multi-modal Face Recognition Hu Han hanhu@ict.ac.cn http: / / vipl.ict.ac.cn/ members/ hhan

Why is modal logic decidable Petros Potikas NTUA 9/5/2017 Petros Potikas (NTUA) Modal logic

Key terms and questions Terms: amodal , symbolic and linguistic as antonyms for modal ,

W HAT IS EHD? Introduction EHD without cross-flow Modal Dielectric fluid Non-modal EHD with

Modal logic Benzm uller/Rojas, 2014 Artificial Intelligence 2 What is Modal Logic?

Definability in modal logics with team semantics Expressivity Frame definability What do we

Temporal and Modal Logic Based on paper: E.A. Emerson. Temporal and Modal Logic J. van Leeuwen,

Encoding of Phonology in an RNN model of Grounded Speech Afra Alishahi, Marie Barking, Grzegorz

Encoding of Phonology in an RNN model of Grounded Speech Afra Alishahi, Marie Barking, Grzegorz

Outline Introduction Definition History Features When should Grounded Theory be used? Types

TAKE TAKE GROUNDED GROUNDED DECISIONS DECISIONS Farm Modelling Statistic based, gamification

We will begin at the top of the hour. Housekeeping Q&A Box Survey at the end

PINK image processing library M. Couprie Universit e Paris-Est - LIGM-A3SI - ESIEE, France

researchfish roadmap Cliff Brown Development Manager Ross Pullar Product Manager Giles

Objectives Data structure: Heaps Data structure: Graphs Submit problem set 2 Jan 25, 2019

Decentralized Execution of Constraint Handling Rules for Ensembles Edmund S. L. Lam Iliano

Discrete Time Evolution and Baxter's Q-operator Christian.Korff@glasgow.ac.uk ( :( ( : a un

Statistical Relational Learning and Knowledge Graph Reasoning CSCI 699 J AY P UJARA Reminder:

PIP-II Goals, Status, and Strategy Steve Holmes DOE Independent Project Review of PIP-II 15

Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy - PowerPoint PPT Presentation

Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy Jesse Thomason Jivko Sinapov, Maxwell Svetlik, Peter Stone, and Raymond J. Mooney The University of Texas at Austin 0 Grounded Linguistic Semantics Service robots

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic

The Expressive Power of Backround Modal Dependence Logic Modal logic Team semantics Modal

Guiding Interaction Behaviors for Multi-modal Grounded Language Learning Jesse Thomason, Jivko

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Multi-modal Face Recognition Hu Han hanhu@ict.ac.cn http: / / vipl.ict.ac.cn/ members/ hhan

Why is modal logic decidable Petros Potikas NTUA 9/5/2017 Petros Potikas (NTUA) Modal logic

Key terms and questions Terms: amodal , symbolic and linguistic as antonyms for modal ,

W HAT IS EHD? Introduction EHD without cross-flow Modal Dielectric fluid Non-modal EHD with

Modal logic Benzm uller/Rojas, 2014 Artificial Intelligence 2 What is Modal Logic?

Definability in modal logics with team semantics Expressivity Frame definability What do we

Temporal and Modal Logic Based on paper: E.A. Emerson. Temporal and Modal Logic J. van Leeuwen,

Encoding of Phonology in an RNN model of Grounded Speech Afra Alishahi, Marie Barking, Grzegorz

Encoding of Phonology in an RNN model of Grounded Speech Afra Alishahi, Marie Barking, Grzegorz

Outline Introduction Definition History Features When should Grounded Theory be used? Types

TAKE TAKE GROUNDED GROUNDED DECISIONS DECISIONS Farm Modelling Statistic based, gamification

We will begin at the top of the hour. Housekeeping Q&amp;A Box Survey at the end

PINK image processing library M. Couprie Universit e Paris-Est - LIGM-A3SI - ESIEE, France

researchfish roadmap Cliff Brown Development Manager Ross Pullar Product Manager Giles

Objectives Data structure: Heaps Data structure: Graphs Submit problem set 2 Jan 25, 2019

Decentralized Execution of Constraint Handling Rules for Ensembles Edmund S. L. Lam Iliano

Discrete Time Evolution and Baxter's Q-operator Christian.Korff@glasgow.ac.uk ( :( ( : a un

Statistical Relational Learning and Knowledge Graph Reasoning CSCI 699 J AY P UJARA Reminder:

PIP-II Goals, Status, and Strategy Steve Holmes DOE Independent Project Review of PIP-II 15

We will begin at the top of the hour. Housekeeping Q&A Box Survey at the end