Continually Improving Grounded Natural Language Understanding - - PowerPoint PPT Presentation

continually improving grounded natural language
SMART_READER_LITE
LIVE PREVIEW

Continually Improving Grounded Natural Language Understanding - - PowerPoint PPT Presentation

Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog Jesse Thomason University of Texas at Austin Ph.D. Defense Human-Robot Dialog 2 Human-Robot Dialog alert me if her heart rate decreases bring


slide-1
SLIDE 1

Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog

Jesse Thomason University of Texas at Austin Ph.D. Defense

slide-2
SLIDE 2

Human-Robot Dialog

2

slide-3
SLIDE 3

Human-Robot Dialog

3

“alert me if her heart rate decreases” “bring me his chart” “go and get the family” “scalpel” “text me when the speaker arrives” “grab the empty, green bottle” “lead him to alice’s office” “get out of the way”

slide-4
SLIDE 4

NLP Robotics Dialog Human- Robot Dialog

4

slide-5
SLIDE 5

5

Dialog Human- Robot Dialog

Natural Language Understanding Dialog Policy Robot Behavior Robot Perception Corpus of Language Commands Corpus of conversations Algorithms for this Platform

NLP Robotics

slide-6
SLIDE 6

6

Dialog Human- Robot Dialog

Natural Language Understanding Dialog Policy Robot Behavior Robot Perception Corpus of Language Commands Corpus of conversations Algorithms for this Platform

NLP Robotics

slide-7
SLIDE 7

Robot Dialog has Multiple Low-Resource Problems

7

  • My work:

○ Develop algorithms for human-robot understanding that overcome sparse training data. ○ Use dialog to correctly perform user requests and better understand future requests.

slide-8
SLIDE 8

Dialog Papers before proposal

Polysemy Induction and Synonymy Detection (IJCAI’17) Improving Semantic Parsing through Dialog (IJCAI’15) Learning Groundings with Human Interaction (IJCAI’16)

Human- Robot Dialog

8

NLP Robotics

slide-9
SLIDE 9

Human- Robot Dialog Dialog

Faster Object Exploration for Grounding (AAAI’18) Learning Groundings with Opportunistic Active Learning (CoRL’17) Jointly Improving Parsing & Perception (in submission)

Papers since proposal

9

NLP Robotics

slide-10
SLIDE 10

Dialog Human- Robot Dialog

10

Next Directions NLP Robotics

slide-11
SLIDE 11

NLP Dialog Papers before proposal

Polysemy Induction and Synonymy Detection (IJCAI’17) Improving Semantic Parsing through Dialog (IJCAI’15) Learning Groundings with Human Interaction (IJCAI’16)

Human- Robot Dialog

11

[Thomason et al., IJCAI’15]

Robotics

slide-12
SLIDE 12

User Natural Language Understanding Dialog Agent Dialog Policy I think I should navigate to room 3

Dialog for Robots

12

“Walk to the kitchen by the lab.” task: navigate goal: room_3 “You want me to go to room 3?”

slide-13
SLIDE 13

User Natural Language Understanding Dialog Agent Dialog Policy I should navigate to room 3

Dialog for Robots

13

“Yes.” task: navigate goal: room_3 Robot Behavior

slide-14
SLIDE 14

Natural Language Understanding

Natural Language Understanding

14

Semantic Parser Annotated World Knowledge task: navigate goal: something that is both a kitchen and is adjacent to a lab

[Thomason et al., IJCAI’15]

“Walk to the kitchen by the lab.” task: navigate goal: room_3

slide-15
SLIDE 15

Semantic Parser

15

  • The parser can be initialized with low annotator effort.

○ In our experiments, we annotate five sentences. ○ Satisfies the low-resource constraints of human-robot dialog.

  • But, more training data will improve performance.

[Thomason et al., IJCAI’15]

slide-16
SLIDE 16

16

Inducing New Training Examples from Dialog

[Thomason et al., IJCAI’15; Artzi and Zettlemoyer, EMNLP’11]

slide-17
SLIDE 17

17

Inducing New Training Examples from Dialog

[Thomason et al., IJCAI’15]

slide-18
SLIDE 18

18

Inducing New Training Examples from Dialog

[Thomason et al., IJCAI’15]

slide-19
SLIDE 19

19

Inducing New Training Examples from Dialog

Semantic Parser Induced Training Pairs “please bring the item in slot five to dave daniel” bring(calender, dave) “calander” calendar “a day planner” calendar

[Thomason et al., IJCAI’15]

slide-20
SLIDE 20

20

Demonstration

[Thomason et al., IJCAI’15]

slide-21
SLIDE 21

21

Demonstration

[Thomason et al., IJCAI’15]

slide-22
SLIDE 22

22

Demonstration

[Thomason et al., IJCAI’15]

slide-23
SLIDE 23

23

Demonstration

[Thomason et al., IJCAI’15]

slide-24
SLIDE 24

24

Dialogs that Clarify Meaning and Provide Supervision

[Thomason et al., IJCAI’15]

Agent Belief (task, goal, item, person) Request Question (?, ?, ?, ?) all “How can I help?” / “Can you reword your original request?” (navigate, ?, _, _) goal “Where should I walk?” (deliver, _, ?, p) item “What should I bring to p?” (navigate, r, _, _) confirm “You want me to walk to r?” ... ... ...

slide-25
SLIDE 25

25

Dialogs that Clarify Meaning and Provide Supervision

[Thomason et al., IJCAI’15]

Agent Belief (task, goal, item, person) Request Question (?, ?, ?, ?) all “How can I help?” / “Can you reword your original request?” (navigate, ?, _, _) goal “Where should I walk?” (deliver, _, ?, p) item “What should I bring to p?” (navigate, r, _, _) confirm “You want me to walk to r?” ... ... ...

slide-26
SLIDE 26

26

Dialogs that Clarify Meaning and Provide Supervision

[Thomason et al., IJCAI’15]

Expect whole command Expect item Expect item Expect item task: deliver item: calendar person: dave_daniel

slide-27
SLIDE 27

Technical Contributions

27

  • Design a dialog policy that allows

us to pair human language with latent meaning representations.

  • Improve semantic parsing given

very little initial in-domain data.

[Thomason et al., IJCAI’15]

slide-28
SLIDE 28

Experiments via Amazon Mechanical Turk

x 50

Semantic Parser Induced Training Pairs

x 4

28

[Thomason et al., IJCAI’15]

slide-29
SLIDE 29

Navigation Dialog Turns

29

[Thomason et al., IJCAI’15]

slide-30
SLIDE 30

Navigation Dialog Turns

30

Induced Training Pairs “go”

go(room_2)

... Robot: How can I help? Human: go … Human: go to dave daniel’s office

[Thomason et al., IJCAI’15]

slide-31
SLIDE 31

Delivery Dialog Turns

31

  • Statistically significant decrease.
  • More arguments:

harder to understand, so more to gain from parser training.

[Thomason et al., IJCAI’15]

Qualitative: One user wrote “the robot even fixed my typo when I mispelled calendar!”

slide-32
SLIDE 32

32

Other Findings

[Thomason et al., IJCAI’15]

  • Users rate system more

understanding and less frustrating.

  • Results replicable on

physical platform.

slide-33
SLIDE 33

NLP Dialog Papers before proposal

Polysemy Induction and Synonymy Detection (IJCAI’17) Improving Semantic Parsing through Dialog (IJCAI’15) Learning Groundings with Human Interaction (IJCAI’16)

Human- Robot Dialog

33

[Thomason et al., IJCAI’15]

Robotics

slide-34
SLIDE 34

NLP Dialog Papers before proposal

Polysemy Induction and Synonymy Detection (IJCAI’17) Improving Semantic Parsing through Dialog (IJCAI’15) Learning Groundings with Human Interaction (IJCAI’16)

Human- Robot Dialog

34

[Thomason et al., IJCAI’16]

Robotics

slide-35
SLIDE 35

User Dialog Agent Dialog Policy Agent Belief Question

We do not yet handle perception information

35

“Get the empty bottle.” Meaning Robot Behavior Natural Language Understanding Semantic Parser Annotated World Knowledge

slide-36
SLIDE 36

User Dialog Agent Dialog Policy Agent Belief Question

We need to perform language grounding

36

“Get the empty bottle.” Meaning Robot Behavior Natural Language Understanding Semantic Parser Annotated World Knowledge Perception Models

slide-37
SLIDE 37

37

empty?

Language Grounding

Perception Models

yes

slide-38
SLIDE 38

38

Language Grounding

  • Symbol grounding problem.
  • Historically use visual space.
  • We use more than vision.

[Harnad, Physica D’90]

slide-39
SLIDE 39

39

Haptic sensors from arm give force information. Audio signals from mic give sound information.

Language Grounding

slide-40
SLIDE 40

Perceptual Grounding

40

Look [Sinapov et al., IJCAI’16; Thomason et al., IJCAI’16; Simonyan and Zisserman, CoRR’14]

color, shape, and deep VGG features.

slide-41
SLIDE 41

Building Perceptual Classifiers

41

SVM trained for predicate p and sensorimotor context c result on object o squishy press haptic p: c:

[Thomason et al., IJCAI’16]

Few labeled examples, but SVMs can operate

  • n this sparse data.
slide-42
SLIDE 42

Building Perceptual Classifiers

42

SVM trained for predicate p and sensorimotor context c result on object o Decision

[Thomason et al., IJCAI’16]

slide-43
SLIDE 43

Building Perceptual Classifiers

43

SVM trained for predicate p and sensorimotor context c result on object o Decision Sensorimotor Contexts

[Thomason et al., IJCAI’16]

slide-44
SLIDE 44

Building Perceptual Classifiers

44

SVM trained for predicate p and sensorimotor context c result on object o Decision Sensorimotor Contexts Context SVM result

[Thomason et al., IJCAI’16]

slide-45
SLIDE 45

Building Perceptual Classifiers

45

SVM trained for predicate p and sensorimotor context c result on object o Decision Reliability Weight Context SVM result Sensorimotor Contexts

[Thomason et al., IJCAI’16]

slide-46
SLIDE 46

Building Perceptual Classifiers

46

SVM trained for predicate p and sensorimotor context c result on object o

squishy sensorimotor context press-haptics 0.5 grasp-haptics 0.3 ... ... look-VGG 0.01

Reliability weights estimated from xval

[Thomason et al., IJCAI’16]

slide-47
SLIDE 47

Building Perceptual Classifiers

47

SVM trained for predicate p and sensorimotor context c result on object o Reliability weights estimated from xval

squishy sensorimotor context press-haptics 0.5 grasp-haptics 0.3 ... ... look-VGG 0.01

[Thomason et al., IJCAI’16]

press haptic

slide-48
SLIDE 48

Building Perceptual Classifiers

48

SVM trained for predicate p and sensorimotor context c result on object o Reliability weights estimated from xval

squishy sensorimotor context press-haptics 0.5 grasp-haptics 0.3 ... ... look-VGG 0.01

[Thomason et al., IJCAI’16]

look VGG

slide-49
SLIDE 49

Technical Contributions

49

  • Ensemble SVMs over multi-modal
  • bject features to perform

language grounding.

  • Get language labels from natural

language game with human users

[Thomason et al., IJCAI’16]

squishy press haptic

slide-50
SLIDE 50

50

[Thomason et al., IJCAI’16]

slide-51
SLIDE 51

Experiments Playing I Spy

vs multi-modal vision only

51

[Thomason et al., IJCAI’16]

slide-52
SLIDE 52

Experiments Playing I Spy

Four folds of objects for four rounds of training.

52

[Thomason et al., IJCAI’16]

slide-53
SLIDE 53

Problematic I Spy Object

53

[Thomason et al., IJCAI’16]

Future: Be mindful of object novelty both for the learning algorithm and for human users.

slide-54
SLIDE 54

NLP Dialog Papers before proposal

Polysemy Induction and Synonymy Detection (IJCAI’17) Improving Semantic Parsing through Dialog (IJCAI’15) Learning Groundings with Human Interaction (IJCAI’16)

Human- Robot Dialog

54

[Thomason et al., IJCAI’16]

Robotics

slide-55
SLIDE 55

NLP Robotics Dialog Papers before proposal

Polysemy Induction and Synonymy Detection Improving Semantic Parsing through Dialog (IJCAI’15) Learning Groundings with Human Interaction (IJCAI’16)

Human- Robot Dialog

55

[Thomason et al., IJCAI’17]

slide-56
SLIDE 56

Unsupervised Word Synset Induction

“kiwi” “chinese grapefruit” “kiwi vine”

[Thomason et al., IJCAI’17]

56

slide-57
SLIDE 57

Unsupervised Word Synset Induction

[Thomason et al., IJCAI’17]

“kiwi”, “chinese grapefruit”, “kiwi vine” “kiwi” “kiwi”

57

slide-58
SLIDE 58

NLP Robotics Dialog Papers before proposal

Polysemy Induction and Synonymy Detection Improving Semantic Parsing through Dialog (IJCAI’15) Learning Groundings with Human Interaction (IJCAI’16)

Human- Robot Dialog

58

[Thomason et al., IJCAI’17]

slide-59
SLIDE 59

Human- Robot Dialog NLP Dialog

Faster Object Exploration for Grounding (AAAI’18) Learning Groundings with Opportunistic Active Learning (CoRL’17) Jointly Improving Parsing & Perception (in submission)

Papers since proposal

59

Robotics

slide-60
SLIDE 60

Human- Robot Dialog NLP Dialog

Faster Object Exploration for Grounding (AAAI’18) Learning Groundings with Opportunistic Active Learning (CoRL’17) Jointly Improving Parsing & Perception (in submission)

Papers since proposal

60

[Thomason et al., AAAI’18]

Robotics

slide-61
SLIDE 61

Exploratory Behaviors

61

104s to explore an object once. 520s to explore an object five times. 4.5 hours to fully explore 32

  • bjects.

+hold (5.7s) +look (0.8s)

[Thomason et al., AAAI’18]

slide-62
SLIDE 62

Guiding Exploratory Behaviors

rigid: squishy? press haptic look VGG press? look?

62

[Thomason et al., AAAI’18]

slide-63
SLIDE 63

Guiding Exploratory Behaviors

rigid: squishy press haptic look VGG press haptic look VGG

63

[Thomason et al., AAAI’18]

slide-64
SLIDE 64

Guiding Exploratory Behaviors

64

d1 d2 similarity(rigid, squishy) = cos()

  • mug

tall

[Thomason et al., AAAI’18; Mikolov et al., NIPS’13]

rigid squishy

slide-65
SLIDE 65

Shared Structure: Embeddings and Features

65

2D-projection of word embeddings 2D-projection of behavior context features

[Thomason et al., AAAI’18]

slide-66
SLIDE 66

Guiding Exploratory Behaviors using Embeddings

66

Reliability weights for trained neighbor classifiers p Surrogate reliability weights for new classifiers for q Nearest word-embedding predicates to q

[Thomason et al., AAAI’18]

slide-67
SLIDE 67

Technical Contributions

67

  • Reduce exploration time when

learning a target new word.

  • Use word embeddings and

human annotations to guide behaviors.

[Thomason et al., AAAI’18]

slide-68
SLIDE 68

Results

Agreement with Gold (dotted lines show standard error) Number of Behaviors Number of Behaviors Number of Behaviors

Contents predicates Color predicates Weight predicates

68

[Thomason et al., AAAI’18]

slide-69
SLIDE 69

69

Other Findings

[Thomason et al., AAAI’18]

  • Human annotations help;

“how would you tell if an

  • bject is tall?”
  • Human annotations + word

embeddings work better than either alone.

  • n

table held lifted grasp drop lift lower look press push hold

slide-70
SLIDE 70

Human- Robot Dialog NLP Dialog

Faster Object Exploration for Grounding (AAAI’18) Learning Groundings with Opportunistic Active Learning (CoRL’17) Jointly Improving Parsing & Perception (in submission)

Papers since proposal

70

[Thomason et al., AAAI’18]

Robotics

slide-71
SLIDE 71

Human- Robot Dialog NLP Dialog

Faster Object Exploration for Grounding (AAAI’18) Learning Groundings with Opportunistic Active Learning Jointly Improving Parsing & Perception (in submisison)

Papers since proposal

71

[Thomason et al., CoRL’17]

Robotics

slide-72
SLIDE 72

d(bottle, ) = 0.8 d(bottle, ) = -0.2 d(bottle, ) = 0.4

Active Learning for Perceptual Questions

72

d(bottle, ) = -0.6

The object for which the predicate classifier is least sure of the predicted label.

[Thomason et al., CoRL’17]

slide-73
SLIDE 73

Active Learning for Perceptual Questions

73

empty sensorimotor context wp,c lift-haptics ? lift-audio ? ... ... look-vgg ? bottle sensorimotor context wp,c look-shape 0.6 look-vgg 0.5 ... ... lower-haptics 0.02

[Thomason et al., CoRL’17]

slide-74
SLIDE 74

Ask for a label with probability proportional to unconfidence in least confident training object. Ask for a positive label for any predicate we have insufficient data for.

74

Active Learning for Perceptual Questions

[Thomason et al., CoRL’17]

slide-75
SLIDE 75

Ask for a label with probability proportional to unconfidence in least confident training object. Ask for a positive label for any predicate we have insufficient data for.

75

Active Learning for Perceptual Questions

“Can you show me something empty?” “Could you use the word bottle when describing this object?”

[Thomason et al., CoRL’17]

slide-76
SLIDE 76

[Thomason et al., CoRL’17]

76

slide-77
SLIDE 77

Technical Contributions

77

  • Introduce an opportunistic

active learning strategy for getting high-value labels.

  • Show that off-topic questions

improve performance.

[Thomason et al., CoRL’17]

“A full, yellow bottle.” “Would you describe this

  • bject as full?”

“Show me something red.”

slide-78
SLIDE 78

Experiments with Object Identification

vs Baseline Agent Inquisitive Agent

78

[Thomason et al., CoRL’17]

“Would you describe this

  • bject as full?”

“Show me something red.”

slide-79
SLIDE 79

Results

79

[Thomason et al., CoRL’17]

“Would you describe this

  • bject as full?”

“Show me something red.”

Baseline Agent

Rated less annoying.

Inquisitive Agent

Correct object more often. Rated better for real-world use.

slide-80
SLIDE 80

Human- Robot Dialog NLP Dialog

Faster Object Exploration for Grounding (AAAI’18) Learning Groundings with Opportunistic Active Learning Jointly Improving Parsing & Perception (in submission)

Papers since proposal

80

[Thomason et al., CoRL’17]

Robotics

slide-81
SLIDE 81

Human- Robot Dialog NLP Dialog

Faster Object Exploration for Grounding (AAAI’18) Learning Groundings with Opportunistic Active Learning (CoRL’17) Jointly Improving Parsing & Perception (in submission)

Papers since proposal

81

[in submission]

Robotics

slide-82
SLIDE 82

User Dialog Agent Dialog Policy Agent Belief Question

Human-Robot Dialog

82

Utterance Meaning Robot Behavior Natural Language Understanding Semantic Parser Annotated World Knowledge Perception Models

slide-83
SLIDE 83

Jointly Improving Parsing and Perception

“Move a rattling container from lounge by the conference room to Bob’s office.”

[in submission]

83

slide-84
SLIDE 84

Experiments via Amazon Mechanical Turk

Semantic Parser Induced Training Pairs

84

[in submission]

Object / Predicate Labels Perception Models

x 113 Training

slide-85
SLIDE 85

Experiments via Amazon Mechanical Turk

Semantic Parser

85

[in submission]

Perception Models

x ~45 Testing - Baseline

slide-86
SLIDE 86

Experiments via Amazon Mechanical Turk

Semantic Parser

86

[in submission]

Perception Models

x ~45 Testing - Perception

Object / Predicate Labels Perception Models

slide-87
SLIDE 87

Getting Object/Predicate Labels in Dialog

87

[in submission]

Object / Predicate Labels Perception Models

slide-88
SLIDE 88

Getting Object/Predicate Labels in Dialog

88

[in submission]

Object / Predicate Labels Perception Models

slide-89
SLIDE 89

Experiments via Amazon Mechanical Turk

Semantic Parser

89

[in submission]

Perception Models

x ~45 Testing - Parsing + Perception

Object / Predicate Labels Perception Models Induced Training Pairs

slide-90
SLIDE 90

Inducing New Training Examples from Dialog

90

[in submission]

Semantic Parser Induced Training Pairs

slide-91
SLIDE 91

Inducing New Training Examples from Dialog

91

[in submission]

Expect whole command Expect goal task: navigate goal: room_3

slide-92
SLIDE 92

92

Inducing New Training Examples from Dialog

Induced Utterance/Denotation Pairs “go to the middle lab” navigate(room_3) “the lab in the middle” room_3

[in submission]

slide-93
SLIDE 93

Natural Language Understanding

Natural Language Understanding

93

Semantic Parser Annotated World Knowledge something that is a lab something that is both a lab and is central something that is central ... “the lab in the middle” room_3 Perception Models room_3, room_7, ... room_3 room_3, room_1, ... ...

[in submission]

slide-94
SLIDE 94

94

Inducing New Training Examples from Dialog

Semantic Parser Induced Utterance/Denotation Pairs “go to the middle lab” navigate(room_3) “the lab in the middle” room_3

[in submission]

Annotated World Knowledge Perception Models Induced Parser Training Data “go to the middle lab” navigate(lab+central) “the lab in the middle” lab+central

slide-95
SLIDE 95

Using Embeddings for Out-of-Vocabulary Words

Semantic Parser Induced Training Pairs

95

“deliver java to bob” task: deliver item: coffee person: bob Word Embeddings “deliver” -> “bring” “java” -> “coffee”

[Mikolov et al., NIPS’13; in submission]

“deliver java to bob”

slide-96
SLIDE 96

Using Embeddings to Find Perception Words

96

[Mikolov et al., NIPS’13; in submission]

d1 d2

  • long

white tall tower

slide-97
SLIDE 97

Technical Contributions

97

  • Improve both parsing and

perception from conversations.

  • Use word embeddings to guide

search for synonyms and novel perceptual predicates.

[in submission]

Semantic Parser Perception Models Induced Training Pairs Object / Predicate Labels d1 d2 long white tall tower

slide-98
SLIDE 98

Experiments via Amazon Mechanical Turk

Semantic Parser

98

[in submission]

Perception Models

Untrained Baseline

Semantic Parser Perception Models

Perception Training

Semantic Parser Perception Models

Parsing + Perception Training

Induced Training Pairs Object / Predicate Labels Object / Predicate Labels

slide-99
SLIDE 99

Metric - Semantic F1

99

[in submission]

slide-100
SLIDE 100

Results - Navigation Task

100

[in submission]

Quantitative - Semantic F1 Qualitative - Usability Rating

slide-101
SLIDE 101

Results - Delivery Task

101

[in submission]

Quantitative - Semantic F1 Qualitative - Usability Rating

slide-102
SLIDE 102

Results - Relocation Task

102

[in submission]

Quantitative - Semantic F1 Qualitative - Usability Rating

slide-103
SLIDE 103

103

[in sub- mission]

slide-104
SLIDE 104

Human- Robot Dialog NLP Dialog

Faster Object Exploration for Grounding (AAAI’18) Learning Groundings with Opportunistic Active Learning (CoRL’17) Jointly Improving Parsing & Perception (in submission)

Papers since proposal

104

[in submission]

Robotics

[in submission]

slide-105
SLIDE 105

NLP Dialog Human- Robot Dialog

105

Next Directions Robotics

slide-106
SLIDE 106

Grounded Predicate Synset Induction

“light” “pale” “small”

106

slide-107
SLIDE 107

Grounded Predicate Synset Induction

107

“light”/”pale” “light”/“small”

slide-108
SLIDE 108

Guided Exploration of New Objects

“Move a rattling container from the kitchen to bob’s office.”

108

rattling?

Perception Models

yes / no

Guided Behavior(s)

slide-109
SLIDE 109

Moving Forward

109

  • The intersection of problems in human-robot dialog is

inherently low-resource.

  • Other parts of NLP, Robotics, and Dialog are not.
  • We can use big data and techniques from these fields

when solving problems in human-robot dialog.

slide-110
SLIDE 110

110

Moving Forward - Using Big Data Where We Can

Very Large Corpus of Unstructured Text

Latent Language Information

Word Embeddings World Knowledge Statistical Scripts ...

slide-111
SLIDE 111

111

Moving Forward - Using Big Data Where We Can

VGG Net

Very Large Corpus of Training Examples

Crowd-sourced (ImageNet) bottle

[Thomason et al., IJCAI’16; Simonyan and Zisserman, CoRR’14]

good features

slide-112
SLIDE 112

112

Moving Forward - Using Big Data Where We Can

Corpus of Object Representations from Exploratory Behaviors

[Burchfiel et al., RSS’17]

good features? Latent Representations

Autoencoders GANs ....

slide-113
SLIDE 113

User Robot

113

Robot Behavior

Moving Forward - Transfer Learning

Corpus of Human-Robot Dialogs

Similar domain shared commands Sharing object representations

slide-114
SLIDE 114

NLP Dialog Papers before proposal

Polysemy Induction and Synonymy Detection (IJCAI’17) Improving Semantic Parsing through Dialog (IJCAI’15) Learning Groundings with Human Interaction (IJCAI’16)

Human- Robot Dialog

114

Robotics

slide-115
SLIDE 115

Human- Robot Dialog NLP Dialog

Faster Object Exploration for Grounding (AAAI’18) Learning Groundings with Opportunistic Active Learning (CoRL’17) Jointly Improving Parsing & Perception (in submission)

Papers since proposal

115

Robotics

slide-116
SLIDE 116

Acknowledgments

Ray Mooney Peter Stone Scott Niekum Stefanie Tellex

slide-117
SLIDE 117

Acknowledgments

Jivko Sinapov Shiqi Zhang Aishwarya Padmakumar Rodolfo Corona Harel Yedidsion Piyush Khandelwal Justin Hart Nick Walker Subhashini Venugopalan Yuqian Jiang

slide-118
SLIDE 118
  • Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog.

Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Harel Yedidsion, Justin Hart, Peter Stone, Raymond J. Mooney. (in submission)

  • Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions.

Jesse Thomason, Jivko Sinapov, Raymond J. Mooney, and Peter Stone. AAAI’18.

  • Improving Black-box Speech Recognition using Semantic Parsing.

Rodolfo Corona, Jesse Thomason, and Raymond J. Mooney. IJCNLP’17.

  • Opportunistic Active Learning for Grounding Natural Language Descriptions.

Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Justin Hart, Peter Stone, and Raymond J.

  • Mooney. CoRL’17.
  • Multi-Modal Word Synset Induction.

Jesse Thomason and Raymond J. Mooney. IJCAI’17.

  • Integrated Learning of Dialog Strategies and Semantic Parsing.

Aishwarya Padmakumar, Jesse Thomason, Raymond J. Mooney. EACL’17.

  • BWIBots: A platform for bridging the gap between AI and human--robot interaction research.

Piyush Khandelwal, Shiqi Zhang, Jivko Sinapov, Matteo Leonetti, Jesse Thomason, Fangkai Yang, Ilaria Gori, Maxwell Svetlik, Priyanka Khante, Vladimir Lifschitz, J. K. Aggarwal, Raymond Mooney, and Peter

  • Stone. IJRR’17.
  • Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy".

Jesse Thomason, Jivko Sinapov, Maxwell Svetlik, Peter Stone, and Raymond J. Mooney. IJCAI’16.

  • Learning to Interpret Natural Language Commands through Human-Robot Dialog.

Jesse Thomason, Shiqi Zhang, Raymond J. Mooney, and Peter Stone. IJCAI’15.

118

slide-119
SLIDE 119

Graded Adjectives

  • Think of gradation as a form of polysemy
  • Semantic parser can use surrounding context
  • Re-ranking of parses, as discussed, can help

disambiguate

119

slide-120
SLIDE 120

words

120

“heavy” “mug” “plate”

slide-121
SLIDE 121

words predicates

121

“heavy” “mug” “plate” plate0 mug0 heavy0 heavy1

slide-122
SLIDE 122

Comparative Adjectives

  • E.g. “taller”, “heavier”; take two arguments: obj1, obj2
  • Train classifier on the feature differences between obj1,
  • bj2
  • Can otherwise be handled with existing architecture
  • Superlatives: majority winner object in pairwise

comparative

122

slide-123
SLIDE 123

Mechanical Turk Qualitative Results

123

[Thomason, IJCAI’15]

slide-124
SLIDE 124

Mechanical Turk Qualitative Results

124

[Thomason, IJCAI’15]

slide-125
SLIDE 125

Multi-modal Representation

125

  • LSA embedding text features; VGG image features

“... most of the oldest known, definitely identified bat fossils were already very similar to modern microbats … ” “... a baseball bat is divided into several regions …” “... about 70% of bat species are insectivores … “ “... hickory has fallen into disfavor over its greater weight, which slows down bat speed … “

Bat Bat Bat Bat

[Thomason et al., IJCAI’17; Deerwester et al., 1990; Simonyan and Zisserman, CoRR’14]

slide-126
SLIDE 126

Technical Contributions

126

  • Perform unsupervised,

multi-modal sense induction and synonymy detection

  • Create an ImageNet-like

resource without manual annotation.

[Thomason et al., IJCAI’17]

slide-127
SLIDE 127

Results

127

ImageNet Text-only Vision-only Multi-modal

[Thomason et al., IJCAI’17]

slide-128
SLIDE 128

Results

128

[Thomason et al., IJCAI’17]

slide-129
SLIDE 129

Results - Correct Object Selected

Same Question Budget

129

slide-130
SLIDE 130

Results - Users Feeling Understood

130

Same Question Budget

slide-131
SLIDE 131

Results - Users Annoyed

Same Question Budget

131

slide-132
SLIDE 132

Results - Viable for Deployment

132

Same Question Budget

slide-133
SLIDE 133
  • Given utterance-denotation pair, find a semantic form that

is plausible for both

Learning from Denotations

133

“rattling container”

( , )

[Liang and Potts, Annual Review of Linguistics’15]

slide-134
SLIDE 134
  • Use the parser to produce a beam of parses
  • Use the grounder to find the denotations of those parses

Learning from Denotations

134

“rattling container”

( , )

slide-135
SLIDE 135

Learning from Denotations

135

“rattling container”

( , , )

the(λy.(rattling(y))) the(λy.(rattling(y) ⋀ container(y))) the(λy.(container(y))) rattling ⋀ container ...

Semantic Parser

slide-136
SLIDE 136

Learning from Denotations

136

“rattling container”

( , , )

the(λy.(rattling(y))) the(λy.(rattling(y) ⋀ container(y))) the(λy.(container(y))) rattling ⋀ container ...

Grounding Modules Semantic Parser

slide-137
SLIDE 137

Learning from Denotations

137

“rattling container”

( , , )

the(λy.(rattling(y) ⋀ container(y)))

slide-138
SLIDE 138

Learning from Denotations

138

“rattling container”

( , )

the(λy.(rattling(y) ⋀ container(y)))

slide-139
SLIDE 139

[ongoing]

139

slide-140
SLIDE 140

Neural Parsing Methods

  • Recurrent Neural Networks (RNNs) with Attention
  • Sequence-to-Tree encoder-decoder networks

RNN+Attention “Walk to Alice’s office.” task: navigate goal: room_1

[Jia, ACL’16; Dong, ACL’16]

Seq-2-Tree “Walk to Alice’s office.” task: navigate goal: room_1

140

slide-141
SLIDE 141

Neural Perception Models

[Gao, ICRA’16]

141

  • Compress high-dimensional sensorimotor context

information using Convolutional Neural Networks (CNNs) textured?

Visual CNN Haptic CNN Fusion

yes

slide-142
SLIDE 142

Embodied Question Answering

[Das et al., CVPR’18]

142

  • End-to-end deep model for joint parsing and perception