Dialog as a Vehicle for Lifelong Learning of Grounded Language - - PowerPoint PPT Presentation

dialog as a vehicle for lifelong learning of grounded
SMART_READER_LITE
LIVE PREVIEW

Dialog as a Vehicle for Lifelong Learning of Grounded Language - - PowerPoint PPT Presentation

Dialog as a Vehicle for Lifelong Learning of Grounded Language Understanding Systems Aishwarya Padmakumar Doctoral Dissertation Defense 1 Grounded Language Understanding Mapping natural language to real-world entities Bring the blue mug


slide-1
SLIDE 1

Dialog as a Vehicle for Lifelong Learning of Grounded Language Understanding Systems

Aishwarya Padmakumar

Doctoral Dissertation Defense

1

slide-2
SLIDE 2

Grounded Language Understanding

2

Mapping natural language to real-world entities

Bring the blue mug from Alice’s office

slide-3
SLIDE 3

Applications in Service Robotics

Bring the blue mug from Alice’s office

3

slide-4
SLIDE 4

Standard Supervised Learning Pipeline

Collect Labelled Data Test Model Train Model

4

slide-5
SLIDE 5
  • Missing domain specific knowledge

– Alice’s office is missing in the directory – There is no category for mugs in the object detector.

Sources of Imperfect Understanding

5

  • Domain shift:

– Train: Test:

slide-6
SLIDE 6

Dialog - Clarification

Bring the blue mug from Alice’s office

6

bring( ,●)

slide-7
SLIDE 7

Dialog - Clarification

Bring the blue mug from Alice’s office Where should I bring a blue mug from? Alice Ashcraft’s office I should bring a blue mug from 3502? Yes

7

slide-8
SLIDE 8

Dialog - Improve Models

Bring the blue mug from Alice’s office Where should I bring a blue mug from? Alice Ashcraft’s office I should bring a blue mug from 3502? Yes

8

Alice’s office ≍ Alice Ashcraft’s

  • ffice

≍ 3502

slide-9
SLIDE 9

Dialog - Acquiring Labels

Bring the blue mug from Alice’s office

9

Blue?

slide-10
SLIDE 10

Dialog - Acquiring Labels

Bring the blue mug from Alice’s office Would you use the word “blue” to refer to this object? Yes

10

slide-11
SLIDE 11

Lifelong Learning

Initial Task(s), Data Test Model Train Model Additional Task(s), Data

11

slide-12
SLIDE 12

Lifelong Learning

Lifelong learning can make models more

  • Generalizable - adapt to a variety of test

data distributions

  • Versatile - same model can be shared

between multiple tasks, that are not necessarily pre-defined

12

slide-13
SLIDE 13

Dialog as a Vehicle for Lifelong Learning

  • Lifelong learning systems assume that

additional labelled data can be obtained from test time usage.

  • Dialog systems interact with users by

design - interactions can be leveraged to

  • btain labelled data.

13

slide-14
SLIDE 14

My Work

Designing dialog interactions to improve grounded language understanding systems and enabling them to perform lifelong learning.

14

slide-15
SLIDE 15

Outline

  • Background
  • Integrating Learning of Dialog Strategies and Semantic Parsing

(Padmakumar et.al., 2017)

  • Opportunistic Active Learning for Grounding Natural Language

Descriptions (Thomason et. al., 2017)

  • Learning a Policy for Opportunistic Active Learning (Padmakumar et.

al., 2018)

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries (Padmakumar and Mooney, in submission)

  • Summary
  • New Directions (Padmakumar and Mooney, RoboDial 2020)

15

slide-16
SLIDE 16

Outline

  • Background
  • Integrating Learning of Dialog Strategies and Semantic Parsing

(Padmakumar et.al., 2017)

  • Opportunistic Active Learning for Grounding Natural Language

Descriptions (Thomason et. al., 2017)

  • Learning a Policy for Opportunistic Active Learning (Padmakumar et.

al., 2018)

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries (Padmakumar and Mooney, in submission)

  • Summary
  • New Directions (Padmakumar and Mooney, RoboDial 2020)

16

Pre-proposal Work

slide-17
SLIDE 17

Outline

  • Background
  • Integrating Learning of Dialog Strategies and Semantic Parsing

(Padmakumar et.al., 2017)

  • Opportunistic Active Learning for Grounding Natural Language

Descriptions (Thomason et. al., 2017)

  • Learning a Policy for Opportunistic Active Learning (Padmakumar et.

al., 2018)

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries (Padmakumar and Mooney, in submission)

  • Summary
  • New Directions (Padmakumar and Mooney, RoboDial 2020)

17

Post-proposal Work

slide-18
SLIDE 18

Outline

  • Background
  • Integrating Learning of Dialog Strategies and Semantic Parsing

(Padmakumar et.al., 2017)

  • Opportunistic Active Learning for Grounding Natural Language

Descriptions (Thomason et. al., 2017)

  • Learning a Policy for Opportunistic Active Learning (Padmakumar et.

al., 2018)

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries (Padmakumar and Mooney, in submission)

  • Summary
  • New Directions (Padmakumar and Mooney, RoboDial 2020)

18

slide-19
SLIDE 19

Background: Parts of a Dialog System

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

19

slide-20
SLIDE 20

Background: Semantic Understanding

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

20

slide-21
SLIDE 21

Background: Semantic Understanding

21

Convert natural language into a machine understandable representation

slide-22
SLIDE 22

Background: Semantic Understanding

22

Convert natural language into a machine understandable representation Bring the blue mug from Alice’s

  • ffice

Semantic parsing -

  • Converts language to a

structured meaning representation

  • Compositionality - meaning of

“blue mug” from meaning of “blue” and meaning of “mug”

slide-23
SLIDE 23

Background: Semantic Understanding

23

Convert natural language into a machine understandable representation Bring the blue mug from Alice’s

  • ffice

Vector Space Representations -

  • Converts words/sentences to

vectors that represent meaning.

  • Less initial handcrafting
  • More training data
slide-24
SLIDE 24

Background: Grounding

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

24

slide-25
SLIDE 25

Background: Grounding

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

25

slide-26
SLIDE 26

Background: Grounding

26

Map meaning representations to real world entities

slide-27
SLIDE 27

Background: Grounding

Person Office alice 3502 bob 3324 3502

27

Map meaning representations to real world entities Knowledge Base Grounding

slide-28
SLIDE 28

Background: Grounding

28

Map meaning representations to real world entities Perceptual Grounding

Classifier blue/not blue Classifier blue/not blue blue not blue Classifier mug/not mug Classifier mug/not mug mug mug

slide-29
SLIDE 29

Background: Dialog Policy

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

29

slide-30
SLIDE 30

Background: Dialog Policy

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

30

slide-31
SLIDE 31

Background: Dialog Policy

Bring the blue mug from Alice’s office Confirm Ask Question Execute

31

Plans the next response that the system has to give.

slide-32
SLIDE 32

Background: Dialog Policy

  • Dialog state - Information from the dialog so

far

  • Dialog policy - Mapping from dialog states to

dialog actions (response types/ responses)

  • Learned using Reinforcement Learning

32

slide-33
SLIDE 33

Background: Reinforcement Learning

Agent Environment Markov Decision Process (MDP)

State Action Reward

33

slide-34
SLIDE 34

Background: Reinforcement Learning

Agent (Belief) Environment (State) Partially Observable Markov Decision Process (POMDP)

Observation Action Reward

34

slide-35
SLIDE 35

Background: Natural Language Generation

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

35

slide-36
SLIDE 36

Background: Natural Language Generation

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

36

slide-37
SLIDE 37

Background: Natural Language Generation

37

ask_param( action=bring, patient= src=? ) Where should I bring a blue mug from?

Converting an action to a natural language response

slide-38
SLIDE 38

Outline

  • Background
  • Integrating Learning of Dialog Strategies and Semantic Parsing

(Padmakumar et.al., 2017)

  • Opportunistic Active Learning for Grounding Natural Language

Descriptions (Thomason et. al., 2017)

  • Learning a Policy for Opportunistic Active Learning (Padmakumar et.

al., 2018)

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries (Padmakumar and Mooney, in submission)

  • Summary
  • New Directions (Padmakumar and Mooney, RoboDial 2020)

38

slide-39
SLIDE 39

Integrating Learning of Dialog Strategies and Semantic Parsing

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

39

[Padmakumar et. al., 2017]

slide-40
SLIDE 40

Prior work: Improving Semantic Parsers from Clarification Dialogs

40

Bring the blue mug from Alice’s office Where should I bring a blue mug from? Alice Ashcraft’s office I should bring a blue mug from 3502? Yes Alice’s office ≍ Alice Ashcraft’s

  • ffice

≍ 3502

[Thomason et. al., 2015]

slide-41
SLIDE 41

Prior Work: Dialog Policy Learning

Bring the blue mug from Alice’s office Confirm Ask Question Execute

41

Learns what the best next response is by modelling dialog system as a Partially Observable Markov Decision Process (POMDP)

slide-42
SLIDE 42

Summary

  • Jointly improving a semantic parser and

dialog policy from human interactions is more effective than improving either alone.

  • The training procedure needs to enable

changes in components to be propagated to each other for joint learning to be effective.

42

slide-43
SLIDE 43

Outline

  • Background
  • Integrating Learning of Dialog Strategies and Semantic Parsing

(Padmakumar et.al., 2017)

  • Opportunistic Active Learning for Grounding Natural Language

Descriptions (Thomason et. al., 2017)

  • Learning a Policy for Opportunistic Active Learning (Padmakumar et.

al., 2018)

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries (Padmakumar and Mooney, in submission)

  • Summary
  • New Directions (Padmakumar and Mooney, RoboDial 2020)

43

slide-44
SLIDE 44

Opportunistic Active Learning for Grounding Natural Language Descriptions

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

44

[Thomason et. al., 2017]

slide-45
SLIDE 45

Opportunistic Active Learning

45

  • A framework for incorporating active learning queries

into test time interactions.

  • Agent asks locally convenient questions during an

interactive task to collect labeled examples for supervised learning.

  • Questions may not be useful for the current interaction

but expected to help future tasks.

slide-46
SLIDE 46

Opportunistic Active Learning

Bring the blue mug from Alice’s office

46

Blue?

slide-47
SLIDE 47

Opportunistic Active Learning

Bring the blue mug from Alice’s office Would you use the word “blue” to refer to this object? Yes

47

slide-48
SLIDE 48

Opportunistic Active Learning

Bring the blue mug from Alice’s office

48

Tall? bring( ,3502) Heavy?

slide-49
SLIDE 49

Opportunistic Active Learning

Bring the blue mug from Alice’s office Would you use the word “tall” to refer to this object? Yes

49

slide-50
SLIDE 50

Opportunistic Active Learning

50

?

Query for labels most likely to improve the model.

slide-51
SLIDE 51

Opportunistic Active Learning

Why ask off-topic queries?

  • Robot may have good models for on-topic

concepts.

  • No useful on-topic queries.
  • Some off-topic concepts may be more important

because they are used in more interactions.

51

slide-52
SLIDE 52

Opportunistic Active Learning - Challenges

Some other object might be a better candidate for the question

52

Purple?

slide-53
SLIDE 53

Opportunistic Active Learning - Challenges

The question interrupts another task and may be seen as unnatural

53

Bring the blue mug from Alice’s office Would you use the word “tall” to refer to this object?

slide-54
SLIDE 54

Opportunistic Active Learning - Challenges

The information needs to be useful for a future task.

54

Red?

slide-55
SLIDE 55

Object Retrieval Task

55

slide-56
SLIDE 56

Object Retrieval Task

56

  • User describes an object

in the active test set

  • Robot needs to identify

which object is being described

slide-57
SLIDE 57

Object Retrieval Task

57

  • Robot can ask

questions about

  • bjects on the sides

to learn object attributes

slide-58
SLIDE 58

Two Types of Questions

58

slide-59
SLIDE 59

Two Types of Questions

59

slide-60
SLIDE 60

Experimental Conditions

60

A yellow water bottle

  • Baseline (on-topic) - the robot can only ask about

“yellow”, “water” and “bottle”

  • Inquisitive (on and off topic) - the robot can ask about any

concept it knows, possibly “red” or “heavy”

slide-61
SLIDE 61

Results

  • Inquisitive robot performs better at

understanding object descriptions.

  • Users find the robot more comprehending, fun

and usable in a real-world setting, when it is

  • pportunistic.

61

slide-62
SLIDE 62

Outline

  • Background
  • Integrating Learning of Dialog Strategies and Semantic Parsing

(Padmakumar et.al., 2017)

  • Opportunistic Active Learning for Grounding Natural Language

Descriptions (Thomason et. al., 2017)

  • Learning a Policy for Opportunistic Active Learning (Padmakumar et.

al., 2018)

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries (Padmakumar and Mooney, in submission)

  • Summary
  • New Directions (Padmakumar and Mooney, RoboDial 2020)

62

slide-63
SLIDE 63

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

63

Learning a Policy for Opportunistic Active Learning

[Padmakumar et. al., 2018]

slide-64
SLIDE 64

Opportunistic Active Learning

Bring the blue mug from Alice’s office Would you use the word “tall” to refer to this object? Yes

64

slide-65
SLIDE 65

Dialog Policy Learning

Bring the blue mug from Alice’s office

65

Tall? bring( ,3502) Heavy?

slide-66
SLIDE 66

Learning a Policy for Opportunistic Active Learning

Learn a dialog policy that decides how many and which questions to ask to improve grounding models.

66

slide-67
SLIDE 67

Learning a Policy for Opportunistic Active Learning

67

To learn an effective policy, the agent needs to learn – To identify good queries in the opportunistic setting. – When a guess is likely to be successful. – To trade off between model improvement and task completion.

slide-68
SLIDE 68

Task Setup

68

Target Description

slide-69
SLIDE 69

Task Setup

69

slide-70
SLIDE 70

Task Setup

70

slide-71
SLIDE 71

Grounding Model

71

A white umbrella {white, umbrella} Pretrained CNN SVM SVM white/ not white umbrella/ not umbrella

slide-72
SLIDE 72

Opportunistic Active Learning

  • Agent starts with no classifiers.
  • Labeled examples are acquired through

questions and used to train the classifiers.

  • Agent needs to learn a policy to balance

active learning with task completion.

72

slide-73
SLIDE 73

MDP Model

Dialog Agent User

Reward:

73

State: Action:

  • Target description
  • Active train and test
  • bjects
  • Agent’s perceptual

classifiers

  • Label query: <yellow, train_1>
  • Label query: <yellow, train_2>
  • Label query: <white, train_1>
  • Label query: <white, train_2>
  • ...
  • Example Query: yellow
  • Example query: white
  • ...
  • Guess

Max correct guesses with short dialogs

slide-74
SLIDE 74

Challenges

Dialog Agent User

Reward:

74

State: Action:

  • Target description
  • Active train and test
  • bjects
  • Agent’s perceptual

classifiers

  • Label query: <yellow, train_1>
  • Label query: <yellow, train_2>
  • Label query: <white, train_1>
  • Label query: <white, train_2>
  • ...
  • Example Query: yellow
  • Example query: white
  • ...
  • Guess

Max correct guesses with short dialogs

How to represent classifiers for policy learning?

slide-75
SLIDE 75

Challenges

Dialog Agent User

Reward:

75

State: Action:

  • Target description
  • Active train and test
  • bjects
  • Agent’s perceptual

classifiers

  • Label query: <yellow, train_1>
  • Label query: <yellow, train_2>
  • Label query: <white, train_1>
  • Label query: <white, train_2>
  • ...
  • Example Query: yellow
  • Example query: white
  • ...
  • Guess

Max correct guesses with short dialogs

How to handle a variable and growing action space?

slide-76
SLIDE 76

Tackling challenges

  • Features based on active learning metrics

– Representing classifiers

  • Featurize state-action pairs

– Variable number of actions and classifiers

  • Sampling a beam of promising queries

– Large action space

76

slide-77
SLIDE 77

Feature Groups

  • Query features - Active learning metrics

used to determine whether a query is useful

  • Guess features - Features that use the

predictions and confidences of classifiers to determine whether a guess will be correct

77

slide-78
SLIDE 78

Experiment Setup

  • Policy learning using REINFORCE.
  • Baseline - A hand-coded dialog policy that asks

a fixed number of questions selected using the sampling distribution that provides candidates to the learned policy.

78

slide-79
SLIDE 79

Experiment Phases

  • Initialization - Collect experience using the baseline

to initialize the policy.

  • Training - Improve the policy from on-policy

experience.

  • Testing - Policy weights are fixed, and we run a new

set of interactions, starting with no classifiers, over an independent test set with different predicates.

79

slide-80
SLIDE 80

Results

80

  • Systems evaluated
  • n dialog success

rate and average dialog length.

slide-81
SLIDE 81

Results

81

  • Systems evaluated
  • n dialog success

rate and average dialog length.

  • We prefer high

success rate and low dialog length (top left corner)

slide-82
SLIDE 82

Results

82

Static Learned

  • Learned policy is

more successful than the baseline, while also using shorter dialogs on average.

slide-83
SLIDE 83

Results

83

Static Learned

  • Query
  • Guess
  • If we ablate either

group of features, the success rate drops considerably but dialogs are also much shorter.

  • In both cases, the

system chooses to ask very few queries.

slide-84
SLIDE 84

Summary

  • We can learn a dialog policy that learns to

acquire knowledge of predicates through

  • pportunistic active learning.
  • The learned policy is more successful at
  • bject retrieval than a static baseline, using

fewer dialog turns on average.

84

slide-85
SLIDE 85

Outline

  • Background
  • Integrating Learning of Dialog Strategies and Semantic Parsing

(Padmakumar et.al., 2017)

  • Opportunistic Active Learning for Grounding Natural Language

Descriptions (Thomason et. al., 2017)

  • Learning a Policy for Opportunistic Active Learning (Padmakumar et.

al., 2018)

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries (Padmakumar and Mooney, in submission)

  • Summary
  • New Directions (Padmakumar and Mooney, RoboDial 2020)

85

slide-86
SLIDE 86

Outline

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries – Dialog Policy Learning for Joint Clarification and Active Learning Queries (Padmakumar and Mooney, in submission) – Human Evaluation – Extension to Joint Embedding Based Grounding Model

86

slide-87
SLIDE 87

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

87

Dialog Policy Learning for Joint Clarification and Active Learning Queries

[Padmakumar and Mooney, in submission]

slide-88
SLIDE 88

Previous Work

Bring the blue mug from Alice’s office

88

Tall? bring( ,3502) Heavy?

slide-89
SLIDE 89

This Work

Bring the blue mug from Alice’s office

89

Tall? bring(●,3502) Heavy?

slide-90
SLIDE 90

This Work

Bring the blue mug from Alice’s office

90

What should I bring? Would you use the word “tall” to refer to this object?

slide-91
SLIDE 91

Dialog Policy Learning for Joint Clarification and Active Learning Queries

91

Clarification Opportunistic Active Learning Dialog Policy Learning This Work

slide-92
SLIDE 92

Dialog Policy Learning for Joint Clarification and Active Learning Queries

Learn a dialog policy to trade off -

  • Model improvement with opportunistic active

learning to better understand future commands

  • Clarification to better understand and

complete the current command

92

slide-93
SLIDE 93

Attribute Based Clarification: Motivation

93

Bring the blue mug from Alice’s office What should I bring? bring(●, 3502)

slide-94
SLIDE 94

Attribute Based Clarification: Motivation

94

Bring the blue mug from Alice’s office What should I bring? The blue coffee mug What should I bring?

slide-95
SLIDE 95

Attribute Based Clarification: Motivation

95

Bring the blue mug from Alice’s office Is this the object I should bring? No Is this the object I should bring?

slide-96
SLIDE 96

Attribute Based Clarification: Motivation

96

[De Vries et. al., 2017] [Das, et. al., 2017]

slide-97
SLIDE 97

Attribute Based Clarification

  • More specific than a new description.
  • More general than showing each possible object.
  • Provide ground truth answers to questions for

training in simulation.

  • Attribute - any property that can be used in a

description - categories, colors, shapes, domain specific properties.

97

slide-98
SLIDE 98

Attribute Based Clarification: Motivation

98

Bring the blue mug from Alice’s office Is the object I should bring a cup?

slide-99
SLIDE 99

Task Setup

  • Motivated by an online

shopping application

  • Use clarifications to help refine

search queries

  • Use active learning to improve

the model retrieving images.

99

slide-100
SLIDE 100

Dataset

  • We simulate dialogs using

the iMaterialist Fashion Attribute dataset.

  • Images have associated

product titles and are annotated with binary labels for 228 attributes.

  • Attributes: Dress, Shirt,

Red, Blue, V-Neck, Pleats, ...

100

slide-101
SLIDE 101

Task Setup

101

Active Training Set Active Test Set

slide-102
SLIDE 102

Task Setup

102

What can I help you find? A Polka Dot Chiffon Blouse Would you like one which is black? Yes Yes Can you show me something you would describe as chiffon? Would you describe this as sleeveless? Is this what you were searching for? Yes

slide-103
SLIDE 103

Visual Attribute Classifier

103

slide-104
SLIDE 104

Visual Attribute Classifier

104

slide-105
SLIDE 105

Visual Attribute Classifier

105

slide-106
SLIDE 106

Visual Attribute Classifier

106

slide-107
SLIDE 107

Visual Attribute Classifier

107

Cross Entropy Loss Over All Examples

slide-108
SLIDE 108

Visual Attribute Classifier

108

slide-109
SLIDE 109

Visual Attribute Classifier

109

Cross Entropy Loss Over Positive Labels

slide-110
SLIDE 110

Grounding Model

A Polka Dot Chiffon Blouse

110

{Polka Dot, Chiffon, Blouse}

slide-111
SLIDE 111

Grounding Model

111

Belief:

Attributes Mentioned in Description A Polka Dot Chiffon Blouse {Polka Dot, Chiffon, Blouse}

slide-112
SLIDE 112

Grounding Model

112

Belief:

  • Classifier probability

that attribute w is positive for image i

  • w-th value in

classifier output for image i A Polka Dot Chiffon Blouse {Polka Dot, Chiffon, Blouse}

slide-113
SLIDE 113

Grounding Model

Agent: Would you like one which is black? User: Yes

113

<Black, 1>

Belief:

Clarifications that get the answer “Yes”

slide-114
SLIDE 114

Grounding Model

Agent: Would you like one which is black? User: No

114

<Black, 0>

Belief:

Clarifications that get the answer “No”

slide-115
SLIDE 115

Grounding Model

Best guess: Image in active test set with maximum belief b(i)

115

slide-116
SLIDE 116

Information Gain

  • For estimating the utility of clarifications
  • Estimated using classifier probabilities
  • Estimate based on Lee et. al., 2018

116

slide-117
SLIDE 117

Information Gain

117

slide-118
SLIDE 118

Information Gain

118

Objects in Active Test Set

slide-119
SLIDE 119

Information Gain

119

Possible answers to a clarification: No and Yes

slide-120
SLIDE 120

Information Gain

120

Belief of image i

slide-121
SLIDE 121

Information Gain

121

Probability of the answer

slide-122
SLIDE 122

Information Gain

122

Probability of the answer

For “Yes” Answer:

slide-123
SLIDE 123

Information Gain

123

Probability of the answer

For “No” Answer:

slide-124
SLIDE 124

Dialog as MDP

Dialog Agent User

Reward:

124

State: Action:

  • Target description
  • Active train and

test objects

  • Agent’s perceptual

classifiers

  • Clarifications
  • Label queries
  • Example queries
  • Guess

Max correct guesses with short dialogs

slide-125
SLIDE 125

Policy Learning

  • Hierarchical Dialog Policy -

– Clarification policy - chooses best clarification – Active learning policy - chooses best active learning query – Decision Policy - chooses between guess, best clarification and best active learning query

  • Featurize state-action pairs
  • Q-Learning and A3C for policy learning

125

slide-126
SLIDE 126

Policy Features

  • Clarification Policy Features - Metrics about

current beliefs, information gain

  • Active Learning Policy Features - Margin,

Fraction of previous uses and successes

  • Decision Policy Features - Metrics about current

beliefs, information gain, margin, dialog length

126

slide-127
SLIDE 127

Static Baseline

  • Clarification: Choose query with maximum information

gain

  • Active Learning: Uncertainty Sampling
  • Decision Policy

– Fixed dialog length – Clarification till the belief reaches a threshold – Active learning for the second half of the dialog

127

slide-128
SLIDE 128

Experiment Phases

  • Classifier Initialization - Train classifier using paired images

and labels

  • Policy Initialization - Collect experience using the baseline to

initialize the policy.

  • Policy Training - Improve the policy from on-policy experience.
  • Policy Testing - Policy weights are fixed, and we run a new set
  • f interactions, reset classifiers to the state at the end of

classifier initialization, over an independent test set with different predicates.

128

slide-129
SLIDE 129

Results

129

slide-130
SLIDE 130

Results

130

Fully learned policy is significantly more successful than the baseline, while also having significantly shorter dialogs on average

slide-131
SLIDE 131

Results

131

If we replace either the clarification or active learning policies with static policies, we find that the success rate drops considerably.

slide-132
SLIDE 132

Results

132

If we replace only the decision policy with a static policy, we find that it remains more successful than the baseline but is unable to shorten dialogs.

slide-133
SLIDE 133

Action Types - Learned Policy

133

slide-134
SLIDE 134

Utility of Clarifications

134

slide-135
SLIDE 135

Utility of Clarifications

135

slide-136
SLIDE 136

Summary

  • We train a hierarchical dialog policy to trade off opportunistic

active learning, attribute based clarification and task completion in a language based image retrieval task.

  • Our learned policy is more successful than a static baseline

while using fewer dialog turns on average.

  • In our task setup, both good clarifications and active learning

queries are necessary to improve performance over direct retrieval.

136

slide-137
SLIDE 137

Outline

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries – Dialog Policy Learning for Joint Clarification and Active Learning Queries (Padmakumar and Mooney, in submission) – Human Evaluation – Extension to Joint Embedding Based Grounding Model

137

slide-138
SLIDE 138

Human Evaluation - Experiment Changes

  • Descriptions from human users contained

far fewer attributes than product titles

  • Changes in task setup -

– Provide one attribute from product title as simulated description – Smaller and easier active test set

138

slide-139
SLIDE 139

Experiment Interface

139

slide-140
SLIDE 140

Experiment Interface

140

slide-141
SLIDE 141

Experiment

  • Initialization, training and test phases run

in new simulated setup

  • Run a single batch of interactions on

Amazon Mechanical Turk with final policy and classifiers

141

slide-142
SLIDE 142

Results

142

The learned policy is considerably more successful in the new simulated setup but is unable to shorten dialogs compared to the baseline.

slide-143
SLIDE 143

Results

143

  • The performance of both policies drops in AMT interactions.
  • The learned policy is still somewhat more successful (p <= 0.1)
slide-144
SLIDE 144

Outline

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries – Dialog Policy Learning for Joint Clarification and Active Learning Queries (Padmakumar and Mooney, in submission) – Human Evaluation – Extension to Grounding Model Based on Joint Embeddings

144

slide-145
SLIDE 145

Motivation

  • Independent classifiers cannot identify

correlations between properties

  • Multilabel classifiers assume a fixed set of

properties

145

slide-146
SLIDE 146

Grounding Model

146

slide-147
SLIDE 147

Grounding Model

147

  • Represent words and

images as vectors in the same space.

  • Words are near images

they apply to and vice versa.

slide-148
SLIDE 148

Grounding Model

To ground a description, such as “blue mug”, find the image which minimizes the sum of distances to the words.

148

slide-149
SLIDE 149

Grounding Model

To ground a description, such as “blue mug”, find the image which minimizes the sum of distances to the words.

149

slide-150
SLIDE 150

Grounding Model

To ground a description, such as “blue mug”, find the image which minimizes the sum of distances to the words.

150

slide-151
SLIDE 151

Grounding Model

To ground a description, such as “blue mug”, find the image which minimizes the sum of distances to the words.

151

slide-152
SLIDE 152

Grounding Model

152

slide-153
SLIDE 153

Grounding Model

d(f( ),g(blue)) ≤ d(f( ),g(blue)) d(f( ),g(blue)) ≤ d(f( ),g(pink))

153

  • Constraints captured using a ranking loss
  • Platt scaling parameters are trained

using log loss

slide-154
SLIDE 154

Preliminary Results

Clarifications with a high estimate of information gain do not necessarily increase the belief of the correct target image.

154

slide-155
SLIDE 155

Discussion

Possible reasons why our estimate of information gain is not able to identify helpful clarifications -

  • Noise in annotations used to provide

responses

  • Grounding model does not produce a true

probability distribution

155

slide-156
SLIDE 156

Future Work

  • Better learned spaces - Possibly using

pretrained models such as ViLBERT, LXMERT

  • Techniques such as adversarial loss to

make the learned space smoother.

156

slide-157
SLIDE 157

Outline

  • Background
  • Integrating Learning of Dialog Strategies and Semantic Parsing

(Padmakumar et.al., 2017)

  • Opportunistic Active Learning for Grounding Natural Language

Descriptions (Thomason et. al., 2017)

  • Learning a Policy for Opportunistic Active Learning (Padmakumar et.

al., 2018)

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries (Padmakumar and Mooney, in submission)

  • Summary
  • New Directions (Padmakumar and Mooney, RoboDial 2020)

157

slide-158
SLIDE 158

Dialog as a Vehicle for Lifelong Learning of Grounded Language Understanding Systems

158

slide-159
SLIDE 159

Joint Parser and Policy Learning

Bring the blue mug from Alice’s office Semantic Understanding Grounding Dialog Policy Natural Language Generation Where should I bring a blue mug from?

159

slide-160
SLIDE 160

Policy Learning for Opportunistic Active Learning

160

slide-161
SLIDE 161

Dialog Policy Learning for Joint Clarification and Active Learning Queries

161

What can I help you find? A Polka Dot Chiffon Blouse Would you like one which is black? Yes Yes Can you show me something you would describe as knit? Would you describe this as sleeveless? Is this what you were searching for? Yes

slide-162
SLIDE 162

Outline

  • Background
  • Integrating Learning of Dialog Strategies and Semantic Parsing

(Padmakumar et.al., 2017)

  • Opportunistic Active Learning for Grounding Natural Language

Descriptions (Thomason et. al., 2017)

  • Learning a Policy for Opportunistic Active Learning (Padmakumar et.

al., 2018)

  • Dialog Policy Learning for Joint Clarification and Active Learning

Queries (Padmakumar and Mooney, in submission)

  • Summary
  • New Directions (Padmakumar and Mooney, RoboDial 2020)

162

slide-163
SLIDE 163

Dialog as a Vehicle for Lifelong Learning

  • New challenge area for dialog researchers
  • Goal: Design dialog systems that can better

support lifelong learning

[Padmakumar and Mooney, RoboDial 2020]

163

slide-164
SLIDE 164

Challenges: Active Learning

  • Improving sample complexity
  • Few shot adaptation of pretrained models
  • Better robustness and transferability of RL

policies for active learning

164

slide-165
SLIDE 165

Challenge: Dialog Act Design

Design new dialog acts that collect labeled data or combine this with task-completion objectives Can you show me how to

  • pen this with

a knife?

165

slide-166
SLIDE 166

Challenges: Dataset Collection and Simulation

  • Designing simulations to answer a wide

range of queries.

  • Providing “correct” answers in simulation.
  • Sim2Real Transfer

166

slide-167
SLIDE 167

Challenges: User Experience

  • Prosodic analysis to identify urgency,

stress, sarcasm and frustration in users to determine when it is appropriate to include

  • r avoid data collection queries.
  • Demonstrating few-shot learning to keep

users motivated.

167

slide-168
SLIDE 168

Dialog as a Vehicle for Lifelong Learning of Grounded Language Understanding Systems

Aishwarya Padmakumar

Doctoral Dissertation Defense

168