Opening the pod bay doors building intelligent agents that can - - PowerPoint PPT Presentation

opening the pod bay doors
SMART_READER_LITE
LIVE PREVIEW

Opening the pod bay doors building intelligent agents that can - - PowerPoint PPT Presentation

Opening the pod bay doors building intelligent agents that can interpret, generate and learn from natural language Jacob Andreas, MIT / Microsoft web.mit.edu/jda/www / @jacobandreas Following natural language instructions


slide-1
SLIDE 1

Opening the pod bay doors


building intelligent agents that can interpret, generate and learn from natural language

Jacob Andreas, MIT / Microsoft web.mit.edu/jda/www / @jacobandreas

slide-2
SLIDE 2

Following natural language instructions

slide-3
SLIDE 3

https://www.youtube.com/watch?v=-3m-Zu3qgM4

slide-4
SLIDE 4

https://www.youtube.com/watch?v=G_v5B_gYceM

slide-5
SLIDE 5

https://www.marcelvarallo.com/the-ballad-of-roomba-part2/

slide-6
SLIDE 6

[self-driving cars]

https://www.freep.com/story/money/cars/general-motors/2019/07/24/gms-self-driving-car-robot-taxi

slide-7
SLIDE 7

Following natural language instructions

slide-8
SLIDE 8

Instruction following: ingredients

Environment Actions Instructions Supervision Context Data

slide-9
SLIDE 9

Context: Environments & Actions

s1 s4 s3 s0

slide-10
SLIDE 10

Context: Environments & Actions

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

s1 s4 s3 s0

slide-11
SLIDE 11

Context: Environments & Actions

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

s1 s4 s3 s0

go_forward t u r n _ r i g h t

a1: a

2:

slide-12
SLIDE 12

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

s1 s4 s3 s0

go_forward t u r n _ r i g h t

Looks like an MDP!

Context: Environments & Actions

a1: a

2:

slide-13
SLIDE 13

Instructions

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

s1 s3 s0

go_forward t u r n _ r i g h t

slide-14
SLIDE 14

Instructions

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

s1 s3 s0

go_forward t u r n _ r i g h t

Go forward, then turn to face right.

slide-15
SLIDE 15

Instructions

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

s1 s3 s0

go_forward t u r n _ r i g h t

Go forward, then turn to face right. Find the sofa.

slide-16
SLIDE 16

Instructions

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

s1 s3 s0

go_forward t u r n _ r i g h t

Go forward, then turn to face right. Find the sofa. Go forward then face the sofa.

slide-17
SLIDE 17

Supervision

s1 s3 s0

go_forward t u r n _ r i g h t

s4

Find the sofa.

slide-18
SLIDE 18

Supervision

s1 s3 s0

go_forward t u r n _ r i g h t

s4

Find the sofa.

slide-19
SLIDE 19

Supervision

s1 s3 s0

go_forward t u r n _ r i g h t

s4

Find the sofa.

slide-20
SLIDE 20

Supervision

s1 s3 s0

go_forward t u r n _ r i g h t

s4

Go forward, then turn to face right. Find the sofa.

slide-21
SLIDE 21

Instruction following: formally

States S Actions A Transitions T: S ⨉ A → S Instruction X Demo Y Reward R Goal: find a policy S ⨉ X → A Context Data

slide-22
SLIDE 22

As machine translation

Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right

slide-23
SLIDE 23

As machine translation

Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

slide-24
SLIDE 24

As machine translation

Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

slide-25
SLIDE 25

Approach 1: predicting action sequences

slide-26
SLIDE 26

From instructions to actions

Go through the door and end facing into the next room.

turn_right turn_left go_forward stop

slide-27
SLIDE 27

From instructions to actions

Go through the door and end facing into the next room.

go_forward stop

slide-28
SLIDE 28

From instructions to actions

Key idea: solve this like a normal MDP, with the instruction as part of the state

  • bservation.
slide-29
SLIDE 29

From instructions to actions

max p(action | text, state; θ)

θ

max Estate | θ R(action | state)

θ

max p(action | text, state; θ)

action

Training Evaluation

slide-30
SLIDE 30

Are we there yet?

Go through the door and end facing into the next room.

turn_right turn_left go_forward stop

slide-31
SLIDE 31

Are we there yet?

Go through the door and end facing into the next room.

turn_right turn_left go_forward stop

slide-32
SLIDE 32

Are we there yet?

Go through the door and end facing into the next room.

turn_right turn_left go_forward stop

slide-33
SLIDE 33

Are we there yet?

Key idea: make the state space track both "reading state” and physical state.

slide-34
SLIDE 34

Augmented state spaces

Environment states Se Environment actions Ae Reading states Sr Reading actions Ar

slide-35
SLIDE 35

Augmented state spaces

Environment states Se Environment actions Ae Reading states Se Reading actions Ae

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

s1 s3

t u r n _ r i g h t

slide-36
SLIDE 36

Augmented state spaces

Environment states Se Environment actions Ae Reading states Se Reading actions Ae

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

s1 s3

t u r n _ r i g h t

Go forward then face the sofa. Go forward then face the sofa. Go forward then face the sofa. Go forward then face the sofa.

slide-37
SLIDE 37

Augmented state spaces

States S = Se ⨉ Sr Actions A = Ae ⋃ Ar Transitions T: S ⨉ A → S Goal: find a policy S ⨉ X → A

slide-38
SLIDE 38

Augmented state spaces: training

max p(action | text, state; θ) max Estate | θ R(action | state) max p(action | text, state; θ)

action

Training Evaluation

[Branavan et al., ACL ’09]

slide-39
SLIDE 39
slide-40
SLIDE 40

clear the two long columns, and then the row

slide-41
SLIDE 41

Augmented state spaces: better training

max p(action | text, state; θ) max Estate | θ R(action | state) max p(action | text, state; θ)

action

Training Evaluation

slide-42
SLIDE 42

Learning the reading state

Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right

slide-43
SLIDE 43

Learning the reading state

Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right

slide-44
SLIDE 44

Learning the reading state

Key idea: move “reading state” into the hidden state of an RNN.

[Mei et al., AAAI ’16]

slide-45
SLIDE 45

Learning the reading state

max p(action | text, state; θ) max Estate | θ R(action | state) max p(action | text, state; θ)

action

Training Evaluation

slide-46
SLIDE 46

human: Walk past hall table. Walk into bedroom. Make left at table clock. Wait at bathroom door threshold.

slide-47
SLIDE 47

Approach 2: predicting constraints

slide-48
SLIDE 48

Actions, goals, constraints

Find a table next to a chair.

go_forward go_forward turn_left go_forward turn_left

slide-49
SLIDE 49

Actions, goals, constraints

[Find] [a table] [next to] [a chair].

go_forward go_forward turn_left go_forward turn_left

slide-50
SLIDE 50

Actions, goals, constraints

[Find] [a table] [next to] [a chair].

slide-51
SLIDE 51

Actions, goals, constraints

[Find] [a table] [next to] [a chair].

slide-52
SLIDE 52

Actions, goals, constraints

Key idea: predict constraints rather than action sequences, and let a planner do the rest of the work.

slide-53
SLIDE 53

Predicting constraints

[Find] [a table] [next to] [a chair].

x1 x2 x3 x3 x5 x6

slide-54
SLIDE 54

Predicting constraints

[Find] [a table] [next to] [a chair].

x1 x2 x3 x3 x5 x6

x1? x3? x4?

x1 x2 x3 x3 x5 x6

slide-55
SLIDE 55

Predicting constraints

[Find] [a table] [next to] [a chair].

x1 x2 x3 x3 x5 x6

x6? x5?

x1 x2 x3 x3 x5 x6

slide-56
SLIDE 56

Predicting constraints

[Find] [a table] [next to] [a chair].

x1 x2 x3 x3 x5 x6

x6? x5? adj

x1 x2 x3 x3 x5 x6

slide-57
SLIDE 57

Predicting constraints

[Find] [a table] [next to] [a chair].

x1 x2 x3 x3 x5 x6

x6? x5? adj

x1 x2 x3 x3 x5 x6

slide-58
SLIDE 58

Predicting constraints

[Find] [a table] [next to] [a chair].

x1 x2 x3 x3 x5 x6

? ? ?

x1 x2 x3 x3 x5 x6

slide-59
SLIDE 59

Predicting constraints

[Find] [a table] [next to] [a chair].

x1 x2 x3 x3 x5 x6

? ? ?

x1 x2 x3 x3 x5 x6

slide-60
SLIDE 60

Predicting constraints

[Find] [a table] [next to] [a chair].

x1 x2 x3 x3 x5 x6

rel?

  • bj?
  • bj?

x1 x2 x3 x3 x5 x6

slide-61
SLIDE 61

Learning a constraint parser

[Find] [a table] [next to] [a chair].

x1 x2 x3 x3 x5 x6

rel?

  • bj?
  • bj?

x6? x5? adj

max p(labels | text, graph; θ)

θ

x1 x2 x3 x3 x5 x6

slide-62
SLIDE 62

Inferring constraints

[Find] [a table] [next to] [a chair].

x1 x2 x3 x3 x5 x6

rel?

  • bj?
  • bj?

x6? x5? adj

max p(labels | text, graph; θ)

labels

x1 x2 x3 x3 x5 x6

slide-63
SLIDE 63

Inferring constraints

[Put] [the cup] [on] [the table].

x1 x2 x3 x3 x5 x6

rel?

  • bj?
  • bj?

x6? x5? adj

max p(labels | text, graph; θ)

labels

[Tellex et al., NCAI ’11]

x1 x2 x3 x3 x5 x6

slide-64
SLIDE 64

Logical constraint languages

Find a table next to a chair.

at( x1 ) table( x1 ) next_to( x1 , x2 ) chair( x2 )

max p(constraint | text; θ)

θ

max p(constraint | text; θ)

constraint

slide-65
SLIDE 65
slide-66
SLIDE 66

Logical constraint languages

Find a table next to a chair.

at( x1 ) table( x1 ) next_to( x1 , x2 ) chair( x2 )

max p(constraint | text; θ)

θ

max p(constraint | text; θ)

constraint

slide-67
SLIDE 67

Logical constraint languages

X" y"

1" 2" 3" 4" 5" 1" 2" 3" 4" 5"

270$

90$

0$ 180$

C"

D" E"

A" B"

D" E"

  • (a) chair

λx.chair(x) ⇢

A" B"

  • (b) hall

λx.hall(x)

E"

(c) the chair ιx.chair(x)

C"

(d) you you ⇢

B"

  • (e) blue hall

λx.hall(x) ∧ blue(x) ⇢

E"

  • (f) chair in the intersection

λx.chair(x) ∧ intersect(ιy.junction(y), x) ⇢

A" B" E"

  • (g) in front of you

λx.in front of(you, x)

[Artzi et al., TACL ’13]

slide-68
SLIDE 68

Constraints without logic

go_forward turn_left turn_left go_forward turn_right

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

Find a table next to a chair.

slide-69
SLIDE 69

Constraints without logic

Key idea: use freeform learned potential functions rather than symbolic constraints

[Andreas & Klein, EMNLP ’16]

slide-70
SLIDE 70

Constraints without logic

go_forward turn_left turn_left go_forward turn_right

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

Find a table next to a chair.

slide-71
SLIDE 71

Constraints without logic

go_forward turn_left turn_left go_forward turn_right

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

Find a table next to a chair.

slide-72
SLIDE 72

Constraints without logic

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

Find a table next to a chair.

max

θ, alignment

f (plan, alignment | text; θ)

∑ f (plan’, alignment’ | text; θ)

max

plan,
 alignment

f (plan, alignment | text; θ)

slide-73
SLIDE 73

Constraints without logic

Clear the columns,
 then the row

slide-74
SLIDE 74

Constraints without logic

Clear the columns,
 then the row

(no “column”!)

slide-75
SLIDE 75

[Janner et al., TACL ’18]

slide-76
SLIDE 76

Our toolkit so far

slide-77
SLIDE 77

Instruction following

Track progress over time

In the underlying state space or RNN state

Plan ahead and reason about outcomes

With a symbolic planner or learned cost function

Act in complex environments

With expressive policies that condition on instructions and observations

slide-78
SLIDE 78

What else can we do?

slide-79
SLIDE 79

Application: instruction generation

slide-80
SLIDE 80

Instruction following

Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

slide-81
SLIDE 81

Instruction following generation

Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

slide-82
SLIDE 82

Prediction action sequences

find a sofa go_forward turn_left turn_left go_forward turn_right

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

slide-83
SLIDE 83

Instruction generation

Key idea: a good instruction gets readers to their goal with high probability (whatever the training data says!)

slide-84
SLIDE 84

Instruction generation

max p(text | plan; θ)

Max posterior probability (“how do people describe this?”) text

slide-85
SLIDE 85

Instruction generation

max p(text | plan; θ)

Max posterior probability (“how do people describe this?”) text

max p(plan | text; θ)

min Bayes risk (“how do I make people do this?”) text

slide-86
SLIDE 86

Reasoning about outcomes

Instruction
 follower

I will make a turn.

max p(plan | text; θ)

text

slide-87
SLIDE 87

Reasoning about outcomes

Listener I will make a turn.

max p(plan | text; θ)

text

slide-88
SLIDE 88

Reasoning about outcomes

Listener I will go straight through.

max p(plan | text; θ)

text

slide-89
SLIDE 89

Reasoning about outcomes

Listener I will turn left at the brick intersection.

max p(plan | text; θ)

text

[Fried et al., NAACL ’18]

slide-90
SLIDE 90

Reasoning about belief

[Frank & Goodman, Trends in Cog. Sci. ’12]

I will turn left at the brick intersection.

slide-91
SLIDE 91

listener: Walk past the dining room table and chairs and take a right into the living

  • room. Stop once you

are on the rug. speaker: Walk past the dining room table and chairs and wait there. human: Turn right and walk through the

  • kitchen. Go right into

the living room and stop by the rug.

slide-92
SLIDE 92

Application: machine teaching

slide-93
SLIDE 93

Instructions as scafgolds for RL

slide-94
SLIDE 94

Instructions as parameter-tying schemes

slide-95
SLIDE 95

Instructions as parameter tying schemes

Environment states Se Environment actions Ae Reading states Se Reading actions Ae

go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space go towards this direction! turn left turn left turn left turn left go forward instruction: … Turn left and go towards the sofa ... Low-level visuomotor space Panoramic action space

s1 s3

t u r n _ r i g h t

Go forward then face the sofa. Go forward then face the sofa. Go forward then face the sofa. Go forward then face the sofa.

slide-96
SLIDE 96

Instructions as parameter-tying schemes

go north, go north, go west go north, go east, go south go north, go east, go north, …

slide-97
SLIDE 97

Go north. Go east. Go north.

[Andreas et al., ICML ’17]

slide-98
SLIDE 98

Learning interactively from corrections

slide-99
SLIDE 99

Supervision

s1 s3 s0

go_forward t u r n _ r i g h t

s4

slide-100
SLIDE 100

Conditioning on the past

Push the chair against the wall.

go_forward grasp turn_left go_forward release

slide-101
SLIDE 101

go_forward grasp turn_left go_forward release

Conditioning on the past

Push the chair against the wall. No, the red chair.

turn_left grasp go_forward go_forward release

slide-102
SLIDE 102

go_forward grasp turn_left go_forward release

Conditioning on the past

Push the chair against the wall. Now a little to the left.

turn_left grasp go_forward turn_left release turn_left grasp go_forward go_forward release

No, the red chair.

slide-103
SLIDE 103

Conditioning on the past

Key idea: learn to solve problems interactively by conditioning on the whole history of instructions.

[Co-Reyes et al., ICLR ’19]

slide-104
SLIDE 104

Touch cyan block. Move closer to magenta block. Move a lot up. Move a little up.

slide-105
SLIDE 105

Learning with latent language

slide-106
SLIDE 106

Language learning as pertaining

FORWARD

reach the heart

slide-107
SLIDE 107

Structured exploration

slide-108
SLIDE 108

Structured exploration

R = -1 reach the heart

slide-109
SLIDE 109

Structured exploration

reach the heart east of the 
 gold star go to the east 


  • f the heart

R = -1 R = 0 R = 3

slide-110
SLIDE 110

Structured exploration

Language learning Reinforcement learning

go east of the heart

[Andreas et al., NAACL ’19]

slide-111
SLIDE 111

Structured few-shot learning

change any n 
 to a c replace all n s with c

loocies loocies

(a)

examples true description true output

  • pred. description
  • pred. output

emboldens kisses loneliness vein dogtrot emboldecs kisses locelicess veic dogtrot loonies

slide-112
SLIDE 112

Structured few-shot learning

change any n 
 to a c replace all n s with c

loocies loocies

(a)

examples true description true output

  • pred. description
  • pred. output

emboldens kisses loneliness vein dogtrot emboldecs kisses locelicess veic dogtrot loonies

slide-113
SLIDE 113
slide-114
SLIDE 114

Future challenges

slide-115
SLIDE 115

Fake data

Touch cyan block. Move closer to magenta block. Move a lot up.

slide-116
SLIDE 116

Fake data

Touch cyan block. Move closer to magenta block. Move a lot up.

“Instructions” are synthesized from a grammar because of sample ineffjciency!

slide-117
SLIDE 117

Fake data

25 50 75 100

Train on synthetic language and Test on
 synthetic Test on real

[Blukis et al., CoRL ’18]

slide-118
SLIDE 118

Fake data

Start by eliciting real user utterances Use synthetic data to augment, not replace, 
 natural language Sim-to-real transfer?

slide-119
SLIDE 119

Neural planning

Clear the columns,
 then the row

slide-120
SLIDE 120

Natural language subgoals

Solve the puzzle. Clear out the right half of the puzzle. Clear the remaining blocks. Remove all the long columns. Clear a row. Clear a column.

slide-121
SLIDE 121

Conclusions

Instruction following ⇒ other tasks

Language generation, machine teaching, structured exploration

Challenges

Better data effjciency, smarter inference

Instruction following ⇔ policy learning

But need to think carefully about state tracking, 
 planning, compositionality

slide-122
SLIDE 122

References

Branavan et al. Reinforcement learning for mapping instructions to actions. ACL 2009. Mei et al. Listen, attend, and walk: neural mapping of navigational instructions to action sequences. AAAI 2016. Tellex et al. Understanding natural language commands for robotic navigation and mobile manipulation. NCAI 2011. Andreas & Klein. Alignment-based compositional semantics for instruction following. EMNLP 2016. Andreas et al. Modular multitask reinforcement learning with policy sketches. ICML 2017. Fried et al. Unified pragmatic models for generating and following instructions. NAACL 2018. Co-Reyes et al. Guiding policies with language via meta-learning. ICLR 2019. Andreas et al. Learning with latent language. NAACL 2018.