Linguistic sca fg olds for policy learning Jacob Andreas Berkeley - - PowerPoint PPT Presentation

linguistic sca fg olds for policy learning
SMART_READER_LITE
LIVE PREVIEW

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley - - PowerPoint PPT Presentation

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley Microsoft Semantic Machines MIT Linguistic sca fg olds for policy learning (what can language do for RL?) Jacob Andreas Berkeley Microsoft Semantic Machines MIT An


slide-1
SLIDE 1

Jacob Andreas Berkeley → Microsoft Semantic Machines → MIT

Linguistic scafgolds for policy learning

slide-2
SLIDE 2

Linguistic scafgolds for policy learning

(what can language do for RL?)

Jacob Andreas Berkeley → Microsoft Semantic Machines → MIT

slide-3
SLIDE 3

An NLPer’s view of RL

( , R)

slide-4
SLIDE 4

An NLPer’s view of RL

( , R)

memorize 1 reward fn

slide-5
SLIDE 5

An NLPer’s view of RL

( , R)

( , R1) ( , R2)

[e.g. Taylor & Stone 09]

memorize k reward fns

slide-6
SLIDE 6

An NLPer’s view of RL

( , R)

( , R1) ( , R2)

( , R1)

(-2, 3)

( , R1)

(-2, -2)

Learn to accomplish new goals!

[e.g. Schaul et al. 15]

slide-7
SLIDE 7

An NLPer’s view of RL

( , R)

( , R1) ( , R2)

( , R1)

run northwest

( , R1)

go southwest

( , R1)

(-2, 3)

( , R1)

(-2, -2)

Learn to follow
 instructions!

slide-8
SLIDE 8

Instructions as observations

( , R)

( , R1) ( , R2)

( , R1)

run northwest

( , R1)

go southwest

( , R1)

(-2, 3)

( , R1)

(-2, -2)

slide-9
SLIDE 9

Instructions as observations

( , R)

( , R1) ( , R2)

( , R1)

run northwest

( , R1)

go southwest

( , R1)

(-2, 3)

( , R1)

(-2, -2)

slide-10
SLIDE 10

Beyond observations

(1) Instructions are moves in a game, not

  • bservations of an environment.

( , R1)

run northwest

( , R1)

go southwest

( , R1)

(-2, 3)

( , R1)

(-2, -2)

slide-11
SLIDE 11

Beyond goals

( , R1)

???

( , R1)

not so fast

( , R1)

run northwest

( , R1)

go southwest

(2) There’s more to language learning 
 than instruction following!

slide-12
SLIDE 12

Language use as gameplay

slide-13
SLIDE 13

Generation & understanding

[Anderson et al. 18]

Turn right and walk through the kitchen. Go right into the living room and stop by the rug.

slide-14
SLIDE 14

A reference game

[Frank & Goodman 12]

slide-15
SLIDE 15

“glasses"

[Frank & Goodman 12]

slide-16
SLIDE 16

“glasses"

[Frank & Goodman 12]

slide-17
SLIDE 17

“glasses"

[Frank & Goodman 12]

slide-18
SLIDE 18

“glasses"

[Frank & Goodman 12]

slide-19
SLIDE 19

The rational speech acts model

[Frank & Goodman 12, Degen 13]

L0( . | glasses) L0( . | hat)

1/2 1/2 1

slide-20
SLIDE 20

The rational speech acts model

L0( . | glasses) L0( . | hat)

1/2 1/2 1

S1( glasses | . ) ∝ L0( . | glasses)

1 1/3

S1( hat | . )

2/3

[Frank & Goodman 12, Degen 13]

slide-21
SLIDE 21

The rational speech acts model 3/4 1/4 1

S1( glasses | . ) ∝ L0( . | glasses)

1 1/3

S1( hat | . )

2/3

L1( . | glasses ) ∝ S1( glasses | . ) L1( . | hat )

[Frank & Goodman 12, Degen 13]

slide-22
SLIDE 22

Pragmatics Q: Do you know what time it is?

slide-23
SLIDE 23

Q: Do you know what time it is? A: Yes Pragmatics

slide-24
SLIDE 24

Pragmatics Q: Do you know what time it is? A: Yes I find his cooking very interesting.

[Grice 70]

slide-25
SLIDE 25

RSA game tree

hat glasses speaker

slide-26
SLIDE 26

RSA game tree: as speaker

hat glasses hat

glasses

  • 1

+1

  • 1

+1 speaker listener

slide-27
SLIDE 27

RSA game tree: as speaker

hat glasses hat

glasses

  • 1

+1

  • 1

+1 speaker listener

slide-28
SLIDE 28

RSA game tree: as listener

glasses

glasses

? ? listener

?

speaker

slide-29
SLIDE 29

A recipe for pragmatic language understanding

smiley plain glasses
 man glasses hat &
 glasses

  • 1. Train a base speaker model

hat &
 glasses glasses
 man guy with 
 hat

slide-30
SLIDE 30

A recipe for pragmatic language understanding

  • 2. Solve this POMDP:
  • 1. Train a base speaker model

hat glasses hat glasses

  • 1

+1

  • 1

+1

Daniel
 Fried Ronghang
 Hu Volkan
 Cirik

Speaker—follower models for vision- and-language navigation. NeurIPS 18.

slide-31
SLIDE 31

Application: instruction following

human: Go through the door on the right and continue straight. Stop in the next room in front of the bed.

instruction: Go through the door on the right and continue

  • straight. Stop in the next

room in front of the bed. (a) orange: trajectory without pragmatic inference (b) green: trajectory with pragmatic inference top-down

  • verview of

trajectories

baseline policy Reasoning

slide-32
SLIDE 32

Application: instruction generation

reasoning: Walk past the dining room table and chairs and take a right into the living room. Stop once you are on the rug. seq2seq: Walk past the dining room table and chairs and wait there. human: Turn right and walk through the kitchen. Go right into the living room and stop by the rug.

slide-33
SLIDE 33

Lesson Utterances are chosen to facilitate 
 correct interpretation in context. (This makes the learning problem easier!)

slide-34
SLIDE 34

Language as a scafgold
 for learning

slide-35
SLIDE 35

What else is an instruction follower good for?

Language learning Reinforcement learning

go east of the heart

Learning with latent language.
 A, Klein & Levine. NAACL 18.

slide-36
SLIDE 36

f( · ; η, )

Pretraining via language learning

NORTH

go east of the heart [Branavan et al., 09]

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>
slide-37
SLIDE 37

L(f( · ; η, ), · )

(Standard) reinforcement learning

???

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

R

<latexit sha1_base64="cVRUNBy/RTcU6LUbsjbBwonoaeo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx7ByCOBDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJYPZpygH9GB5CFn1Fipft8rltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+AK3vjNo=</latexit>
slide-38
SLIDE 38

Concept learning

find the horse

L(f( · ; η, ), · )

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

R

<latexit sha1_base64="cVRUNBy/RTcU6LUbsjbBwonoaeo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx7ByCOBDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJYPZpygH9GB5CFn1Fipft8rltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+AK3vjNo=</latexit>

NORTH,…

slide-39
SLIDE 39

Concept learning

  • 0.52

L(f( · ; η, ), · )

find the horse

NORTH,…

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

R

<latexit sha1_base64="cVRUNBy/RTcU6LUbsjbBwonoaeo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx7ByCOBDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJYPZpygH9GB5CFn1Fipft8rltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+AK3vjNo=</latexit>
slide-40
SLIDE 40

Concept learning

left of heart

0.33

find the horse

L(f( · ; η, ), · )

  • 0.52

SOUTH,…

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

R

<latexit sha1_base64="cVRUNBy/RTcU6LUbsjbBwonoaeo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx7ByCOBDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJYPZpygH9GB5CFn1Fipft8rltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+AK3vjNo=</latexit>
slide-41
SLIDE 41

Concept learning

left of the heart find the horse heart east side

0.95

L(f( · ; η, ), · )

0.33

  • 0.52

SOUTH,…

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

R

<latexit sha1_base64="cVRUNBy/RTcU6LUbsjbBwonoaeo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx7ByCOBDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJYPZpygH9GB5CFn1Fipft8rltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+AK3vjNo=</latexit>
slide-42
SLIDE 42

As multitask learning

go east of the heart find the triangle

arg min

η

L(f( | ; η, ))

<latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">ACQXicbZBSxwxFMcz1ra62nbHr0EF2EtZmRQgu9SNtDxUVXBV2luVN9s0aTDJj8qawDOvH8N4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP32fGxtbT84uWr9us3+y4rMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB2YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFy6v0w=</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">ACQXicbZBSxwxFMcz1ra62nbHr0EF2EtZmRQgu9SNtDxUVXBV2luVN9s0aTDJj8qawDOvH8N4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP32fGxtbT84uWr9us3+y4rMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB2YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFy6v0w=</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">ACQXicbZBSxwxFMcz1ra62nbHr0EF2EtZmRQgu9SNtDxUVXBV2luVN9s0aTDJj8qawDOvH8N4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP32fGxtbT84uWr9us3+y4rMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB2YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFy6v0w=</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">ACQXicbZBSxwxFMcz1ra62nbHr0EF2EtZmRQgu9SNtDxUVXBV2luVN9s0aTDJj8qawDOvH8N4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP32fGxtbT84uWr9us3+y4rMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB2YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFy6v0w=</latexit>

arg min

η

L(f( | ; η, ))

<latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">ACQXicbZBSxwxFMcz1ra62nbHr0EF2EtZmRQgu9SNtDxUVXBV2luVN9s0aTDJj8qawDOvH8N4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP32fGxtbT84uWr9us3+y4rMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB2YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFy6v0w=</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">ACQXicbZBSxwxFMcz1ra62nbHr0EF2EtZmRQgu9SNtDxUVXBV2luVN9s0aTDJj8qawDOvH8N4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP32fGxtbT84uWr9us3+y4rMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB2YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFy6v0w=</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">ACQXicbZBSxwxFMcz1ra62nbHr0EF2EtZmRQgu9SNtDxUVXBV2luVN9s0aTDJj8qawDOvH8N4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP32fGxtbT84uWr9us3+y4rMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB2YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFy6v0w=</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">ACQXicbZBSxwxFMcz1ra62nbHr0EF2EtZmRQgu9SNtDxUVXBV2luVN9s0aTDJj8qawDOvH8N4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP32fGxtbT84uWr9us3+y4rMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB2YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFy6v0w=</latexit>

???

[Caruana 97]

Language learning Reinforcement learning

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

R

<latexit sha1_base64="cVRUNBy/RTcU6LUbsjbBwonoaeo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx7ByCOBDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJYPZpygH9GB5CFn1Fipft8rltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+AK3vjNo=</latexit>

R

<latexit sha1_base64="cVRUNBy/RTcU6LUbsjbBwonoaeo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx7ByCOBDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJYPZpygH9GB5CFn1Fipft8rltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+AK3vjNo=</latexit>
slide-43
SLIDE 43

As a language game…

go east of the heart

speaker model listener loss

arg min

<latexit sha1_base64="/8RfHiPqR2J1MpofaCATkWdehoU=">ACz3icbVFdaxQxFM2MX3W0utVHX4LwlZkmSmig8ufoAPFVtw28JmWO5k7mxDMx9NMsoSR3z1r/iP/DdmZqdit15IODkn5+bem6SQpsw/O35167fuHlr63Zw5+72vfuDnQdHuqwVxkvZalOEtAoRYEzI4zEk0oh5InE4+Tsbasf0GlRVl8NqsK4xyWhcgEB+OoxeDXiIFa5qJYMDRA98fZmJ2f15BSlgu3dfgVbcWn7uQOu7vBhYcy+tdCecAc5ok9n2zsMxATZkWOXVKM+/ytS3PiXdyBm7pF03VmHaMBuMsq4UJzuBxdvDIbhJOyCXgVRD4akj4PFjrfN0pLXORaGS9B6HoWViS0oI7jEJmC1xgr4GSx7mABOerYdmU0dOSYlGalcqswtGP/dVjItV7libvZ9q03tZb8nzavTfYytqKoaoMFXz+U1ZKakrY/RFOhkBu5cgC4Eq5Wyk9BATfuHwP2Dl0vCj+6vJ8qVGBK9cT2I2o6wFp0qZx1S26A0ea4roKjvUkUTqLDZ8Ppm36UW+QReUzGJCIvyJR8IAdkRrg38J57r72pf+h/9b/7P9ZXfa/3PCSXwv/5B8AY3SY=</latexit><latexit sha1_base64="/8RfHiPqR2J1MpofaCATkWdehoU=">ACz3icbVFdaxQxFM2MX3W0utVHX4LwlZkmSmig8ufoAPFVtw28JmWO5k7mxDMx9NMsoSR3z1r/iP/DdmZqdit15IODkn5+bem6SQpsw/O35167fuHlr63Zw5+72vfuDnQdHuqwVxkvZalOEtAoRYEzI4zEk0oh5InE4+Tsbasf0GlRVl8NqsK4xyWhcgEB+OoxeDXiIFa5qJYMDRA98fZmJ2f15BSlgu3dfgVbcWn7uQOu7vBhYcy+tdCecAc5ok9n2zsMxATZkWOXVKM+/ytS3PiXdyBm7pF03VmHaMBuMsq4UJzuBxdvDIbhJOyCXgVRD4akj4PFjrfN0pLXORaGS9B6HoWViS0oI7jEJmC1xgr4GSx7mABOerYdmU0dOSYlGalcqswtGP/dVjItV7libvZ9q03tZb8nzavTfYytqKoaoMFXz+U1ZKakrY/RFOhkBu5cgC4Eq5Wyk9BATfuHwP2Dl0vCj+6vJ8qVGBK9cT2I2o6wFp0qZx1S26A0ea4roKjvUkUTqLDZ8Ppm36UW+QReUzGJCIvyJR8IAdkRrg38J57r72pf+h/9b/7P9ZXfa/3PCSXwv/5B8AY3SY=</latexit><latexit sha1_base64="/8RfHiPqR2J1MpofaCATkWdehoU=">ACz3icbVFdaxQxFM2MX3W0utVHX4LwlZkmSmig8ufoAPFVtw28JmWO5k7mxDMx9NMsoSR3z1r/iP/DdmZqdit15IODkn5+bem6SQpsw/O35167fuHlr63Zw5+72vfuDnQdHuqwVxkvZalOEtAoRYEzI4zEk0oh5InE4+Tsbasf0GlRVl8NqsK4xyWhcgEB+OoxeDXiIFa5qJYMDRA98fZmJ2f15BSlgu3dfgVbcWn7uQOu7vBhYcy+tdCecAc5ok9n2zsMxATZkWOXVKM+/ytS3PiXdyBm7pF03VmHaMBuMsq4UJzuBxdvDIbhJOyCXgVRD4akj4PFjrfN0pLXORaGS9B6HoWViS0oI7jEJmC1xgr4GSx7mABOerYdmU0dOSYlGalcqswtGP/dVjItV7libvZ9q03tZb8nzavTfYytqKoaoMFXz+U1ZKakrY/RFOhkBu5cgC4Eq5Wyk9BATfuHwP2Dl0vCj+6vJ8qVGBK9cT2I2o6wFp0qZx1S26A0ea4roKjvUkUTqLDZ8Ppm36UW+QReUzGJCIvyJR8IAdkRrg38J57r72pf+h/9b/7P9ZXfa/3PCSXwv/5B8AY3SY=</latexit><latexit sha1_base64="/8RfHiPqR2J1MpofaCATkWdehoU=">ACz3icbVFdaxQxFM2MX3W0utVHX4LwlZkmSmig8ufoAPFVtw28JmWO5k7mxDMx9NMsoSR3z1r/iP/DdmZqdit15IODkn5+bem6SQpsw/O35167fuHlr63Zw5+72vfuDnQdHuqwVxkvZalOEtAoRYEzI4zEk0oh5InE4+Tsbasf0GlRVl8NqsK4xyWhcgEB+OoxeDXiIFa5qJYMDRA98fZmJ2f15BSlgu3dfgVbcWn7uQOu7vBhYcy+tdCecAc5ok9n2zsMxATZkWOXVKM+/ytS3PiXdyBm7pF03VmHaMBuMsq4UJzuBxdvDIbhJOyCXgVRD4akj4PFjrfN0pLXORaGS9B6HoWViS0oI7jEJmC1xgr4GSx7mABOerYdmU0dOSYlGalcqswtGP/dVjItV7libvZ9q03tZb8nzavTfYytqKoaoMFXz+U1ZKakrY/RFOhkBu5cgC4Eq5Wyk9BATfuHwP2Dl0vCj+6vJ8qVGBK9cT2I2o6wFp0qZx1S26A0ea4roKjvUkUTqLDZ8Ppm36UW+QReUzGJCIvyJR8IAdkRrg38J57r72pf+h/9b/7P9ZXfa/3PCSXwv/5B8AY3SY=</latexit>

???

  • 0.52

π R

<latexit sha1_base64="Cvky5V13MoRBV8LVqr3UWOLq/tA=">AB8nicbVBNS8NAEN3Ur1q/qh69LBbBU0mqoMeiF49V7AckoWy2m3bpZjfsToQS+jO8eFDEq7/Gm/GbZuDtj4YeLw3w8y8KBXcgOt+O6W19Y3NrfJ2ZWd3b/+genjUMSrTlLWpEkr3ImKY4JK1gYNgvVQzkSCdaPx7czvPjFtuJKPMElZmJCh5DGnBKzkBynHAeWa4od+tebW3TnwKvEKUkMFWv3qVzBQNEuYBCqIMb7nphDmRAOngk0rQWZYSuiYDJlvqSQJM2E+P3mKz6wywLHStiTgufp7IieJMZMksp0JgZFZ9mbif56fQXwd5lymGTBJF4viTGBQePY/HnDNKIiJYRqbm/FdEQ0oWBTqtgQvOWXV0mnUfcu6o37y1rzpoijE7QKTpHrpCTXSHWqiNKFLoGb2iNwecF+fd+Vi0lpxi5hj9gfP5A2wJkLA=</latexit>
slide-44
SLIDE 44

Results

44 reach cell on left of triangle reach square left of triangle True description Pred description

slide-45
SLIDE 45

Results: RL

20 40 60 80 100

Timestep (×1000)

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Average reward

L3 Multitask Scratch

This work

slide-46
SLIDE 46

Results

46

change any n 
 to a c replace all n s with c

loocies loocies

(a)

examples true description true output

  • pred. description
  • pred. output

emboldens kisses loneliness vein dogtrot emboldecs kisses locelicess veic dogtrot loonies

slide-47
SLIDE 47

Results: programming by demonstration

Identity Multitask Meta This Work 18 50 62 76

slide-48
SLIDE 48

Results: locomotion

Modular multitask reinforcement learning with policy sketches. A, Klein & Levine. ICML 2017 north, east, north

slide-49
SLIDE 49

Generalization

25 50 75 100 Training Adapta0on

47 89 76 42

This work Mul-task

slide-50
SLIDE 50

Learning with corrections

Language learning Reinforcement learning

go north a 
 bit more

slide-51
SLIDE 51

f( · ; η, )

Pretraining by learning to correct

NORTH

further east

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

JD Co-Reyes

Guiding policies with language via
 meta-learning. ICLR 19.

slide-52
SLIDE 52

further east further east

f( · ; η, )

Pretraining by learning to correct

NORTH

further east

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>
slide-53
SLIDE 53

f( · ; η, )

Learning from corrections

WEST,…

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

f( · ; η, )

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

f( · ; η, )

π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit>

NORTH,… NORTH,…

go to top further west

slide-54
SLIDE 54

Touch cyan block. Move closer to magenta block. Move a lot up. Move a little up.

slide-55
SLIDE 55

Enter the blue room. Enter the red room. Exit the blue room. Pick up the blue triangle

slide-56
SLIDE 56

Lesson Language is useful as side information,
 not just a goal specification. Use it with / instead of instructions as a representational bottleneck 


  • r interactive advice
slide-57
SLIDE 57

So what comes next?

slide-58
SLIDE 58

What comes next?

Challenges for the field:

slide-59
SLIDE 59

What comes next?

Challenges for the field:

  • huge datasets
slide-60
SLIDE 60

What comes next?

Challenges for the field:

  • huge datasets
  • with fake annotations
slide-61
SLIDE 61

What comes next?

Challenges for the field:

  • huge datasets
  • with fake annotations
  • that look very little like natural language
slide-62
SLIDE 62

What comes next?

Challenges for the field:

  • huge datasets →
  • with fake annotations
  • that look very little like natural language

Learn to make do without an annotation
 for every rollout!

slide-63
SLIDE 63

What comes next?

Challenges for the field:

  • huge datasets →
  • with fake annotations →
  • that look very little like natural language


Learn to make do without an annotation
 for every rollout! Learn to generalize from fake
 strings to real ones!

slide-64
SLIDE 64

What comes next?

Challenges for the field:

  • huge datasets →
  • with fake annotations →
  • that look very little like natural language


Learn to make do without an annotation
 for every rollout! Learn to generalize from fake
 strings to real ones! Pay attention to human evals (or scope claims accordingly)!

slide-65
SLIDE 65

Learn more: Luketina et al.,
 A survey of reinforcement learning 
 informed by natural language

https://arxiv.org/abs/1906.03926

Agent Environment

Action State, Reward

Task-dependent Language-assisted Key Opens a door of the same color as the key. Skull They come in two varieties, rolling skulls and bouncing skulls ... you must jump over rolling skulls and walk under bouncing skulls. Language-conditional Go down the ladder and walk right im- mediately to avoid falling off the conveyor belt, jump to the yellow rope and again to the platform on the right. Task-independent

[...] having the correct key can open the lock [...] [...] known lock and key device was discovered [...] [...] unless the correct key is inserted [...]

vkey vskull vladder vrope

Pre-training Pre-trained