Language Understanding for Text-based Games Using Deep - - PowerPoint PPT Presentation

language understanding for text based games using deep
SMART_READER_LITE
LIVE PREVIEW

Language Understanding for Text-based Games Using Deep - - PowerPoint PPT Presentation

Language Understanding for Text-based Games Using Deep Reinforcement Learning Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay MIT Text-based games (State 1: The old bridge) You are standing very close to the bridges eastern


slide-1
SLIDE 1

Language Understanding for Text-based Games Using Deep Reinforcement Learning

Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay MIT

slide-2
SLIDE 2

Text-based games

MUDs: predecessors to modern graphical games

(State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind. (State 2: Ruined gatehouse) The old gatehouse is near collapse. Part of its northern wall has already fallen down ... East of the gatehouse leads out to a small open area surrounded by the remains of the castle. … >> go east

slide-3
SLIDE 3

Why are they challenging?

(State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back

  • n solid ground ... The bridge

sways in the wind. Loca%on: Bridge 1 Wind level: 3 Time: 8pm

No symbolic representation available

Branavan et al., 2011

slide-4
SLIDE 4

≈ Understanding Actionable intelligence

Can a computer understand language enough in order to play these games?

slide-5
SLIDE 5

Inspiration: Playing graphical games directly from raw pixels (DeepMind)

Can a computer understand language enough in order to play these games?

slide-6
SLIDE 6

Reinforcement Learning utilizing in-game feedback to:

✦ Learn control policies for gameplay. ✦ Learn good representations for text description of

game state.

Our Approach

slide-7
SLIDE 7

Traditional RL framework

… s1 s2 s3 st Reward s = a1 a2 a3

Loca%on: Bridge 1 Wind level: 3 Time: 8pm

Q(s, a)

Q-value is the agent’s notion of discounted future reward

slide-8
SLIDE 8

Text-based games

… s1 s2 s3 st a1 a2 a3

(State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ...

Reward s =

Loca%on: Bridge 1 Wind level: 3 Time: 8pm

slide-9
SLIDE 9

Text-based games: BOW representation

… s1 s2 s3 st a1 a2 a3

(State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ...

Reward s =

Bag of words?

       1 . . .       

slide-10
SLIDE 10

T Q

Input text

       1 . . .       

Bag of words Control policy

Can we do better?

slide-11
SLIDE 11

Model

T Q

Q values for all commands Input text Recurrent NN to map text to vector representation

v

slide-12
SLIDE 12

Model

T Q

Q values for all commands Input text Recurrent NN to map text to vector representation NN for control policy

v

slide-13
SLIDE 13

LSTM-DQN

Mean Pooling LSTM LSTM LSTM LSTM Linear ReLU Linear Linear Q(s, a) Q(s, o) w1 w2 w3 wn

φR vs φA

Action-Object Scorer Representation Generator

slide-14
SLIDE 14

Algorithm (1)

(State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind.

Q

Q(s,a)

Obtain Q-values

slide-15
SLIDE 15

Algorithm (2)

(State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind.

Take action using -greedy

✏ a*

slide-16
SLIDE 16

Algorithm (3)

(State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind.

a*

(State 2: Ruined gatehouse) The old gatehouse is near

  • collapse. Part of its northern

wall has already fallen down ... East of the gatehouse leads out …

+ reward

slide-17
SLIDE 17

Algorithm (4)

. . .

Store transition in experience memory

(State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be a (State 2: Ruined gatehouse) The old gatehouse is near

  • collapse. Part of its

northern wall has already fallen down ... East of the gatehouse leads out … + reward

∼ Sample transitions for updates

slide-18
SLIDE 18

Parameter update

(State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind. a* (State 2: Ruined gatehouse) The old gatehouse is near

  • collapse. Part of its

northern wall has already fallen down ... East of the gatehouse leads out … + reward

rθiLi(θi) = Eˆ

s,ˆ a[2(yi Q(ˆ

s, ˆ a; θi))rθiQ(ˆ s, ˆ a; θi)]

yi = Eˆ

s,ˆ a[r + γ max a0 Q(s0, a0; θi1) | ˆ

s, ˆ a]

where

slide-19
SLIDE 19

Game Environment

Evennia: a highly extensible python framework for MUD games Two worlds:

✦ small game to

demonstrate task and analyze learnt representations.

✦ a pre-existing Fantasy

world.

slide-20
SLIDE 20

Home World

  • Number of different quests: 16
  • Vocabulary: 84 words
  • Words per description (avg.): 10.5
  • Multiple descriptions per room/object.
slide-21
SLIDE 21

Home World

This room has two sofas, chairs and a chandelier. You are not sleepy now but you are hungry now. > go east

slide-22
SLIDE 22

Home World

This area has plants, grass and rabbits. You are not sleepy now but you are hungry now. > go south

slide-23
SLIDE 23

Home World

You have arrived in the kitchen. You can find food and drinks here. You are not sleepy now but you are hungry now. > eat apple Reward: +1

slide-24
SLIDE 24

Fantasy World

  • Number of rooms: > 56
  • Vocabulary: 1340 words
  • Avg. no. of words/description: 65.21
  • Max descriptions per room: 100
  • Considerably more complex
  • Varying descriptions per state

created by game developers

(State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind.

slide-25
SLIDE 25

Evaluation

Two metrics:

✦ Quest completion ✦ Cumulative reward per episode

  • Positive rewards for quest fulfillment
  • Negative rewards for bad actions

Epoch: Training for n episodes followed by evaluation on n episodes

slide-26
SLIDE 26

Baselines

T Q

Q values Input text

       1 . . .       

  • Bag of words: unigrams and bigrams
  • Randomly select actions
slide-27
SLIDE 27

Agent Performance (Home)

Random agent performs poorly

slide-28
SLIDE 28

Agent Performance (Home)

LSTM-DQN has delayed performance jump

slide-29
SLIDE 29

Agent Performance (Fantasy)

Good representation is essential for successful gameplay

slide-30
SLIDE 30

Visualizing Learnt Representations

“Kitchen” “Living room” “Bedroom” “Garden”

t-SNE visualization of vectors learnt by agent on Home world

slide-31
SLIDE 31

Visualizing Learnt Representations

“Kitchen” “Living room” “Bedroom” “Garden”

t-SNE visualization of vectors learnt by agent on Home world

“Garden”

slide-32
SLIDE 32

Nearby states: Similar representations

slide-33
SLIDE 33

Transfer Learning (Home)

Play on world with same vocabulary but different physical configuration

slide-34
SLIDE 34

Conclusions

  • Addressed the task of end-to-end learning of

control policies for textual games.

  • Learning good representations for text is

essential for gameplay.

34

Code and game framework are available at: http://people.csail.mit.edu/karthikn/mud-play/