Language Understanding for Text-based Games Using Deep - PowerPoint PPT Presentation

Language Understanding for Text-based Games Using Deep Reinforcement Learning Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay MIT

Text-based games (State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind. >> go east (State 2: Ruined gatehouse) The old gatehouse is near collapse. Part of its northern wall has already fallen down ... East of the gatehouse leads out to a small open area surrounded by the remains of the castle. … MUDs: predecessors to modern graphical games

Why are they challenging? (State 1: The old bridge) Loca%on: Bridge 1 You are standing very close to the bridge’s eastern founda<on. Wind level: 3 If you go east you will be back Time : 8pm on solid ground ... The bridge sways in the wind. Branavan et al., 2011 No symbolic representation available

Can a computer understand language enough in order to play these games? Understanding Actionable intelligence ≈

Can a computer understand language enough in order to play these games? Inspiration: Playing graphical games directly from raw pixels (DeepMind)

Our Approach Reinforcement Learning utilizing in-game feedback to: ✦ Learn control policies for gameplay. ✦ Learn good representations for text description of game state.

Traditional RL framework s 1 s 2 s 3 s t … a 1 a 2 a 3 Reward Q ( s, a ) Loca%on: Bridge 1 s = Wind level: 3 Q-value is the agent’s Time : 8pm notion of discounted future reward

Text-based games s 1 s 2 s 3 s t … a 1 a 2 a 3 Reward (State 1: The old bridge) Loca%on: Bridge 1 You are standing very close to s = Wind level: 3 the bridge’s eastern founda<on. If you go east you Time : 8pm will be back on solid ground ...

Text-based games: BOW representation s 1 s 2 s 3 s t … a 1 a 2 a 3 Reward   0 (State 1: The old bridge) 1     You are standing very close to 0 s =     . the bridge’s eastern .   .   founda<on. If you go east you 0 will be back on solid ground ... Bag of words?

  0 1     0 Input text   Q Control policy T   . .   .   0 Bag of words Can we do better?

Model Q values Input text Q for all T commands v Recurrent NN to map text to vector representation

Model NN for control policy Q values Input text Q for all T commands v Recurrent NN to map text to vector representation

LSTM-DQN Q(s, o) Q(s, a) Linear Linear Action-Object φ A Scorer ReLU Linear v s Mean Pooling Representation φ R LSTM LSTM LSTM LSTM Generator w 2 w 3 w 1 w n

Algorithm (1) (State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If Q Q(s,a) you go east you will be back on solid ground ... The bridge sways in the wind. Obtain Q-values

Algorithm (2) (State 1: The old bridge) You are standing very a* close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind. Take action using -greedy ✏

Algorithm (3) (State 1: The old bridge) (State 2: Ruined gatehouse) You are standing very a* close to the bridge’s The old gatehouse is near eastern founda<on. If collapse. Part of its northern you go east you will be wall has already fallen back on solid ground ... down ... East of the The bridge sways in the gatehouse leads out … wind. + reward

Algorithm (4) (State 1: The old (State 2: Ruined gatehouse) bridge) a The old gatehouse is near You are standing collapse. Part of its very close to the northern wall has already bridge’s eastern fallen down ... East of the founda<on. If you Sample transitions gatehouse leads out … go east you will be ∼ + for updates reward . . . Store transition in experience memory

Parameter update (State 1: The old bridge) (State 2: Ruined gatehouse) You are standing very a* close to the bridge’s The old gatehouse is near eastern founda<on. If collapse. Part of its you go east you will northern wall has already be back on solid fallen down ... East of the ground ... The bridge gatehouse leads out … sways in the wind. + reward r θ i L i ( θ i ) = E ˆ a [2( y i � Q (ˆ s, ˆ a ; θ i )) r θ i Q (ˆ s, ˆ a ; θ i )] s, ˆ where a 0 Q ( s 0 , a 0 ; θ i � 1 ) | ˆ y i = E ˆ a [ r + γ max s, ˆ a ] s, ˆ

Game Environment Evennia : a highly extensible python framework for MUD games Two worlds: ✦ small game to demonstrate task and analyze learnt representations. ✦ a pre-existing Fantasy world.

Home World Number of different quests: 16 • Vocabulary: 84 words • Words per description (avg.): 10.5 • Multiple descriptions per room/object. •

Home World This room has two sofas, chairs and a chandelier. You are not sleepy now but you are hungry now. > go east

Home World This area has plants, grass and rabbits. You are not sleepy now but you are hungry now. > go south

Home World Reward: +1 You have arrived in the kitchen. You can find food and drinks here. You are not sleepy now but you are hungry now. > eat apple

Fantasy World (State 1: The old bridge) • Number of rooms: > 56 You are standing very close to • Vocabulary: 1340 words the bridge’s eastern founda<on. If you go east • Avg. no. of words/description: 65.21 you will be back on solid • Max descriptions per room: 100 ground ... The bridge sways in the wind. • Considerably more complex • Varying descriptions per state created by game developers

Evaluation Two metrics: ✦ Quest completion ✦ Cumulative reward per episode • Positive rewards for quest fulfillment • Negative rewards for bad actions Epoch : Training for n episodes followed by evaluation on n episodes

Baselines • Randomly select actions • Bag of words: unigrams and bigrams   0 1     0 Input text   Q Q values T   . .   .   0

Agent Performance (Home) Random agent performs poorly

Agent Performance (Home) LSTM-DQN has delayed performance jump

Agent Performance (Fantasy) Good representation is essential for successful gameplay

Visualizing Learnt Representations “Kitchen” “Bedroom” “Living room” “Garden” t-SNE visualization of vectors learnt by agent on Home world

Visualizing Learnt Representations “Kitchen” “Bedroom” “Garden” “Living room” “Garden” t-SNE visualization of vectors learnt by agent on Home world

Nearby states: Similar representations

Transfer Learning (Home) Play on world with same vocabulary but different physical configuration

Conclusions ‣ Addressed the task of end-to-end learning of control policies for textual games. ‣ Learning good representations for text is essential for gameplay. Code and game framework are available at: http://people.csail.mit.edu/karthikn/mud-play/ 34

Language Understanding for Text-based Games Using Deep - PowerPoint PPT Presentation

Language Understanding for Text-based Games Using Deep Reinforcement Learning Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay MIT Text-based games (State 1: The old bridge) You are standing very close to the bridges eastern

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

LOGIC OF GAMES Andreas Blass University of Michigan Ann Arbor, MI 48109 ablass@umich.edu Games

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

Exam review, Game of Life work time Turn in your questions on material in preparation for

HTML5 game pro rovid ider About SOFTGAMES A leading developer of the most popular Instant Games

Chapter 5 Adversarial Search 5.1 5.4 Deterministic games CS4811 - Artificial Intelligence

Guest Lecture: Prof. Allan Borodin Game Theory (Cost sharing & congestion games, Potential

League of Legends: Scaling to Millions of Ninjas, Yordles,

Inductive general game playing Andrew Cropper, Richard Evans, and Mark Law Learning game rules

Negative Momentum for Improved Game Dynamics Gauthier Gidel* , Reyhane Askari Hemmat*, Mohammad

Game Loops CIS 580 - Fundamentals of Game Programming Hangman Game Phases Game Loop

Sambuz

Useful Links

Newsletter

Mail Us

Language Understanding for Text-based Games Using Deep - PowerPoint PPT Presentation

Language Understanding for Text-based Games Using Deep Reinforcement Learning Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay MIT Text-based games (State 1: The old bridge) You are standing very close to the bridges eastern

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

LOGIC OF GAMES Andreas Blass University of Michigan Ann Arbor, MI 48109 ablass@umich.edu Games

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

Exam review, Game of Life work time Turn in your questions on material in preparation for

HTML5 game pro rovid ider About SOFTGAMES A leading developer of the most popular Instant Games

Chapter 5 Adversarial Search 5.1 5.4 Deterministic games CS4811 - Artificial Intelligence

Guest Lecture: Prof. Allan Borodin Game Theory (Cost sharing &amp; congestion games, Potential

League of Legends: Scaling to Millions of Ninjas, Yordles,

Inductive general game playing Andrew Cropper, Richard Evans, and Mark Law Learning game rules

Negative Momentum for Improved Game Dynamics Gauthier Gidel* , Reyhane Askari Hemmat*, Mohammad

Game Loops CIS 580 - Fundamentals of Game Programming Hangman Game Phases Game Loop

Sambuz

Useful Links

Newsletter

Mail Us

Guest Lecture: Prof. Allan Borodin Game Theory (Cost sharing & congestion games, Potential