Learning to Win by Reading Manuals in a Monte-Carlo Framework - - PowerPoint PPT Presentation

learning to win by reading manuals in a monte carlo
SMART_READER_LITE
LIVE PREVIEW

Learning to Win by Reading Manuals in a Monte-Carlo Framework - - PowerPoint PPT Presentation

Learning to Win by Reading Manuals in a Monte-Carlo Framework S.R.K. Branavan, David Silver, Regina Barzilay MIT Semantic Interpretation Traditional view: Map text into an abstract representation Alternative view: Map text into a


slide-1
SLIDE 1

Learning to Win by Reading Manuals in a Monte-Carlo Framework

S.R.K. Branavan, David Silver, Regina Barzilay

MIT

slide-2
SLIDE 2

Traditional view: Map text into an abstract representation Alternative view: Map text into a representation which helps performance in a control application

Semantic Interpretation

2

slide-3
SLIDE 3

Semantic Interpretation for Control Applications

3

lost won lost

End result Complex strategy game action 1 action 2 action 3

Traditional approach: Learn action-selection policy from game feedback. Our contribution: Use textual advice to guide action-selection policy.

slide-4
SLIDE 4

Leveraging Textual Advice: Challenges

4

  • 1. Find sentences relevant to given game state.

Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow …

settler city

slide-5
SLIDE 5

5

Game state Strategy document

Leveraging Textual Advice: Challenges

  • 1. Find sentences relevant to given game state.

settler city

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow …

slide-6
SLIDE 6

settler city settler 6

Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow …

Leveraging Textual Advice: Challenges

  • 1. Find sentences relevant to given game state.
slide-7
SLIDE 7

Leveraging Textual Advice: Challenges

7 Move the settler to a site suitable for buildinga city, onto grassland with a river if possible.

  • 2. Label sentences with predicate stucture.

Move the settler to a site suitable for building a city, onto grassland with a river if possible.

move_settl move_settlers_to ers_to() settlers_b settlers_build_ci uild_city ty()

? ?

move_settl move_settlers_to ers_to()

Label words as action, state or background

slide-8
SLIDE 8

Leveraging Textual Advice: Challenges

8

Build the city on plains or grassland with a river running through it if possible.

a1 – move_settlers_to(7,3) S a2 – settlers_build_city() a3 – settlers_irrigate_land()

  • 3. Guide action selection using relevant text
slide-9
SLIDE 9

Learning from Game Feedback

9

Goal: Learn from game feedback as only source of supervision. Key idea: Better parameter settings will lead to more victories.

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first

  • city. Use settlers to build the city on

plains or grassland with a river running through it if possible. In order to survive and grow …

a1 S a2 a3

won

End result

a1 S a2 a3

lost

End result

Model params:

θ1

Model params:

θ2

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first

  • city. Use settlers to build the city on

plains or grassland with a river running through it if possible. In order to survive and grow …

Game manual Game manual

slide-10
SLIDE 10

Model Overview

Monte-Carlo Search Framework

  • Learn action selection policy from simulations
  • Very successful in complex games like Go and Poker.

Our Algorithm

  • Learn text interpretation from simulation feedback
  • Bias action selection policy using text

10

slide-11
SLIDE 11

Monte-Carlo Search

11 Actual Game

State 1

Simulation Irrigate

State 1

Copy

Game lost Copy game

???

Select actions via simulations, game and opponent can be stochastic

slide-12
SLIDE 12

3.5 3.5 3.5

Monte-Carlo Search

Try many candidate actions from current state & see how well they perform.

State 1

Current game state Game scores Rollout depth 0.1 0.4 3.5 1.2

12

slide-13
SLIDE 13

Current game state

Monte-Carlo Search

Try many candidate actions from current state & see how well they perform. Learn feature weights from simulation outcomes

State 1

Game scores Rollout depth 0.1 0.4 3.5 1.2

13

  • feature function
  • model parameters

. . . . . . . . .

5 1 1 1 1 = 0.1 15 1 1 = 0.4 37 1 1 = 1.2

slide-14
SLIDE 14

Model Overview

Monte-Carlo Search Framework

  • Learn action selection policy from simulations

Our Algorithm

  • Bias action selection policy using text
  • Learn text interpretation from simulation feedback

14

slide-15
SLIDE 15
  • Identify sentence relevant to game state
  • Label sentence with predicate structure
  • Estimate value of candidate actions

Modeling Requirements

15

Build cities near rivers or ocean. Build cities near rivers or ocean. Build cities near rivers or ocean. Build cities near rivers

  • r ocean.

Fortify : Irrigate :

  • 10

. . . . Build city :

  • 5

25

slide-16
SLIDE 16

Sentence Relevance

16

Sentence is selected as relevant State , candidate action , document

  • weight vector
  • feature function

Log-linear model: Identify sentence relevant to game state and action

1 2 3

slide-17
SLIDE 17

Word index , sentence , dependency info

  • weight vector
  • feature function

Log-linear model:

Predicate Structure

17

Select word labels based on sentence + dependency info

E.g., “Build cities near rivers or ocean.” Predicate label = { action, state, background }

1 2 3

slide-18
SLIDE 18
  • weight vector
  • feature function

Linear model:

Final Q function approximation

18

State , candidate action

Predict expected value of candidate action

Document , relevant sentence , predicate labeling

1 2 3

slide-19
SLIDE 19

Multi-layer neural network: Each layer represents a different stage of analysis

Model Representation

19

Input: game state, candidate action, document text Select most relevant sentence Q function approximation Predict sentence predicate structure Predicted action value

slide-20
SLIDE 20

Parameter Estimation

20

Objective: Minimize mean square error between predicted utility and observed utility

25

State Predicted utility: Action Observed utility:

Game rollout

slide-21
SLIDE 21

Parameter Estimation

21

Method: Gradient descent – i.e., Backpropagation. Parameter updates:

slide-22
SLIDE 22

Features

22

State features:

  • Amount of gold in treasury
  • Government type
  • Terrain surrounding current unit

Action features:

  • Unit type (settler, worker, archer, etc)
  • Unit action type

Text features:

  • Word
  • Parent word in dependency tree
  • Word matches text label of unit
slide-23
SLIDE 23

Experimental Domain

23

Sentences: 2083

  • Avg. sentence words:

16.7 Vocabulary: 3638

Document:

  • Official game manual of Civilization II

Text Statistics: Game:

  • Complex, stochastic turn-based

strategy game Civilization II.

  • Branching factor: 1020
slide-24
SLIDE 24

Experimental Setup

24 Game opponent:

  • Built-in AI of Game.
  • Domain knowledge rich AI, built to challenge humans.

Primary evaluation:

  • Games won within first 100 game steps.
  • Averaged over 200 independent experiments.
  • Avg. experiment runtime: 1.5 hours

Secondary evaluation:

  • Full games won.
  • Averaged over 50 independent experiments.
  • Avg. experiment runtime: 4 hours
slide-25
SLIDE 25

Results

25 0% 20% 40% 60%

Full model Built-in AI

0% % games won in 100 turns, averaged over 200 runs.

slide-26
SLIDE 26

Does Text Help ?

26 0% 20% 40% 60%

Full model Game only Built-in AI

0% % games won in 100 turns, averaged over 200 runs.

Linear Q fn. approximation, No text

slide-27
SLIDE 27

Text vs. Representational Capacity

27 0% 20% 40% 60%

Full model Game only Latent variable Built-in AI

0%

Non-Linear Q fn. approximation, No text

% games won in 100 turns, averaged over 200 runs.

slide-28
SLIDE 28

Linguistic Complexity vs. Performance Gain

28 0% 20% 40% 60%

Full model Sentence relevance Game only Latent variable Built-in AI

0% % games won in 100 turns, averaged over 200 runs.

slide-29
SLIDE 29

Results: Sentence Relevance

29

Problem: Sentence relevance depends on game state. States are game specific, and not known a priori! Solution: Add known non-relevant sentences to text. E.g., sentences from the Wall Street Journal corpus. Results: 71.8% sentence relevance accuracy… Surprisingly poor accuracy given game win rate!

slide-30
SLIDE 30

Results: Sentence Relevance

30

Sentence relevance accuracy Text feature importance Text features Game features

slide-31
SLIDE 31

Results: Full Games

31 0% 20% 40% 60% 80% 100%

Full model Latent variable

Percentage games won, averaged over 50 runs

Game only

slide-32
SLIDE 32

Grounded Language Acquisition: Instruction Interpretation

Branavan et al. 2009, 2010, Vogel & Jurafsky 2010

  • Imperative descriptions of action sequences
  • Assume relevance of text to current world state

Language Analysis in Games

Eisenstein et al. 2009

  • Extract high-level semantic representation from text
  • Learn game rules from labeled traces + extracted formulae

Gorniak & Roy 2005

  • Interpret spoken commands to control game character
  • Learn from labeled parallel corpus

Related Work

32

slide-33
SLIDE 33
  • Human knowledge encoded in natural language can

be automatically leveraged to improve control applications.

  • Environment feedback is a powerful supervision

signal for language analysis.

  • Method is applicable to control applications that

have an inherent success signal, and can be simulated.

Conclusions

Code, data & experimental framework available at: http://groups.csail.mit.edu/rbg/code/civ

33

slide-34
SLIDE 34

34

slide-35
SLIDE 35

Monte-Carlo Search: Summary

0.1 3.5 1.2 0.4 1.1 0.8 2.8 0.9 3.1 2.9 1.4

Game states and actions Monte-Carlo Rollouts (simulations) Use observed rollout scores to select game action state 1 state 2 state 3 action 1 action 2

35

slide-36
SLIDE 36

36

Model Complexity, Time and Performance

slide-37
SLIDE 37

37

Dependency Information