learning to win by reading manuals in a monte carlo
play

Learning to Win by Reading Manuals in a Monte-Carlo Framework - PowerPoint PPT Presentation

Learning to Win by Reading Manuals in a Monte-Carlo Framework S.R.K. Branavan, David Silver, Regina Barzilay MIT Semantic Interpretation Traditional view: Map text into an abstract representation Alternative view: Map text into a


  1. Learning to Win by Reading Manuals in a Monte-Carlo Framework S.R.K. Branavan, David Silver, Regina Barzilay MIT

  2. Semantic Interpretation Traditional view: Map text into an abstract representation Alternative view: Map text into a representation which helps performance in a control application 2

  3. Semantic Interpretation for Control Applications Complex strategy game End result action 1 lost action 2 won action 3 lost Traditional approach: Learn action-selection policy from game feedback. Our contribution: Use textual advice to guide action-selection policy. 3

  4. Leveraging Textual Advice: Challenges 1. Find sentences relevant to given game state. Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. city settler Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow … 4

  5. Leveraging Textual Advice: Challenges 1. Find sentences relevant to given game state. Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. city settler Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your cit y. In order to survive and grow … 5

  6. Leveraging Textual Advice: Challenges 1. Find sentences relevant to given game state. Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. city settler Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow … settler 6

  7. Leveraging Textual Advice: Challenges 2. Label sentences with predicate stucture. ? Move the settler to a site suitable move_settl move_settlers_to ers_to () for building a city , onto grassland ? settlers_build_ci settlers_b uild_city ty () with a river if possible. Move the settler to a site suitable for building a city, onto grassland move_settl move_settlers_to ers_to () with a river if possible. Label words as action, state or background 7

  8. Leveraging Textual Advice: Challenges 3. Guide action selection using relevant text a 1 – move_settlers_to(7,3) Build the city on plains or grassland a 2 – settlers_build_city() with a river running through it if possible. S a 3 – settlers_irrigate_land() 8

  9. Learning from Game Feedback Goal: Learn from game feedback as only source of supervision. Key idea: Better parameter settings will lead to more victories. Game manual a 1 You start with two settler units. Although Model End result settlers are capable of performing a variety of useful tasks, your first task is won to move the settlers to a site that is a 2 params: suitable for the construction of your first city. Use settlers to build the city on θ 1 plains or grassland with a river running through it if possible. In order to survive S a 3 and grow … Game manual a 1 Model You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is a 2 params: to move the settlers to a site that is suitable for the construction of your first End result city. Use settlers to build the city on θ 2 plains or grassland with a river running through it if possible. In order to survive lost S a 3 and grow … 9

  10. Model Overview Monte-Carlo Search Framework • Learn action selection policy from simulations • Very successful in complex games like Go and Poker. Our Algorithm • Learn text interpretation from simulation feedback • Bias action selection policy using text 10

  11. Monte-Carlo Search Select actions via simulations, game and opponent can be stochastic Actual Game Simulation Copy game State 1 State 1 Copy Irrigate ??? Game lost 11

  12. Monte-Carlo Search Try many candidate actions from current state & see how well they perform. Current game state State 1 Rollout depth Game scores 0.1 0.4 1.2 3.5 3.5 3.5 3.5 12

  13. Monte-Carlo Search Try many candidate actions from current state & see how well they perform. Learn feature weights from simulation outcomes Current game state State 1 5 1 0 1 1 0 1 = 0.1 15 0 1 0 0 1 0 = 0.4 Rollout depth 37 1 0 1 0 0 0 = 1.2 . . . . . . . . . - feature function - model parameters Game scores 0.1 0.4 1.2 3.5 13

  14. Model Overview Monte-Carlo Search Framework • Learn action selection policy from simulations Our Algorithm • Bias action selection policy using text • Learn text interpretation from simulation feedback 14

  15. Modeling Requirements • Identify sentence relevant to game state Build cities near rivers or ocean. • Label sentence with predicate structure Build cities near rivers or ocean. Build cities near rivers or ocean. • Estimate value of candidate actions Irrigate : -10 Build cities -5 Fortify : near rivers . . . . or ocean. Build city : 25 15

  16. 1 Sentence Relevance 2 3 Identify sentence relevant to game state and action State , candidate action , document Sentence is selected as relevant - weight vector Log-linear model: - feature function 16

  17. 1 Predicate Structure 2 Select word labels based on sentence + dependency info 3 E.g., “ Build cities near rivers or ocean. ” Word index , sentence , dependency info Predicate label = { action , state , background } - weight vector Log-linear model: - feature function 17

  18. 1 Final Q function approximation 2 3 Predict expected value of candidate action State , candidate action Document , relevant sentence , predicate labeling - weight vector Linear model: - feature function 18

  19. Model Representation Multi-layer neural network: Each layer represents a different stage of analysis Q function approximation Input: game state, Predicted action value candidate action, document text Select most relevant sentence Predict sentence predicate structure 19

  20. Parameter Estimation Objective : Minimize mean square error between predicted utility and observed utility Game rollout 25 State Action Predicted utility: Observed utility: 20

  21. Parameter Estimation Method : Gradient descent – i.e., Backpropagation. Parameter updates : 21

  22. Features State features: - Amount of gold in treasury - Government type - Terrain surrounding current unit Action features: - Unit type (settler, worker, archer, etc) - Unit action type Text features: - Word - Parent word in dependency tree - Word matches text label of unit 22

  23. Experimental Domain Game: • Complex, stochastic turn-based strategy game Civilization II. • Branching factor: 10 20 Document: • Official game manual of Civilization II Text Statistics: Sentences: 2083 Avg. sentence words: 16.7 Vocabulary: 3638 23

  24. Experimental Setup Game opponent: • Built-in AI of Game. • Domain knowledge rich AI, built to challenge humans. Primary evaluation: • Games won within first 100 game steps. • Averaged over 200 independent experiments. • Avg. experiment runtime: 1.5 hours Secondary evaluation: • Full games won. • Averaged over 50 independent experiments. • Avg. experiment runtime: 4 hours 24

  25. Results Built-in AI 0% Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 25

  26. Does Text Help ? Linear Q fn. Built-in AI 0% approximation, No text Game only Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 26

  27. Text vs. Representational Capacity Built-in AI 0% Non-Linear Q fn. Game only approximation, No text Latent variable Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 27

  28. Linguistic Complexity vs. Performance Gain Built-in AI 0% Game only Latent variable Sentence relevance Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 28

  29. Results: Sentence Relevance Problem: Sentence relevance depends on game state. States are game specific, and not known a priori! Solution: Add known non-relevant sentences to text. E.g., sentences from the Wall Street Journal corpus. Results: 71.8% sentence relevance accuracy… Surprisingly poor accuracy given game win rate! 29

  30. Results: Sentence Relevance Sentence relevance accuracy Game features Text feature importance Text features 30

  31. Results: Full Games Game only Latent variable Full model 0% 20% 40% 60% 80% 100% Percentage games won, averaged over 50 runs 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend