nash q learning for general sum stochastic games
play

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman - PowerPoint PPT Presentation

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r Presented by Ilan Lobel Outline Stochastic Games and Markov Perfect Equilibria Bellmans Operator as a Contraction Mapping Stochastic


  1. Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r Presented by Ilan Lobel

  2. Outline  Stochastic Games and Markov Perfect Equilibria  Bellman’s Operator as a Contraction Mapping  Stochastic Approximation of a Contraction Mapping  Application to Zero-Sum Markov Games  Minimax-Q Learning  Theory of Nash-Q Learning  Empirical Testing of Nash-Q Learning

  3. How do we model games that evolve over time ?  Stochastic Games !  Current Game = State  Ingredients: – Agents (N) – States (S) – Payoffs (R) – Transition Probabilities (P) – Discount Factor ( δ )

  4. Example of a Stochastic Game C D δ = 0.9 1,2 3,4 A Move with 50% probability when (A,C) or (A,D) 5,6 7,8 B C D E -1,2 -3,4 0,0 A -5,6 -7,8 -10,10 Move with 30% probability B when (B,D)

  5. Markov Game is a Generalization of… Repeated Games Markov Games Add States

  6. Markov Game is a Generalization of… Repeated Games MDP Markov Games Add Agents Add States

  7. Markov Perfect Equilibrium (MPE)  Strategy maps states into randomized actions – π i: S Δ (A)  No agent has an incentive to unilaterally change her policy.

  8. Cons & Pros of MPEs  Cons: – Can’t implement everything described by the Folk Theorems (i.e., no trigger strategies)  Pros: – MPEs always exist in finite Markov Games (Fink, 64) – Easier to “search for”

  9. Learning in Stochastic Games  Learning is specially important in Markov Games because MPE are hard to compute.  Do we know: – Our own payoffs ? – Others’ rewards ? – Transition probabilities ? – Others’ strategies ?

  10. Learning in Stochastic Games  Adapted from Reinforcement Learning: – Minimax-Q Learning (zero-sum games) – Nash-Q Learning – CE-Q Learning

  11. Zero-Sum Stochastic Games  Nice properties: – All equilibria have the same value. – Any equilibrium strategy of player 1 against any equilibrium strategy of player 2 produces an MPE. – It has a Bellman’s-type equation.

  12. Bellman’s Equation in DP  Bellman Operator: T  Bellman’s Equation Rewritten:

  13. Contraction Mapping  The Bellman Operator has the contraction property:  Bellman’s Equation is a direct consequence of the contraction.

  14. The Shapley Operator for Zero-Sum Stochastic Games  The Shapley Operator is a contraction mapping. (Shapley, 53)  Hence, it also has a fixed point, which is an MPE:

  15. Value Iteration for Zero-Sum Stochastic Games  Direct consequence of contraction.  Converges to fixed point of operator.

  16. Q-Learning  Another consequence of a contraction mapping: – Q-Learning converges !  Q-Learning can be described as an approximation of value iteration: – Value iteration with noise.

  17. Q-Learning Convergence  Q-Learning is called a Stochastic Iterative Approximation of Bellman’s operator: – Learning Rate of 1/t. – Noise is zero-mean and has bounded variance.  It converges if all state-action pairs are visited infinitely often. (Neuro-Dynamic Programming – Bertsekas, Tsitsiklis)

  18. Minimax-Q Learning Algorithm For Zero-Sum Stochastic Games  Initialize your Q0(s,a1,a2) for all states, actions.  Update rule:  Player 1 then chooses action u1 in the next stage sk+1.

  19. Minimax-Q Learning  It’s a Stochastic Iterative Approximation of Shapley Operator.  It converges to a Nash Equilibrium if all state- action-action triplets are visited infinitely often. (Littman, 96)

  20. Can we extend it to General-Sum Stochastic Games ?  Yes & No.  Nash-Q Learning is such an extension.  However, it has much worse computational and theoretical properties.

  21. Nash-Q Learning Algorithm  Initialize Q0j(s,a1,a2) for all states, actions and for every agent. – You must simulate everyone’s Q-factors.  Update rule:  Choose the randomized action generated by the Nash operator.

  22. The Nash Operator and The Principle of Optimality  Nash Operator finds the Nash of a stage game.  Find Nash of stage game with Q-factors as your payoffs. Payoffs for Rest of the Current Reward Markov Game

  23. The Nash Operator  Unkown complexity even for 2 players.  In comparison, the minimax operator can be solved in polynomial time. (there’s a linear programming formulation)  For convergence, all players must break ties in favor of the same Nash Equilibrium.  Why not go model-based if computation is so expensive ?

  24. Convergence Results  If every stage game encountered during learning has a global optimum, Nash-Q converges.  If every stage game encountered during learning has a saddle point, Nash-Q converges.  Both of these are VERY strong assumptions.

  25. Convergence Result Analysis  The global optimum assumption implies full cooperation between agents.  The saddle point assumption implies no cooperation between agents.  Are these equivalent to DP Q-Learning and minimax-Q Learning, respectively ?

  26. Empirical Testing: The Grid-world WORLD 1 Some Nash Equilibria

  27. Empirical Testing: Nash Equilibria (3%) (3%) (97%) WORLD 2 All Nash Equilibria

  28. Empirical Performance  In very small and simple games, Nash-Q learning often converged even though theory did not predict so.  In particular, if all Nash Equilibria have the same value Nash-Q did better than expected.

  29. Conclusions  Nash-Q is a nice step forward: – It can be used for any Markov Game. – It uses the Principle of Optimality in a smart way.  But there is still a long way to go: – Convergence results are weak. – There are no computational complexity results.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend