Convergence Problems of General-Sum Multiagent Reinforcement - PowerPoint PPT Presentation

Convergence Problems of General-Sum Multiagent Reinforcement Learning Michael Bowling Carnegie Mellon University Computer Science Department ICML 2000

Overview • Stochastic Game Framework • Q-Learning for General-Sum Games [Hu & Wellman, 1998] • Counterexample and Flaw • Discussion

Stochastic Game Framework MDPs Matrix Games - Single Agent - Multiple Agent - Multiple State - Single State Stochastic Games - Multiple Agent - Multiple State

Markov Decision Processes A Markov decision process (MDP) is a tuple, ( S , A , T, R ), where, • S is the set of states, • A is the set of actions, • T is a transition function S × A × S → [0 , 1], • R is a reward function S × A → ℜ . T(s, a, s’) R(s, a) s’ a s

Matrix Games A matrix game is a tuple ( n, A 1 ...n , R 1 ...n ), where, • n is the number of players, • A i is the set of actions available to player i – A is the joint action space A 1 × . . . × A n , • R i is player i ’s payoff function A → ℜ . a 2 a 2 . . R = R = 2 1 . . . . . . . . . . a 1 . . . a 1 . . . R (a) R (a) 1 2 . . . . . .

Matrix Game – Examples Matching Pennies � � � � 1 − 1 − 1 1 R row = R col = − 1 1 1 − 1 This is a zero-sum matrix game. Coordination Game � � � � 2 0 2 0 R row = R col = 0 2 0 2 This is a general-sum matrix game.

Matrix Games – Solving • No optimal opponent independent strategies. • Mixed (i.e. stochastic) strategies does not help. • Opponent dependent strategies, Definition 1 For a game, define the best-response function for player i , BR i ( σ − i ) , to be the set of all, possibly mixed, strategies that are optimal given the other player(s) play the possibly mixed joint strategy σ − i .

Matrix Games – Solving • Best-response equilibrium [Nash, 1950], Definition 2 A Nash equilibrium is a collection of strategies (possibly mixed) for all players, σ i , with, σ i ∈ BR i ( σ − i ) . • Example Games: – Matching Pennies : Both players playing each action with equal probability. – Coordination Game : Both players play action 1 or both players play action 2.

Stochastic Game Framework MDPs Matrix Games - Single Agent - Multiple Agent - Multiple State - Single State Stochastic Games - Multiple Agent - Multiple State

Stochastic Game Framework A stochastic game is a tuple ( n, S , A 1 ...n , T, R 1 ...n ), where, • n is the number of agents, • S is the set of states, • A i is the set of actions available to agent i , – A is the joint action space A 1 × . . . × A n , • T is the transition function S × A × S → [0 , 1], • R i is the reward function for the i th agent S × A → ℜ . a 2 . T(s, a, s’) R (s,a)= . i . . . . . . . a 1 R (s,a) i s’ . . . s

Q-Learning for Zero-Sum Games: Minimax-Q [Littman, 1994] • Explicitly learn equilibrium policy. • Maintain Q value for state/ joint-action pairs. • Update rule: Q ( s, a ) ← (1 − α ) Q ( s, a ) + α ( r + γV ( s ′ )) , where,   V ( s ′ ) = Value  Q ( s ′ , ¯ a ) .  ¯ a ∈A Converges to the game’s equilibrium, with usual assumptions.

Q-Learning for General-Sum Games [Hu & Wellman, 1998] • Explicitly learn equilibrium policy. • Maintain n Q values for state/ joint-action pairs. • Update rule: Q i ( s, a ) ← (1 − α ) Q i ( s, a ) + α ( r i + γV i ( s ′ )) , where,   V i ( s ′ ) = Value i  Q ( s ′ )  ¯ a ∈A , i =1 ...n Does this converge to an equilibrium?

Q-Learning for General-Sum Games � π 1 ( s ) , π 2 ( s ) � Assumption 1 A Nash equilibrium for all matrix � Q 1 t ( s ) , Q 2 � � Q 1 ∗ ( s ) , Q 2 � games t ( s ) as well as ∗ ( s ) satisfy one of the following properties: 1.) The equilibrium is a global optimal. π 1 ( s ) Q k ( s ) π 2 ( s ) ≥ ρ 1 ( s ) Q k ( s ) ρ 2 ( s ) ∀ ρ k 2.) The equilibrium receives a higher payoff if the other agent deviates from the equilibrium strategy. π 1 ( s ) Q 1 ( s ) π 2 ( s ) ≤ π 1 ( s ) Q 1 ( s ) ρ 2 ( s ) ∀ ρ k π 1 ( s ) Q 2 ( s ) π 2 ( s ) ≤ ρ 1 ( s ) Q 2 ( s ) π 2 ( s )

Q-Learning for General-Sum Games • Proof depends on the update rule being a contraction mapping: t Q k − P k ∗ || ≤ γ || Q k − Q k ∀ Q k || P k t Q k ∗ || , where,    Q ( s ′ ) P k t Q k ( s ) = r k t + γ Value k  . • I.e., the update function always moves Q k closer to Q k ∗ , the Q values of the equilibirum. Unfortunately, this is not true with their stated assumption.

Counterexample � � 1 , 1 1 − 2 ǫ, 1 + ǫ (0 , 0) 1 + ǫ, 1 − 2 ǫ 1 − ǫ, 1 − ǫ s 0 s 1 s 2 (0 , 0) Q ∗ ( s 0 ) = ( γ (1 − ǫ ) , γ (1 − ǫ )) � � 1 , 1 1 − 2 ǫ, 1 + ǫ Q ∗ ( s 1 ) = 1 + ǫ, 1 − 2 ǫ 1 − ǫ, 1 − ǫ Q ∗ ( s 2 ) = (0 , 0) Q ∗ Satisfies Property 2 of the Assumption.

Counterexample � � 1 , 1 1 − 2 ǫ, 1 + ǫ (0 , 0) 1 + ǫ, 1 − 2 ǫ 1 − ǫ, 1 − ǫ s 0 s 1 s 2 (0 , 0) Q ( s 0 ) = ( γ, γ ) � � 1 + ǫ, 1 + ǫ 1 − ǫ, 1 Q ( s 1 ) = 1 , 1 − ǫ 1 − 2 ǫ, 1 − 2 ǫ Q ( s 2 ) = (0 , 0) . || Q − Q ∗ || = ǫ Q Satisfies Property 1 of the Assumption.

Counterexample � � 1 , 1 1 − 2 ǫ, 1 + ǫ (0 , 0) 1 + ǫ, 1 − 2 ǫ 1 − ǫ, 1 − ǫ s 0 s 1 s 2 (0 , 0) Q ( s 0 ) = ( γ, γ ) � � 1 + ǫ, 1 + ǫ 1 − ǫ, 1 Q ( s 1 ) = 1 , 1 − ǫ 1 − 2 ǫ, 1 − 2 ǫ Q ( s 2 ) = (0 , 0) . PQ ( s 0 ) = ( γ (1 + ǫ ) , γ (1 + ǫ )) � � 1 , 1 1 − 2 ǫ, 1 + ǫ PQ ( s 1 ) = 1 + ǫ, 1 − 2 ǫ 1 − ǫ, 1 − ǫ PQ ( s 2 ) = (0 , 0) . || PQ − PQ ∗ || = 2 γǫ > ǫ

Proof Flaw • The proof of the Lemma handles the following cases: – When Q ∗ ( s ) meets Property 1 of the Assumption. – When Q ( s ) meets Property 2 of the Assumption. Q ∗ ( s ) meets Q ( s ) meets Property 1 Property 2 Property 1 X Property 2 X X • Fails to handle case where Q ∗ ( s ) meets Property 2, and Q ( s ) meets Property 1. – This is the case of the counterexample.

Strengthening the Assumption Easy Answer: Rule out the unhandled case. Assumption 2 The Nash equilibrium of all matrix games, Q t ( s ) , as well as Q ∗ ( s ) must satisfy property 1 in Assumption 1 OR the Nash equilibrium of all matrix games, Q t ( s ) , as well as Q ∗ ( s ) must satisfy property 2 of Assumption 1.

Discussion: Applicability of the Theorem • Q t satisfies assumption � Q t +1 satisfies assumption. – Problem with their original assumption. – Magnified by the further restrictions of new assumption. • All Q t values must satisfy same property as the unknown Q ∗ . These limitations prevent a real guarantee of convergence.

Discussion: Other Issues Why is convergence in general-sum games difficult? • Short answer: Small changes in Q values can cause a large change in the state’s equilibrium value. • But some general-sum games are “easy”: – Fully collaborative ( R i = R j ∀ i, j ) [Claus & Boutilier, 1998] – Iterated dominance solvable [Fudenberg & Levine, 1999] • Other general-sum games are also “easy”. – Even games with multiple equilibria. – See paper.

Conclusion There is still much work to be done on learning equilibria in general-sum games. Thanks to Manuela Veloso, Nicolas Meuleau, and Leslie Kaelbling for helpful discussions and ideas.

Convergence Problems of General-Sum Multiagent Reinforcement - PowerPoint PPT Presentation

Convergence Problems of General-Sum Multiagent Reinforcement Learning Michael Bowling Carnegie Mellon University Computer Science Department ICML 2000 Overview Stochastic Game Framework Q-Learning for General-Sum Games [Hu &

CHAPTER 11: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

CHAPTER 6: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

CHAPTER 12: LOGICS FOR MULTIAGENT SYSTEMS An Introduction to Multiagent Systems

A MultiAgent System for A MultiAgent System for Retrieving Bioinformatics Retrieving

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

1. Introduction ( (to Agents and Multiagent g g Systems) ems (SMA-UPC) Javier

and Applications Lecture 13: Programming Multiagent Systems [Part 2] Juan Carlos Nieves Snchez

1. Introduction (to Agents and Multiagent ( g g D) ems Design (MASD Systems) Javier

Multiagent Resource Allocation: What to optimise, how, and why? Ulle Endriss Imperial College

Agents and Artifacts: The A&A Meta-model for Multiagent Systems Multiagent Systems LS

Multiagent System-based Verification of Security and Privacy Ioana Boureanu Imperial College

Multiagent Systems: Rational Decision Making and Negotiation Ulle Endriss ( ue@doc.ic.ac.uk )

Game Theory: Spring 2020 Ulle Endriss Institute for Logic, Language and Computation University

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

In September of 2018, Brigadier General (retired) Zanetti was asked to be the keynote speaker at a

Homework for lecture slides 4a, 4b, and 4c. 1,0 1 L R Homework 4.1. 0,2 1,0 2 L R

Zero-Sum Games Are Special CMPUT 366: Intelligent Systems S&LB 3.4.1 Lecture Outline

Game Theory to the Rescue When Hard Decisions Are to Be Made Alexander C. S. Hendorf @hendorf

CS 4700: Foundations of Artificial Intelligence Bart Selman selman@cs.cornell.edu Module:

Convergence Problems of General-Sum Multiagent Reinforcement - PowerPoint PPT Presentation

Convergence Problems of General-Sum Multiagent Reinforcement Learning Michael Bowling Carnegie Mellon University Computer Science Department ICML 2000 Overview Stochastic Game Framework Q-Learning for General-Sum Games [Hu &

CHAPTER 11: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

CHAPTER 6: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

CHAPTER 12: LOGICS FOR MULTIAGENT SYSTEMS An Introduction to Multiagent Systems

A MultiAgent System for A MultiAgent System for Retrieving Bioinformatics Retrieving

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

1. Introduction ( (to Agents and Multiagent g g Systems) ems (SMA-UPC) Javier

and Applications Lecture 13: Programming Multiagent Systems [Part 2] Juan Carlos Nieves Snchez

1. Introduction (to Agents and Multiagent ( g g D) ems Design (MASD Systems) Javier

Multiagent Resource Allocation: What to optimise, how, and why? Ulle Endriss Imperial College

Agents and Artifacts: The A&amp;A Meta-model for Multiagent Systems Multiagent Systems LS

Multiagent System-based Verification of Security and Privacy Ioana Boureanu Imperial College

Multiagent Systems: Rational Decision Making and Negotiation Ulle Endriss ( ue@doc.ic.ac.uk )

Game Theory: Spring 2020 Ulle Endriss Institute for Logic, Language and Computation University

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

In September of 2018, Brigadier General (retired) Zanetti was asked to be the keynote speaker at a

Homework for lecture slides 4a, 4b, and 4c. 1,0 1 L R Homework 4.1. 0,2 1,0 2 L R

Zero-Sum Games Are Special CMPUT 366: Intelligent Systems S&amp;LB 3.4.1 Lecture Outline

Game Theory to the Rescue When Hard Decisions Are to Be Made Alexander C. S. Hendorf @hendorf

CS 4700: Foundations of Artificial Intelligence Bart Selman selman@cs.cornell.edu Module:

Agents and Artifacts: The A&A Meta-model for Multiagent Systems Multiagent Systems LS

Zero-Sum Games Are Special CMPUT 366: Intelligent Systems S&LB 3.4.1 Lecture Outline