Multi-agent learning Satising pla y Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

What if none of these assumptions holds? ■ Players don’t know the This takes the problem out of structure of the game. game theory into machine learning. ● Players don’t know who’s What can we do? playing. ■ Reinforcement learning. ● Players don’t know the arsenal of other players. Disadvantages: ■ Players can’t observe other ● No reference to past player’s actions. average payoffs. ■ Players can’t observe other ● Difficult theory. player’s payoffs. Alternative: ■ Players aren’t aware that they ■ Satisficing learning. are in a game. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

Herbert A. Simon on maximising vs. satisficing Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

Herbert A. Simon on maximising vs. satisficing “A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .” a a H. Simon “Models of bounded rationality” in: Empirically grounded economic reasons , Vol. 3. MIT Press, 1997. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

Herbert A. Simon on maximising vs. satisficing “A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .” a a H. Simon “Models of bounded rationality” in: Empirically grounded economic reasons , Vol. 3. MIT Press, 1997. “Decision makers can satisfice either by finding optimum solutions for a simplified world, or by finding satisfactory solutions for a more realistic world. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

Herbert A. Simon on maximising vs. satisficing “A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .” a a H. Simon “Models of bounded rationality” in: Empirically grounded economic reasons , Vol. 3. MIT Press, 1997. “Decision makers can satisfice either by finding optimum solutions for a simplified world, or by finding satisfactory solutions for a more realistic world. Neither approach, in general, domi- nates the other, and both have continued to co-exist in the world of management science.” Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

Herbert A. Simon on maximising vs. satisficing “A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .” a a H. Simon “Models of bounded rationality” in: Empirically grounded economic reasons , Vol. 3. MIT Press, 1997. “Decision makers can satisfice either by finding optimum solutions for a simplified world, or by finding satisfactory solutions for a more realistic world. Neither approach, in general, domi- nates the other, and both have continued to co-exist in the world of management science.” a a H. Simon “Rational decision making in business organizations” in: The American Economic Review, Vol. 69(4), pp. 493-513. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

Karandikar et al. ’s algorithm for satisficing play (1989) Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 5

aspiration level p ersisten e rate Satisficing algorithm Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

aspiration level p ersisten e rate Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

aspiration level p ersisten e rate Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

p ersisten e rate Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . aspiration Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

p ersisten e rate Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , where λ is the p ersisten e rate Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , where λ is the rate , and π t is the payoff in round t . p ersisten e Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , where λ is the rate , and π t is the payoff in round t . p ersisten e ● It is up to the programmer to choose an initial action A 0 , and an initial aspiration level α 0 . Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , where λ is the rate , and π t is the payoff in round t . p ersisten e ● It is up to the programmer to choose an initial action A 0 , and an initial aspiration level α 0 . ■ Satisficing algorithm: � A t if π t ≥ α t , A t + 1 = any other action else. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , where λ is the rate , and π t is the payoff in round t . p ersisten e ● It is up to the programmer to choose an initial action A 0 , and an initial aspiration level α 0 . ■ Satisficing algorithm: � A t if π t ≥ α t , A t + 1 = any other action else. Also works if “any other action” is replaced by “any action”. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

Example of satisficing play Game: prisoner’s dilemma. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 2.01171875 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 2.01171875 Progress of aspirations. 10 Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 2.01171875 Progress of aspirations. 10 C Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 2.01171875 Progress of aspirations. 10 C D Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 2.01171875 Progress of aspirations. 10 C D 5 Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Multi-agent learning Satising pla y Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Multi-agent learning Satising pla y Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Tuesday 16 th June, 2020 Assumptions in game playing Author: Gerard

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

WELCOME! DTRF Western Regional Patient Meeting South San Francisco, April 13, 2019 Our Mission

Protocol to Patient (P2P) Ghulam Warsi 1 , Kert Viele 2 , Lebedinsky Claudia 1 , , Parasuraman

Cancer treatment in Mexico and the Seguro Popular August, 2012 1 Contents 1. Cancer as a public

Rx: Medical Neighborhoods The Medical Legal Partnership and Social Determinants of Health

Bounded rationality in the description of research excellence in Early Career Researchers in the

On Satisficing Planning with Admissible Empirical Evaluation Heuristics Discussion Summary and

Know ledge-Based Systems IS430 Decision Making, Systems, Modeling, and Support Mostafa Z. Ali

An Overview of the International Planning Competition Part 1: Classical Tracks Amanda Coles 1