multi agent learning
play

Multi-agent learning Satising pla y Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Multi-agent learning Satising pla y Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Tuesday 16 th June, 2020 Assumptions in game playing Author: Gerard


  1. What if none of these assumptions holds? ■ Players don’t know the This takes the problem out of structure of the game. game theory into machine learning. ● Players don’t know who’s What can we do? playing. ■ Reinforcement learning. ● Players don’t know the arsenal of other players. Disadvantages: ■ Players can’t observe other ● No reference to past player’s actions. average payoffs. ■ Players can’t observe other ● Difficult theory. player’s payoffs. Alternative: ■ Players aren’t aware that they ■ Satisficing learning. are in a game. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

  2. Herbert A. Simon on maximising vs. satisficing Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

  3. Herbert A. Simon on maximising vs. satisficing Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

  4. Herbert A. Simon on maximising vs. satisficing Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

  5. Herbert A. Simon on maximising vs. satisficing “A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .” a a H. Simon “Models of bounded rationality” in: Empirically grounded economic reasons , Vol. 3. MIT Press, 1997. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

  6. Herbert A. Simon on maximising vs. satisficing “A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .” a a H. Simon “Models of bounded rationality” in: Empirically grounded economic reasons , Vol. 3. MIT Press, 1997. “Decision makers can satisfice either by finding optimum solu- tions for a simplified world, or by finding satisfactory solutions for a more realistic world. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

  7. Herbert A. Simon on maximising vs. satisficing “A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .” a a H. Simon “Models of bounded rationality” in: Empirically grounded economic reasons , Vol. 3. MIT Press, 1997. “Decision makers can satisfice either by finding optimum solu- tions for a simplified world, or by finding satisfactory solutions for a more realistic world. Neither approach, in general, domi- nates the other, and both have continued to co-exist in the world of management science.” Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

  8. Herbert A. Simon on maximising vs. satisficing “A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .” a a H. Simon “Models of bounded rationality” in: Empirically grounded economic reasons , Vol. 3. MIT Press, 1997. “Decision makers can satisfice either by finding optimum solu- tions for a simplified world, or by finding satisfactory solutions for a more realistic world. Neither approach, in general, domi- nates the other, and both have continued to co-exist in the world of management science.” a a H. Simon “Rational decision making in business organizations” in: The American Economic Review, Vol. 69(4), pp. 493-513. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

  9. Karandikar et al. ’s algorithm for satisficing play (1989) Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 5

  10. aspiration level p ersisten e rate Satisficing algorithm Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

  11. aspiration level p ersisten e rate Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

  12. aspiration level p ersisten e rate Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

  13. p ersisten e rate Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . aspiration Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

  14. p ersisten e rate Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

  15. Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , where λ is the p ersisten e rate Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

  16. Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , where λ is the rate , and π t is the payoff in round t . p ersisten e Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

  17. Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , where λ is the rate , and π t is the payoff in round t . p ersisten e ● It is up to the programmer to choose an initial action A 0 , and an initial aspiration level α 0 . Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

  18. Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , where λ is the rate , and π t is the payoff in round t . p ersisten e ● It is up to the programmer to choose an initial action A 0 , and an initial aspiration level α 0 . ■ Satisficing algorithm: � A t if π t ≥ α t , A t + 1 = any other action else. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

  19. Satisficing algorithm ■ At any time, t , the agent’s state is a tuple ( A t , α t ) . A t is the current action. ● ● α t is the current level . Updated as aspiration α t + 1 = Def λα t + ( 1 − λ ) π t , where λ is the rate , and π t is the payoff in round t . p ersisten e ● It is up to the programmer to choose an initial action A 0 , and an initial aspiration level α 0 . ■ Satisficing algorithm: � A t if π t ≥ α t , A t + 1 = any other action else. Also works if “any other action” is replaced by “any action”. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

  20. Example of satisficing play Game: prisoner’s dilemma. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  21. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  22. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  23. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  24. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  25. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  26. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  27. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  28. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  29. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  30. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  31. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  32. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  33. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  34. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  35. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  36. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  37. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  38. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  39. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  40. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  41. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  42. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  43. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  44. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  45. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  46. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  47. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  48. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  49. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  50. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  51. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  52. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  53. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  54. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  55. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  56. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  57. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  58. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  59. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  60. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  61. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  62. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  63. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  64. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  65. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  66. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  67. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  68. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  69. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  70. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  71. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  72. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  73. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 2.01171875 Progress of aspirations. Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  74. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 2.01171875 Progress of aspirations. 10 Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  75. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 2.01171875 Progress of aspirations. 10 C Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  76. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 2.01171875 Progress of aspirations. 10 C D Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

  77. Example of satisficing play Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state ( A 0 , α 0 ) = ( C , 5 ) . Persistence rate: λ = 0.5. t TFT A t π t α t 0 C C 3 5 Α 5 1 C D 5 4 4 2 D D 1 4.5 3 D C 0 2.75 3 4 C D 5 1.375 5 D D 1 3.1875 2 6 D C 0 2.09375 1 7 C D 5 1.046875 8 D D 1 3.0234375 t 0 2 4 6 8 10 9 D C 0 2.01171875 Progress of aspirations. 10 C D 5 Author: Gerard Vreeswijk. Slides last modified on June 16 th , 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend