lecture 6 multiagent interactions
play

LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent - PDF document

What are Multiagent Systems? LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems http://www.csc.liv.ac.uk/~mjw/pubs/imas 6-1 6-2 MultiAgent Systems Utilities and Preferences Assume we have just two agents: Ag = {i,


  1. What are Multiagent Systems? LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems http://www.csc.liv.ac.uk/~mjw/pubs/imas 6-1 6-2 MultiAgent Systems Utilities and Preferences � Assume we have just two agents: Ag = {i, j} Thus a multiagent system contains a � Agents are assumed to be self-interested : they have number of agents… preferences over how the environment is � …which interact through communication… � Assume Ω = { ω 1 , ω 2 , …} is the set of “outcomes” that agents have preferences over � …are able to act in an environment… � We capture preferences by utility functions : � …have different “spheres of influence” (which u i = Ω → � may coincide)… u j = Ω → � � …will be linked by other (organizational) � Utility functions lead to preference orderings over relationships outcomes: ω � i ω ’ means u i ( ω ) � u i ( ω ’) ω � i ω ’ means u i ( ω ) > u i ( ω ’) 6-3 6-4 What is Utility? Multiagent Encounters � We need a model of the environment in which these � Utility is not money (but it is a useful analogy) agents will act… � Typical relationship between utility & money: � agents simultaneously choose an action to perform, and as a result of the actions they select, an outcome in Ω will result � the actual outcome depends on the combination of actions � assume each agent has just two possible actions that it can perform, C (“cooperate”) and D (“defect”) � Environment behavior given by state transformer function : 6-5 6-6 1

  2. Multiagent Encounters Rational Action � Here is a state transformer function: � Suppose we have the case where both agents can influence the outcome, and they have utility functions as follows: (This environment is sensitive to actions of both agents.) � With a bit of abuse of notation: � Here is another: (Neither agent has any influence in this � Then agent i ’s preferences are: environment.) � And here is another: � “ C ” is the rational choice for i . (Because i prefers all outcomes that arise through C over all outcomes that arise through D .) (This environment is controlled by j .) 6-7 6-8 Payoff Matrices Dominant Strategies � We can characterize the previous scenario in � Given any particular strategy (either C or D ) of agent a payoff matrix : i , there will be a number of possible outcomes � We say s 1 dominates s 2 if every outcome possible by i playing s 1 is preferred over every outcome possible by i playing s 2 � A rational agent will never play a dominated strategy � So in deciding what to do, we can delete dominated strategies � Unfortunately, there isn’t always a unique � Agent i is the column player undominated strategy � Agent j is the row player 6-9 6-10 Competitive and Zero-Sum Interactions Nash Equilibrium In general, we will say that two strategies s 1 and s 2 � Where preferences of agents are � are in Nash equilibrium if: diametrically opposed we have strictly under the assumption that agent i plays s 1 , agent j can do 1. competitive scenarios no better than play s 2 ; and � Zero-sum encounters are those where utilities under the assumption that agent j plays s 2 , agent i can do 2. no better than play s 1 . sum to zero: Neither agent has any incentive to deviate from a � u i ( ω ) + u j ( ω ) = 0 for all ω � Ω Nash equilibrium � Zero sum implies strictly competitive Unfortunately: � � Zero sum encounters in real life are very rare Not every interaction scenario has a Nash equilibrium 1. … but people tend to act in many scenarios Some interaction scenarios have more than one Nash 2. equilibrium as if they were zero sum 6-11 6-12 2

  3. The Prisoner’s Dilemma The Prisoner’s Dilemma � Two men are collectively charged with a � Payoff matrix for crime and held in separate cells, with no way prisoner’s dilemma: of meeting or communicating. They are told that: � Top left: If both defect, then both get � if one confesses and the other does not, the punishment for mutual defection confessor will be freed, and the other will be jailed � Top right: If i cooperates and j defects, i gets for three years sucker’s payoff of 1, while j gets 4 � if both confess, then each will be jailed for two years � Bottom left: If j cooperates and i defects, j gets sucker’s payoff of 1, while i gets 4 � Both prisoners know that if neither confesses, then they will each be jailed for one year � Bottom right: Reward for mutual cooperation 6-13 6-14 The Prisoner’s Dilemma The Prisoner’s Dilemma � The individual rational action is defect � This apparent paradox is the fundamental This guarantees a payoff of no worse than 2, problem of multi-agent interactions . whereas cooperating guarantees a payoff of at It appears to imply that cooperation will not most 1 occur in societies of self-interested agents . � So defection is the best response to all � Real world examples: possible strategies: both agents defect, and � nuclear arms reduction (“why don’t I keep mine. . . ”) get payoff = 2 � free rider systems — public transport; � But intuition says this is not the best outcome: � in the UK — television licenses. Surely they should both cooperate and each � The prisoner’s dilemma is ubiquitous . get payoff of 3! � Can we recover cooperation? 6-15 6-16 Arguments for Recovering Cooperation The Iterated Prisoner’s Dilemma � Conclusions that some have drawn from this � One answer: play the game more than once analysis: � If you know you will be meeting your � the game theory notion of rational action is wrong! opponent again, then the incentive to defect � somehow the dilemma is being formulated appears to evaporate wrongly � Cooperation is the rational choice in the � Arguments to recover cooperation: infinititely repeated prisoner’s dilemma � We are not all Machiavelli! (Hurrah!) � The other prisoner is my twin! � The shadow of the future… 6-17 6-18 3

  4. Backwards Induction Axelrod’s Tournament � But…suppose you both know that you will play the game exactly n times � Suppose you play iterated prisoner’s dilemma On round n - 1 , you have an incentive to against a range of opponents… defect, to gain that extra bit of payoff… What strategy should you choose, so as to But this makes round n – 2 the last “real”, and maximize your overall payoff? so you have an incentive to defect there, too. � Axelrod (1984) investigated this problem, with This is the backwards induction problem. a computer tournament for programs playing � Playing the prisoner’s dilemma with a fixed, the prisoner’s dilemma finite, pre-determined, commonly known number of rounds, defection is the best strategy 6-19 6-20 Recipes for Success in Axelrod’s Strategies in Axelrod’s Tournament Tournament ALLD: � “Always defect” — the hawk strategy; � � Axelrod suggests the following rules for succeeding in his tournament: TIT-FOR-TAT: � On round u = 0 , cooperate � Don’t be envious : 1. Don’t play as if it were zero sum! On round u > 0 , do what your opponent did on round u – 1 2. � Be nice : TESTER: � Start by cooperating, and reciprocate cooperation On 1st round, defect. If the opponent retaliated, then play � TIT-FOR-TAT. Otherwise intersperse cooperation and � Retaliate appropriately : defection. Always punish defection immediately, but use “measured” force — don’t overdo it JOSS: � � Don’t hold grudges : As TIT-FOR-TAT, except periodically defect � Always reciprocate cooperation immediately 6-21 6-22 Game of Chicken Other Symmetric 2 x 2 Games � Consider another type of encounter — the game of � Given the 4 possible outcomes of (symmetric) chicken : cooperate/defect games, there are 24 possible orderings on outcomes � CC � i CD � i DC � i DD Cooperation dominates � DC � i DD � i CC � i CD (Think of James Dean in Rebel without a Cause : Deadlock . You will always do best by defecting swerving = coop, driving straight = defect.) � DC � i CC � i DD � i CD � Difference to prisoner’s dilemma: Prisoner’s dilemma Mutual defection is most feared outcome . � DC � i CC � i CD � i DD (Whereas sucker’s payoff is most feared in prisoner’s Chicken dilemma.) � CC � i DC � i DD � i CD Stag hunt � Strategies (c,d) and (d,c) are in Nash equilibrium 6-23 6-24 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend