game theory cs 188 artificial intelligence
play

Game Theory CS 188: Artificial Intelligence Game theory: study of - PDF document

Game Theory CS 188: Artificial Intelligence Game theory: study of strategic situations, Spring 2006 usually simultaneous actions Prisoners Dilemma A game has: Lecture 26: Game Theory Players A 4/25/2006 Testify Refuse


  1. Game Theory CS 188: Artificial Intelligence � Game theory: study of strategic situations, Spring 2006 usually simultaneous actions Prisoner’s Dilemma � A game has: Lecture 26: Game Theory � Players A 4/25/2006 Testify Refuse � Actions B Testify -5,-5 -10,0 � Payoff matrix Refuse 0,-10 -1,-1 Dan Klein – UC Berkeley � Example: prisoner’s dilemma Strategies Dominance and Optimality � Strategy = policy Prisoner’s Dilemma � Strategy Dominance: Prisoner’s Dilemma � A strategy s for A (strictly) � A A Pure strategy dominates s’ if it produces a � Deterministic policy Testify Refuse Testify Refuse better outcome for A, for any B � In a one-move game, just a move strategy B Testify -5,-5 -10,0 B Testify -5,-5 -10,0 Refuse 0,-10 -1,-1 Refuse 0,-10 -1,-1 � Mixed strategy � Outcome Dominance: � Randomized policy � Ever good to use one? � An outcome o Pareto dominates Two-Finger Morra Two-Finger Morra o’ if all players prefer o to o’ � Strategy profile: a spec of one � An outcome is Pareto optimal if O O strategy per player there is no outcome that all One Two One Two players would prefer � Outcome: each strategy profile E One -2,2 3,-3 E One -2,2 3,-3 results in an (expected) number for Two 3,-3 -4,4 Two 3,-3 -4,4 each player Equilibria Coordination Games � In the prisoner’s dilemma: A � No dominant strategy Technology Choice � What will A do? Testify Refuse � But, two (pure) Nash A � What will B do? B Testify -5,-5 -10,0 equilibria � What’s the dilemma? DVD HD-DVD Refuse 0,-10 -1,-1 B DVD 5,5 -2,-1 � Both testifying is a (Nash) equilibrium � What should agents do? HD-DVD -2,-1 8,8 � Neither player can benefit from a unilateral change in strategy � Can sometimes choose � I.e., it’s a local optimum (not necessarily global) Pareto optimal Nash � Nash showed that every game has such an equilibrium equilibrium � Note: not every game has a dominant strategy equilibrium Driving Direction � But may be ties! A � � Naturally gives rise to What do we have to change for the prisoners to refuse? communication � Left Right Change the payoffs � � Also: correlated equilibria Consider repeated games B Left 1,1 -1,-1 � Limit the computational ability of the agents Right -1,-1 1,1 � How would we model a “code of thieves”? 1

  2. Mixed Strategy Games (Zero-Sum) Minimax Strategies Two-Finger Morra � What’s the Nash equilibrium? � Idea: force one player to chose Two-Finger Morra � No pure strategy equilibrium and declare a strategy first O � Must look at mixed strategies � Say E reveals first O One Two � For each E strategy, O has a One Two E One -2,2 3,-3 minimax response � Mixed strategies: E One -2,2 3,-3 Two 3,-3 -4,4 � Utility of the root favors O (why?) � Distribution over actions per state Two 3,-3 -4,4 and is -3 (from E’s perspective) � In a one-move game, a single � If O goes first, root is 2 (for E) distribution � If these two utilities matched, we � For Morra, a single number p even 1 2 1 2 would know the utility of the specifies the strategy maximum equilibrium 1 2 1 2 1 2 1 2 � How to choose the optimal � Must look at mixed strategies 2 -3 -3 4 2 -3 -3 4 mixed strategy? Continuous Minimax Repeated Games � What about repeated games? Two-Finger Morra � Imagine a minimax tree: � E.g. repeated prisoner’s dilemma � Instead of the two pure strategies, O � Future responses, retaliation becomes an issue first player has infinitely many One Two mixed ones � Strategy can condition on past experience E One -2,2 3,-3 � Note that second player should Two 3,-3 -4,4 always respond with a pure � Repeated prisoner’s dilemma strategy (why?) � Fixed numbers of games causes repeated betrayal � If agents unsure of number of future games, other options � Here, can calculate the minimax � E.g. perpetual punishment: silent until you’re betrayed, then testify thereafter (and maximin) values [p;1, (1-p);2] � E.g. tit-for-tat: do what was done to you last round � Both are ½ (from O’s perspective) � It’s enough for your opponent to believe you are incapable of � 1 2 Correspond to [7/12; 1, 5/12; 2] for remembering the number of games played (doesn’t actually both players matter whether the limitation really exists) (2)(p)+(-3)(1-p) (-3)(p)+(4)(1-p) Partially Observed Games The Ultimatum Game � Much harder to analyze � Game theory can reveal interesting issues in social psychology � You have to work with trees of belief states � E.g. the ultimatum game � Problem: you don’t know your opponent’s belief state! � Proposer: receives $x, offers split $k / $(x-k) � Accepter: either � Accepts: gets $k, proposer gets $(x-k) � Newer techniques can solve some partially observable � Rejects: neither gets anything games � Nash equilibrium? � Mini-poker analysis shows, e.g., that bluffing can be a rational � Any strategy profile where proposer offers $k and accepter will accept $k or action greater � � Randomization: not just for being unpredictable, also useful for But that’s not the interesting part… minimizing what opponent can learn from your actions � Issues: � Why do people tend to reject offers which are very unfair (e.g. $20 from $100)? � Irrationality? � Utility of $20 exceeded by utility of punishing the unfair proposer? � What about if x is very very large? 2

  3. Mechanism Design Auctions � One use of game theory: mechanism design � Example: auctions � Consider auction for one item � Designing a game which induces desired behavior in rational � Each bidder i has value v i and bids b i for item agents � English auction: increasing bids � E.g. avoiding tragedies of the commons � How should bidder i bid? � What will the winner pay? � Classic example: farmers share a common pasture � Why is this not an optimal result? � Each chooses how many goats to graze � Adding a goat gains utility for that farmer � Sealed single-bid auction, highest pays bid � Adding a goat slightly degrades the pasture � How should bidder i bid? � � Inevitable that each farmer will keep adding goats until the Why is bidding your value no longer dominant? � Why is this auction not optimal? commons is destroyed (tragedy!) � Sealed single-bid second-price auction � Classic solution: charge for use of the commons � How should bidder i bid? � Bid v i – why? � Prices need to be set to produce the right behavior 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend