CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro - - PowerPoint PPT Presentation

cmu q 15 381
SMART_READER_LITE
LIVE PREVIEW

CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro - - PowerPoint PPT Presentation

CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro I CE - CREAM W ARS http://youtu.be/jILgxeNBK_8 2 G AME T HEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent systems


slide-1
SLIDE 1

CMU-Q 15-381

Lecture 20: Game Theory I

Teacher: Gianni A. Di Caro

slide-2
SLIDE 2

2

ICE-CREAM WARS

http://youtu.be/jILgxeNBK_8

slide-3
SLIDE 3

3

GAME THEORY

§ Game theory is the formal study of conflict and cooperation in (rational) multi-agent systems § Decision-making where several players must make choices that potentially affect the interests of other players: the effect

  • f the actions of several agents are interdependent

(and agents are aware of it) § Example: Auctioning!

Psychology: Theory of social situations

slide-4
SLIDE 4

4

ELEMENTS OF A GAME

§ The players: how many players are there? Does nature/chance play a role? Players are assumed to be rational

§ A complete description of what the players can do: the set of all possible actions.

slide-5
SLIDE 5

5

ELEMENTS OF A GAME

§ A description of the payoff / consequences for each player for every possible combination of actions chosen by all players playing the game. § A description of all players’ preferences over payoffs

Utility function for each player

slide-6
SLIDE 6

6

AGENT DESIGN VS. MECHANISM DESIGN

§ Agent strategy design: Game theory can be used to compute the expected utility for each decision, and use this to determine the best strategy (and its expected return) against a rational player § System-level mechanism design: Define the rules of the game, such that the collective utility of the agents is maximized when each agent strategy is designed to maximize its own utility according to ASD

Strategy ≡ Policy

slide-7
SLIDE 7

7

MAKING DECISIONS: BASIC DEFINITIONS

§ Decision-making can involve: one action or a sequence of actions § Action outcomes can be certain or subject to uncertainty § A set 𝐵 of alternative actions to choose from is given, it can be either discrete (finite or numerable) or continuous (infinite) § 𝐵 = {𝑏',𝑏),⋯, 𝑏+} 𝐵 = 𝑏 𝑏 ∈ 0,10 } § Strategy (=Policy): tells a player what to do for every possible situation (state) throughout the game (complete algorithm for playing the game). It can be deterministic or stochastic

𝐵' = 𝑏',𝑏) , 𝐵) = 𝑐',𝑐) , 𝐵1= 𝑑',𝑑) , 𝐵3 = States: {1,2,3,𝑈}

𝑏' 𝑏) 2 𝑐' 𝑐) 3 𝑑' 𝑑) 1 § Strategy set 𝑇: set of all strategies available for the players to play. Set 𝑇 can be finite or infinite

𝑇 = {𝑏'𝑐',𝑏'𝑐),𝑏)𝑑', 𝑏)𝑑)} Sequential game, one player E.g. strategy: 𝑡 = {𝑏'𝑐'}

slide-8
SLIDE 8

8

MAKING DECISIONS: BASIC DEFINITIONS

𝐵' = 𝑏',𝑏) , 𝐵) = 𝑐',𝑐) , 𝐵1= 𝑑',𝑑) , 𝐵3 = States: {1,2,3,𝑈}

𝑏' 𝑏) 1

§ One-action (static) games

𝑐' 𝑐) 2 𝑑' 𝑑) 3

§ The strategy defines the behavior of an agent § The observed behavior of an agent following a given strategy is the outcome

  • f the strategy

𝑇 = (1, 𝑏' ,(1, 𝑏)),(2,𝑐'), (2, 𝑐)), (3, 𝑑'),(3,𝑑))} E.g. strategy: 𝑡 = {(1,𝑏'), (2, 𝑐)), (3, 𝑑')} § Pure strategy: a strategy in which there is no randomization, one specific action from the set 𝐵 is selected with certainty at each state / decision node § The strategy set 𝑇 is also indicated as the pure strategy set

slide-9
SLIDE 9

9

PAYOFFS AND UTILITIES

§ How do we choose the strategy? § Rational agents: Principle of Maximum Expected Utility § Payoffs ~ Rewards in MDPs: what results from taking an action § Payoff (for a single agent): function that associates a numerical value with every action in 𝐵 𝜌: 𝐵 → ℝ § Payoff (for a multi-agent scenario): The payoff of the action 𝑏 for agent 𝑗 depends on the actions of the other players! 𝜌: 𝐵×𝐵×⋯ ×𝐵 → ℝ § Utility: it can be any convenient additive function 𝑣 of the payoffs § In the following the payoffs will coincide with the utility of the agents (it fully makes sense for the static games that we will consider) § Notation: we will use 𝜌B and 𝑣B quite interchangeably

slide-10
SLIDE 10

10

INFORMATION AND TYPES OF GAMES

§ Complete information game: Utility functions, payoffs, strategies and “types” of players are common knowledge § Incomplete information game: Players may not possess full information about their opponents (e.g., in auctions, each player knows its utility but not that of the other players). “Parameters” of the game are not fully known § Perfect information game: Each player, when making any decision, is perfectly informed of all the events that have previously occurred (e.g., chess) [Full observability] § Imperfect information game: Not all information is accessible to the player (e.g., poker, prisoner’s dilemma) [Partial observability]

slide-11
SLIDE 11

11

TURN-TAKING VS. SIMULTANEOUS MOVES

§ Dynamic games

  • Turn-taking games
  • Fully observable ↔

Perfect Information Games

  • Complete Information
  • Repeated moves

10 10 9 100 max min

§ Static games § All players take actions “simultaneously” § → Imperfect information games § Complete information § Single-move games

Morra

slide-12
SLIDE 12

12

(STRATEGIC-) NORMAL-FORM GAME

§ Let’s focus on static games § There is a strategic interaction among players

§ Strategy profile: a set of strategies for all players which fully specifies all actions in a game. It must include one and only one strategy for every player

§ A game in normal form consists of:

  • Set of players 𝑂 = {1,… , 𝑜}
  • Set of actions available to each player, that

defines the strategy set 𝑇 = {𝑡', 𝑡),⋯ , 𝑡G}

  • For each 𝑗 ∈ 𝑂, a utility function 𝑣B defined
  • ver the set of all possible strategy profiles

𝑣B ∶ 𝑇+ → ℝ

Payoff matrix Payoff matrix in a 2-player game

If each player 𝑘 ∈ 𝑂 plays the strategy 𝑡

J ∈ 𝑇, the utility of player 𝑗 is

𝑣B 𝑡',… , 𝑡+ that is the same as player 𝑗’s payoff when strategy profile (𝑡',… , 𝑡+) is chosen

slide-13
SLIDE 13
  • 𝑣B 𝑡B, 𝑡

J

=

KLMKN )

, 𝑡B < 𝑡

J

1 −

KLMKN )

, 𝑡B > 𝑡

J ' ), 𝑡B = 𝑡 J

13

THE ICE CREAM WARS

§ 𝑂 = 1,2 § 𝑇 = [0,1] § 𝑡i is the fraction of beach § …..

slide-14
SLIDE 14

14

THE PRISONER’S DILEMMA (1962)

§ Two men are charged with a

  • crime. Police suspects they are

the authors of the crime but doesn’t have enough evidence § They are taken into custody and can’t communicate with each other § They are told that:

  • If one rats out and the other

does not, the rat will be freed,

  • ther jailed for 9 years
  • If both rat out, both will be

jailed for 6 years § They also know that if neither rats

  • ut, both will be jailed for 1 year

6 6 9 9

§ 𝑂 = 1,2 § 𝑇 = {𝐷𝑝𝑜𝑔𝑓𝑡𝑡, 𝐸𝑝𝑜Y𝑢 𝑑𝑝𝑜𝑔𝑓𝑡𝑡} § Strategy profiles: { 𝐷, 𝐷 , 𝐷, 𝐸 , 𝐸, 𝐷 , 𝐸, 𝐸 } § 𝑣[ 𝐷, 𝐷 = 6, 𝑣[ 𝐷, 𝐸 = 0, 𝑣[ 𝐸, 𝐷 = 9, 𝑣[ 𝐸, 𝐸 = 1 § Symmetric for 𝑣^

slide-15
SLIDE 15

15

THE PRISONER’S DILEMMA (1962)

slide-16
SLIDE 16

16

PRISONER’S DILEMMA: PAYOFF MATRIX

  • 1,-1
  • 9,0

0,-9

  • 6,-6

Don’t Confess Confess

What would you do?

Don’t confess = Don’t rat out Cooperate with each other Confess = Rat out Don’t cooperate to each

  • ther, act selfishly!

Don’t Confess Confess

B A

slide-17
SLIDE 17

17

PRISONER’S DILEMMA: PAYOFF MATRIX

  • 1,-1
  • 9,0

0,-9

  • 6,-6

Don’t Confess Confess Don’t Confess Confess

B A B Don’t confess:

§ If A don’t confess, B gets -1 § If A confess, B gets -9

B Confess:

§ If A don’t confess, B gets 0 § If A confess, B gets -6

Rational agent B opts to Confess

slide-18
SLIDE 18

18

PRISONER’S DILEMMA

§ Confess (Defection = Acting selfishly) is a dominant strategy for B: no matters what A plays, the best reply strategy is always to confess § (Strictly) dominant strategy: yields a player strictly higher payoff, regardless of which decision(s) the other player(s) choose § Weakly dominant strategy: ties in some cases § Because of symmetry, Confess is a dominant strategy also for A § A will reason as follows: B’s dominant strategy is to Confess, therefore, given that we are both rational agents, B will also Confess and we will both get 6 years.

slide-19
SLIDE 19

19

PRISONER’S DILEMMA

§ But, is the dominant strategy (𝐷,𝐷) the best strategy?

  • 1,-1
  • 9,0

0,-9

  • 6,-6

Don’t Confess Confess Don’t Confess Confess

B A

slide-20
SLIDE 20

20

PARETO OPTIMALITY VS. EQUILIBRIA

§ Being selfish is a dominant strategy, but the players can do much better by cooperating: (-1,-1), which is the Pareto-optimal outcome § Pareto optimality: an outcome such that there is no other

  • utcome that makes any player better off without making at least

another one player worse off → Outcome (Don’t Confess, Don’t confess): (-1,-1) § A strategy profile forms an equilibrium if no player can benefit by switching strategies, given that every other player sticks with the same strategy, which is the case of (Confess, Confess) § An equilibrium is a local optimum in the space of the strategies

slide-21
SLIDE 21

21

UNDERSTANDING THE DILEMMA

§ (Self-interested & Rational) agents would choose a strategy that does not bring the maximal reward § The dilemma is that the equilibrium , outcome , that derives from the dominant strategy, is worse for both players than the outcome they would get if both refuse to confess

https://en.wikipedia.org/wiki/Tragedy_of_the_common s

§ Related to the tragedy of the commons:

Situation in a shared-resource system where individual users acting independently according to their own self-interest behave contrary to the common good of all users by depleting or spoiling that resource through their collective action § CO2 emissions / climate, oceans, water, energy, welfare,….

slide-22
SLIDE 22

22

ON TV: GOLDEN BALLS

http://youtu.be/S0qjK3TWZE8

§ If both choose Split, they each receive half the jackpot. § If one chooses Steal and the

  • ther chooses Split, the Steal

contestant wins the entire jackpot. § If both choose Steal, neither contestant wins any money § Watch the video!