Game Theory - Lecture #12 Outline: Randomized actions vNM & - - PDF document

game theory lecture 12
SMART_READER_LITE
LIVE PREVIEW

Game Theory - Lecture #12 Outline: Randomized actions vNM & - - PDF document

Game Theory - Lecture #12 Outline: Randomized actions vNM & Bernoulli payoff functions Mixed strategies & Nash equilibrium Hawk/Dove & Mixed strategies Randomized action profiles Original strategic setup: Set of


slide-1
SLIDE 1

Game Theory - Lecture #12

Outline:

  • Randomized actions
  • vNM & Bernoulli payoff functions
  • Mixed strategies & Nash equilibrium
  • Hawk/Dove & Mixed strategies
slide-2
SLIDE 2

Randomized action profiles

  • Original strategic setup:

– Set of players, {1, 2, ..., n} – For each player, a set of actions Ai – For each player, preferences on action profiles characterized by a payoff function: Ui : A → R

  • Question: How do we extend preferences to lotteries over action profiles?
  • Extension: Strategic game with vNM (Von Neumann and Morgenstern) preferences

– Set of players – For each player, a set of actions Ai – For each player, preferences on lotteries on action profiles characterized by a (vNM) payoff function: Ui(∆(A)) → R Notation: ∆(Set) denotes probability distributions over a Set of outcomes

  • Important special case: vNM preferences given by expected utility over action profiles

(Bernoulli payoff)

  • Key observation: Payoff values define preferences over distributions

– Original setting: preferences ⇔ payoffs over profiles – Extension: preferences ⇔ payoffs over profiles ⇒ preferences over distributions

  • Concern: Moving further away from true preferences

1

slide-3
SLIDE 3

Example

  • Original setting: preferences ⇔ payoffs over profiles

– Fact: Several payoff functions reflect preferences C D C 2, 2 0, 3 D 3, 0 1, 1 C D C 3, 3 0, 4 D 4, 0 1, 1 – These are the same game (Prisoner’s dilemma) in terms of original ordinal preference

  • Extension: preferences ⇔ payoffs over profiles ⇒ preferences over distributions

– These are different games in terms of probability preferences – Player 1 vNM utility depends on probabilities of {CC, CD, DC, DD}: ∗ Left game: U1(p) = pCC · 2 + pCD · 0 + pDC · 3 + pDD · 1 ∗ Right game: U1(p) = pCC · 3 + pCD · 0 + pDC · 4 + pDD · 1 Similar for Player 2 – Compare following probability distributions: (2/5, 3/5, 0, 0) vs (0, 0, 0, 1)

  • Payoff values take on heightened importance in extended setting.
  • Dependence on payoff values can result in peculiar outcomes.

2

slide-4
SLIDE 4

Expected payoff peculiarities

  • In the new framework, the preferences are over probability distributions
  • Issue: Are expected payoffs “reasonable”?
  • Example: Allais paradox

– Consider the following two lotteries (in millions): $10 $2 $0 1 A vs $10 $2 $0 0.1 0.89 0.01 a Most prefer A to a... – Consider another two lotteries: $10 $2 $0 0.1 0.9 B vs $10 $2 $0 0.11 0.89 b Most prefer B to b... – Q: Are there choices of u(10), u(2), u(0) such that expected utilities result in pref- erences (A > a) and (B > b) – Preference evaluation for (A > a) u(2) > 0.1u(10) + 0.89u(2) + 0.01u(0). – Subtract 0.89u(2) and add 0.89u(0) to each side 0.11u(2) + 0.89u(0) > 0.1u(10) + 0.9u(0) – This implies that the expected payoff of lottery b exceeds that of lottery B!

  • Conclusion: Decision maker’s preferences cannot always be represented by an expected

payoff function. Nonetheless, we will make use of expected payoffs.

3

slide-5
SLIDE 5

Mixed strategies

  • A mixed strategy is a probability distribution over a player’s actions. Specifically, a

player selects αi ∈ ∆(Ai)

  • Consequences:

– Joint action probabilities are products of player probabilities – Bernoulli payoff becomes expected utility with independent players – New notation: Ui(αi, α−i)

  • Continuing previous example:

– Player 1 chooses α1 = (α1C, α1D) – Player 2 chooses α2 = (α2C, α2D) – Resulting probability distribution over joint actions is (pCC, pCD, pDC, pDD) = (α1Cα2C, α1Cα2D, α1Dα2C, α1Dα2D) – Inherited expected utilities: ∗ Left game: U1(α1, α2) = 2 · α1Cα2C + 0 · α1Cα2D + 3 · α1Dα2C + 1 · α1Dα2D ∗ Right game: U1(α1, α2) = 3 · α1Cα2C + 0 · α1Cα2D + 4 · α1Dα2C + 1 · α1Dα2D (Likewise for U2(·))

  • Reconciled viewpoint: New setup is same as old setup with

– Set of players – “New” set of actions αi ∈ ∆(Ai) – “New” payoff functions Ui(αi, α−i) which is expected value of original payoff func- tions assuming independent players

4

slide-6
SLIDE 6

Mixed strategy best response

  • Define the best response function, Bi(·), as

Bi(α−i) = {αi : Ui(αi, α−i) ≥ Ui(α′

i, α−i) for all α′ i ∈ ∆(Ai)}

Note that the best response “function” is actually a “set”

  • This definition is exactly as before except:

– Player actions are replaced with mixed strategies – Player utilities are replaced with expected utilities assuming independent players

  • Example: Generic two player/two action game

L R T a, A b, B B c, C d, D – Assume mixed strategies are α1 = (p, 1 − p) for row player and α2 = (q, 1 − q) for column player – Player 1 must maximize over p ∈ [0, 1] p

  • q · a + (1 − q) · b
  • + (1 − p)
  • q · c + (1 − q) · d
  • – Fact:

Brow(q) =      1 (q · a + (1 − q) · b) > (q · c + (1 − q) · d) (q · a + (1 − q) · b) < (q · c + (1 − q) · d) [0, 1] (q · a + (1 − q) · b) = (q · c + (1 − q) · d) – Similar analysis to derive Bcol(p)

5

slide-7
SLIDE 7

Mixed strategy Nash equilibrium

  • The mixed strategy profile α∗ = (α∗

1, ..., α∗ n) is a mixed strategy Nash equilibrium if

for every player i, α∗

i ∈ Bi(α∗ −i)

  • Celebrated Nash theorem: Every strategic game with vNM preferences in which each

player has finitely many actions has a mixed strategy Nash equilibrium.

  • Nash result due to (advanced) fixed point theory

– Want to find (α∗

1, ..., α∗ n) such that

α∗ → (B1(·), ..., Bn(·)) → α∗ – Illustration: A continuous function on the closed interval [0,1] must have a “fixed point”, i.e., an x ∈ [0, 1] such that x = f(x)

6

slide-8
SLIDE 8

Hawk/Dove

H D H 0, 0 6, 1 D 1, 6 3, 3

  • Setup:

– H: hawk = aggressive – D: dove = passive – Model of game of “chicken” or traffic intersection

  • First look: What are the pure (i.e., non-randomized) action NE?

– Best response function for row player: Brow(H) = D & Brow(D) = H – Symmetric for column player – NE: (H, D) and (D, H)

7

slide-9
SLIDE 9

Hawk/Dove: Mixed strategies

H D H 0, 0 6, 1 D 1, 6 3, 3

  • Second look: What are the mixed strategy NE?
  • As before, we construct best response function, but for mixed strategies

– Row: Pr (H) = p and Pr (D) = 1 − p – Column: Pr (H) = q and Pr (D) = 1 − q – Players select {H, D} independently

  • Best response for row player: Need to maximize expected payoff, i.e.,

max

0≤p≤1 p

  • 0 · q + 6 · (1 − q)
  • + (1 − p)
  • 1 · q + 3 · (1 − q)

Brow(q) =          1

  • 0 · q + 6 · (1 − q)
  • >
  • 1 · q + 3 · (1 − q)
  • [0, 1]
  • 0 · q + 6 · (1 − q)
  • =
  • 1 · q + 3 · (1 − q)
  • 0 · q + 6 · (1 − q)
  • <
  • 1 · q + 3 · (1 − q)
  • Conclusion:

Brow(q) =      1 q < 3/4 [0, 1] q = 3/4 q > 3/4 & Bcol(p) =      1 p < 3/4 [0, 1] p = 3/4 p > 3/4

8

slide-10
SLIDE 10

H/D: Best response plots

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

q p

  • NE occur at intersection of best response plots

– NE of original pure strategy game are still present – New “mixed strategy” NE: (p∗, q∗) = (3/4, 3/4)

  • Peculiarity: At mixed strategy NE, players are indifferent, i.e.,

Brow(3/4) = [0, 1] & Bcol(3/4) = [0, 1] i.e., at NE, best response is to play (H, D) with any probability combination.

  • The mixed strategy NE makes both players indifferent
  • Question: Are there other outcome that could lead to more desirable behavior?

9