Preference-dependent learning in the Centipede Game Astrid Gamba 1 - - PowerPoint PPT Presentation

preference dependent learning in the centipede game
SMART_READER_LITE
LIVE PREVIEW

Preference-dependent learning in the Centipede Game Astrid Gamba 1 - - PowerPoint PPT Presentation

Preference-dependent learning in the Centipede Game Astrid Gamba 1 Tobias Regner 2 1 University of Milan-Bicocca 2 Max Planck Institute, Jena Game Theory at the Universities of Milano III, 24 May 2013 A. Gamba, T. Regner 07/05 1 / 36


slide-1
SLIDE 1

Preference-dependent learning in the Centipede Game

Astrid Gamba1 Tobias Regner2

1University of Milan-Bicocca 2Max Planck Institute, Jena

Game Theory at the Universities of Milano III, 24 May 2013

  • A. Gamba, T. Regner

07/05 1 / 36

slide-2
SLIDE 2

Introduction

Aim of this paper: to explain heterogeneous behavior in the Centipede game, by means of an experiment driven by the theory on Self-confirming equilibrium (Battigalli, 1987; Fudenberg and Levine, 1993; Dekel et al., 2004). Contribution: behavior in the long run is the result of a learning process driven by players’ preference types and based on own

  • bservations of co-players’ behavior: given very plausible limitations

to the evidence that can be collected, off-path prediction errors may persist in the long run and contribute to sustain heterogeneity of behavior (some agents unravel and some don’t).

  • A. Gamba, T. Regner

07/05 2 / 36

slide-3
SLIDE 3

Self-confirming equilibrium

Hann, 1973; Battigalli, 1987; Battigalli and Guaitoli, 1988; Fudenberg and Levine, 1993a

SCE describes steady states where agents best respond to confirmed beliefs about the play. Two conditions: rationality: players maximize their (subjective) expected utility; confirmation of beliefs (instead of correctness as in Nash): agents’ equilibrium conjectures on the opponents’ strategies are consistent with the evidence they can collect given the strategies adopted.

  • A. Gamba, T. Regner

07/05 3 / 36

slide-4
SLIDE 4

Learning interpretation of Self-confirming equilibrium

Basic intuition: agents’ beliefs come from a large set of observations of the opponent’s play acquired along recurrent play of the same game. Partial evidence on the opponent’s strategy (own payoff? terminal node? ...) Subjective probabilities of the opponent’s strategies may be different from their objective probabilities. Crucial: especially in an extensive-form game off-path prediction errors may persist in the long run. Learning foundation for SCE: Fudenberg and Levine,1993b and 2006; Fudenberg and Kreps, 1995.

  • A. Gamba, T. Regner

07/05 4 / 36

slide-5
SLIDE 5

Other experiments related to Self-confirming equilibrium

Fudenberg and Levine (1997): measure losses in payoffs due to limited information about the play; Maniadis (2011): experimental study on whether aggregate information release causes more or less pro-social behavior in the Centipede Game (SCE of an incomplete information game with a small fraction of altruists).

  • A. Gamba, T. Regner

07/05 5 / 36

slide-6
SLIDE 6

Other solution concepts applied to the Centipede Game

Other solution concepts used to rationalize behavior in the Centipede Game Agent Quantal Response equilibrium (McKelvey and Palfrey, 1992): agents imperfectly respond to correct beliefs about the play. Analogy-based Expectation equilibrium (Jehiel, 2005; Huck and Jehiel, 2004): agents best respond to coarse beliefs about the play (they bundle opponent’s information sets in analogy classes). Cox and James (2012): ”Exploration of the impact of exogenously varied provision of information on past play in these games is an interesting topic for future research, and one that could help further establish the suitability of candidate explanatory models.”(Econometrica, 80(2), p.902)

  • A. Gamba, T. Regner

07/05 6 / 36

slide-7
SLIDE 7

The model underlying our experiment

Two-player extensive form game (6-stages Centipede Game, CG). In each role/player i there is a large population of agents with heterogeneous preferences (more or less joint payoff maximizers): θ ∈ [0, 1], with distributions qi and qj. Agents are drawn at random to play CG and play pure strategies. Each player (role) i plays a mixed strategy σi ∈ ∆(Si), induced by qi and the pure strategies adopted by each preference type in i’role, i.e., si,θ. We allow agents to have heterogeneous conjectures on the opponent’s (mixed) strategy: µi,θ ∈ ∆(Sj)

  • A. Gamba, T. Regner

07/05 7 / 36

slide-8
SLIDE 8

The model underlying our experiment

Assume that agents don’t know of the distribution of preference types in either population. Denote π(z|si,θ; qj, σ) the objective probability that preference type θ

  • bserves terminal node z given his own move, the move by Nature

and the mixed strategy of the opponent. Denote ρ(z|si,θ; µi,θ) the subjective probability of observing terminal node z as assessed by preference type θ given his own strategy and his conjecture about the opponent’s mixed strategy. Assume that after having played agents can only observe the terminal node reached.

  • A. Gamba, T. Regner

07/05 8 / 36

slide-9
SLIDE 9

Self-confirming equilibrium of an extensive-form game with heterogeneous preference types

Definition

A profile of mixed strategies (σi)i∈I is a self-confirming equilibrium if for each preference type θ we can find a conjecture µi,θ s.t. for each si,θ ∈ supp σi i) si,θ ∈ arg maxsi∈Si

  • ∑sj∈Sj µi,θ(sj)Uθ(si,θ, sj)
  • and

ii) ∀z ∈ Z, ρ(z|si,θ; µi,θ) = π(z|si,θ; qj, σ)

  • A. Gamba, T. Regner

07/05 9 / 36

slide-10
SLIDE 10

Self-confirming equilibrium of an extensive-form game with heterogeneous preference types: an example (Gamba 2013)

Joint payoff maximizers always choose ”across”, whichever their conjectures. Assume that selfish agents in role 1 believe that A′ and a are unlikely to be played (prob.< 1/3 both to the set of co-player’s strategies that prescribe A′ and to the set of co-player’s strategies that prescribe a → they always play ”down”.

  • A. Gamba, T. Regner

07/05 10 / 36

slide-11
SLIDE 11

The experiment

Research question: which is the role of incorrect off-path beliefs in determining long run outcomes of the CG and how they interact with social preferences. How: Behavior of different (social) preference types along 40 rounds of the CG We manipulate access to information about opponent’s play (personal/public) and study how the long run outcomes vary across treatments (from SCE to Nash?).

  • A. Gamba, T. Regner

07/05 11 / 36

slide-12
SLIDE 12

The design

Jena, 8 sessions: 32 subjects per session (tot. 256); 40 repetitions of the 6-stage Centipede Game Anonymous matches Elicitation of preferences two weeks before the sessions Elicitation of behavior in the CG: sequential response method and strategy method Elicitation of beliefs in round 1 (before they play), and 17, 18, 19, 40 (after they have played) Two treatments with two different ex post information structures: personal versus public information

  • A. Gamba, T. Regner

07/05 12 / 36

slide-13
SLIDE 13

Preferences Elicitation

Two steps: (1) Elicitation via Social Value Orientation (Murphy et al., 2011) 15 menus of allocations of payoff for self and payoff for other we consider only a subset of sliders (4) which are relevant for joint payoff maximizing concerns. Example: we obtain θ by computing (and normalizing): arctan( πo−50

πs−50)

we split types at θ = 1

2

  • A. Gamba, T. Regner

07/05 13 / 36

slide-14
SLIDE 14

Preference Elicitation

(2) Check that types elicited via the SVO test are meaningful in the context of a trust game (played before the 40 rounds of the CG).

  • A. Gamba, T. Regner

07/05 14 / 36

slide-15
SLIDE 15

Preference Elicitation

67% of the agents choosing b2 are high types and 68% of the agents choosing a2 are high types.

  • A. Gamba, T. Regner

07/05 15 / 36

slide-16
SLIDE 16

The Centipede Game

  • A. Gamba, T. Regner

07/05 16 / 36

slide-17
SLIDE 17

Two ex post information structures

We manipulate access to information, i.e., information feedbacks about the opponent’s moves after each round of play. In the SEQUENTIAL RESPONSE METHOD: Personal information: agents observe the terminal node reached in their own past match (actions of the agent they just met). Public information: agents are informed about average conditional frequencies of opponent’s actions (averaged across all agents in the opponent population).

  • A. Gamba, T. Regner

07/05 17 / 36

slide-18
SLIDE 18

Two ex post information structures

In the STRATEGY METHOD: Personal information (as above): agents observe the terminal node reached in their own past match (actions of the agent they just met) Public information: players are informed about frequencies of strategies implemented by agents in the opponent population in the round just played.

  • A. Gamba, T. Regner

07/05 18 / 36

slide-19
SLIDE 19

Ex post information structures and learning

Personal information: agents in population i learn the conditional frequencies of opponent’s actions at opponent’s information sets personally visited with positive frequency under (si, σj); Public information in the sequential response method: agents i learn the conditional frequencies of opponent’s actions at opponent’s information sets visited with positive frequency by population i under (σi, σj); Public information in the strategy method: agents i learn the

  • bjective probability of strategies adopted by the opponent’s j.
  • A. Gamba, T. Regner

07/05 19 / 36

slide-20
SLIDE 20

Results

Aggregate behavior

There is no significant difference in direct response versus strategy method in the personal information treatment.

  • A. Gamba, T. Regner

07/05 20 / 36

slide-21
SLIDE 21

Results

Aggregate behavior

Effect of public information: Agents take at later nodes both in direct response and in strategy method. Stabilization of aggregate play:

in personal information treatment there is some unraveling, but stabilization towards the end (rounds 31-40); in public information treatment the average final node shifts to the right, quick stabilization, but move a bit to the left at the end. Interpretation: Information release moves final node to the right in general but more informative feedbacks destabilizes long run aggregate behavior − > revision of beliefs.

  • A. Gamba, T. Regner

07/05 21 / 36

slide-22
SLIDE 22

Aggregate behavior across information treatments in the direct response method

  • A. Gamba, T. Regner

07/05 22 / 36

slide-23
SLIDE 23

Aggregate behavior across information treatments in the strategy method

  • A. Gamba, T. Regner

07/05 23 / 36

slide-24
SLIDE 24

First round behavior across preference types in both methods

Reg conditional frequencies of actions in round 1 on theta

White Black Leave1 Leave2 Leave3 Leave1 Leave2 Leave3 theta 0.019 0.256 0.071 0.142 0.267 0.355

(0.51) (2.85)*** (0.42) (2.69)*** (2.09)** (1.76)*

meth. 0.031

  • 0.044

0.177

  • 0.000

0.006 0.007

(1.40) (0.81) (1.79)* (0.00) (0.08) (0.06)

cons 0.961 0.807 0.469 0.906 0.679 0.156

(44.10)*** (14.78)*** (4.53)*** (28.52)*** (8.64)*** (1.15)

R2 0.02 0.07 0.03 0.06 0.04 0.04 N 128 124 100 126 117 72

* p < 0.1; ** p < 0.05; *** p < 0.01

  • A. Gamba, T. Regner

07/05 24 / 36

slide-25
SLIDE 25

First round behavior across preference types in both methods and both information treatments

Average stopping node across preference types of Black in round 1

type mean sd min max N low 5.372 1.455 2 7 51 high 5.875 1.064 4 7 48 MannWhitney p-val. 0.0951

Average stopping node across preference types of White in round 1

type mean sd min max N low 5.327 1.667 1 7 55 high 6.133 1.241 1 7 60 MannWhitney p-val. 0.0064

  • A. Gamba, T. Regner

07/05 25 / 36

slide-26
SLIDE 26

Initial beliefs across preference types

Correlation between conditional subjective probabilities of opponent’s actions and preference type at t = 0 Average conditional subjective probabilities across preference types at t = 0

  • A. Gamba, T. Regner

07/05 26 / 36

slide-27
SLIDE 27

Long run (t > 35) behavior across preference types in the sequential response method

  • Avg. conditional frequencies of actions in personal info treatm.
  • Avg. conditional frequencies of actions in public info treatm.
  • A. Gamba, T. Regner

07/05 27 / 36

slide-28
SLIDE 28

Long run behavior across preference types in the strategy method

  • A. Gamba, T. Regner

07/05 28 / 36

slide-29
SLIDE 29

Long run behavior across preference types in the strategy method

  • A. Gamba, T. Regner

07/05 29 / 36

slide-30
SLIDE 30

Long run behavior across preference types in the strategy method

  • A. Gamba, T. Regner

07/05 30 / 36

slide-31
SLIDE 31

Long run behavior across preference types in the strategy method

  • A. Gamba, T. Regner

07/05 31 / 36

slide-32
SLIDE 32

Long run beliefs

There is no significant correlation between beliefs in round t = 40 and preference type. Long run beliefs move in the same direction of information: they are much more ”optimistic” in public information treatment than in personal information treatment. Beliefs of White selfish (theta = 0) on Leave by Black at the last node are .03 in personal info and .35 in public info → it becomes rational to Leave for a selfish White at his last decision node.

  • A. Gamba, T. Regner

07/05 32 / 36

slide-33
SLIDE 33

Long run beliefs

Reg long run beliefs on own observations and initial beliefs in personal info treatment

Conditional subj. prob. of opponent’s Leave at info set 3

  • subj. prob. at t = 0

0.143

(2.39)**

  • wn-observations up to t = 40

0.354

(3.35)***

theta 0.005

(0.08)

role is White

  • 0.026

(0.66)

Constant 0.056

(1.02)

R2 0.17 N 113

* p < 0.1; ** p < 0.05; *** p < 0.01

  • A. Gamba, T. Regner

07/05 33 / 36

slide-34
SLIDE 34

Long run beliefs

There is significant correlation between subjective probabilities and real frequencies only in the public information treatment. Initial beliefs have a significant impact on final beliefs only in the private information treatment, but not in the public information treatment. Both own observations and real frequencies have a significant impact

  • n final beliefs in the public information treatment.
  • A. Gamba, T. Regner

07/05 34 / 36

slide-35
SLIDE 35

A further insight on behavior

Leave at info set 3 by White at any t Personal Public theta 0.123 0.001

(0.77) (0.01)

  • wn obs

0.744 0.266

(3.83)*** (1.81)*

subj.prob 0.998 0.009

(4.71)*** (0.04)

real freq

  • 0.173

0.443

(0.91) (2.46)**

cons

  • 0.178

0.365

(1.55) (2.86)***

R2 0.18 0.04 N 228 275

* p < 0.1; ** p < 0.05; *** p < 0.01

  • A. Gamba, T. Regner

07/05 35 / 36

slide-36
SLIDE 36

Conclusion

We find evidence that behavior in the CG differ across preference types Heterogeneity decreases when we release public information Heterogeneity of behavior does not reflect into heterogeneity of beliefs All the results suggest that heterogeneity comes from social preferences and in particular off-path prediction errors impact behavior of more selfish types Further analysis to test whether social preferences change along the learning process

  • A. Gamba, T. Regner

07/05 36 / 36