Learned Impatience? Dispersed Reinforcement and Time Discounting - - PowerPoint PPT Presentation

learned impatience dispersed reinforcement and time
SMART_READER_LITE
LIVE PREVIEW

Learned Impatience? Dispersed Reinforcement and Time Discounting - - PowerPoint PPT Presentation

Learned Impatience? Dispersed Reinforcement and Time Discounting David Poensgen (Goethe University Frankfurt) February 22, 2019 Sloan-Nomis Workshop on the Cognitive Foundations of Economic Behavior Motivation 1. Individuals learn from


slide-1
SLIDE 1

Learned Impatience? Dispersed Reinforcement and Time Discounting

David Poensgen (Goethe University Frankfurt) February 22, 2019

Sloan-Nomis Workshop on the Cognitive Foundations of Economic Behavior

slide-2
SLIDE 2

Motivation

  • 1. Individuals learn from consequences of past actions.
  • 2. Actions often have a series of consequences: some follow soon, some later.
  • 3. How does this ordering affect learning?

Plausibly: Easiest to learn from soonest consequences.

  • 4. Then: Immediate consequences will be over-weighted.

Behavior biased towards impatience. May help explain why myopic behavior is so widespread and persistent.

1

slide-3
SLIDE 3

Motivation

  • 1. Individuals learn from consequences of past actions.
  • 2. Actions often have a series of consequences: some follow soon, some later.
  • 3. How does this ordering affect learning?

Plausibly: Easiest to learn from soonest consequences.

  • 4. Then: Immediate consequences will be over-weighted.

Behavior biased towards impatience. May help explain why myopic behavior is so widespread and persistent.

1

slide-4
SLIDE 4

2

slide-5
SLIDE 5

2

slide-6
SLIDE 6

2

slide-7
SLIDE 7

Motivation

  • 1. Individuals learn from consequences of past actions.
  • 2. Actions often have a series of consequences: some follow soon, some later.
  • 3. How does this ordering affect learning?

Plausibly: Easiest to learn from soonest consequences.

  • 4. Then: Immediate consequences will be over-weighted.

Behavior biased towards impatience. May help explain why myopic behavior is so widespread and persistent.

3

slide-8
SLIDE 8

Motivation

  • 1. Individuals learn from consequences of past actions.
  • 2. Actions often have a series of consequences: some follow soon, some later.
  • 3. How does this ordering affect learning?

Plausibly: Easiest to learn from soonest consequences.

  • 4. Then: Immediate consequences will be over-weighted.

Behavior biased towards impatience. May help explain why myopic behavior is so widespread and persistent.

3

slide-9
SLIDE 9

Motivation

  • 1. Individuals learn from consequences of past actions.
  • 2. Actions often have a series of consequences: some follow soon, some later.
  • 3. How does this ordering affect learning?

Plausibly: Easiest to learn from soonest consequences.

  • 4. Then: Immediate consequences will be over-weighted.

Behavior biased towards impatience. May help explain why myopic behavior is so widespread and persistent.

3

slide-10
SLIDE 10

4

slide-11
SLIDE 11

Motivation

  • 1. Individuals learn from consequences of past actions.
  • 2. Actions often have a series of consequences: some follow soon, some later.
  • 3. How does this ordering affect learning?

Plausibly: Easiest to learn from soonest consequences.

  • 4. Then: Immediate consequences will be over-weighted.

Behavior biased towards impatience. May help explain why myopic behavior is so widespread and persistent.

5

slide-12
SLIDE 12

Background

  • Decreasing effectiveness of reinforcement with delay (e.g. Mazur 2002).
  • Typically not connected to time discounting, but speed of learning.
  • Explained via accumulation of noise by Commons, Woodford et al. (1982, 1991).
  • Feedback delay modulates neural circuitries involved in learning

(Foerde/Shohamy 2011, Foerde et al. 2013, Arbel et al. 2017).

  • Associative learning tasks; singular feedback. Performance not affected.
  • Gabaix & Laibson (2017) also link time discounting and information frictions.
  • Formally applicable here; different interpretation on source of noise.
  • Melioration theory: Behavior guided by immediate, not overall reinforcement

rate (Herrnstein et al.).

  • Important experimental paradigm: “Harvard game” (Review: Prelec 2014).
  • Critique by Sims et al. (2013): Bayesian algorithms need 1000s of trials for solution.

Melioration as rational response to task complexity.

6

slide-13
SLIDE 13

Design: Overview

  • 6 abstract options (= colors):

{ , , , , , }

  • Subjects faced with sequence of 105 binary choices.
  • Payoff and feedback mechanism:
  • Each color x associated with a payoff vector (x1, x2)
  • Values initially unknown, but can be learned.
  • Choosing x has 2 consequences:

x1 + ϵ points shown and awarded immediately. x2 + ϵ′ points shown and awarded with one round delay.

  • ϵ, ϵ′ are disturbances drawn uniformly from {1, 2, 3, 4}.
  • Total value of x is x1 + x2
  • Goal: Collect as many points as possible.
  • All points rewarded simultaneously after the experiment.
  • All rules and mechanisms clearly communicated to subjects.

7

slide-14
SLIDE 14

Design: Overview

  • 6 abstract options (= colors):

{ , , , , , }

  • Subjects faced with sequence of 105 binary choices.
  • Payoff and feedback mechanism:
  • Each color x associated with a payoff vector (x1, x2)
  • Values initially unknown, but can be learned.
  • Choosing x has 2 consequences:

x1 + ϵ points shown and awarded immediately. x2 + ϵ′ points shown and awarded with one round delay.

  • ϵ, ϵ′ are disturbances drawn uniformly from {1, 2, 3, 4}.
  • Total value of x is x1 + x2
  • Goal: Collect as many points as possible.
  • All points rewarded simultaneously after the experiment.
  • All rules and mechanisms clearly communicated to subjects.

7

slide-15
SLIDE 15

Design: Overview

  • 6 abstract options (= colors):

{ , , , , , }

  • Subjects faced with sequence of 105 binary choices.
  • Payoff and feedback mechanism:
  • Each color x associated with a payoff vector (x1, x2)
  • Values initially unknown, but can be learned.
  • Choosing x has 2 consequences:

x1 + ϵ points shown and awarded immediately. x2 + ϵ′ points shown and awarded with one round delay.

  • ϵ, ϵ′ are disturbances drawn uniformly from {1, 2, 3, 4}.
  • Total value of x is x1 + x2
  • Goal: Collect as many points as possible.
  • All points rewarded simultaneously after the experiment.
  • All rules and mechanisms clearly communicated to subjects.

7

slide-16
SLIDE 16

Design: Example Screen

8

slide-17
SLIDE 17

Design: Example Screen

8

slide-18
SLIDE 18

Design: Example Screen

8

slide-19
SLIDE 19

Design: Example Screen

8

slide-20
SLIDE 20

Design: Example Screen

8

slide-21
SLIDE 21

Design: Example Screen

8

slide-22
SLIDE 22

Design: Example Screen

8

slide-23
SLIDE 23

Design: Example Screen

8

slide-24
SLIDE 24

Design: Example Screen

8

slide-25
SLIDE 25

Design: Example Screen

8

slide-26
SLIDE 26

Design: Example Screen

8

slide-27
SLIDE 27

Design: Example Screen

8

slide-28
SLIDE 28

Design: Example Screen

8

slide-29
SLIDE 29

Design: Example Screen

8

slide-30
SLIDE 30

Design: Example Screen

8

slide-31
SLIDE 31

Design: Example Screen

8

slide-32
SLIDE 32

Design: Example Screen

8

slide-33
SLIDE 33

Design: Payoff Vectors

Option Payoff Vectors

color e.g. (total value) (immediate, delayed)

Group A Group B (18) (11, 7)A (7, 11)B (16) (6, 10)A (10, 6)B (14) (9, 5)A (5, 9)B (12) (4, 8)A (8, 4)B (10) (7, 3)A (3, 7)B (8) (2, 6)A (6, 2)B Hypotheses: (11, 7)A chosen more often than (7, 11)B; (10, 6)B more than (6, 10)A; ... (11, 7)A and (6, 10)A further apart than (6, 10)A and (9, 5)A. Potentially even: (9, 5)A preferred to (6, 10)A.

9

slide-34
SLIDE 34

Design: Payoff Vectors

Option Payoff Vectors

color e.g. (total value) (immediate, delayed)

Group A Group B (18) (11, 7)A (7, 11)B (16) (6, 10)A (10, 6)B (14) (9, 5)A (5, 9)B (12) (4, 8)A (8, 4)B (10) (7, 3)A (3, 7)B (8) (2, 6)A (6, 2)B Hypotheses: (11, 7)A chosen more often than (7, 11)B; (10, 6)B more than (6, 10)A; ... (11, 7)A and (6, 10)A further apart than (6, 10)A and (9, 5)A. Potentially even: (9, 5)A preferred to (6, 10)A.

9

slide-35
SLIDE 35

Results: Choice Frequencies

10

slide-36
SLIDE 36

Results: Choice Frequencies

10

slide-37
SLIDE 37

Results: Choice Frequencies

10

slide-38
SLIDE 38

Results: Choice Frequencies

10

slide-39
SLIDE 39

Results: Choice Frequencies

10

slide-40
SLIDE 40

Results: Bias over time

11

slide-41
SLIDE 41

Summary: Further Results

  • Estimated latent utility function: u(x) = x1 + 0.4x2
  • Elicited beliefs are in accordance with choice behavior.
  • Considerable heterogeneity in degree of biasedness.
  • Correlated to impatience in hypothetical intertemporal choice.
  • (To do: Incentivized choice or field measures of impatience.)
  • Treatment: Learning by observation
  • Subjects passively presented with feedback for 63 rounds.
  • Directly afterwards: 42 own decisions.
  • Bias attenuated; low right after the learning phase, then gradually increasing.
  • Suggests emergence of bias is connected to active decision making.

12

slide-42
SLIDE 42

Summary: Further Results

  • Estimated latent utility function: u(x) = x1 + 0.4x2
  • Elicited beliefs are in accordance with choice behavior.
  • Considerable heterogeneity in degree of biasedness.
  • Correlated to impatience in hypothetical intertemporal choice.
  • (To do: Incentivized choice or field measures of impatience.)
  • Treatment: Learning by observation
  • Subjects passively presented with feedback for 63 rounds.
  • Directly afterwards: 42 own decisions.
  • Bias attenuated; low right after the learning phase, then gradually increasing.
  • Suggests emergence of bias is connected to active decision making.

12

slide-43
SLIDE 43

Summary: Further Results

  • Estimated latent utility function: u(x) = x1 + 0.4x2
  • Elicited beliefs are in accordance with choice behavior.
  • Considerable heterogeneity in degree of biasedness.
  • Correlated to impatience in hypothetical intertemporal choice.
  • (To do: Incentivized choice or field measures of impatience.)
  • Treatment: Learning by observation
  • Subjects passively presented with feedback for 63 rounds.
  • Directly afterwards: 42 own decisions.
  • Bias attenuated; low right after the learning phase, then gradually increasing.
  • Suggests emergence of bias is connected to active decision making.

12

slide-44
SLIDE 44

Summary: Further Results

  • Estimated latent utility function: u(x) = x1 + 0.4x2
  • Elicited beliefs are in accordance with choice behavior.
  • Considerable heterogeneity in degree of biasedness.
  • Correlated to impatience in hypothetical intertemporal choice.
  • (To do: Incentivized choice or field measures of impatience.)
  • Treatment: Learning by observation
  • Subjects passively presented with feedback for 63 rounds.
  • Directly afterwards: 42 own decisions.
  • Bias attenuated; low right after the learning phase, then gradually increasing.
  • Suggests emergence of bias is connected to active decision making.

12

slide-45
SLIDE 45

Outlook

  • Relation to actual reward discounting – ideally with field measure
  • Relation to working memory
  • known to affect reward discounting (Wesley/Bickel 2014)
  • Potential explanation: Differential precision in memory
  • Investigate this using...
  • response time data
  • more fine-grained belief data
  • variations in timing, payoff vectors

13