Dynamic Delegation of Experimentation Yingni Guo Northwestern - - PowerPoint PPT Presentation

dynamic delegation of experimentation
SMART_READER_LITE
LIVE PREVIEW

Dynamic Delegation of Experimentation Yingni Guo Northwestern - - PowerPoint PPT Presentation

Dynamic Delegation of Experimentation Yingni Guo Northwestern University ngni Guo (NU) Delegation of Experimentation 1 / 55 Introduction Delegating Experimentation I study how to manage innovation in a hierarchical organization. A principal


slide-1
SLIDE 1

Dynamic Delegation of Experimentation

Yingni Guo

Northwestern University

ngni Guo (NU) Delegation of Experimentation 1 / 55

slide-2
SLIDE 2

Introduction

Delegating Experimentation

I study how to manage innovation in a hierarchical organization. A principal delegates experimentation to an agent.

Yingni Guo (NU) Delegation of Experimentation 2 / 55

slide-3
SLIDE 3

Introduction

Delegating Experimentation

I study how to manage innovation in a hierarchical organization. A principal delegates experimentation to an agent. Private information: the agent knows more precisely the prospect of the experimentation. Misaligned preferences: the agent prefers more experimentation.

Yingni Guo (NU) Delegation of Experimentation 2 / 55

slide-4
SLIDE 4

Introduction

Delegating Experimentation

I study how to manage innovation in a hierarchical organization. A principal delegates experimentation to an agent. Private information: the agent knows more precisely the prospect of the experimentation. Misaligned preferences: the agent prefers more experimentation. Tradeoff: using the agent’s information and constraining his bias.

Yingni Guo (NU) Delegation of Experimentation 2 / 55

slide-5
SLIDE 5

Introduction

What is the Optimal Delegation Contract?

The principal has full commitment. The principal cannot use transfers. The principal can only impose limitations on the agent’s behavior.

Yingni Guo (NU) Delegation of Experimentation 3 / 55

slide-6
SLIDE 6

Introduction

What is the Optimal Delegation Contract?

The principal has full commitment. The principal cannot use transfers. The principal can only impose limitations on the agent’s behavior. As new information arrives over time, how should the principal adjust the flexibility that the agent has? Is the optimal delegation contract time-consistent?

Yingni Guo (NU) Delegation of Experimentation 3 / 55

slide-7
SLIDE 7

Introduction

Short Answer: Main Result

The optimal contract is a cutoff rule in the belief space and can be implemented as a sliding deadline:

The principal initially sets a deadline for experimentation; Whenever encouraging information arrives, the deadline is extended; The agent has full flexibility before the deadline but none after.

Yingni Guo (NU) Delegation of Experimentation 4 / 55

slide-8
SLIDE 8

Introduction

Short Answer: Main Result

The optimal contract is a cutoff rule in the belief space and can be implemented as a sliding deadline:

The principal initially sets a deadline for experimentation; Whenever encouraging information arrives, the deadline is extended; The agent has full flexibility before the deadline but none after.

The cutoff rule is time-consistent.

Yingni Guo (NU) Delegation of Experimentation 4 / 55

slide-9
SLIDE 9

Introduction

Short Answer: Main Result

The optimal contract is a cutoff rule in the belief space and can be implemented as a sliding deadline:

The principal initially sets a deadline for experimentation; Whenever encouraging information arrives, the deadline is extended; The agent has full flexibility before the deadline but none after.

The cutoff rule is time-consistent. The most promising products are under-experimented whereas less promising

  • nes are over-experimented.

Yingni Guo (NU) Delegation of Experimentation 4 / 55

slide-10
SLIDE 10

Introduction

Examples: Within Firms and Others

In-house innovation. Market learning. Public good provision. Research grants and funding.

Yingni Guo (NU) Delegation of Experimentation 5 / 55

slide-11
SLIDE 11

Outline

1

Model

2

Single-player benchmark

3

Characterizing the policy space

4

Main results

5

More general results

Yingni Guo (NU) Delegation of Experimentation 7 / 55

slide-12
SLIDE 12
  • I. Model

Experimentation

Players, Tasks and States

Time t ∈ [0, ∞) is continuous. Two risk-neutral players i ∈ {α, ρ}: Agent (he) and Principal (she). One unit of a divisible resource per unit of time. Agent continually splits the resource between two tasks:

S: known (deterministic flow) payoff; R: unknown state of world ω ∈ {0, 1}.

Yingni Guo (NU) Delegation of Experimentation 8 / 55

slide-13
SLIDE 13
  • I. Model

Experimentation

Players, Tasks and States

Time t ∈ [0, ∞) is continuous. Two risk-neutral players i ∈ {α, ρ}: Agent (he) and Principal (she). One unit of a divisible resource per unit of time. Agent continually splits the resource between two tasks:

S: known (deterministic flow) payoff; R: unknown state of world ω ∈ {0, 1}.

For the talk: focus on Poisson bandits (conclusive news). All results generalize to L´ evy bandits.

Yingni Guo (NU) Delegation of Experimentation 8 / 55

slide-14
SLIDE 14
  • I. Model

Experimentation

Tasks and Payoffs

Over [t, t + dt): πt 1 − πt R S

Unit Resource

S yields to player i (1 − πt)sidt. R yields a success with probability πtλωdt. Each success is worth hi to player i.

Yingni Guo (NU) Delegation of Experimentation 9 / 55

slide-15
SLIDE 15
  • I. Model

Experimentation

Tasks and Payoffs (cont.)

Conditional on ω, the expected payoff increment to player i is (1 − πt)sidt + πtλhiωdt =

  • si

λhiω

  • ·
  • (1 − πt)dt

πtdt

  • .

Yingni Guo (NU) Delegation of Experimentation 10 / 55

slide-16
SLIDE 16
  • I. Model

Experimentation

Tasks and Payoffs (cont.)

Conditional on ω, the expected payoff increment to player i is (1 − πt)sidt + πtλhiωdt =

  • si

λhiω

  • ·
  • (1 − πt)dt

πtdt

  • .

Yingni Guo (NU) Delegation of Experimentation 10 / 55

slide-17
SLIDE 17
  • I. Model

Experimentation

Tasks and Payoffs (cont.)

Conditional on ω, the expected payoff increment to player i is (1 − πt)sidt + πtλhiωdt =

  • si

λhiω

  • ·
  • (1 − πt)dt

πtdt

  • .

For i ∈ {α, ρ}, λhi > si > 0.

Yingni Guo (NU) Delegation of Experimentation 10 / 55

slide-18
SLIDE 18
  • I. Model

Experimentation

Tasks and Payoffs (cont.)

Conditional on ω, the expected payoff increment to player i is (1 − πt)sidt + πtλhiωdt =

  • si

λhiω

  • ·
  • (1 − πt)dt

πtdt

  • .

For i ∈ {α, ρ}, λhi > si > 0. Preferred allocation coincides if the state is known.

Yingni Guo (NU) Delegation of Experimentation 10 / 55

slide-19
SLIDE 19
  • I. Model

Experimentation

Conflict of Interests

Let ηi be the (net) benefit-cost ratio from the experimentation ηi = λhi − si si .

Yingni Guo (NU) Delegation of Experimentation 11 / 55

slide-20
SLIDE 20
  • I. Model

Experimentation

Conflict of Interests

Let ηi be the (net) benefit-cost ratio from the experimentation ηi = λhi − si si . Parameters are such that ηα > ηρ.

Yingni Guo (NU) Delegation of Experimentation 11 / 55

slide-21
SLIDE 21
  • I. Model

Experimentation

Conflict of Interests

Let ηi be the (net) benefit-cost ratio from the experimentation ηi = λhi − si si . Parameters are such that ηα > ηρ. Interpretations:

High cost of Principal’s resources; Principal’s moderate benefit from one out of her many responsibilities; Agent’s career advancement as an extra benefit.

Yingni Guo (NU) Delegation of Experimentation 11 / 55

slide-22
SLIDE 22
  • I. Model

Experimentation

Private Information

Players do not observe the state. Agent has private information: his type is his prior belief that the state is 1. Agent’s type is denoted θ, drawn from Θ ≡

  • θ, θ
  • ⊂ (0, 1).

F is the cdf, f the pdf.

Yingni Guo (NU) Delegation of Experimentation 12 / 55

slide-23
SLIDE 23
  • I. Model

Experimentation

Private Information

Players do not observe the state. Agent has private information: his type is his prior belief that the state is 1. Agent’s type is denoted θ, drawn from Θ ≡

  • θ, θ
  • ⊂ (0, 1).

F is the cdf, f the pdf. Actions and successes are publicly observed.

Yingni Guo (NU) Delegation of Experimentation 12 / 55

slide-24
SLIDE 24
  • I. Model

Experimentation

Policy and Payoffs

A resource allocation policy is a non-anticipative stochastic process π = {πt}t≥0.

πt ∈ [0, 1]: the fraction allocated to R at time t, which may depend only on the history of events up to t.

The space of all (mixed) policies is Π.

Formal definition Yingni Guo (NU) Delegation of Experimentation 13 / 55

slide-25
SLIDE 25
  • I. Model

Experimentation

Examples of Policies

Allocate all resource to R until a fixed time and switch to S if no success

  • ccurs by then;

Allocate all resource to R until 1st success and then allocate a fixed fraction to R; Allocate all resource to R until 2nd success and then switch to S; ...

Yingni Guo (NU) Delegation of Experimentation 14 / 55

slide-26
SLIDE 26
  • I. Model

Experimentation

Policy and Payoffs (cont.)

Players discount payoffs at rate r > 0. Nt: the number of successes observed up to time t. Player i’s payoff given policy π ∈ Π and prior p0 ∈ [0, 1] is Ui(π, p0) ≡ E ∞ re−rt [(1 − πt) sidt + hidNt]

  • π, p0
  • .

Yingni Guo (NU) Delegation of Experimentation 15 / 55

slide-27
SLIDE 27
  • I. Model

Experimentation

Policy and Payoffs (cont.)

Players discount payoffs at rate r > 0. Nt: the number of successes observed up to time t. Player i’s payoff given policy π ∈ Π and prior p0 ∈ [0, 1] is Ui(π, p0) ≡ E ∞ re−rt [(1 − πt) sidt + hidNt]

  • π, p0
  • .

By the Law of Iterated Expectations, Ui(π, p0) can be rewritten as Ui(π, p0) = E ∞ re−rt [(1 − πt)si + πtλhiω] dt

  • π, p0
  • .

Yingni Guo (NU) Delegation of Experimentation 15 / 55

slide-28
SLIDE 28
  • I. Model

Delegation

Delegation

Principal has full commitment and cannot use transfers. She determines a delegation contract at time 0. By the Revelation Principle, Principal offers a direct mechanism π : Θ → Π sup

  • Θ

Uρ(π(θ), θ)dF(θ), subject to Uα(π(θ), θ) ≥ Uα(π(θ′), θ) ∀θ, θ′ ∈ Θ.

Yingni Guo (NU) Delegation of Experimentation 16 / 55

slide-29
SLIDE 29

Outline

1

Model

2

Single-player benchmark

3

Characterizing the policy space

4

Main results

5

More general results

Yingni Guo (NU) Delegation of Experimentation 17 / 55

slide-30
SLIDE 30
  • II. Single-Player Benchmark

Posterior Beliefs

Given prior p0 and the history of events up to time t pt = Pt[ω = 1]. Before the first success, pt satisfies a differential equation ˙ pt = −λπtpt(1 − pt). At the first success, pt jumps to one.

Yingni Guo (NU) Delegation of Experimentation 18 / 55

slide-31
SLIDE 31
  • II. Single-Player Benchmark

Single Player’s Preferred Policy

Player i’s preferred policy is Markov wrt pt, characterized by a cutoff p∗

i s.t.

πt =

  • 1

if pt > p∗

i ,

if pt ≤ p∗

i .

The cutoff belief is p∗

i =

r r + (λ + r)ηi . Agent’s cutoff is lower than Principal’s p∗

α < p∗ ρ.

Yingni Guo (NU) Delegation of Experimentation 19 / 55

slide-32
SLIDE 32
  • II. Single-Player Benchmark

Agency Problem Revisited

τi(θ): Player i’s preferred stopping time given θ.

state prob. p time θ p∗

ρ

τρ(θ) 1

Yingni Guo (NU) Delegation of Experimentation 20 / 55

slide-33
SLIDE 33
  • II. Single-Player Benchmark

Agency Problem Revisited

For a given prior, Agent prefers to experiment longer than Principal. (θ)

state prob. p time θ p∗

ρ

p∗

α

τρ(θ) τα(θ) 1

Yingni Guo (NU) Delegation of Experimentation 20 / 55

slide-34
SLIDE 34
  • II. Single-Player Benchmark

Agency Problem Revisited

Higher priors warrant longer experimentation. (θ)

state prob. p time θ′ θ p∗

ρ

τρ(θ′) τρ(θ) 1

Yingni Guo (NU) Delegation of Experimentation 20 / 55

slide-35
SLIDE 35
  • II. Single-Player Benchmark

Agency Problem Revisited

Lower types (those with lower θ) have incentives to mimic higher types.

state prob. p time θ′ θ p∗

ρ

p∗

α

τρ(θ′) τρ(θ) τα(θ′) τα(θ) 1

Yingni Guo (NU) Delegation of Experimentation 20 / 55

slide-36
SLIDE 36

Outline

1

Model

2

Single-player benchmark

3

Characterizing the policy space

4

Main results

5

More general results

Yingni Guo (NU) Delegation of Experimentation 21 / 55

slide-37
SLIDE 37
  • III. Characterizing the Policy Space

A Policy as a Pair of Numbers

(Total Expected Discounted) Resource Pair

For a fixed policy π, define w1(π) and w0(π) as follows: w1(π) ≡ E ∞ re−rtπtdt

  • π, 1
  • ∈ [0, 1]

w0(π) ≡ E ∞ re−rtπtdt

  • π, 0
  • ∈ [0, 1].

w1(π): (total expected discounted) resource allocated to R under π in state 1. w0(π): (total expected discounted) resource allocated to R under π in state 0.

Yingni Guo (NU) Delegation of Experimentation 22 / 55

slide-38
SLIDE 38
  • III. Characterizing the Policy Space

A Policy as a Pair of Numbers

Summary Statistic for the Payoffs

Lemma 1 (A Policy as a Pair of Numbers) For a given policy π ∈ Π and prior p0 ∈ [0, 1], player i’s payoff can be written as Ui(π, p0) − si =

  • p0

1 − p0

  • ·
  • (λhi − si) w1(π)

(0 − si) w0(π)

  • .

Proof Yingni Guo (NU) Delegation of Experimentation 23 / 55

slide-39
SLIDE 39
  • III. Characterizing the Policy Space

A Policy as a Pair of Numbers

Summary Statistic for the Payoffs

Lemma 1 (A Policy as a Pair of Numbers) For a given policy π ∈ Π and prior p0 ∈ [0, 1], player i’s payoff can be written as Ui(π, p0) − si =

  • p0

1 − p0

  • ·
  • (λhi − si) w1(π)

(0 − si) w0(π)

  • .

Proof

(w1(π), w0(π)) is a summary statistic of π for the payoffs.

Yingni Guo (NU) Delegation of Experimentation 23 / 55

slide-40
SLIDE 40
  • III. Characterizing the Policy Space

Feasible Set

Feasible Set

Feasible set Γ: the set of feasible resource pairs Γ =

  • (w1, w0) | (w1, w0) = (w1(π), w0(π)), π ∈ Π
  • .

Yingni Guo (NU) Delegation of Experimentation 24 / 55

slide-41
SLIDE 41
  • III. Characterizing the Policy Space

Feasible Set

Characterizing the Feasible Set

ˆ w ∈ bd(Γ) ⇐ ⇒ ∃p ∈ R2, p = 1, ˆ w ∈ argmax

w∈Γ p · w.

Yingni Guo (NU) Delegation of Experimentation 25 / 55

slide-42
SLIDE 42
  • III. Characterizing the Policy Space

Feasible Set

Characterizing the Feasible Set

ˆ w ∈ bd(Γ) ⇐ ⇒ ∃p ∈ R2, p = 1, ˆ w ∈ argmax

w∈Γ p · w.

Yingni Guo (NU) Delegation of Experimentation 25 / 55

slide-43
SLIDE 43
  • III. Characterizing the Policy Space

Feasible Set

Characterizing the Feasible Set

ˆ w ∈ bd(Γ) ⇐ ⇒ ∃p ∈ R2, p = 1, ˆ w ∈ argmax

w∈Γ p · w.

Yingni Guo (NU) Delegation of Experimentation 25 / 55

slide-44
SLIDE 44
  • III. Characterizing the Policy Space

Feasible Set

Characterizing the Feasible Set

ˆ w ∈ bd(Γ) ⇐ ⇒ ∃p ∈ R2, p = 1, ˆ w ∈ argmax

w∈Γ p · w.

Yingni Guo (NU) Delegation of Experimentation 25 / 55

slide-45
SLIDE 45
  • III. Characterizing the Policy Space

Feasible Set

Characterizing the Feasible Set

ˆ w ∈ bd(Γ) ⇐ ⇒ ∃p ∈ R2, p = 1, ˆ w ∈ argmax

w∈Γ p · w.

Yingni Guo (NU) Delegation of Experimentation 25 / 55

slide-46
SLIDE 46
  • III. Characterizing the Policy Space

Feasible Set

Characterizing the Feasible Set

ˆ w ∈ bd(Γ) ⇐ ⇒ ∃p ∈ R2, p = 1, ˆ w ∈ argmax

w∈Γ p · w.

Lemma 2 (Feasible Set) Γ = co

  • (w1(π), w0(π)), π ∈ ΠM

, where ΠM are Markov policies (wrt p).

Proof Yingni Guo (NU) Delegation of Experimentation 25 / 55

slide-47
SLIDE 47
  • III. Characterizing the Policy Space

Feasible Set

Canonical Markov Policies: Poisson Conclusive News

Stopping-time policies (lower-cutoff Markov policies)

allocate all resource to R until a fixed time; if at least one success occurs by then, allocate all resource to R forever;

  • therwise, switch to S.

Slack-after-success policies (upper-cutoff Markov policies)

allocate all resource to R until the first success; then allocate a fixed fraction to R.

Yingni Guo (NU) Delegation of Experimentation 26 / 55

slide-48
SLIDE 48
  • III. Characterizing the Policy Space

Feasible Set

Canonical Markov Policies: Poisson Conclusive News

w0(π) w1(π) 1 1 slack-after-success policies s t

  • p

p i n g

  • t

i m e p

  • l

i c i e s

Yingni Guo (NU) Delegation of Experimentation 27 / 55

slide-49
SLIDE 49
  • III. Characterizing the Policy Space

Feasible Set

Canonical Markov Policies: Poisson Conclusive News

w0(π) w1(π) 1 1 slack-after-success policies s t

  • p

p i n g

  • t

i m e p

  • l

i c i e s

A: allocate all resource to S

b

A

Yingni Guo (NU) Delegation of Experimentation 27 / 55

slide-50
SLIDE 50
  • III. Characterizing the Policy Space

Feasible Set

Canonical Markov Policies: Poisson Conclusive News

w0(π) w1(π) 1 1 slack-after-success policies s t

  • p

p i n g

  • t

i m e p

  • l

i c i e s

B: switch to S at some fixed time if no success occurs

b

B

Yingni Guo (NU) Delegation of Experimentation 27 / 55

slide-51
SLIDE 51
  • III. Characterizing the Policy Space

Feasible Set

Canonical Markov Policies: Poisson Conclusive News

w0(π) w1(π) 1 1 slack-after-success policies s t

  • p

p i n g

  • t

i m e p

  • l

i c i e s

C: allocate all resource to R

b C

Yingni Guo (NU) Delegation of Experimentation 27 / 55

slide-52
SLIDE 52
  • III. Characterizing the Policy Space

Feasible Set

Canonical Markov Policies: Poisson Conclusive News

w0(π) w1(π) 1 1 slack-after-success policies s t

  • p

p i n g

  • t

i m e p

  • l

i c i e s

D: allocate all resource to R until 1st success; then allocate some fixed fraction to R

b

D

Yingni Guo (NU) Delegation of Experimentation 27 / 55

slide-53
SLIDE 53
  • III. Characterizing the Policy Space

Feasible Set

Canonical Markov Policies: Poisson Conclusive News

w0(π) w1(π) 1 1 slack-after-success policies s t

  • p

p i n g

  • t

i m e p

  • l

i c i e s

E: allocate all resource to R until 1st success; then switch to S

b

E

Yingni Guo (NU) Delegation of Experimentation 27 / 55

slide-54
SLIDE 54
  • III. Characterizing the Policy Space

Feasible Set

Feasible Set: Poisson Conclusive News

w0(π) w1(π) 1 1 feasible set: Γ slack-after-success policies s t

  • p

p i n g

  • t

i m e p

  • l

i c i e s

Yingni Guo (NU) Delegation of Experimentation 28 / 55

slide-55
SLIDE 55
  • III. Characterizing the Policy Space

Feasible Set

Feasible Set: Poisson Conclusive News

Lemma 3 (Feasible Set: Poisson Conclusive News) The feasible set is the convex hull of the image of stopping-time and slack-after-success policies.

Yingni Guo (NU) Delegation of Experimentation 29 / 55

slide-56
SLIDE 56
  • III. Characterizing the Policy Space

Preferences over Feasible Pairs

Preferences over Feasible Pairs

Player i’s payoff given π and θ is Ui(π, θ) − si =

  • θηiw1(π) − (1 − θ)w0(π)
  • · si.

Yingni Guo (NU) Delegation of Experimentation 30 / 55

slide-57
SLIDE 57
  • III. Characterizing the Policy Space

Preferences over Feasible Pairs

Preferences over Feasible Pairs

Player i’s payoff given π and θ is Ui(π, θ) − si =

  • θηiw1(π) − (1 − θ)w0(π)
  • · si.

Yingni Guo (NU) Delegation of Experimentation 30 / 55

slide-58
SLIDE 58
  • III. Characterizing the Policy Space

Preferences over Feasible Pairs

Preferences over Feasible Pairs

Player i’s payoff given π and θ is Ui(π, θ) − si =

  • θηiw1(π) − (1 − θ)w0(π)
  • · si.

Player i’s preferences over (w1, w0) ∈ Γ are determined by

θ: the prior belief that the state is 1; ηi: the benefit-cost ratio from the experimentation.

Yingni Guo (NU) Delegation of Experimentation 30 / 55

slide-59
SLIDE 59
  • III. Characterizing the Policy Space

Preferences over Feasible Pairs

Preferences over Feasible Pairs: Indifference Curves

w0(π) w1(π) 1 1

b

P b P slope=

θ 1−θ ηρ

Principal’s indifference curve given θ

Yingni Guo (NU) Delegation of Experimentation 31 / 55

slide-60
SLIDE 60
  • III. Characterizing the Policy Space

Preferences over Feasible Pairs

Preferences over Feasible Pairs: Indifference Curves

w0(π) w1(π) 1 1

b

P b P slope=

θ 1−θ ηρ

Principal’s indifference curve given θ

Yingni Guo (NU) Delegation of Experimentation 31 / 55

slide-61
SLIDE 61
  • III. Characterizing the Policy Space

Preferences over Feasible Pairs

Preferences over Feasible Pairs: Indifference Curves

w0(π) w1(π) 1 1

b

P

b

A

b

P

b

A slope=

θ 1−θ ηρ

Principal’s indifference curve given θ slope=

θ 1−θ ηα

Agent’s indifference curve given θ

Yingni Guo (NU) Delegation of Experimentation 31 / 55

slide-62
SLIDE 62
  • III. Characterizing the Policy Space

Preferences over Feasible Pairs

Preferences over Feasible Pairs: Indifference Curves

w0(π) w1(π) 1 1

b

P

b

A

b

P

b

A slope=

θ 1−θ ηρ

Principal’s indifference curve given θ slope=

θ 1−θ ηα

Agent’s indifference curve given θ A: (w1

α(θ), w0 α(θ))

P: (w1

ρ(θ), w0 ρ(θ))

Yingni Guo (NU) Delegation of Experimentation 31 / 55

slide-63
SLIDE 63
  • III. Characterizing the Policy Space

Delegation Problem Reformulated

Delegation Problem Reformulated

Replace policy space Π with feasible set Γ: (w1, w0) ∈ Γ = ⇒ θηiw1 − (1 − θ)w0.

Yingni Guo (NU) Delegation of Experimentation 32 / 55

slide-64
SLIDE 64
  • III. Characterizing the Policy Space

Delegation Problem Reformulated

Delegation Problem Reformulated

Replace policy space Π with feasible set Γ: (w1, w0) ∈ Γ = ⇒ θηiw1 − (1 − θ)w0. Principal offers a direct mechanism (w1, w0) : Θ → Γ max

  • Θ
  • θηρw1(θ) − (1 − θ)w0(θ)
  • dF(θ),

subject to θηα 1 − θw1(θ) − w0(θ) ≥ θηα 1 − θw1(θ′) − w0(θ′) ∀θ, θ′ ∈ Θ.

Yingni Guo (NU) Delegation of Experimentation 32 / 55

slide-65
SLIDE 65
  • III. Characterizing the Policy Space

Delegation Problem Reformulated

Delegation Problem Reformulated

Replace policy space Π with feasible set Γ: (w1, w0) ∈ Γ = ⇒ θηiw1 − (1 − θ)w0. Principal offers a direct mechanism (w1, w0) : Θ → Γ max

  • Θ
  • θηρw1(θ) − (1 − θ)w0(θ)
  • dF(θ),

subject to θηα 1 − θw1(θ) − w0(θ) ≥ θηα 1 − θw1(θ′) − w0(θ′) ∀θ, θ′ ∈ Θ. Payoff parameters ηα > ηρ; feasible set Γ ; type distribution F.

Yingni Guo (NU) Delegation of Experimentation 32 / 55

slide-66
SLIDE 66
  • III. Characterizing the Policy Space

Delegation Problem Reformulated

Delegation Problem Reformulated

Replace policy space Π with feasible set Γ: (w1, w0) ∈ Γ = ⇒ θηiw1 − (1 − θ)w0. Principal offers a direct mechanism (w1, w0) : Θ → Γ max

  • Θ
  • θηρw1(θ) − (1 − θ)w0(θ)
  • dF(θ),

subject to θηα 1 − θw1(θ) − w0(θ) ≥ θηα 1 − θw1(θ′) − w0(θ′) ∀θ, θ′ ∈ Θ. Payoff parameters ηα > ηρ; feasible set Γ ; type distribution F.

Yingni Guo (NU) Delegation of Experimentation 33 / 55

slide-67
SLIDE 67

Outline

1

Model

2

Single-player benchmark

3

Characterizing the policy space

4

Main results

5

More general results

Yingni Guo (NU) Delegation of Experimentation 34 / 55

slide-68
SLIDE 68
  • IV. Main Results

The Cutoff Rule

The Cutoff Rule

Definition 1 The cutoff rule is the contract (w1, w0) s.t. (w1(θ), w0(θ)) =

  • (w1

α(θ), w0 α(θ))

if θ ≤ θ∗, (w1

α(θ∗), w0 α(θ∗))

if θ > θ∗.

Yingni Guo (NU) Delegation of Experimentation 35 / 55

slide-69
SLIDE 69
  • IV. Main Results

The Cutoff Rule

Delegation Set under Cutoff Rule

w0(π) w1(π) 1 1 feasible set: Γ

b

θηρ

b

θηρ Principal’s preferred policies

Yingni Guo (NU) Delegation of Experimentation 36 / 55

slide-70
SLIDE 70
  • IV. Main Results

The Cutoff Rule

Delegation Set under Cutoff Rule

w0(π) w1(π) 1 1 feasible set: Γ

b

θηα

b θηα b

θηρ

b

θηρ Agent’s preferred policies Principal’s preferred policies

Yingni Guo (NU) Delegation of Experimentation 36 / 55

slide-71
SLIDE 71
  • IV. Main Results

The Cutoff Rule

Delegation Set under Cutoff Rule

w0(π) w1(π) 1 1 feasible set: Γ

b

θηα

b θηα b

θηρ

b

θηρ

b

θ∗ηα Agent’s preferred policies Principal’s preferred policies

Yingni Guo (NU) Delegation of Experimentation 36 / 55

slide-72
SLIDE 72
  • IV. Main Results

The Cutoff Rule

Delegation Set under Cutoff Rule

w0(π) w1(π) 1 1 feasible set: Γ

b

θηα

b θηα b

θηρ

b

θηρ

b

θ∗ηα Agent’s preferred policies Principal’s preferred policies

Yingni Guo (NU) Delegation of Experimentation 36 / 55

slide-73
SLIDE 73
  • IV. Main Results

The Cutoff Rule

Optimality

Main assumption For all θ ≤ θ∗, the following condition is satisfied: ηα ηα − ηρ ≥ (3θ − 1) − f ′(θ) f(θ) θ(1 − θ). Proposition 1 The cutoff rule is optimal if the main assumption holds.

Yingni Guo (NU) Delegation of Experimentation 37 / 55

slide-74
SLIDE 74
  • IV. Main Results

Implementation

Implementing the Cutoff Rule

Calibrated belief (pt) prior belief p0 = θ∗; without any success, it drifts down according to ˙ pt = −λπtpt(1 − pt); upon the first success, it jumps to one. Behavior a cutoff imposed at p∗

α;

Agent has full flexibility if the belief stays above the cutoff; Agent is required to stop once the cutoff is reached.

Yingni Guo (NU) Delegation of Experimentation 38 / 55

slide-75
SLIDE 75
  • IV. Main Results

Implementation

Implementing the Cutoff Rule (cont.)

calibrated belief pt time t p0 = θ∗

b

cutoff: p∗

α

1

Yingni Guo (NU) Delegation of Experimentation 39 / 55

slide-76
SLIDE 76
  • IV. Main Results

Implementation

Implementing the Cutoff Rule (cont.)

calibrated belief pt time t p0 = θ∗

b

cutoff: p∗

α

1

Yingni Guo (NU) Delegation of Experimentation 39 / 55

slide-77
SLIDE 77
  • IV. Main Results

Implementation

Implementing the Cutoff Rule (cont.)

1st success calibrated belief pt time t p0 = θ∗

b

cutoff: p∗

α

1

Yingni Guo (NU) Delegation of Experimentation 39 / 55

slide-78
SLIDE 78
  • IV. Main Results

Implementation

Implementing the Cutoff Rule (cont.)

Type θ stops. calibrated belief pt time t p0 = θ∗

b

cutoff: p∗

α

1

Yingni Guo (NU) Delegation of Experimentation 39 / 55

slide-79
SLIDE 79
  • IV. Main Results

Implementation

Implementing the Cutoff Rule (cont.)

Type θ stops. calibrated belief pt time t p0 = θ∗

b

cutoff: p∗

α

1

Yingni Guo (NU) Delegation of Experimentation 39 / 55

slide-80
SLIDE 80
  • IV. Main Results

Implementation

Implementing the Cutoff Rule (cont.)

Type θ stops. Types with θ ≥ θ∗ stop.

b

calibrated belief pt time t p0 = θ∗

b

cutoff: p∗

α

1

Yingni Guo (NU) Delegation of Experimentation 39 / 55

slide-81
SLIDE 81
  • IV. Main Results

Time Consistency

Time Consistency

Definition 2 Fix a (direct or indirect) mechanism. It is time-consistent if Principal finds it

  • ptimal to fulfill the mechanism after any history on path.

Formal definition Yingni Guo (NU) Delegation of Experimentation 40 / 55

slide-82
SLIDE 82
  • IV. Main Results

Time Consistency

Time Consistency

Definition 2 Fix a (direct or indirect) mechanism. It is time-consistent if Principal finds it

  • ptimal to fulfill the mechanism after any history on path.

Formal definition

Proposition 2 The cutoff rule is time-consistent if the main assumption holds.

Yingni Guo (NU) Delegation of Experimentation 40 / 55

slide-83
SLIDE 83
  • IV. Main Results

Time Consistency

Time Consistency: Principal’s Posterior Belief

Type θ stops. Types with θ ≥ θ∗ stop.

b

Calibrated belief pt time t p0(θ∗) b Cutoff: p∗

α

1

More general results Yingni Guo (NU) Delegation of Experimentation 41 / 55

slide-84
SLIDE 84
  • IV. Main Results

Time Consistency

Time Consistency: Principal’s Posterior Belief

Type θ stops. Types with θ ≥ θ∗ stop.

b

Cutoff: p∗

ρ

Calibrated belief pt time t p0(θ∗) b Cutoff: p∗

α

1

More general results Yingni Guo (NU) Delegation of Experimentation 41 / 55

slide-85
SLIDE 85
  • IV. Main Results

Time Consistency

Time Consistency: Principal’s Posterior Belief

Type θ stops. Types with θ ≥ θ∗ stop.

b

Cutoff: p∗

ρ

Calibrated belief pt time t p0(θ∗) b Cutoff: p∗

α

1

More general results Yingni Guo (NU) Delegation of Experimentation 41 / 55

slide-86
SLIDE 86
  • IV. Main Results

Time Consistency

Time Consistency: Principal’s Posterior Belief

Type θ stops. Types with θ ≥ θ∗ stop.

b

Cutoff: p∗

ρ

Calibrated belief pt time t p0(θ∗) b Cutoff: p∗

α

1

More general results Yingni Guo (NU) Delegation of Experimentation 41 / 55

slide-87
SLIDE 87
  • IV. Main Results

Time Consistency

Time Consistency: Principal’s Posterior Belief

Type θ stops. Types with θ ≥ θ∗ stop.

b

Cutoff: p∗

ρ

Calibrated belief pt time t p0(θ∗) b Cutoff: p∗

α

1

More general results Yingni Guo (NU) Delegation of Experimentation 41 / 55

slide-88
SLIDE 88
  • IV. Main Results

Time Consistency

Time Consistency: Principal’s Posterior Belief

Type θ stops. Types with θ ≥ θ∗ stop.

b

Cutoff: p∗

ρ

Calibrated belief pt time t p0(θ∗) b Cutoff: p∗

α

1

More general results Yingni Guo (NU) Delegation of Experimentation 41 / 55

slide-89
SLIDE 89
  • IV. Main Results

Cutoff Type

The Cutoff Type

The cutoff type θ∗: the lowest value in Θ s.t. Agent’s preferred policy given θ∗ equals Principal’s preferred policy if she believes that θ ≥ θ∗. For any ˆ θ > θ∗, Agent’s preferred policy given ˆ θ is above Principal’s preferred policy if she believes that θ ≥ ˆ θ.

Yingni Guo (NU) Delegation of Experimentation 42 / 55

slide-90
SLIDE 90
  • IV. Main Results

Cutoff Type

The Cutoff Type (cont.)

w0(π) w1(π) 1 1 feasible set: Γ

b

θηα

b θηα b

θηρ

b

θηρ

b

θ∗ηα Agent’s preferred policies Principal’s preferred policies

Yingni Guo (NU) Delegation of Experimentation 43 / 55

slide-91
SLIDE 91
  • IV. Main Results

Cutoff Type

The Cutoff Type (cont.)

w0(π) w1(π) 1 1 feasible set: Γ

b

θ∗ηρ

b

θηα

b θηα b

θηρ

b

θηρ

b

θ∗ηα Agent’s preferred policies Principal’s preferred policies

Yingni Guo (NU) Delegation of Experimentation 43 / 55

slide-92
SLIDE 92
  • IV. Main Results

Cutoff Type

The Cutoff Type (cont.)

w0(π) w1(π) 1 1 feasible set: Γ

b

θ∗ηρ

b b

θηα

b θηα b

θηρ

b

θηρ

b

θ∗ηα Agent’s preferred policies Principal’s preferred policies

Yingni Guo (NU) Delegation of Experimentation 43 / 55

slide-93
SLIDE 93
  • IV. Main Results

Cutoff Type

Over- and Under-Experimentation

stopping time τ type θ Agent’s preferred stopping time θ θ

More general results Yingni Guo (NU) Delegation of Experimentation 44 / 55

slide-94
SLIDE 94
  • IV. Main Results

Cutoff Type

Over- and Under-Experimentation

stopping time τ type θ Principal’s preferred stopping time Agent’s preferred stopping time θ θ

More general results Yingni Guo (NU) Delegation of Experimentation 44 / 55

slide-95
SLIDE 95
  • IV. Main Results

Cutoff Type

Over- and Under-Experimentation

stopping time τ type θ τα(θ∗) Principal’s preferred stopping time Agent’s preferred stopping time delegation rule θ∗ θ θ

More general results Yingni Guo (NU) Delegation of Experimentation 44 / 55

slide-96
SLIDE 96
  • IV. Main Results

Cutoff Type

Over- and Under-Experimentation

stopping time τ type θ τα(θ∗) Principal’s preferred stopping time delegation rule θ∗ θ θ

More general results Yingni Guo (NU) Delegation of Experimentation 44 / 55

slide-97
SLIDE 97
  • IV. Main Results

Cutoff Type

Over- and Under-Experimentation

stopping time τ type θ τα(θ∗) Principal’s preferred stopping time delegation rule θ∗

  • ver-experimentation

θ θ

More general results Yingni Guo (NU) Delegation of Experimentation 44 / 55

slide-98
SLIDE 98
  • IV. Main Results

Cutoff Type

Over- and Under-Experimentation

stopping time τ type θ τα(θ∗) Principal’s preferred stopping time delegation rule θ∗

  • ver-experimentation

under-experimentation θ θ

More general results Yingni Guo (NU) Delegation of Experimentation 44 / 55

slide-99
SLIDE 99

Outline

1

Model

2

Single-player benchmark

3

Characterizing the policy space

4

Main results

5

More general results

Yingni Guo (NU) Delegation of Experimentation 45 / 55

slide-100
SLIDE 100
  • V. More general results

Poisson Inconclusive News

Feasible Set: Poisson Inconclusive News

w0(π) w1(π) 1 1 feasible set: Γ l

  • w

e r

  • c

u t

  • ff

M a r k

  • v

p

  • l

i c i e s u pp e r

  • c

u to ff M a r k

  • v

p

  • l

i c i e s

Yingni Guo (NU) Delegation of Experimentation 46 / 55

slide-101
SLIDE 101
  • V. More general results

Poisson Inconclusive News

The Cutoff Rule: Poisson Inconclusive News

w0(π) w1(π) 1 1 feasible set: Γ

b

θηα

b θηα b

θηρ

b

θηρ

b

θ∗ηα Agent’s preferred policies Principal’s preferred policies

Yingni Guo (NU) Delegation of Experimentation 47 / 55

slide-102
SLIDE 102
  • V. More general results

Poisson Inconclusive News

Implementation: Poisson Inconclusive News

Type θ stops. Types with θ ≥ θ∗ stop.

b

calibrated belief pt time t p0 = θ∗ cutoff: p∗

α

1

Yingni Guo (NU) Delegation of Experimentation 48 / 55

slide-103
SLIDE 103
  • V. More general results

Poisson Inconclusive News

Sliding Deadline: Poisson Inconclusive News

Principal initially sets a deadline for experimentation Whenever a success realizes, the deadline is extended. Agent is free to switch to S before the deadline. When the deadline is reached, Agent is required to switch to S.

The end Yingni Guo (NU) Delegation of Experimentation 49 / 55

slide-104
SLIDE 104
  • V. More general results

L´ evy Bandits

L´ evy Bandits

Proposition 3 The cutoff rule is optimal if the main assumption holds. Proposition 4 The cutoff rule is time-consistent if the main assumption holds.

The end Yingni Guo (NU) Delegation of Experimentation 50 / 55

slide-105
SLIDE 105
  • V. More general results

L´ evy Bandits

Optimal Contract with Transfers

The principal can make transfers to the agent. The agent is protected by limited liability. For each type, the principal specifies an experimentation policy and a transfer scheme,

  • w1, w0; t1, t0

.

Yingni Guo (NU) Delegation of Experimentation 51 / 55

slide-106
SLIDE 106
  • V. More general results

L´ evy Bandits

Optimal Contract with Transfers

Stopping time τ type θ Agent’s preferred θ θ

Yingni Guo (NU) Delegation of Experimentation 52 / 55

slide-107
SLIDE 107
  • V. More general results

L´ evy Bandits

Optimal Contract with Transfers

Stopping time τ type θ Principal’s preferred Agent’s preferred θ θ

Yingni Guo (NU) Delegation of Experimentation 52 / 55

slide-108
SLIDE 108
  • V. More general results

L´ evy Bandits

Optimal Contract with Transfers

Stopping time τ type θ Principal’s preferred Agent’s preferred Delegation rule θ∗ θ θ

Yingni Guo (NU) Delegation of Experimentation 52 / 55

slide-109
SLIDE 109
  • V. More general results

L´ evy Bandits

Optimal Contract with Transfers

Stopping time τ type θ Principal’s preferred Agent’s preferred Delegation rule θ∗ Optimal allocation with transfers θ∗∗ θ θ

Yingni Guo (NU) Delegation of Experimentation 52 / 55

slide-110
SLIDE 110

Discussion

Implications and Applications

A (sliding) deadline should be in place as a safeguard against abuse of

  • resources. The continuation of a project is permitted only upon

demonstrated successes. Agent should have full flexibility over resource allocation before the (sliding) deadline is reached.

Yingni Guo (NU) Delegation of Experimentation 53 / 55

slide-111
SLIDE 111

Discussion

Implications and Applications

A (sliding) deadline should be in place as a safeguard against abuse of

  • resources. The continuation of a project is permitted only upon

demonstrated successes. Agent should have full flexibility over resource allocation before the (sliding) deadline is reached. Google: the once highly publicized and well-funded Google Wave was canceled in August 2010 as it failed to achieve the goal set by Google executives before then.

Yingni Guo (NU) Delegation of Experimentation 53 / 55

slide-112
SLIDE 112

Discussion

Future Work

Asymmetric learning. Multi-dimensional hidden information. Allocation vs. investment.

Yingni Guo (NU) Delegation of Experimentation 54 / 55

slide-113
SLIDE 113

Discussion

Thank you!

Yingni Guo (NU) Delegation of Experimentation 55 / 55

slide-114
SLIDE 114

Appendix

Yingni Guo (NU) Delegation of Experimentation 55 / 55

slide-115
SLIDE 115

Appendix

A Policy is a Non-Anticipative Process

Suppose the process L is a L´ evy process L1 with probability p ∈ (0, 1) and L0 with probability 1 − p. Let FL

t be the sigma-algebra generated by the process (L(s))s≤t.

Then it is required that the process π satisfies that { t

0 πsds ≤ t′} ∈ FL t′ , for

any t, t′ ∈ [0, ∞).

Back Yingni Guo (NU) Delegation of Experimentation 55 / 55

slide-116
SLIDE 116

Appendix

Mixed Policies

I define mixed policies following Aumann (1964). Let Π∗ be the set of all pure policies. I define mixed policies as measurable functions ˆ π : [0, 1] → Π∗. According to ˆ π, a value x ∈ [0, 1] is drawn uniformly from [0, 1] and then the pure policy ˆ π(x) is implemented. Stochastic mechanisms are measurable functions ˆ π : Θ × [0, 1] → Π∗.

Back Yingni Guo (NU) Delegation of Experimentation 55 / 55

slide-117
SLIDE 117

Appendix

A Policy as a Pair of Numbers

Player i’s payoff given policy π ∈ Π and prior p0 ∈ [0, 1] is

Ui(π, p0) = E ∞ re−rt [(1 − πt)si + πtλωhi] dt

  • π, p0
  • = p0E

∞ re−rt si + πt

  • λ1hi − si
  • dt
  • π, 1
  • + (1 − p0)E

∞ re−rt si + πt

  • λ0hi − si
  • dt
  • π, 0
  • = p0
  • λ1hi − si
  • E

∞ re−rtπtdt

  • π, 1
  • + (1 − p0)
  • λ0hi − si
  • E

∞ re−rtπtdt

  • π, 0
  • + si

= p0

  • λ1hi − si
  • w1(π) + (1 − p)
  • λ0hi − si
  • w0(π) + si.

Back Yingni Guo (NU) Delegation of Experimentation 55 / 55

slide-118
SLIDE 118

Appendix

Characterization of Feasible Set

Given γ = (γ1, γ0) ∈ R2, define the supremum score in direction γ and the associated half space as K(γ) ≡ sup

π∈Π

  • γ1w1(π) + γ0w0(π)
  • ,

H(γ) ≡

  • v ∈ R2 : γ · v ≤ K(γ)
  • .

Define the intersection of all half spaces as H ≡ ∩γ∈R2H(γ). Since Γ ⊂ H(γ) for any γ, it follows that Γ ⊂ H. The feasible set Γ is convex given that the policy space Π and hence Γ are convexified. It follows that Γ = H. Since the extreme points of H are given by Markov policies, this completes the proof.

Back Yingni Guo (NU) Delegation of Experimentation 55 / 55

slide-119
SLIDE 119

Appendix

Formal Definition of Time Consistency

A history of length t on path is ht = ((πs)s≤t, (Ns)s≤t). The set of histories of length t on path is denoted Ht. The set of all histories

  • n path is H = ∪t≥0Ht.

Let F(ht) be the cdf of the agent’s belief of state 1 at time t after history ht with support Θ(ht). A delegation rule C : Θ → Π admits a time-consistent implementation if for any ht on path C(ht) ∈ argmax

π(·)

  • Θ(ht)

Uρ(θ, π(θ))dF(ht), subject to Uα(θ, π(θ)) ≥ Uα(θ, π(θ′)) ∀θ, θ′ ∈ Θ(ht).

Back Yingni Guo (NU) Delegation of Experimentation 55 / 55