Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - - PowerPoint PPT Presentation

β–Ά
empirical evidence equilibria in stochastic games
SMART_READER_LITE
LIVE PREVIEW

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - - PowerPoint PPT Presentation

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic games Empirical-evidence equilibria (EEEs) Open questions in EEEs Stochastic Games 3 Game theory Markov decision processes Game Theory


slide-1
SLIDE 1

Empirical-evidence Equilibria in Stochastic Games

Nicolas Dudebout

slide-2
SLIDE 2

Outline

  • Stochastic games
  • Empirical-evidence equilibria (EEEs)
  • Open questions in EEEs

2

slide-3
SLIDE 3

Stochastic Games

  • Game theory
  • Markov decision processes

3

slide-4
SLIDE 4

Game Theory

Decision making π‘£βˆΆ 𝒝 β†’ ℝ ⟹ π‘βˆ— ∈ arg max

π‘βˆˆπ’

𝑣(𝑏) Game theory 𝑣1 ∢ 𝒝1 Γ— 𝒝2 β†’ ℝ 𝑣2 ∢ 𝒝1 Γ— 𝒝2 β†’ ℝ Nash Equilibrium ⎧ βŽͺ ⎨ βŽͺ ⎩ π‘βˆ—

1 ∈ arg max 𝑏1βˆˆπ’1

𝑣1(𝑏1, π‘βˆ—

2)

π‘βˆ—

2 ∈ arg max 𝑏2βˆˆπ’2

𝑣2(π‘βˆ—

1, 𝑏2)

4

slide-5
SLIDE 5

Example: Battle of the Sexes

F O F 2, 2 0, 1 O 0, 0 1, 3 Nash equilibria

  • (𝐺, 𝐺)
  • (𝑃, 𝑃)
  • (3/

4𝐺 1/ 4𝑃, 1/ 3𝐺 2/ 3𝑃)

5

slide-6
SLIDE 6

Markov Decision Process (MDP)

Dynamic 𝑦+ ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦𝑒+1 ∼ 𝑔(𝑦𝑒, 𝑏𝑒) Stage cost 𝑣(𝑦, 𝑏) History β„Žπ‘’ = (𝑦0, 𝑦1, … , 𝑦𝑒, 𝑏0, 𝑏1, … , 𝑏𝑒) Strategy 𝜏 ∢ β„‹ β†’ 𝒝 Utility 𝑉(𝜏) = 𝔽𝑔,𝜏[

∞

βˆ‘

𝑒=0

πœ€π‘’π‘£(𝑦𝑒, 𝑏𝑒)] Bellman’s equation 𝑉 βˆ—(𝑦) = max

π‘βˆˆπ’ {𝑣(𝑦, 𝑏) + πœ€π”½π‘”[𝑉 βˆ—(𝑦+) | 𝑦, 𝑏]}

Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction

6

slide-7
SLIDE 7

Markov Decision Process (MDP)

Dynamic 𝑦+ ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦𝑒+1 ∼ 𝑔(𝑦𝑒, 𝑏𝑒) Stage cost 𝑣(𝑦, 𝑏) History β„Žπ‘’ = (𝑦0, 𝑦1, … , 𝑦𝑒, 𝑏0, 𝑏1, … , 𝑏𝑒) Strategy 𝜏 ∢ β„‹ β†’ 𝒝 Utility 𝑉(𝜏) = 𝔽𝑔,𝜏[

∞

βˆ‘

𝑒=0

πœ€π‘’π‘£(𝑦𝑒, 𝑏𝑒)] Bellman’s equation 𝑉 βˆ—(𝑦) = max

π‘βˆˆπ’ {𝑣(𝑦, 𝑏) + πœ€π”½π‘”[𝑉 βˆ—(𝑦+) | 𝑦, 𝑏]}

Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction

6

slide-8
SLIDE 8

Markov Decision Process (MDP)

Dynamic 𝑦+ ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦𝑒+1 ∼ 𝑔(𝑦𝑒, 𝑏𝑒) Stage cost 𝑣(𝑦, 𝑏) History β„Žπ‘’ = (𝑦0, 𝑦1, … , 𝑦𝑒, 𝑏0, 𝑏1, … , 𝑏𝑒) Strategy 𝜏 ∢ 𝒴 β†’ 𝒝 Utility 𝑉(𝜏) = 𝔽𝑔,𝜏[

∞

βˆ‘

𝑒=0

πœ€π‘’π‘£(𝑦𝑒, 𝑏𝑒)] Bellman’s equation 𝑉 βˆ—(𝑦) = max

π‘βˆˆπ’ {𝑣(𝑦, 𝑏) + πœ€π”½π‘”[𝑉 βˆ—(𝑦+) | 𝑦, 𝑏]}

Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction

6

slide-9
SLIDE 9

Imperfect Information (POMDP)

Dynamic π‘₯+ ∼ π‘œ(π‘₯, 𝑏) Signal 𝑑 ∼ πœ‰(π‘₯) History β„Žπ‘’ = (𝑑0, 𝑑1, … , 𝑑𝑒, 𝑏0, 𝑏1, … , 𝑏𝑒) Strategy 𝜏 ∢ β„‹ β†’ 𝒝 Belief β„™π‘œ,πœ‰,𝜏[π‘₯ | β„Ž]

7

slide-10
SLIDE 10

Imperfect Information (POMDP)

Dynamic π‘₯+ ∼ π‘œ(π‘₯, 𝑏) Signal 𝑑 ∼ πœ‰(π‘₯) History β„Žπ‘’ = (𝑑0, 𝑑1, … , 𝑑𝑒, 𝑏0, 𝑏1, … , 𝑏𝑒) Strategy 𝜏 ∢ β„‹ β†’ 𝒝 Belief β„™π‘œ,πœ‰,𝜏[π‘₯ | β„Ž]

7

slide-11
SLIDE 11

Imperfect Information (POMDP)

Dynamic π‘₯+ ∼ π‘œ(π‘₯, 𝑏) Signal 𝑑 ∼ πœ‰(π‘₯) History β„Žπ‘’ = (𝑑0, 𝑑1, … , 𝑑𝑒, 𝑏0, 𝑏1, … , 𝑏𝑒) Strategy 𝜏 ∢ Ξ”(𝒳 ) β†’ 𝒝 Belief β„™π‘œ,πœ‰,𝜏[π‘₯ | β„Ž]

7

slide-12
SLIDE 12

Stochastic Games

Dynamic π‘₯+ ∼ π‘œ(π‘₯, 𝑏1, 𝑏2) Signals { 𝑑1 ∼ πœ‰1(π‘₯) 𝑑2 ∼ πœ‰2(π‘₯) Histories { β„Žπ‘’

1 = (𝑑1 0, 𝑑1 1, … , 𝑑𝑒 1, 𝑏1 0, 𝑏1 1, … , 𝑏𝑒 1)

β„Žπ‘’

2 = (𝑑2 0, 𝑑2 1, … , 𝑑𝑒 2, 𝑏2 0, 𝑏2 1, … , 𝑏𝑒 2)

Strategies { 𝜏1 ∢ β„‹1 β†’ 𝒝1 𝜏2 ∢ β„‹2 β†’ 𝒝2 Beliefs { β„™π‘œ,πœ‰1,𝜏1,πœ‰2,𝜏2[π‘₯, β„Ž2 | β„Ž1] β„™π‘œ,πœ‰1,𝜏1,πœ‰2,𝜏2[π‘₯, β„Ž1 | β„Ž2]

8

slide-13
SLIDE 13

Stochastic Games

Dynamic π‘₯+ ∼ π‘œ(π‘₯, 𝑏1, 𝑏2) Signals { 𝑑1 ∼ πœ‰1(π‘₯) 𝑑2 ∼ πœ‰2(π‘₯) Histories { β„Žπ‘’

1 = (𝑑1 0, 𝑑1 1, … , 𝑑𝑒 1, 𝑏1 0, 𝑏1 1, … , 𝑏𝑒 1)

β„Žπ‘’

2 = (𝑑2 0, 𝑑2 1, … , 𝑑𝑒 2, 𝑏2 0, 𝑏2 1, … , 𝑏𝑒 2)

Strategies { 𝜏1 ∢ β„‹1 β†’ 𝒝1 𝜏2 ∢ β„‹2 β†’ 𝒝2 Beliefs { β„™π‘œ,πœ‰1,𝜏1,πœ‰2,𝜏2[π‘₯, β„Ž2 | β„Ž1] β„™π‘œ,πœ‰1,𝜏1,πœ‰2,𝜏2[π‘₯, β„Ž1 | β„Ž2]

8

slide-14
SLIDE 14

Existing Approaches

  • (Weakly) belief-free equilibrium
  • Mean-field equilibrium
  • Incomplete theories

9

slide-15
SLIDE 15

Empirical-evidence Equilibria

10

slide-16
SLIDE 16

Motivation

Agent 1 Nature Agent 2

  • 0. Pick arbitrary strategies
  • 1. Formulate simple but consistent models
  • 2. Design strategies optimal w.r.t. models, then, back to 1.

Empirical-evidence equilibrium is a fixed point:

  • Strategies optimal w.r.t. models
  • Models consistent with strategies

11

slide-17
SLIDE 17

Example: Asset Management

Trading one asset on the stock market Model based on

  • information published by the company
  • observed trading activity

Model very different for each agent

12

slide-18
SLIDE 18

Multiple to Single Agent

Agent 1 Nature Agent 2

13

slide-19
SLIDE 19

Multiple to Single Agent

Agent 1 Nature Agent 2 Nature 1

13

slide-20
SLIDE 20

Single Agent Setup

Agent Nature

14

slide-21
SLIDE 21

Single Agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) Nature

14

slide-22
SLIDE 22

Single Agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) Nature 𝑑

14

slide-23
SLIDE 23

Single Agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑

14

slide-24
SLIDE 24

Example: Asset Management

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 State holding 𝑦 ∈ {0 .. 𝑁} Action sell one, hold, or buy one 𝑏 ∈ {βˆ’1, 0, 1} Signal price π‘ž ∈ {Low, High} Stage cost π‘ž β‹… 𝑏 Nature π‘₯ represents market sentiment, political climate,

  • ther traders

15

slide-25
SLIDE 25

Single Agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 Model Μ‚ 𝑑 Μ‚ 𝑑 consistent with 𝜏 𝜏 optimal w.r.t. Μ‚ 𝑑

16

slide-26
SLIDE 26

Single Agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 Model Μ‚ 𝑑 Μ‚ 𝑑 consistent with 𝜏 𝜏 optimal w.r.t. Μ‚ 𝑑

16

slide-27
SLIDE 27

Single Agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 Model Μ‚ 𝑑 Μ‚ 𝑑 consistent with 𝜏 𝜏 optimal w.r.t. Μ‚ 𝑑

16

slide-28
SLIDE 28

Single Agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 Model Μ‚ 𝑑 Μ‚ 𝑑 consistent with 𝑑 𝜏 optimal w.r.t. Μ‚ 𝑑

16

slide-29
SLIDE 29

Single Agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 Model Μ‚ 𝑑 Μ‚ 𝑑 consistent with 𝜏 𝜏 optimal w.r.t. Μ‚ 𝑑

16

slide-30
SLIDE 30

Single Agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, Μ‚ 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 Model Μ‚ 𝑑 Μ‚ 𝑑 consistent with 𝜏 𝜏 optimal w.r.t. Μ‚ 𝑑

16

slide-31
SLIDE 31

Single Agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, Μ‚ 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 Model Μ‚ 𝑑 Μ‚ 𝑑 consistent with 𝜏 𝜏 optimal w.r.t. Μ‚ 𝑑

16

slide-32
SLIDE 32

Depth-𝑙 Consistency

Consider a binary stochastic process 𝑑 0100010001001010010110111010000111010101...

  • 0 characteristic: β„™[𝑑 = 0], β„™[𝑑 = 1]
  • 1 characteristic: β„™[𝑑𝑑+ = 00], β„™[𝑑𝑑+ = 10],

β„™[𝑑𝑑+ = 01], β„™[𝑑𝑑+ = 11]

  • ...
  • 𝑙 characteristic: probability of strings of length 𝑙 + 1

Definition Two processes 𝑑 and 𝑑′ are depth-𝑙 consistent if they have the same 𝑙 characteristic

17

slide-33
SLIDE 33

Depth-𝑙 Consistency

Consider a binary stochastic process 𝑑 0100010001001010010110111010000111010101...

  • 0 characteristic: β„™[𝑑 = 0], β„™[𝑑 = 1]
  • 1 characteristic: β„™[𝑑𝑑+ = 00], β„™[𝑑𝑑+ = 10],

β„™[𝑑𝑑+ = 01], β„™[𝑑𝑑+ = 11]

  • ...
  • 𝑙 characteristic: probability of strings of length 𝑙 + 1

Definition Two processes 𝑑 and 𝑑′ are depth-𝑙 consistent if they have the same 𝑙 characteristic

17

slide-34
SLIDE 34

Depth-𝑙 Consistency

Consider a binary stochastic process 𝑑 0100010001001010010110111010000111010101...

  • 0 characteristic: β„™[𝑑 = 0], β„™[𝑑 = 1]
  • 1 characteristic: β„™[𝑑𝑑+ = 00], β„™[𝑑𝑑+ = 10],

β„™[𝑑𝑑+ = 01], β„™[𝑑𝑑+ = 11]

  • ...
  • 𝑙 characteristic: probability of strings of length 𝑙 + 1

Definition Two processes 𝑑 and 𝑑′ are depth-𝑙 consistent if they have the same 𝑙 characteristic

17

slide-35
SLIDE 35

Depth-𝑙 Consistency: Example

1 π‘¨βˆ…

0.5 0.5

𝑨0 𝑨1 1

1 1 0.3 0.7 0.3 0.7

18

slide-36
SLIDE 36

Complete picture

Fix a depth 𝑙 ∈ β„• 𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) Model 𝑑 Μ‚ 𝑑 𝜏 ↦ 𝜈 consistent with 𝜏 𝜈 ↦ 𝜏 optimal w.r.t. 𝜈 𝑨 contains the last 𝑙 observed signals 𝜈(𝑨 = (𝑑1, 𝑑2, … , 𝑑𝑙))[𝑑𝑙+1] = β„™πœ[𝑑𝑒+1 = 𝑑𝑙+1 | 𝑑𝑒 = 𝑑𝑙, … , π‘‘π‘’βˆ’π‘™+1 = 𝑑1]

19

slide-37
SLIDE 37

Complete picture

Fix a depth 𝑙 ∈ β„• 𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑨+ ∼ 𝑛𝑙(𝑨) Μ‚ 𝑑 ∼ 𝜈(𝑨) 𝑑 Μ‚ 𝑑 𝜏 ↦ 𝜈 consistent with 𝜏 𝜈 ↦ 𝜏 optimal w.r.t. 𝜈 𝑨 contains the last 𝑙 observed signals 𝜈(𝑨 = (𝑑1, 𝑑2, … , 𝑑𝑙))[𝑑𝑙+1] = β„™πœ[𝑑𝑒+1 = 𝑑𝑙+1 | 𝑑𝑒 = 𝑑𝑙, … , π‘‘π‘’βˆ’π‘™+1 = 𝑑1]

19

slide-38
SLIDE 38

Complete picture

Fix a depth 𝑙 ∈ β„• 𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(𝑦, 𝑨) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑨+ ∼ 𝑛𝑙(𝑨) Μ‚ 𝑑 ∼ 𝜈(𝑨) 𝑑 Μ‚ 𝑑 𝜏 ↦ 𝜈 consistent with 𝜏 𝜈 ↦ 𝜏 optimal w.r.t. 𝜈 𝑨 contains the last 𝑙 observed signals 𝜈(𝑨 = (𝑑1, 𝑑2, … , 𝑑𝑙))[𝑑𝑙+1] = β„™πœ[𝑑𝑒+1 = 𝑑𝑙+1 | 𝑑𝑒 = 𝑑𝑙, … , π‘‘π‘’βˆ’π‘™+1 = 𝑑1]

19

slide-39
SLIDE 39

Complete picture

Fix a depth 𝑙 ∈ β„• 𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(𝑦, 𝑨) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑨+ ∼ 𝑛𝑙(𝑨) Μ‚ 𝑑 ∼ 𝜈(𝑨) 𝑑 Μ‚ 𝑑 𝜏 ↦ 𝜈 consistent with 𝜏 𝜈 ↦ 𝜏 optimal w.r.t. 𝜈 𝑨 contains the last 𝑙 observed signals 𝜈(𝑨 = (𝑑1, 𝑑2, … , 𝑑𝑙))[𝑑𝑙+1] = β„™πœ[𝑑𝑒+1 = 𝑑𝑙+1 | 𝑑𝑒 = 𝑑𝑙, … , π‘‘π‘’βˆ’π‘™+1 = 𝑑1]

19

slide-40
SLIDE 40

Definition

(𝜏, 𝜈) is an empirical-evidence optimum (EEO) for 𝑙 iff

  • 𝜏 is optimal w.r.t. 𝜈
  • 𝜈 is depth-𝑙 consistent with 𝜏

(𝜏, 𝜈) is an πœ— empirical-evidence optimum (πœ— EEO) for 𝑙 iff

  • 𝜏 is πœ— optimal w.r.t. 𝜈
  • 𝜈 is depth-𝑙 consistent with 𝜏

20

slide-41
SLIDE 41

Definition

(𝜏, 𝜈) is an empirical-evidence optimum (EEO) for 𝑙 iff

  • 𝜏 is optimal w.r.t. 𝜈
  • 𝜈 is depth-𝑙 consistent with 𝜏

(𝜏, 𝜈) is an πœ— empirical-evidence optimum (πœ— EEO) for 𝑙 iff

  • 𝜏 is πœ— optimal w.r.t. 𝜈
  • 𝜈 is depth-𝑙 consistent with 𝜏

20

slide-42
SLIDE 42

Existence Result

Theorem For all 𝑙 and πœ—, there exists an πœ— EEO for 𝑙 Proof sketch Prove continuity of 𝜏 ↦ 𝜈 ↦ 𝜏 𝜏 ∢ 𝒴 Γ— 𝒢 β†’ Ξ”(𝒝) 𝜏 parametrized over a simplex (convex and compact) Apply Brouwer’s fixed point theorem

21

slide-43
SLIDE 43

Existence Result

Theorem For all 𝑙 and πœ—, there exists an πœ— EEO for 𝑙 Proof sketch Prove continuity of 𝜏 ↦ 𝜈 ↦ 𝜏 𝜏 ∢ 𝒴 Γ— 𝒢 β†’ Ξ”(𝒝) 𝜏 parametrized over a simplex (convex and compact) Apply Brouwer’s fixed point theorem

21

slide-44
SLIDE 44

Complete picture

Fix a depth 𝑙 ∈ β„• 𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(𝑦, 𝑨) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑨+ ∼ 𝑛𝑙(𝑨) Μ‚ 𝑑 ∼ 𝜈(𝑨) 𝑑 Μ‚ 𝑑 𝜏 ↦ 𝜈 consistent with 𝜏 𝜈 ↦ 𝜏 optimal w.r.t. 𝜈 𝑨 contains the last 𝑙 observed signals 𝜈(𝑨 = (𝑑1, 𝑑2, … , 𝑑𝑙))[𝑑𝑙+1] = β„™πœ[𝑑𝑒+1 = 𝑑𝑙+1 | 𝑑𝑒 = 𝑑𝑙, … , π‘‘π‘’βˆ’π‘™+1 = 𝑑1]

22

slide-45
SLIDE 45

Multiagent Setting

𝑦+

𝑗 ∼ 𝑔𝑗(𝑦𝑗, 𝑏𝑗, 𝑑𝑗)

𝑏𝑗 ∼ πœπ‘—(𝑦𝑗, 𝑨𝑗) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑨+

𝑗 ∼ 𝑛𝑗,𝑙(𝑨𝑗)

Μ‚ 𝑑𝑗 ∼ πœˆπ‘—(𝑨𝑗) 𝑑𝑗 Μ‚ 𝑑𝑗 𝑦 = (𝑦1, 𝑦2, … , 𝑦𝑂) 𝑏 = (𝑏1, 𝑏2, … , 𝑏𝑂) 𝑑 = (𝑑1, 𝑑2, … , 𝑑𝑂)

23

slide-46
SLIDE 46

Empirical-evidence Equilibrium

(𝜏, 𝜈) is an empirical-evidence equilibrium (EEE) for 𝐿 = (𝑙1, 𝑙2, … , 𝑙𝑂) iff

  • for all 𝑗, πœπ‘— is optimal w.r.t. πœˆπ‘—
  • for all 𝑗, πœˆπ‘— is depth-𝑙𝑗 consistent with 𝜏

Theorem For all 𝐿 and πœ—, there exists an πœ— EEE for 𝐿

24

slide-47
SLIDE 47

Empirical-evidence Equilibrium

(𝜏, 𝜈) is an empirical-evidence equilibrium (EEE) for 𝐿 = (𝑙1, 𝑙2, … , 𝑙𝑂) iff

  • for all 𝑗, πœπ‘— is optimal w.r.t. πœˆπ‘—
  • for all 𝑗, πœˆπ‘— is depth-𝑙𝑗 consistent with 𝜏

Theorem For all 𝐿 and πœ—, there exists an πœ— EEE for 𝐿

24

slide-48
SLIDE 48

Open Questions

  • endogenous model depending on action
  • large number of agents
  • large 𝑙
  • relating EEE to other concepts (MFE, optimum)
  • offline computation
  • online learning using empirical evidence

25

slide-49
SLIDE 49

Open Questions

  • endogenous model depending on action
  • large number of agents
  • large 𝑙
  • relating EEE to other concepts (MFE, optimum)
  • offline computation
  • online learning using empirical evidence

25

slide-50
SLIDE 50

Open Questions

  • endogenous model depending on action
  • large number of agents
  • large 𝑙
  • relating EEE to other concepts (MFE, optimum)
  • offline computation
  • online learning using empirical evidence

25

slide-51
SLIDE 51

Open Questions

  • endogenous model depending on action
  • large number of agents
  • large 𝑙
  • relating EEE to other concepts (MFE, optimum)
  • offline computation
  • online learning using empirical evidence

25

slide-52
SLIDE 52

Example: Asset Management

State holdings 𝑦𝑗 ∈ {0 .. 𝑁} Action sell one, hold, or buy one 𝑏𝑗 ∈ {βˆ’1, 0, 1} Signal price π‘ž ∈ {Low, High} Dynamic 𝑦+

𝑗 = 𝑦𝑗 + 𝑏𝑗

Stage cost π‘ž β‹… 𝑏𝑗 Nature market trend 𝑐 ∈ {Bull, Bear} π‘₯ = (𝑐, π‘ž) Nature is a sticky bear

26

slide-53
SLIDE 53

Example: Asset Management

  • 0. Pick arbitrary models 𝜈
  • 1. Design strategies 𝜏 optimal w.r.t. models 𝜈
  • 2. Formulate consistent models 𝜈upd, then, back to 1.

Depth-0 consistency:

  • 𝜈1 = 1
  • 𝜈2 = 0

πœˆπ‘’+1

𝑗

= (1 βˆ’ 𝛽)πœˆπ‘’

𝑗 + 𝛽(πœˆπ‘’ 𝑗,upd βˆ’ πœˆπ‘’ 𝑗)

27

slide-54
SLIDE 54

Learning Results: Offline

20 40 60 80 100 0.2 0.4 0.6 0.8 1 Time 𝑒 Prediction πœˆπ‘’

𝑗[High]

𝑗 = 1 𝑗 = 2

28

slide-55
SLIDE 55

Learning Results: Online

20 40 60 80 100 0.2 0.4 0.6 0.8 1 Time 𝑒 Prediction πœˆπ‘’

𝑗[High]

𝑗 = 1 𝑗 = 2

29

slide-56
SLIDE 56

Empirical-evidence Equilibria

  • Introduce
  • Contrast
  • Compute

30