Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - - PowerPoint PPT Presentation

β–Ά
empirical evidence equilibria in stochastic games
SMART_READER_LITE
LIVE PREVIEW

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - - PowerPoint PPT Presentation

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Georgia Institute of Technology Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal


slide-1
SLIDE 1

Empirical-evidence Equilibria in Stochastic Games

Nicolas Dudebout

Georgia Institute of Technology

slide-2
SLIDE 2

Empirical-evidence Equilibria (EEEs)

Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach:

  • 0. Pick arbitrary strategies
  • 1. Formulate simple but consistent models
  • 2. Design strategies optimal w.r.t. models, then, back to 1.

The fixed points are EEEs

Example

Asset management on the stock market

2

slide-3
SLIDE 3

Empirical-evidence Equilibria (EEEs)

Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach:

  • 0. Pick arbitrary strategies
  • 1. Formulate simple but consistent models
  • 2. Design strategies optimal w.r.t. models, then, back to 1.

The fixed points are EEEs

Example

Asset management on the stock market

2

slide-4
SLIDE 4

Empirical-evidence Equilibria (EEEs)

Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach:

  • 0. Pick arbitrary strategies
  • 1. Formulate simple but consistent models
  • 2. Design strategies optimal w.r.t. models, then, back to 1.

The fixed points are EEEs

Example

Asset management on the stock market

2

slide-5
SLIDE 5

Empirical-evidence Equilibria (EEEs)

Agent 1 Nature Agent 2 Nature 1 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach:

  • 0. Pick arbitrary strategies
  • 1. Formulate simple but consistent models
  • 2. Design strategies optimal w.r.t. models, then, back to 1.

The fixed points are EEEs

Example

Asset management on the stock market

2

slide-6
SLIDE 6

Single-agent Setup

Agent Nature 𝑨+ ∼ 𝑛(𝑨) Μ‚ 𝑑+ ∼ 𝜈(𝑨) Μ‚ 𝑑 max

𝜏

π”½πœ[

∞

βˆ‘

𝑒=0

πœ€π‘’π‘£(𝑦𝑒, 𝑏𝑒, 𝑑𝑒)]

  • 𝜈 consistent with 𝜏
  • 𝜏 optimal w.r.t. 𝜈

3

slide-7
SLIDE 7

Single-agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(β„Ž) Nature 𝑑 𝑨+ ∼ 𝑛(𝑨) Μ‚ 𝑑+ ∼ 𝜈(𝑨) Μ‚ 𝑑 max

𝜏

π”½πœ[

∞

βˆ‘

𝑒=0

πœ€π‘’π‘£(𝑦𝑒, 𝑏𝑒, 𝑑𝑒)]

  • 𝜈 consistent with 𝜏
  • 𝜏 optimal w.r.t. 𝜈

3

slide-8
SLIDE 8

Single-agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 𝑨+ ∼ 𝑛(𝑨) Μ‚ 𝑑+ ∼ 𝜈(𝑨) Μ‚ 𝑑 max

𝜏

π”½πœ[

∞

βˆ‘

𝑒=0

πœ€π‘’π‘£(𝑦𝑒, 𝑏𝑒, 𝑑𝑒)]

  • 𝜈 consistent with 𝜏
  • 𝜏 optimal w.r.t. 𝜈

3

slide-9
SLIDE 9

Single-agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(β„Ž) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 Model Μ‚ 𝑑 max

𝜏

π”½πœ[

∞

βˆ‘

𝑒=0

πœ€π‘’π‘£(𝑦𝑒, 𝑏𝑒, 𝑑𝑒)]

  • 𝜈 consistent with 𝜏
  • 𝜏 optimal w.r.t. 𝜈

3

slide-10
SLIDE 10

Single-agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(𝑦, 𝑨) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 𝑨+ ∼ 𝑛(𝑨) Μ‚ 𝑑+ ∼ 𝜈(𝑨) Μ‚ 𝑑 max

𝜏

π”½πœ[

∞

βˆ‘

𝑒=0

πœ€π‘’π‘£(𝑦𝑒, 𝑏𝑒, 𝑑𝑒)]

  • 𝜈 consistent with 𝜏
  • 𝜏 optimal w.r.t. 𝜈

3

slide-11
SLIDE 11

Single-agent Setup

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(𝑦, 𝑨) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 𝑨+ ∼ 𝑛(𝑨) Μ‚ 𝑑+ ∼ 𝜈(𝑨) Μ‚ 𝑑 max

𝜏

π”½πœ[

∞

βˆ‘

𝑒=0

πœ€π‘’π‘£(𝑦𝑒, 𝑏𝑒, 𝑑𝑒)]

  • 𝜈 consistent with 𝜏
  • 𝜏 optimal w.r.t. 𝜈

3

slide-12
SLIDE 12

Depth-𝑙 Consistency

Binary stochastic process 𝑑 0100010001001010010110111010000111010101...

  • 0 characteristic: β„™[𝑑 = 0], β„™[𝑑 = 1]
  • 1 characteristic: β„™[𝑑𝑑+ = 00], β„™[𝑑𝑑+ = 10],

β„™[𝑑𝑑+ = 01], β„™[𝑑𝑑+ = 11]

  • ...
  • 𝑙 characteristic: probability of strings of length 𝑙 + 1

Definition

Two processes 𝑑 and Μ‚ 𝑑 are depth-𝑙 consistent if they have the same 𝑙 characteristic

4

slide-13
SLIDE 13

Complete Picture

𝑦+ ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑏 ∼ 𝜏(𝑦, 𝑨) π‘₯+ ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑑 ∼ πœ‰(π‘₯) 𝑑 𝑨+ ∼ 𝑛𝑙(𝑨) Μ‚ 𝑑 ∼ 𝜈(𝑨) Μ‚ 𝑑 Fix a depth 𝑙 ∈ β„• 𝑨 contains the last 𝑙

  • bserved signals

𝜈(𝑨 = (𝑑1, 𝑑2, … , 𝑑𝑙))[𝑑𝑙+1] = β„™πœ[𝑑𝑒+1 = 𝑑𝑙+1 | 𝑑𝑒 = 𝑑𝑙, … , π‘‘π‘’βˆ’π‘™+1 = 𝑑1]

5

slide-14
SLIDE 14

Empirical-evidence Optimality

Definition

(𝜏, 𝜈) is an empirical-evidence optimum (EEO) for 𝑙 iff

  • 𝜏 is optimal w.r.t. 𝜈
  • 𝜈 is depth-𝑙 consistent with 𝜏

Definition

(𝜏, 𝜈) is an πœ— empirical-evidence optimum (πœ— EEO) for 𝑙 iff

  • 𝜏 is πœ— optimal w.r.t. 𝜈
  • 𝜈 is depth-𝑙 consistent with 𝜏

6

slide-15
SLIDE 15

Empirical-evidence Optimality

Definition

(𝜏, 𝜈) is an empirical-evidence optimum (EEO) for 𝑙 iff

  • 𝜏 is optimal w.r.t. 𝜈
  • 𝜈 is depth-𝑙 consistent with 𝜏

Definition

(𝜏, 𝜈) is an πœ— empirical-evidence optimum (πœ— EEO) for 𝑙 iff

  • 𝜏 is πœ— optimal w.r.t. 𝜈
  • 𝜈 is depth-𝑙 consistent with 𝜏

6

slide-16
SLIDE 16

Existence Result

Theorem

For all 𝑙 and πœ—, there exists an πœ— EEO for 𝑙

Proof sketch

  • Technical assumption insures ergodicity of 𝑑
  • π‘ˆ ∢ 𝜏

π‘‘π‘π‘œπ‘‘π‘—π‘‘π‘’π‘“π‘œπ‘‘π‘§

βŸβ†β†β†β†β†β†β†’ 𝜈

πœ— π‘π‘žπ‘’π‘—π‘›π‘π‘šπ‘—π‘’π‘§

βŸβ†β†β†β†β†β†β†β†’ 𝜏 is continuous

  • 𝜏 ∢ 𝒴 Γ— 𝒢 β†’ Ξ”(𝒝) is parametrized over a simplex
  • Apply Brouwer’s fixed point theorem to π‘ˆ

7

slide-17
SLIDE 17

Existence Result

Theorem

For all 𝑙 and πœ—, there exists an πœ— EEO for 𝑙

Proof sketch

  • Technical assumption insures ergodicity of 𝑑
  • π‘ˆ ∢ 𝜏

π‘‘π‘π‘œπ‘‘π‘—π‘‘π‘’π‘“π‘œπ‘‘π‘§

βŸβ†β†β†β†β†β†β†’ 𝜈

πœ— π‘π‘žπ‘’π‘—π‘›π‘π‘šπ‘—π‘’π‘§

βŸβ†β†β†β†β†β†β†β†’ 𝜏 is continuous

  • 𝜏 ∢ 𝒴 Γ— 𝒢 β†’ Ξ”(𝒝) is parametrized over a simplex
  • Apply Brouwer’s fixed point theorem to π‘ˆ

7

slide-18
SLIDE 18

Multiagent Setup

𝑦+

1 ∼ 𝑔1(𝑦1, 𝑏1, 𝑑1)

𝑏1 ∼ 𝜏1(𝑦1, 𝑨1) 𝑦+

2 ∼ 𝑔2(𝑦2, 𝑏2, 𝑑2)

𝑏2 ∼ 𝜏2(𝑦2, 𝑨2) π‘₯+ ∼ π‘œ(π‘₯, 𝑦1, 𝑏1, 𝑦2, 𝑏2) (𝑑1, 𝑑2) ∼ πœ‰(π‘₯) 𝑑1 𝑨+

1 ∼ 𝑛1,𝑙1(𝑨1)

Μ‚ 𝑑1 ∼ 𝜈1(𝑨1) Μ‚ 𝑑1 𝑑2 𝑨+

2 ∼ 𝑛2,𝑙2(𝑨2)

Μ‚ 𝑑2 ∼ 𝜈2(𝑨2) Μ‚ 𝑑2

8

slide-19
SLIDE 19

Empirical-evidence Equilibrium

Strategies 𝜏 = (𝜏1, 𝜏2, … , πœπ‘‚) Models 𝜈 = (𝜈1, 𝜈2, … , πœˆπ‘‚) Depths 𝑙 = (𝑙1, 𝑙2, … , 𝑙𝑂)

Definition

(𝜏, 𝜈) is an empirical-evidence equilibrium (EEE) for 𝑙 iff

  • for all 𝑗, πœπ‘— is optimal w.r.t. πœˆπ‘—
  • for all 𝑗, πœˆπ‘— is depth-𝑙𝑗 consistent with 𝜏

Theorem

For all 𝑙 and πœ—, there exists an πœ— EEE for 𝑙

9

slide-20
SLIDE 20

Learning Setup

State holdings 𝑦𝑗 ∈ {0 .. 𝑁} Action sell one, hold, or buy one 𝑏𝑗 ∈ {βˆ’1, 0, 1} Signal price π‘ž ∈ {Low, High} Dynamic 𝑦+

𝑗 = 𝑦𝑗 + 𝑏𝑗

Stage cost π‘ž β‹… 𝑏𝑗 Nature market trend 𝑐 ∈ {Bull, Bear} π‘₯ = (𝑐, π‘ž)

  • 0. Pick arbitrary depth-0 models 𝜈
  • 1. Design strategies 𝜏 optimal w.r.t. models 𝜈
  • 2. Formulate consistent models 𝜈upd, then, back to 1.

πœˆπ‘’+1

𝑗

= (1 βˆ’ 𝛽)πœˆπ‘’

𝑗 + 𝛽(πœˆπ‘’ 𝑗,upd βˆ’ πœˆπ‘’ 𝑗)

10

slide-21
SLIDE 21

Learning Results: Offline

20 40 60 80 100 0.2 0.4 0.6 0.8 1 Time 𝑒 Prediction πœˆπ‘’

𝑗[High]

𝑗 = 1 𝑗 = 2

11

slide-22
SLIDE 22

Learning Results: Online

20 40 60 80 100 0.2 0.4 0.6 0.8 1 Time 𝑒 Prediction πœˆπ‘’

𝑗[High]

𝑗 = 1 𝑗 = 2

11

slide-23
SLIDE 23

Concluding Remarks

Comparison with mean-field equilibria

  • Identical agents with a specific signal
  • Depth-0 model
  • Large number of agents to recover Nash equilibrium

Future directions

  • Endogenous model (𝑨+ ∼ 𝑛(𝑨, 𝑦, 𝑏))
  • Quality of EEEs
  • Learning EEEs

12

slide-24
SLIDE 24

Concluding Remarks

Comparison with mean-field equilibria

  • Identical agents with a specific signal
  • Depth-0 model
  • Large number of agents to recover Nash equilibrium

Future directions

  • Endogenous model (𝑨+ ∼ 𝑛(𝑨, 𝑦, 𝑏))
  • Quality of EEEs
  • Learning EEEs

12