Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - - PowerPoint PPT Presentation
Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - - PowerPoint PPT Presentation
Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Georgia Institute of Technology Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal
Empirical-evidence Equilibria (EEEs)
Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach:
- 0. Pick arbitrary strategies
- 1. Formulate simple but consistent models
- 2. Design strategies optimal w.r.t. models, then, back to 1.
The fixed points are EEEs
Example
Asset management on the stock market
2
Empirical-evidence Equilibria (EEEs)
Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach:
- 0. Pick arbitrary strategies
- 1. Formulate simple but consistent models
- 2. Design strategies optimal w.r.t. models, then, back to 1.
The fixed points are EEEs
Example
Asset management on the stock market
2
Empirical-evidence Equilibria (EEEs)
Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach:
- 0. Pick arbitrary strategies
- 1. Formulate simple but consistent models
- 2. Design strategies optimal w.r.t. models, then, back to 1.
The fixed points are EEEs
Example
Asset management on the stock market
2
Empirical-evidence Equilibria (EEEs)
Agent 1 Nature Agent 2 Nature 1 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach:
- 0. Pick arbitrary strategies
- 1. Formulate simple but consistent models
- 2. Design strategies optimal w.r.t. models, then, back to 1.
The fixed points are EEEs
Example
Asset management on the stock market
2
Single-agent Setup
Agent Nature π¨+ βΌ π(π¨) Μ π‘+ βΌ π(π¨) Μ π‘ max
π
π½π[
β
β
π’=0
ππ’π£(π¦π’, ππ’, π‘π’)]
- π consistent with π
- π optimal w.r.t. π
3
Single-agent Setup
π¦+ βΌ π(π¦, π, π‘) π βΌ π(β) Nature π‘ π¨+ βΌ π(π¨) Μ π‘+ βΌ π(π¨) Μ π‘ max
π
π½π[
β
β
π’=0
ππ’π£(π¦π’, ππ’, π‘π’)]
- π consistent with π
- π optimal w.r.t. π
3
Single-agent Setup
π¦+ βΌ π(π¦, π, π‘) π βΌ π(β) π₯+ βΌ π(π₯, π¦, π) π‘ βΌ π(π₯) π‘ π¨+ βΌ π(π¨) Μ π‘+ βΌ π(π¨) Μ π‘ max
π
π½π[
β
β
π’=0
ππ’π£(π¦π’, ππ’, π‘π’)]
- π consistent with π
- π optimal w.r.t. π
3
Single-agent Setup
π¦+ βΌ π(π¦, π, π‘) π βΌ π(β) π₯+ βΌ π(π₯, π¦, π) π‘ βΌ π(π₯) π‘ Model Μ π‘ max
π
π½π[
β
β
π’=0
ππ’π£(π¦π’, ππ’, π‘π’)]
- π consistent with π
- π optimal w.r.t. π
3
Single-agent Setup
π¦+ βΌ π(π¦, π, π‘) π βΌ π(π¦, π¨) π₯+ βΌ π(π₯, π¦, π) π‘ βΌ π(π₯) π‘ π¨+ βΌ π(π¨) Μ π‘+ βΌ π(π¨) Μ π‘ max
π
π½π[
β
β
π’=0
ππ’π£(π¦π’, ππ’, π‘π’)]
- π consistent with π
- π optimal w.r.t. π
3
Single-agent Setup
π¦+ βΌ π(π¦, π, π‘) π βΌ π(π¦, π¨) π₯+ βΌ π(π₯, π¦, π) π‘ βΌ π(π₯) π‘ π¨+ βΌ π(π¨) Μ π‘+ βΌ π(π¨) Μ π‘ max
π
π½π[
β
β
π’=0
ππ’π£(π¦π’, ππ’, π‘π’)]
- π consistent with π
- π optimal w.r.t. π
3
Depth-π Consistency
Binary stochastic process π‘ 0100010001001010010110111010000111010101...
- 0 characteristic: β[π‘ = 0], β[π‘ = 1]
- 1 characteristic: β[π‘π‘+ = 00], β[π‘π‘+ = 10],
β[π‘π‘+ = 01], β[π‘π‘+ = 11]
- ...
- π characteristic: probability of strings of length π + 1
Definition
Two processes π‘ and Μ π‘ are depth-π consistent if they have the same π characteristic
4
Complete Picture
π¦+ βΌ π(π¦, π, π‘) π βΌ π(π¦, π¨) π₯+ βΌ π(π₯, π¦, π) π‘ βΌ π(π₯) π‘ π¨+ βΌ ππ(π¨) Μ π‘ βΌ π(π¨) Μ π‘ Fix a depth π β β π¨ contains the last π
- bserved signals
π(π¨ = (π‘1, π‘2, β¦ , π‘π))[π‘π+1] = βπ[π‘π’+1 = π‘π+1 | π‘π’ = π‘π, β¦ , π‘π’βπ+1 = π‘1]
5
Empirical-evidence Optimality
Definition
(π, π) is an empirical-evidence optimum (EEO) for π iff
- π is optimal w.r.t. π
- π is depth-π consistent with π
Definition
(π, π) is an π empirical-evidence optimum (π EEO) for π iff
- π is π optimal w.r.t. π
- π is depth-π consistent with π
6
Empirical-evidence Optimality
Definition
(π, π) is an empirical-evidence optimum (EEO) for π iff
- π is optimal w.r.t. π
- π is depth-π consistent with π
Definition
(π, π) is an π empirical-evidence optimum (π EEO) for π iff
- π is π optimal w.r.t. π
- π is depth-π consistent with π
6
Existence Result
Theorem
For all π and π, there exists an π EEO for π
Proof sketch
- Technical assumption insures ergodicity of π‘
- π βΆ π
ππππ‘ππ‘π’ππππ§
ββββββββ π
π πππ’ππππππ’π§
βββββββββ π is continuous
- π βΆ π΄ Γ πΆ β Ξ(π) is parametrized over a simplex
- Apply Brouwerβs fixed point theorem to π
7
Existence Result
Theorem
For all π and π, there exists an π EEO for π
Proof sketch
- Technical assumption insures ergodicity of π‘
- π βΆ π
ππππ‘ππ‘π’ππππ§
ββββββββ π
π πππ’ππππππ’π§
βββββββββ π is continuous
- π βΆ π΄ Γ πΆ β Ξ(π) is parametrized over a simplex
- Apply Brouwerβs fixed point theorem to π
7
Multiagent Setup
π¦+
1 βΌ π1(π¦1, π1, π‘1)
π1 βΌ π1(π¦1, π¨1) π¦+
2 βΌ π2(π¦2, π2, π‘2)
π2 βΌ π2(π¦2, π¨2) π₯+ βΌ π(π₯, π¦1, π1, π¦2, π2) (π‘1, π‘2) βΌ π(π₯) π‘1 π¨+
1 βΌ π1,π1(π¨1)
Μ π‘1 βΌ π1(π¨1) Μ π‘1 π‘2 π¨+
2 βΌ π2,π2(π¨2)
Μ π‘2 βΌ π2(π¨2) Μ π‘2
8
Empirical-evidence Equilibrium
Strategies π = (π1, π2, β¦ , ππ) Models π = (π1, π2, β¦ , ππ) Depths π = (π1, π2, β¦ , ππ)
Definition
(π, π) is an empirical-evidence equilibrium (EEE) for π iff
- for all π, ππ is optimal w.r.t. ππ
- for all π, ππ is depth-ππ consistent with π
Theorem
For all π and π, there exists an π EEE for π
9
Learning Setup
State holdings π¦π β {0 .. π} Action sell one, hold, or buy one ππ β {β1, 0, 1} Signal price π β {Low, High} Dynamic π¦+
π = π¦π + ππ
Stage cost π β ππ Nature market trend π β {Bull, Bear} π₯ = (π, π)
- 0. Pick arbitrary depth-0 models π
- 1. Design strategies π optimal w.r.t. models π
- 2. Formulate consistent models πupd, then, back to 1.
ππ’+1
π
= (1 β π½)ππ’
π + π½(ππ’ π,upd β ππ’ π)
10
Learning Results: Offline
20 40 60 80 100 0.2 0.4 0.6 0.8 1 Time π’ Prediction ππ’
π[High]
π = 1 π = 2
11
Learning Results: Online
20 40 60 80 100 0.2 0.4 0.6 0.8 1 Time π’ Prediction ππ’
π[High]
π = 1 π = 2
11
Concluding Remarks
Comparison with mean-field equilibria
- Identical agents with a specific signal
- Depth-0 model
- Large number of agents to recover Nash equilibrium
Future directions
- Endogenous model (π¨+ βΌ π(π¨, π¦, π))
- Quality of EEEs
- Learning EEEs
12
Concluding Remarks
Comparison with mean-field equilibria
- Identical agents with a specific signal
- Depth-0 model
- Large number of agents to recover Nash equilibrium
Future directions
- Endogenous model (π¨+ βΌ π(π¨, π¦, π))
- Quality of EEEs
- Learning EEEs
12