empirical evidence equilibria in stochastic games
play

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - PowerPoint PPT Presentation

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic games Empirical-evidence equilibria (EEEs) Open questions in EEEs Stochastic Games 3 Game theory Markov decision processes Game Theory


  1. Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout

  2. Outline 2 • Stochastic games • Empirical-evidence equilibria (EEEs) • Open questions in EEEs

  3. Stochastic Games 3 • Game theory • Markov decision processes

  4. Game Theory Decision making 1 , 𝑏 2 ) 𝑣 2 (𝑏 ∗ 𝑏 2 ∈𝒝 2 𝑏 ∗ 2 ) 𝑣 1 (𝑏 1 , 𝑏 ∗ 𝑏 1 ∈𝒝 1 𝑏 ∗ ⎩ ⎪ ⎨ ⎪ ⎧ Nash Equilibrium Game theory 𝑣(𝑏) 𝑏∈𝒝 4 𝑣∶ 𝒝 → ℝ ⟹ 𝑏 ∗ ∈ arg max 𝑣 1 ∶ 𝒝 1 × 𝒝 2 → ℝ 𝑣 2 ∶ 𝒝 1 × 𝒝 2 → ℝ 1 ∈ arg max 2 ∈ arg max

  5. Example: Battle of the Sexes 0, 0 3 𝑃) F 1, 3 Nash equilibria O 0, 1 2, 2 F O 5 • (𝐺, 𝐺) • (𝑃, 𝑃) 4 𝐺 1 / 4 𝑃, 1 / 3 𝐺 2 / • ( 3 /

  6. Markov Decision Process (MDP) Stage cost 𝑣(𝑦, 𝑏) Strategy 𝜏 ∶ ℋ → 𝒝 ∞ ∑ 𝑢=0 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 )] Bellman’s equation Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction 6 Dynamic 𝑦 + ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦 𝑢+1 ∼ 𝑔(𝑦 𝑢 , 𝑏 𝑢 ) History ℎ 𝑢 = (𝑦 0 , 𝑦 1 , … , 𝑦 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Utility 𝑉(𝜏) = 𝔽 𝑔,𝜏 [ 𝑉 ∗ (𝑦) = max 𝑏∈𝒝 {𝑣(𝑦, 𝑏) + 𝜀𝔽 𝑔 [𝑉 ∗ (𝑦 + ) | 𝑦, 𝑏]}

  7. Markov Decision Process (MDP) Stage cost 𝑣(𝑦, 𝑏) Strategy 𝜏 ∶ ℋ → 𝒝 ∞ ∑ 𝑢=0 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 )] Bellman’s equation Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction 6 Dynamic 𝑦 + ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦 𝑢+1 ∼ 𝑔(𝑦 𝑢 , 𝑏 𝑢 ) History ℎ 𝑢 = (𝑦 0 , 𝑦 1 , … , 𝑦 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Utility 𝑉(𝜏) = 𝔽 𝑔,𝜏 [ 𝑉 ∗ (𝑦) = max 𝑏∈𝒝 {𝑣(𝑦, 𝑏) + 𝜀𝔽 𝑔 [𝑉 ∗ (𝑦 + ) | 𝑦, 𝑏]}

  8. Markov Decision Process (MDP) Stage cost 𝑣(𝑦, 𝑏) Strategy 𝜏 ∶ 𝒴 → 𝒝 ∞ ∑ 𝑢=0 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 )] Bellman’s equation Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction 6 Dynamic 𝑦 + ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦 𝑢+1 ∼ 𝑔(𝑦 𝑢 , 𝑏 𝑢 ) History ℎ 𝑢 = (𝑦 0 , 𝑦 1 , … , 𝑦 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Utility 𝑉(𝜏) = 𝔽 𝑔,𝜏 [ 𝑉 ∗ (𝑦) = max 𝑏∈𝒝 {𝑣(𝑦, 𝑏) + 𝜀𝔽 𝑔 [𝑉 ∗ (𝑦 + ) | 𝑦, 𝑏]}

  9. Imperfect Information (POMDP) Signal 𝑡 ∼ 𝜉(𝑥) Strategy 𝜏 ∶ ℋ → 𝒝 7 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏) History ℎ 𝑢 = (𝑡 0 , 𝑡 1 , … , 𝑡 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Belief ℙ 𝑜,𝜉,𝜏 [𝑥 | ℎ]

  10. Imperfect Information (POMDP) Signal 𝑡 ∼ 𝜉(𝑥) Strategy 𝜏 ∶ ℋ → 𝒝 7 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏) History ℎ 𝑢 = (𝑡 0 , 𝑡 1 , … , 𝑡 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Belief ℙ 𝑜,𝜉,𝜏 [𝑥 | ℎ]

  11. Imperfect Information (POMDP) Signal 𝑡 ∼ 𝜉(𝑥) 7 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏) History ℎ 𝑢 = (𝑡 0 , 𝑡 1 , … , 𝑡 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Strategy 𝜏 ∶ Δ(𝒳 ) → 𝒝 Belief ℙ 𝑜,𝜉,𝜏 [𝑥 | ℎ]

  12. Stochastic Games 0 , 𝑏 1 2 ) 1 , … , 𝑏 𝑢 0 , 𝑏 2 2 , 𝑏 2 1 , … , 𝑡 𝑢 0 , 𝑡 2 1 ) 1 , … , 𝑏 𝑢 ℎ 𝑢 1 , 𝑏 1 ℎ 𝑢 1 , … , 𝑡 𝑢 8 0 , 𝑡 1 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏 1 , 𝑏 2 ) 𝑡 1 ∼ 𝜉 1 (𝑥) Signals { 𝑡 2 ∼ 𝜉 2 (𝑥) 1 = (𝑡 1 Histories { 2 = (𝑡 2 𝜏 1 ∶ ℋ 1 → 𝒝 1 Strategies { 𝜏 2 ∶ ℋ 2 → 𝒝 2 ℙ 𝑜,𝜉 1 ,𝜏 1 ,𝜉 2 ,𝜏 2 [𝑥, ℎ 2 | ℎ 1 ] Beliefs { ℙ 𝑜,𝜉 1 ,𝜏 1 ,𝜉 2 ,𝜏 2 [𝑥, ℎ 1 | ℎ 2 ]

  13. Stochastic Games 0 , 𝑏 1 2 ) 1 , … , 𝑏 𝑢 0 , 𝑏 2 2 , 𝑏 2 1 , … , 𝑡 𝑢 0 , 𝑡 2 1 ) 1 , … , 𝑏 𝑢 ℎ 𝑢 1 , 𝑏 1 ℎ 𝑢 1 , … , 𝑡 𝑢 8 0 , 𝑡 1 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏 1 , 𝑏 2 ) 𝑡 1 ∼ 𝜉 1 (𝑥) Signals { 𝑡 2 ∼ 𝜉 2 (𝑥) 1 = (𝑡 1 Histories { 2 = (𝑡 2 𝜏 1 ∶ ℋ 1 → 𝒝 1 Strategies { 𝜏 2 ∶ ℋ 2 → 𝒝 2 ℙ 𝑜,𝜉 1 ,𝜏 1 ,𝜉 2 ,𝜏 2 [𝑥, ℎ 2 | ℎ 1 ] Beliefs { ℙ 𝑜,𝜉 1 ,𝜏 1 ,𝜉 2 ,𝜏 2 [𝑥, ℎ 1 | ℎ 2 ]

  14. Existing Approaches 9 • (Weakly) belief-free equilibrium • Mean-field equilibrium • Incomplete theories

  15. Empirical-evidence Equilibria 10

  16. Motivation Agent 1 Nature Agent 2 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. Empirical-evidence equilibrium is a fixed point: 11 • Strategies optimal w.r.t. models • Models consistent with strategies

  17. Example: Asset Management Trading one asset on the stock market Model based on Model very different for each agent 12 • information published by the company • observed trading activity

  18. Multiple to Single Agent Agent 1 Nature Agent 2 13

  19. Multiple to Single Agent Agent 1 Nature Agent 2 Nature 1 13

  20. Single Agent Setup Agent Nature 14

  21. Single Agent Setup Nature 14 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡)

  22. Single Agent Setup Nature 𝑡 14 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡)

  23. Single Agent Setup 𝑡 ∼ 𝜉(𝑥) 𝑡 14 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏) 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡)

  24. Example: Asset Management 𝑡 ∼ 𝜉(𝑥) 𝑡 Stage cost 𝑞 ⋅ 𝑏 Nature 𝑥 represents market sentiment, political climate, other traders 15 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏) 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) State holding 𝑦 ∈ {0 .. 𝑁} Action sell one, hold, or buy one 𝑏 ∈ {−1, 0, 1} Signal price 𝑞 ∈ { Low , High }

  25. Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  26. Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  27. Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  28. Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝑡 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  29. Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  30. Single Agent Setup 𝑡) 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, ̂ 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  31. Single Agent Setup 𝑡) 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, ̂ 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  32. • 0 characteristic: ℙ[𝑡 = 0], ℙ[𝑡 = 1] • 1 characteristic: ℙ[𝑡𝑡 + = 00], ℙ[𝑡𝑡 + = 10], ℙ[𝑡𝑡 + = 01], ℙ[𝑡𝑡 + = 11] • ... • 𝑙 characteristic: probability of strings of length 𝑙 + 1 Definition Two processes 𝑡 and 𝑡 ′ are depth- 𝑙 consistent if Depth- 𝑙 Consistency Consider a binary stochastic process 𝑡 0100010001001010010110111010000111010101... they have the same 𝑙 characteristic 17

  33. Definition Two processes 𝑡 and 𝑡 ′ are depth- 𝑙 consistent if Depth- 𝑙 Consistency Consider a binary stochastic process 𝑡 0100010001001010010110111010000111010101... they have the same 𝑙 characteristic 17 • 0 characteristic: ℙ[𝑡 = 0], ℙ[𝑡 = 1] • 1 characteristic: ℙ[𝑡𝑡 + = 00], ℙ[𝑡𝑡 + = 10], ℙ[𝑡𝑡 + = 01], ℙ[𝑡𝑡 + = 11] • ... • 𝑙 characteristic: probability of strings of length 𝑙 + 1

  34. Depth- 𝑙 Consistency Consider a binary stochastic process 𝑡 0100010001001010010110111010000111010101... they have the same 𝑙 characteristic 17 • 0 characteristic: ℙ[𝑡 = 0], ℙ[𝑡 = 1] • 1 characteristic: ℙ[𝑡𝑡 + = 00], ℙ[𝑡𝑡 + = 10], ℙ[𝑡𝑡 + = 01], ℙ[𝑡𝑡 + = 11] • ... • 𝑙 characteristic: probability of strings of length 𝑙 + 1 Definition Two processes 𝑡 and 𝑡 ′ are depth- 𝑙 consistent if

  35. Depth- 𝑙 Consistency: Example 1 0.7 0.3 0.7 0.3 1 1 0 0 𝑨 1 𝑨 0 0.5 0.5 𝑨 ∅ 1 18

  36. 𝜈(𝑨 = (𝑡 1 , 𝑡 2 , … , 𝑡 𝑙 ))[𝑡 𝑙+1 ] = ℙ 𝜏 [𝑡 𝑢+1 = 𝑡 𝑙+1 | 𝑡 𝑢 = 𝑡 𝑙 , … , 𝑡 𝑢−𝑙+1 = 𝑡 1 ] Complete picture Fix a depth 𝑙 ∈ ℕ 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) Model 𝑡 ̂ 𝑡 𝜏 ↦ 𝜈 consistent with 𝜏 𝜈 ↦ 𝜏 optimal w.r.t. 𝜈 𝑨 contains the last 𝑙 observed signals 19 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend