Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - PowerPoint PPT Presentation

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Georgia Institute of Technology

Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach: 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. The fixed points are EEEs Example Asset management on the stock market 2

Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 Nature 1 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach: 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. The fixed points are EEEs Example Asset management on the stock market 2

• 𝜈 consistent with 𝜏 • 𝜏 optimal w.r.t. 𝜈 Single-agent Setup 𝔽 𝜏 [ 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 , 𝑡 𝑢 )] 𝑢=0 ∑ ∞ 𝜏 Agent max 𝑡 ̂ ̂ Nature 3 𝑨 + ∼ 𝑛(𝑨) 𝑡 + ∼ 𝜈(𝑨)

• 𝜈 consistent with 𝜏 • 𝜏 optimal w.r.t. 𝜈 Single-agent Setup 𝜏 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 , 𝑡 𝑢 )] 𝑢=0 ∑ ∞ 𝔽 𝜏 [ max 𝑡 ̂ ̂ 𝑡 Nature 𝑏 ∼ 𝜏(ℎ) 3 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑨 + ∼ 𝑛(𝑨) 𝑡 + ∼ 𝜈(𝑨)

• 𝜈 consistent with 𝜏 • 𝜏 optimal w.r.t. 𝜈 Single-agent Setup max 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 , 𝑡 𝑢 )] 𝑢=0 ∑ ∞ 𝔽 𝜏 [ 𝜏 𝑡 ̂ ̂ 𝑡 𝑡 ∼ 𝜉(𝑥) 𝑏 ∼ 𝜏(ℎ) 3 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏) 𝑨 + ∼ 𝑛(𝑨) 𝑡 + ∼ 𝜈(𝑨)

• 𝜈 consistent with 𝜏 • 𝜏 optimal w.r.t. 𝜈 Single-agent Setup 𝜏 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 , 𝑡 𝑢 )] 𝑢=0 ∑ ∞ 𝔽 𝜏 [ max 𝑡 ̂ Model 𝑡 𝑡 ∼ 𝜉(𝑥) 𝑏 ∼ 𝜏(ℎ) 3 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

• 𝜈 consistent with 𝜏 • 𝜏 optimal w.r.t. 𝜈 Single-agent Setup max 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 , 𝑡 𝑢 )] 𝑢=0 ∑ ∞ 𝔽 𝜏 [ 𝜏 𝑡 ̂ ̂ 𝑡 𝑡 ∼ 𝜉(𝑥) 𝑏 ∼ 𝜏(𝑦, 𝑨) 3 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏) 𝑨 + ∼ 𝑛(𝑨) 𝑡 + ∼ 𝜈(𝑨)

Single-agent Setup max 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 , 𝑡 𝑢 )] 𝑢=0 ∑ ∞ 𝔽 𝜏 [ 𝜏 𝑡 ̂ ̂ 𝑡 𝑡 ∼ 𝜉(𝑥) 𝑏 ∼ 𝜏(𝑦, 𝑨) 3 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏) 𝑨 + ∼ 𝑛(𝑨) 𝑡 + ∼ 𝜈(𝑨) • 𝜈 consistent with 𝜏 • 𝜏 optimal w.r.t. 𝜈

Depth- 𝑙 Consistency Binary stochastic process 𝑡 0100010001001010010110111010000111010101... Definition Two processes 𝑡 and ̂ 𝑡 are depth- 𝑙 consistent if they have the same 𝑙 characteristic 4 • 0 characteristic: ℙ[𝑡 = 0], ℙ[𝑡 = 1] • 1 characteristic: ℙ[𝑡𝑡 + = 00], ℙ[𝑡𝑡 + = 10], ℙ[𝑡𝑡 + = 01], ℙ[𝑡𝑡 + = 11] • ... • 𝑙 characteristic: probability of strings of length 𝑙 + 1

Complete Picture ̂ observed signals 𝑨 contains the last 𝑙 Fix a depth 𝑙 ∈ ℕ 𝑡 𝑡 ∼ 𝜈(𝑨) ̂ 𝑡 𝑡 ∼ 𝜉(𝑥) 𝑏 ∼ 𝜏(𝑦, 𝑨) 5 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏) 𝑨 + ∼ 𝑛 𝑙 (𝑨) 𝜈(𝑨 = (𝑡 1 , 𝑡 2 , … , 𝑡 𝑙 ))[𝑡 𝑙+1 ] = ℙ 𝜏 [𝑡 𝑢+1 = 𝑡 𝑙+1 | 𝑡 𝑢 = 𝑡 𝑙 , … , 𝑡 𝑢−𝑙+1 = 𝑡 1 ]

(𝜏, 𝜈) is an 𝜗 empirical-evidence optimum ( 𝜗 EEO) for 𝑙 iff • 𝜏 is 𝜗 optimal w.r.t. 𝜈 • 𝜈 is depth- 𝑙 consistent with 𝜏 Empirical-evidence Optimality Definition Definition 6 (𝜏, 𝜈) is an empirical-evidence optimum (EEO) for 𝑙 iff • 𝜏 is optimal w.r.t. 𝜈 • 𝜈 is depth- 𝑙 consistent with 𝜏

Empirical-evidence Optimality Definition Definition 6 (𝜏, 𝜈) is an empirical-evidence optimum (EEO) for 𝑙 iff • 𝜏 is optimal w.r.t. 𝜈 • 𝜈 is depth- 𝑙 consistent with 𝜏 (𝜏, 𝜈) is an 𝜗 empirical-evidence optimum ( 𝜗 EEO) for 𝑙 iff • 𝜏 is 𝜗 optimal w.r.t. 𝜈 • 𝜈 is depth- 𝑙 consistent with 𝜏

• Technical assumption insures ergodicity of 𝑡 • 𝑈 ∶ 𝜏 • 𝜏 ∶ 𝒴 × 𝒶 → Δ(𝒝) is parametrized over a simplex • Apply Brouwer’s fixed point theorem to 𝑈 Existence Result Theorem For all 𝑙 and 𝜗 , there exists an 𝜗 EEO for 𝑙 Proof sketch 𝑑𝑝𝑜𝑡𝑗𝑡𝑢𝑓𝑜𝑑𝑧 ⟝←←←←←←→ 𝜈 𝜗 𝑝𝑞𝑢𝑗𝑛𝑏𝑚𝑗𝑢𝑧 ⟝←←←←←←←→ 𝜏 is continuous 7

Existence Result Theorem For all 𝑙 and 𝜗 , there exists an 𝜗 EEO for 𝑙 Proof sketch 𝑑𝑝𝑜𝑡𝑗𝑡𝑢𝑓𝑜𝑑𝑧 ⟝←←←←←←→ 𝜈 𝜗 𝑝𝑞𝑢𝑗𝑛𝑏𝑚𝑗𝑢𝑧 ⟝←←←←←←←→ 𝜏 is continuous 7 • Technical assumption insures ergodicity of 𝑡 • 𝑈 ∶ 𝜏 • 𝜏 ∶ 𝒴 × 𝒶 → Δ(𝒝) is parametrized over a simplex • Apply Brouwer’s fixed point theorem to 𝑈

Multiagent Setup 𝑦 + 𝑡 2 ̂ ̂ 𝑨 + 𝑡 2 𝑡 1 ̂ ̂ 𝑨 + 𝑡 1 𝑦 + 8 𝑥 + ∼ 𝑜(𝑥, 𝑦 1 , 𝑏 1 , 𝑦 2 , 𝑏 2 ) (𝑡 1 , 𝑡 2 ) ∼ 𝜉(𝑥) 1 ∼ 𝑔 1 (𝑦 1 , 𝑏 1 , 𝑡 1 ) 2 ∼ 𝑔 2 (𝑦 2 , 𝑏 2 , 𝑡 2 ) 𝑏 1 ∼ 𝜏 1 (𝑦 1 , 𝑨 1 ) 𝑏 2 ∼ 𝜏 2 (𝑦 2 , 𝑨 2 ) 1 ∼ 𝑛 1,𝑙 1 (𝑨 1 ) 2 ∼ 𝑛 2,𝑙 2 (𝑨 2 ) 𝑡 1 ∼ 𝜈 1 (𝑨 1 ) 𝑡 2 ∼ 𝜈 2 (𝑨 2 )

Empirical-evidence Equilibrium Definition Theorem For all 𝑙 and 𝜗 , there exists an 𝜗 EEE for 𝑙 9 Strategies 𝜏 = (𝜏 1 , 𝜏 2 , … , 𝜏 𝑂 ) Models 𝜈 = (𝜈 1 , 𝜈 2 , … , 𝜈 𝑂 ) Depths 𝑙 = (𝑙 1 , 𝑙 2 , … , 𝑙 𝑂 ) (𝜏, 𝜈) is an empirical-evidence equilibrium (EEE) for 𝑙 iff • for all 𝑗 , 𝜏 𝑗 is optimal w.r.t. 𝜈 𝑗 • for all 𝑗 , 𝜈 𝑗 is depth- 𝑙 𝑗 consistent with 𝜏

Learning Setup 1. Design strategies 𝜏 optimal w.r.t. models 𝜈 𝑗 ) = (1 − 𝛽)𝜈 𝑢 𝑗 𝜈 𝑢+1 2. Formulate consistent models 𝜈 upd , then, back to 1. 0. Pick arbitrary depth- 0 models 𝜈 Stage cost 𝑞 ⋅ 𝑏 𝑗 Dynamic 𝑦 + 10 State holdings 𝑦 𝑗 ∈ {0 .. 𝑁} Action sell one, hold, or buy one 𝑏 𝑗 ∈ {−1, 0, 1} Signal price 𝑞 ∈ { Low , High } 𝑗 = 𝑦 𝑗 + 𝑏 𝑗 Nature market trend 𝑐 ∈ { Bull , Bear } 𝑥 = (𝑐, 𝑞) 𝑗 + 𝛽(𝜈 𝑢 𝑗, upd − 𝜈 𝑢

Learning Results: Offline 0.6 𝑗 = 2 𝑗 = 1 𝑗 [High] Prediction 𝜈 𝑢 Time 𝑢 1 0.8 0.4 0 0.2 0 100 80 60 40 20 11

Learning Results: Online 0.6 𝑗 = 2 𝑗 = 1 𝑗 [High] Prediction 𝜈 𝑢 Time 𝑢 1 0.8 0.4 0 0.2 0 100 80 60 40 20 11

• Endogenous model ( 𝑨 + ∼ 𝑛(𝑨, 𝑦, 𝑏) ) • Quality of EEEs • Learning EEEs Concluding Remarks Comparison with mean-field equilibria Future directions 12 • Identical agents with a specific signal • Depth- 0 model • Large number of agents to recover Nash equilibrium

Concluding Remarks Comparison with mean-field equilibria Future directions 12 • Identical agents with a specific signal • Depth- 0 model • Large number of agents to recover Nash equilibrium • Endogenous model ( 𝑨 + ∼ 𝑛(𝑨, 𝑦, 𝑏) ) • Quality of EEEs • Learning EEEs

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - PowerPoint PPT Presentation

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Georgia Institute of Technology Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic

Chemistry 2000 Slide Set 19b: Organic acids Acid dissociation equilibria Marc R. Roussel March

Sustainable Equilibria I Myerson (1996) argued informally for a new refinement concept that he

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Equilibria in large one-arm bandit games A. Salomon Universit e Paris 13 HEC Paris November

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r

Stochastic Games Reachability objectives The value (in Formal Verification) Min strategies

Tighter Bounds on the Inefficiency Ratio of Stable Equilibria in Load Balancing Games Akaki

Efficiency of equilibria Non-atomic routing games Non-atomic routing games Definition:

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Strategy recovery for stochastic mean payoff games Marcello Mamino TU Dresden GRASTA 15,

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System Runbin Shi 1 Junjie

Chapter 1 Electrons and Holes in Semiconductors 1.1 Silicon Crystal Structure Unit cell

Beladys Anomaly with Round Robin Num 1 2 3 4 5 6 7 8 9 10 11 12 Refs a b c d

Testing Steve Loughran HP Laboratories Thursday November 6th, 2006 your code doesn't work i

esap Re-use & Recycling Working Group 4 February 2015 Product Lifecycle 5 esap themes

Verification of Redecoration for Infinite Triangular Matrices in Coq Celia Picard joint work

The Extreme Energy Event network Status and Perspectives Ivan Gnesi for the EEE Collaboration

From magneto-optics From magneto optics to ultrafast manipulation of magnetism Andrei Kirilyuk

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - PowerPoint PPT Presentation

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Georgia Institute of Technology Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic

Chemistry 2000 Slide Set 19b: Organic acids Acid dissociation equilibria Marc R. Roussel March

Sustainable Equilibria I Myerson (1996) argued informally for a new refinement concept that he

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Equilibria in large one-arm bandit games A. Salomon Universit e Paris 13 HEC Paris November

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI &amp; University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI &amp; University of

Nash Q-Learning for General-Sum Stochastic Games Hu &amp; Wellman March 6th, 2006 CS286r

Stochastic Games Reachability objectives The value (in Formal Verification) Min strategies

Tighter Bounds on the Inefficiency Ratio of Stable Equilibria in Load Balancing Games Akaki

Efficiency of equilibria Non-atomic routing games Non-atomic routing games Definition:

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Strategy recovery for stochastic mean payoff games Marcello Mamino TU Dresden GRASTA 15,

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System Runbin Shi 1 Junjie

Chapter 1 Electrons and Holes in Semiconductors 1.1 Silicon Crystal Structure Unit cell

Beladys Anomaly with Round Robin Num 1 2 3 4 5 6 7 8 9 10 11 12 Refs a b c d

Testing Steve Loughran HP Laboratories Thursday November 6th, 2006 your code doesn't work i

esap Re-use &amp; Recycling Working Group 4 February 2015 Product Lifecycle 5 esap themes

Verification of Redecoration for Infinite Triangular Matrices in Coq Celia Picard joint work

The Extreme Energy Event network Status and Perspectives Ivan Gnesi for the EEE Collaboration

From magneto-optics From magneto optics to ultrafast manipulation of magnetism Andrei Kirilyuk

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r

esap Re-use & Recycling Working Group 4 February 2015 Product Lifecycle 5 esap themes