Operator approach to stochastic games with varying stage duration - PowerPoint PPT Presentation

Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 26 January 2016, ADGO II, Santiago de Chile 1 G.Vigeral (with S. Sorin) Operator approach

Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 2 G.Vigeral (with S. Sorin) Operator approach

Zero-sum stochastic games Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 3 G.Vigeral (with S. Sorin) Operator approach

Zero-sum stochastic games Zero-sum stochastic game A zero-sum stochastic game Γ is a 5-tuple ( Ω , I , J , g , ρ ) where: Ω is the set of states. I (resp. J ) is the action set of Player 1 (resp. Player 2). g : I × J × Ω → [ − 1 , 1 ] is the payoff function (that Player 1 maximizes and Player 2 minimizes). ρ : I × J × Ω → ∆ ( Ω ) is the transition probability. 4 G.Vigeral (with S. Sorin) Operator approach

Zero-sum stochastic games How the Game is played An initial state ω 1 is given, known by each player. At each stage k ∈ N : the players observe the current state ω k . According to the past history, Player 1 (resp. Player 2) chooses a mixed action x k in X = ∆ ( I ) (resp. y k in Y = ∆ ( J ) ). Done independently by each player. An action i k of Player 1 (resp. j k of Player 2) is drawn according to his mixed strategy x k (resp. y k ). This gives the payoff at stage k : g k = g ( i k , j k , ω k ) . A new state ω k + 1 is drawn according to ρ ( i k , j k , ω k ) . 5 G.Vigeral (with S. Sorin) Operator approach

Zero-sum stochastic games The n -stage game For any stochastic game Γ , any finite horizon n ∈ N , and any starting state ω 1 , the n -stage game Γ n is the zero-sum game with payoff � � n ∑ E g k , k = 1 that Player 1 maximizes and Player 2 minimizes. The value of Γ n ( ω 1 ) is denoted by V n ( ω 1 ) . Normalized value v n = V n n . 6 G.Vigeral (with S. Sorin) Operator approach

Zero-sum stochastic games The discounted game For any stochastic game Γ , any discount factor λ ∈ ] 0 , 1 [ , and any starting state ω 1 , the discounted game Γ λ ( ω 1 ) is the zero-sum game with payoff � � + ∞ ( 1 − λ ) k − 1 g k ∑ , E k = 1 that Player 1 maximizes and Player 2 minimizes. The value of Γ λ ( ω 1 ) is denoted by W λ ( ω 1 ) . Normalized value w λ = λ v λ . 7 G.Vigeral (with S. Sorin) Operator approach

Zero-sum stochastic games Recursive structure Shapley (1953) proved that the values satisfy a recursive structure: � � V n ( ω ) = g ( x , y , ω )+ E ρ ( x , y , ω ) ( V n − 1 ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+ E ρ ( x , y , ω ) ( V n − 1 ( · )) y ∈ Y sup inf x ∈ X � � W λ ( ω ) = g ( x , y , ω )+( 1 − λ ) E ρ ( x , y , ω ) ( W λ ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+( 1 − λ ) E ρ ( x , y , ω ) ( W λ ( · )) . y ∈ Y sup inf x ∈ X 8 G.Vigeral (with S. Sorin) Operator approach

Zero-sum stochastic games Shapley operator This can be summarized by: Ψ ( V n − 1 ) = Ψ n ( 0 ) = V n W λ = Ψ (( 1 − λ ) W λ ) �� ∞ � 1 − λ � � � 1 − λ = λ Ψ = λ Ψ · w λ w λ λ λ for some operator Ψ . � � Ψ ( f )( ω ) = g ( x , y , ω )+ E ρ ( x , y , ω ) ( f ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+ E ρ ( x , y , ω ) ( f ( · )) y ∈ Y sup inf . x ∈ X Ψ is nonexpansive for the infinite norm � Ψ ( f ) − Ψ ( f ′ ) � ∞ ≤ � f − f ′ � ∞ . 9 G.Vigeral (with S. Sorin) Operator approach

Zero-sum stochastic games Framework This was proven by Shapley in the finite case but true in a very wide framework. For example if Ω finite, X and Y compact, g and ρ continuous. Ω , X and Y are compact metric, g and ρ continuous. See Maitra Partasarathy, Nowak, Mertens Sorin Zamir for more general frameworks. 10 G.Vigeral (with S. Sorin) Operator approach

Exact games with varying stage duration Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 11 G.Vigeral (with S. Sorin) Operator approach

Exact games with varying stage duration Definition Definition due to Neyman (2013). Instead of playing at time 1 , 2 , ··· , n , ··· , players play at times t 1 , t 2 , ··· , t n , ··· The intensity of both payoff and transition at time t k is h k = t k + 1 − t k That is g h = hg and ρ h = ( 1 − h ) Id + h ρ . Shapley operator of "exact game" with duration h : Ψ h = ( 1 − h ) Id + h Ψ 12 G.Vigeral (with S. Sorin) Operator approach

Exact games with varying stage duration Some natural questions What happens, for a fixed horizon t or discount factor λ , 1 when the duration h i of each stage vanishes ? Does the value converge, to which limit ? What happens, for a fixed sequence of stage duration h i , 2 when the horizon goes to infinity or the discount factor goes to 0. Does the normalized value converge, to which limit ? What happens when both λ (or 1 n ) and h i go to 0 ? 3 What can be said of optimal strategies in games with 4 varying duration ? Neyman answers questions 1 3 4 for finite discounted games. Here we use the operator approach to give a general answer to 1 2 3. 13 G.Vigeral (with S. Sorin) Operator approach

Exact games with varying stage duration Finite horizon Game with finite horizon and varying duration Finite horizon t , finite sequence of stage duration h 1 , ··· , h n with ∑ h i = t . The value V of such a game satisfies V = z n with z i + 1 = Ψ h i ( z i ) = ( 1 − h i ) z i + h i Ψ ( z i ) z i + 1 − z i = − ( Id − Ψ )( z i ) h i Eulerian scheme associated to f ′ = − ( Id − Ψ )( f ) . One can use general results associated to such schemes, for any non expansive operator defined on a Banach space. 14 G.Vigeral (with S. Sorin) Operator approach

Exact games with varying stage duration Finite horizon Eulerian schemes in Banach spaces For general nonexpansive Ψ : Proposition (Miyadera-Oharu ‘70, Crandall-Liggett ‘71) h ( z 0 ) � ≤ � z 0 − Ψ ( z 0 ) � h √ n . � f nh ( z 0 ) − Ψ n Proposition (V. ’10) If z i + 1 = ( 1 − h i ) z i + h i Ψ ( z i ) , then � n ∑ h 2 � f t ( z 0 ) − x n � ≤ � z 0 − Ψ ( z 0 ) � i . i = 1 with t = ∑ n i = 1 h i . 15 G.Vigeral (with S. Sorin) Operator approach

Exact games with varying stage duration Finite horizon Result with t fixed Let h = max h i and t = ∑ h i , then √ � V − f ( t ) � ≤ K ht . Hence as the mesh h goes to 0, the value of the game goes to f ( t ) . f ( t ) can be interpreted as the value of a game played in continuous time (Neyman ’13). 16 G.Vigeral (with S. Sorin) Operator approach

Exact games with varying stage duration Finite horizon Asymptotic results For any h i , � V − f ( t ) � ≤ K √ t . t All the repeated games with varying stage duration have the same (normalized) asymptotic behavior. Same asymptotic behavior for the normalized value in continuous time f ( t ) and for the normalized value of the t original game v n . 17 G.Vigeral (with S. Sorin) Operator approach

Exact games with varying stage duration Discounted evaluation Game with discount factor and varying duration Discount factor λ = weight on the payoff on [ 0 , 1 ] compared to [ 0 , + ∞ ] . Infinite sequence of stage durations h 1 , ··· , h n , ··· . � � 1 − λ h When h is constant, normalized value w h λ = λ Ψ h . λ In general w is � � + ∞ D h i ∏ ( 0 ) λ i = 1 with � 1 − λ h � D h λ ( f ) = λ Ψ h . f λ 18 G.Vigeral (with S. Sorin) Operator approach

Exact games with varying stage duration Discounted evaluation Result with λ fixed and vanishing duration λ For a uniform duration h , w h λ = w µ with µ = 1 + λ − λ h . For any λ and h i ≤ h , the value w of the λ − discounted game with stage durations h i satisfies � w − ˆ w λ � ≤ Kh w λ : = w with ˆ 1 + λ . λ Hence as the mesh h goes to 0, the value of the game goes to w 1 + λ . Already known when the game is finite λ (Neyman 2013). w λ can be interpreted as the value of a game played in ˆ continuous time (Neyman ’13). 19 G.Vigeral (with S. Sorin) Operator approach

Exact games with varying stage duration Discounted evaluation Asymptotic results Assumption: there exists nondecreasing k : ] 0 , 1 ] → R + and √ ℓ : [ 0 , + ∞ ] → R + with k ( λ ) = o ( λ ) as λ goes to 0 and � D 1 λ ( z ) − D 1 µ ( z ) � ≤ k ( | λ − µ | ) ℓ ( � z � ) for all ( λ , µ ) ∈ ] 0 , 1 ] 2 and z ∈ Z . Always true for Shapley operators of games with bounded payoff. Then for any λ and h i , the value w of the λ − discounted game with stage durations h i satisfies � w − w λ � ≤ K λ . All the repeated games with varying stage duration have the same (normalized) asymptotic behavior as λ goes to 0. Same asymptotic behavior for the normalized value in continuous time ˆ w λ and for the normalized value of the original game w λ . 20 G.Vigeral (with S. Sorin) Operator approach

Operator approach to stochastic games with varying stage duration - PowerPoint PPT Presentation

Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 26 January 2016, ADGO II, Santiago de Chile 1 G.Vigeral (with S. Sorin) Operator approach Table of contents

in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu Yale University Background

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic

VOLVO PENTA STAGE V SOLUTION Engine concept and range presentation April 2019 ADDITIONAL

Stochastic Games Reachability objectives The value (in Formal Verification) Min strategies

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r

Shuffle algebra perspective on operator valued probability theory 30 mars 2020 1/25 Operator

IGCSE MISY Mandalay 2020-2022 MISY Mandalay Key Stage 4 MISY Key Stages EYFS KS4 KS5 KS1

24/10/2018 01/12/2018 01/07/2019 01/07/2020 01/07/2021 01/07/2022 Stage 2 Stage 3 Royal

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Strategy recovery for stochastic mean payoff games Marcello Mamino TU Dresden GRASTA 15,

LOGIC OF GAMES Andreas Blass University of Michigan Ann Arbor, MI 48109 ablass@umich.edu Games

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

Delay Games with WMSO+U Winning Conditions Martin Zimmermann Saarland University July 13th,

Adversarial Search [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI

Applied machine learning in game theory Dmitrijs Rutko Faculty of Computing University of

Using expert advice Say we want to predict the stock market. We solicit n experts

N ETWORK S CIENCE Game Theory Prof. Marcello Pelillo Ca Foscari University of Venice a.y.

GUI Testing Chapter 19 GUI characteristic Figure 19.1 What is the main characteristic of

IRTF-NMRG Workshop IRTF-NMRG Workshop Challenges for Future Research on Challenges for Future

Exceptio ions and fil ile in input/output try-raise-except-finally Exception control

Sambuz

Useful Links

Newsletter

Mail Us

Operator approach to stochastic games with varying stage duration - PowerPoint PPT Presentation

Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 26 January 2016, ADGO II, Santiago de Chile 1 G.Vigeral (with S. Sorin) Operator approach Table of contents

in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu Yale University Background

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic

VOLVO PENTA STAGE V SOLUTION Engine concept and range presentation April 2019 ADDITIONAL

Stochastic Games Reachability objectives The value (in Formal Verification) Min strategies

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Nash Q-Learning for General-Sum Stochastic Games Hu &amp; Wellman March 6th, 2006 CS286r

Shuffle algebra perspective on operator valued probability theory 30 mars 2020 1/25 Operator

IGCSE MISY Mandalay 2020-2022 MISY Mandalay Key Stage 4 MISY Key Stages EYFS KS4 KS5 KS1

24/10/2018 01/12/2018 01/07/2019 01/07/2020 01/07/2021 01/07/2022 Stage 2 Stage 3 Royal

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Strategy recovery for stochastic mean payoff games Marcello Mamino TU Dresden GRASTA 15,

LOGIC OF GAMES Andreas Blass University of Michigan Ann Arbor, MI 48109 ablass@umich.edu Games

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

Delay Games with WMSO+U Winning Conditions Martin Zimmermann Saarland University July 13th,

Adversarial Search [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI

Applied machine learning in game theory Dmitrijs Rutko Faculty of Computing University of

Using expert advice Say we want to predict the stock market. We solicit n experts

N ETWORK S CIENCE Game Theory Prof. Marcello Pelillo Ca Foscari University of Venice a.y.

GUI Testing Chapter 19 GUI characteristic Figure 19.1 What is the main characteristic of

IRTF-NMRG Workshop IRTF-NMRG Workshop Challenges for Future Research on Challenges for Future

Exceptio ions and fil ile in input/output try-raise-except-finally Exception control

Sambuz

Useful Links

Newsletter

Mail Us

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r