multigrid methods for two player zero sum stochastic games
play

Multigrid methods for two player zero-sum stochastic games Sylvie - PowerPoint PPT Presentation

Discounted Stochastic Games Stochastic Games with mean payoff Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and CMAP, Ecole Polytechnique Soutenance de th` ese Le 25 septembre, 2012 Sylvie


  1. Discounted Stochastic Games Stochastic Games with mean payoff Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and CMAP, ´ Ecole Polytechnique Soutenance de th` ese Le 25 septembre, 2012 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 1 / 53

  2. Discounted Stochastic Games Stochastic Games with mean payoff Outline Zero-sum two player stochastic game with discounted payoff Dynamic Programing equations Policy iteration and multigrids : AMG π Numerical results Zero-sum two player stochastic game with mean payoff Unichain case Dynamic Programing equations Policy iteration and multigrids : AMG π Numerical results Multichain case Dynamic Programing equations Policy iteration for multichain Numerical results Conclusions Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 2 / 53

  3. Discounted Stochastic Games Stochastic Games with mean payoff Dynamic programming equation of zero-sum two-player stochastic games � v ( x ) = max min γ P ( y | x , a , b ) v ( y ) + r ( x , a , b ) a ∈A ( x ) b ∈B ( x , a ) y ∈X ∀ x ∈ X (DP) X state space v ( x ) the value of the game starting at x ∈ X , a , b action of the 1st, 2nd player MAX, MIN r ( x , a , b ) reward paid by MIN to MAX P ( y | x , a , b ) transition probability from x to y given the actions a , b γ < 1 discount factor Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 3 / 53

  4. Discounted Stochastic Games Stochastic Games with mean payoff Value of the game starting in x � ∞ � � γ k r ( X k , a k , b k ) v ( x ) = max ( a k ) k ≥ 0 min ( b k ) k ≥ 0 E k =0 where � a k = a k ( X k , b k − 1 , a k − 1 , X k − 1 · · · ) b k = b k ( X k , a k , · · · ) are strategies and the state dynamics satisfies the process X k P ( X k +1 = y | X k = x , a k = a , b k = b ) = P ( y | x , a , b ) Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 4 / 53

  5. Discounted Stochastic Games Stochastic Games with mean payoff Deterministic zero-sum two-player game 5 3 Circles : Max plays Squares : MIN plays −2 4’ 0 11 Weight on the edges : payment made by −3 MIN to MAX 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 5 / 53

  6. Discounted Stochastic Games Stochastic Games with mean payoff 5 3 −2 4’ If Max initially moves to 2 ′ 0 11 −3 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 6 / 53

  7. Discounted Stochastic Games Stochastic Games with mean payoff 5 3 −2 4’ If Max initially moves to 2 ′ 0 11 −3 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 6 / 53

  8. Discounted Stochastic Games Stochastic Games with mean payoff 5 3 −2 4’ If Max initially moves to 2 ′ 0 11 −3 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 6 / 53

  9. Discounted Stochastic Games Stochastic Games with mean payoff 5 3 −2 4’ If Max initially moves to 2 ′ 0 11 −3 2 1’ he eventually looses 5 per turn. −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 6 / 53

  10. Discounted Stochastic Games Stochastic Games with mean payoff 5 3 But if Max initially moves to 1 ′ −2 4’ 0 11 −3 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 7 / 53

  11. Discounted Stochastic Games Stochastic Games with mean payoff 5 3 But if Max initially moves to 1 ′ −2 4’ 0 11 −3 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 7 / 53

  12. Discounted Stochastic Games Stochastic Games with mean payoff 5 3 But if Max initially moves to 1 ′ −2 4’ 0 11 −3 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 7 / 53

  13. Discounted Stochastic Games Stochastic Games with mean payoff 5 3 But if Max initially moves to 1 ′ −2 4’ 0 11 −3 he only looses eventually 2 1’ (1 + 0 + 2 + 3) / 2 = 3 per turn. −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 7 / 53

  14. Discounted Stochastic Games Stochastic Games with mean payoff Feedback strategies or policy � ∞ � � γ k r ( X k , a k , b k ) v ( x ) = max min E ( a k ) k ≥ 0 ( b k ) k ≥ 0 k =0 For α : x → α ( x ) ∈ A ( x ) and β : ( x , a ) → β ( x , a ) ∈ B ( x , a ), the strategies � a k = α ( X k ) b k = β ( X k , a k ) are such that X k is a Markov Chain with transition matrix P α,β where P α,β := P ( y | x , α ( x ) , β ( x , α ( x ))) xy x , y in X . Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 8 / 53

  15. Discounted Stochastic Games Stochastic Games with mean payoff Dynamic programming operator and optimal policy � v ( x ) = max min γ P ( y | x , a , b ) v ( y ) + r ( x , a , b ) := F ( v ; x ) a ∈A ( x ) b ∈B ( x , a ) y ∈X � �� � F ( v ;( x , a ) , b ) α policy maximizing (DP)eq for MAX β policy minimizing F ( v ; ( x , a ) , b ) for MIN The dynamic programming operator F is monotone and additively sub-homogeneous ( F ( λ + v ) ≤ λ + F ( v ), λ ≥ 0). Method to solve (DP) eqs : Policy iteration algorithm [Howard, 60 (1player game)], [Denardo, 67 (2player game)] Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 9 / 53

  16. Discounted Stochastic Games Stochastic Games with mean payoff Dynamic programming equation of zero-sum two-player stochastic differential games PDE of Isaacs (or Hamilton-Jacobi-Bellman for one player) ∂ 2 v − λ v ( x ) + H ( x , ∂ v , ) = 0 , x ∈ X (I) ∂ x i ∂ x i ∂ x j where H ( x , p , K ) = max b ∈B ( x , a ) [ p · f ( x , a , b ) min a ∈A ( x ) � +1 2 tr ( σ ( x , a , b ) σ T ( x , a , b ) K ) + r ( x , a , b ) Discretization with monotone schemes of (I) yields (DP) Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 10 / 53

  17. Discounted Stochastic Games Stochastic Games with mean payoff Motivation Solve dynamic programming equations arising from the discretization of Isaacs equations or other DP eq of diffucions (eg varitional inequalities) applications: pursuit-evasion games, finance,. . . Solve large scale zero-sum stochastic games (with discrete state space) for example, problems arising from the web, problems in verification of programs in computer science, . . . → Use policy iteration algorithm where the linear systems involved are solved using AMG Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 11 / 53

  18. Discounted Stochastic Games Stochastic Games with mean payoff Policy Iteration (PI) Algorithm for games � v ( x ) = max min γ P ( y | x , a , b ) v ( y ) + r ( x , a , b ) a ∈A ( x ) b ∈B ( x , a ) y ∈X � �� � F ( v ; x , a ) Start with α 0 : x → α 0 ( x ) ∈ A ( x ), apply successively 1 The value v k +1 of policy α k is solution of v k +1 ( x ) = F ( v k +1 ; x , α k ( x )) ∀ x ∈ X . 2 Improve the policy: select α k +1 optimal for v k +1 : F ( v k +1 ; x , a ) α k +1 ( x ) ∈ argmax ∀ x ∈ X . a ∈A ( x ) Until α k +1 ( x ) = α k ( x ) ∀ x ∈ X . Step 1 is solved by PI Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 12 / 53

  19. Discounted Stochastic Games Stochastic Games with mean payoff Policy Iteration (PI) for 1-player games (Howard, 60) Start with β k , 0 , apply successively 1 The value v k , s +1 of policy β k , s is solution of v k , s +1 = γ P α k ,β k , s v k , s +1 + r α k ,β k , s where P α,β := P ( y | x , α ( x ) , β ( x , α ( x ))) xy   β 0 , 0 2 Improve the policy: find     .   . α 0 PI int β k , s +1 optimal for v k , s +1  .     β 0 , s PI ext Until β k , s +1 = β k , s . .   .  .     α k Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 13 / 53

  20. Discounted Stochastic Games Stochastic Games with mean payoff ( v k ) k ≥ 1 ր non decreasing (MAX player) ( v k , s ) s ≥ 1 ց non increasing (MIN player) PI stops after a finite time when sets of actions are finite Internal loop (1player game): PI ≈ Newton algorithm where differentials are replaced by superdifferentials of the (DP) operator External loop (2player game): PI ≈ Newton algorithm where the (DP) operator is approached by below by piecewise affine and concave maps → expect super linear convergence in good cases Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 14 / 53

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend