Multigrid methods for two player zero-sum stochastic games Sylvie - PowerPoint PPT Presentation

Discounted Stochastic Games Stochastic Games with mean payoff Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and CMAP, ´ Ecole Polytechnique Soutenance de th` ese Le 25 septembre, 2012 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 1 / 53

Discounted Stochastic Games Stochastic Games with mean payoff Outline Zero-sum two player stochastic game with discounted payoff Dynamic Programing equations Policy iteration and multigrids : AMG π Numerical results Zero-sum two player stochastic game with mean payoff Unichain case Dynamic Programing equations Policy iteration and multigrids : AMG π Numerical results Multichain case Dynamic Programing equations Policy iteration for multichain Numerical results Conclusions Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 2 / 53

Discounted Stochastic Games Stochastic Games with mean payoff Dynamic programming equation of zero-sum two-player stochastic games � v ( x ) = max min γ P ( y | x , a , b ) v ( y ) + r ( x , a , b ) a ∈A ( x ) b ∈B ( x , a ) y ∈X ∀ x ∈ X (DP) X state space v ( x ) the value of the game starting at x ∈ X , a , b action of the 1st, 2nd player MAX, MIN r ( x , a , b ) reward paid by MIN to MAX P ( y | x , a , b ) transition probability from x to y given the actions a , b γ < 1 discount factor Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 3 / 53

Discounted Stochastic Games Stochastic Games with mean payoff Value of the game starting in x � ∞ � � γ k r ( X k , a k , b k ) v ( x ) = max ( a k ) k ≥ 0 min ( b k ) k ≥ 0 E k =0 where � a k = a k ( X k , b k − 1 , a k − 1 , X k − 1 · · · ) b k = b k ( X k , a k , · · · ) are strategies and the state dynamics satisfies the process X k P ( X k +1 = y | X k = x , a k = a , b k = b ) = P ( y | x , a , b ) Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 4 / 53

Discounted Stochastic Games Stochastic Games with mean payoff Deterministic zero-sum two-player game 5 3 Circles : Max plays Squares : MIN plays −2 4’ 0 11 Weight on the edges : payment made by −3 MIN to MAX 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 5 / 53

Discounted Stochastic Games Stochastic Games with mean payoff 5 3 −2 4’ If Max initially moves to 2 ′ 0 11 −3 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 6 / 53

Discounted Stochastic Games Stochastic Games with mean payoff 5 3 −2 4’ If Max initially moves to 2 ′ 0 11 −3 2 1’ he eventually looses 5 per turn. −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 6 / 53

Discounted Stochastic Games Stochastic Games with mean payoff 5 3 But if Max initially moves to 1 ′ −2 4’ 0 11 −3 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 7 / 53

Discounted Stochastic Games Stochastic Games with mean payoff 5 3 But if Max initially moves to 1 ′ −2 4’ 0 11 −3 he only looses eventually 2 1’ (1 + 0 + 2 + 3) / 2 = 3 per turn. −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 7 / 53

Discounted Stochastic Games Stochastic Games with mean payoff Feedback strategies or policy � ∞ � � γ k r ( X k , a k , b k ) v ( x ) = max min E ( a k ) k ≥ 0 ( b k ) k ≥ 0 k =0 For α : x → α ( x ) ∈ A ( x ) and β : ( x , a ) → β ( x , a ) ∈ B ( x , a ), the strategies � a k = α ( X k ) b k = β ( X k , a k ) are such that X k is a Markov Chain with transition matrix P α,β where P α,β := P ( y | x , α ( x ) , β ( x , α ( x ))) xy x , y in X . Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 8 / 53

Discounted Stochastic Games Stochastic Games with mean payoff Dynamic programming operator and optimal policy � v ( x ) = max min γ P ( y | x , a , b ) v ( y ) + r ( x , a , b ) := F ( v ; x ) a ∈A ( x ) b ∈B ( x , a ) y ∈X � �� F ( v ;( x , a ) , b ) α policy maximizing (DP)eq for MAX β policy minimizing F ( v ; ( x , a ) , b ) for MIN The dynamic programming operator F is monotone and additively sub-homogeneous ( F ( λ + v ) ≤ λ + F ( v ), λ ≥ 0). Method to solve (DP) eqs : Policy iteration algorithm [Howard, 60 (1player game)], [Denardo, 67 (2player game)] Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 9 / 53

Discounted Stochastic Games Stochastic Games with mean payoff Dynamic programming equation of zero-sum two-player stochastic differential games PDE of Isaacs (or Hamilton-Jacobi-Bellman for one player) ∂ 2 v − λ v ( x ) + H ( x , ∂ v , ) = 0 , x ∈ X (I) ∂ x i ∂ x i ∂ x j where H ( x , p , K ) = max b ∈B ( x , a ) [ p · f ( x , a , b ) min a ∈A ( x ) � +1 2 tr ( σ ( x , a , b ) σ T ( x , a , b ) K ) + r ( x , a , b ) Discretization with monotone schemes of (I) yields (DP) Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 10 / 53

Discounted Stochastic Games Stochastic Games with mean payoff Motivation Solve dynamic programming equations arising from the discretization of Isaacs equations or other DP eq of diffucions (eg varitional inequalities) applications: pursuit-evasion games, finance,. . . Solve large scale zero-sum stochastic games (with discrete state space) for example, problems arising from the web, problems in verification of programs in computer science, . . . → Use policy iteration algorithm where the linear systems involved are solved using AMG Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 11 / 53

Discounted Stochastic Games Stochastic Games with mean payoff Policy Iteration (PI) Algorithm for games � v ( x ) = max min γ P ( y | x , a , b ) v ( y ) + r ( x , a , b ) a ∈A ( x ) b ∈B ( x , a ) y ∈X � �� F ( v ; x , a ) Start with α 0 : x → α 0 ( x ) ∈ A ( x ), apply successively 1 The value v k +1 of policy α k is solution of v k +1 ( x ) = F ( v k +1 ; x , α k ( x )) ∀ x ∈ X . 2 Improve the policy: select α k +1 optimal for v k +1 : F ( v k +1 ; x , a ) α k +1 ( x ) ∈ argmax ∀ x ∈ X . a ∈A ( x ) Until α k +1 ( x ) = α k ( x ) ∀ x ∈ X . Step 1 is solved by PI Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 12 / 53

Discounted Stochastic Games Stochastic Games with mean payoff Policy Iteration (PI) for 1-player games (Howard, 60) Start with β k , 0 , apply successively 1 The value v k , s +1 of policy β k , s is solution of v k , s +1 = γ P α k ,β k , s v k , s +1 + r α k ,β k , s where P α,β := P ( y | x , α ( x ) , β ( x , α ( x ))) xy   β 0 , 0 2 Improve the policy: find     .   . α 0 PI int β k , s +1 optimal for v k , s +1  .     β 0 , s PI ext Until β k , s +1 = β k , s . .   .  .     α k Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 13 / 53

Discounted Stochastic Games Stochastic Games with mean payoff ( v k ) k ≥ 1 ր non decreasing (MAX player) ( v k , s ) s ≥ 1 ց non increasing (MIN player) PI stops after a finite time when sets of actions are finite Internal loop (1player game): PI ≈ Newton algorithm where differentials are replaced by superdifferentials of the (DP) operator External loop (2player game): PI ≈ Newton algorithm where the (DP) operator is approached by below by piecewise affine and concave maps → expect super linear convergence in good cases Sylvie Detournay (INRIA and CMAP) Zero-sum two player stochastic games 25 septembre, 2012 14 / 53

Multigrid methods for two player zero-sum stochastic games Sylvie - PowerPoint PPT Presentation

Discounted Stochastic Games Stochastic Games with mean payoff Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and CMAP, Ecole Polytechnique Soutenance de th` ese Le 25 septembre, 2012 Sylvie

Multigrid methods for zero-sum two player stochastic games with mean reward Sylvie Detournay and

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

Outline CS 188: Artificial Intelligence Zero-sum deterministic two player games Spring 2011

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games

Non-Zero-Sum Stochastic Differential Games of Controls and Stoppings Qinghua Li October 1, 2009

CS 598 RM : Algorithmic game theory Lecture 1 Two-player games For any two-player game, we have

Compact Fourier Analysis for Multigrid Methods Cortona 2008 Thomas Huckle joint work with

2-Player Zero-Sum Stochastic Differential Games based on common work with Rainer Buckdahn

Tabletop Game Design UDLS: April 17th, 2015 Neil Newman Zero Player Games Solved Games - Tic

ARTigo Tag Cluster tags of player 2 player 4 player 1 player 3 1 russian 1 army 1

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

Guest Lecture: Prof. Allan Borodin Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 -

Game Theory Preliminaries: Playing and Solving Games Zero-sum games with perfect information

Two-Player Zero-sum Games Played on Graphs: -Regular and Quantitative Objectives

Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer

MCTS Extensions 2/15/17 The Monte Carlo Tree Search Algorithm MCTS Pseudocode for i = 1 :

Game Theory Catherine Moon csm17@duke.edu With thanks to Ron Parr and Vince Conitzer for some

Optimally Resilient Strategies in Pushdown Safety Games Joint work with Daniel Neider (MPI-SWS)

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 3: Linear Programming Duality II

Choueiry AIMA: Chapter 5 (Setions 5.1, 5.2 and 5.3) In tro dution to

Using expert advice Say we want to predict the stock market. We solicit n experts

Applied machine learning in game theory Dmitrijs Rutko Faculty of Computing University of

Sambuz

Useful Links

Newsletter

Mail Us