2-Player Zero-Sum Stochastic Differential Games based on common - PowerPoint PPT Presentation

Roscoff, March, 2010 2-Player Zero-Sum Stochastic Differential Games based on common work with Rainer Buckdahn Universite de Bretagne Occidentale Juan Li Shandong University, branch of Weihai ——————————— SIAM J. on Control Opt. 47(1), 2008; arXiv

Objective of the lecture Generalization of the results of the pioneering work of Fleming and Souganidis on zero-sum two-player SDGs: • cost functionals defined through controlled BSDEs; • the admissible control processes can depend on events occurring before the beginning of the game. This latter extension has the consequence that the cost functionals become random. However, by making use of Girsanov transformation we prove that the upper and the lower value functions of the game remain deterministic. This approach combined with the BSDE method allows to get in a direct way: upper and lower value functions are deterministic − → Dynamic Programming Principle − → Hamilton-Jacobi-Bellman-Isaacs equations. At the end of the lecture: some remarks on extensions of the above SDGs: SDGs defined through reflected BSDEs and so on.

Main results The dynamics of the SDG is given by the controlled SDE dX t , x ; u , v b ( s , X t , x ; u , v , u s , v s ) ds + σ ( s , X t , x ; u , v � = , u s , v s ) dB s , s s s (1) X t , x ; u , v x ( ∈ R n ) . = s ∈ [ t , T ] . t The cost functional (interpreted as a payoff for Player I and as a cost for Player II) is introduced by a BSDE: − dY t , x ; u , v f ( s , X t , x ; u , v , Y t , x ; u , v , Z t , x ; u , v , u s , v s ) ds − Z t , x ; u , v � = dB s , s s s s s Y t , x ; u , v Φ ( X t , x ; u , v = ) , s ∈ [ t , T ] . T T (2) The cost functional is given by J ( t , x ; u , v ) = Y t , x ; u , v . (3) t We define the lower value function as follows: W ( t , x ) : = essinf β ∈ B t , T esssup u ∈ U t , T J ( t , x ; u , β ( u )) , (4)

and the upper value function is given by U ( t , x ) : = esssup α ∈ A t , T essinf v ∈ V t , T J ( t , x ; α ( v ) , v ) . (5) The main results state that W and U are deterministic continuous viscosity solutions of the Bellman–Isaacs equations ∂ � ∂ t W ( t , x )+ H − ( t , x , W , DW , D 2 W ) = 0 , ( t , x ) ∈ [ 0 , T ) × R n , x ∈ R n , W ( T , x ) = Φ ( x ) , (6) and ∂ t U ( t , x )+ H + ( t , x , U , DU , D 2 U ) = 0 , ∂ � ( t , x ) ∈ [ 0 , T ) × R n , x ∈ R n , U ( T , x ) = Φ ( x ) , (7) respectively, associated with the Hamiltonians H − ( t , x , y , p , X ) = sup v ∈ V H ( t , x , y , p , X , u , v ) , inf u ∈ U H + ( t , x , y , p , X ) = inf H ( t , x , y , p , X , u , v ) , v ∈ V sup u ∈ U

( t , x , y , p , X ) ∈ [ 0 , T ] × R n × R × R n × S n (recall that S n denotes the set of all n × n symmetric matrices), where σσ T ( t , x , u , v ) X � � H ( t , x , y , p , X , u , v ) = 1 / 2 · tr + p · b ( t , x , u , v )+ f ( t , x , y , p · σ ( t , x , u , v ) , u , v ) . (8)

Preliminaries. Framework ( Ω , F , P ) canonical Wiener space: for a given finite time horizon T > 0 , • Ω = C 0 ([ 0 , T ] ; R d ) (endowed with the supremum norm); • B t ( ω ) = ω ( t ) , t ∈ [ 0 , T ] , ω ∈ Ω - the coordinate process; • P - the Wiener measure on ( Ω , B ( Ω )) : unique probability measure w.r.t. B is a standard BM; • F = B ( Ω ) ∨ N P ; • F = ( F t ) t ∈ [ 0 , T ] with F t = F B t = σ { B s , s ≤ t }∨ N P . ( Ω , F , F , P ; B ) - the complete, filtered probability space on which we will work.

Dynamics of the game : Initial data: t ∈ [ 0 , T ] , ζ ∈ L 2 ( Ω , F t , P ; R d ) ; associated doubly controlled stochastic system: dX t , ζ ; u , v b ( s , X t , ζ ; u , v , u s , v s ) ds + σ ( s , X t , ζ ; u , v = , u s , v s ) dB s , s s s ( 1 ) X t , ζ ; u , v = ζ , s ∈ [ t , T ] , t Player I: u ∈ U = : L 0 F ( 0 , T ; U ) ; Player II: v ∈ V = : L 0 F ( 0 , T ; V ) ; U , V - compact metric spaces and where the mappings b : [ 0 , T ] × R n × U × V → R n , σ : [ 0 , T ] × R n × U × V → R n × d , are continuous over R × U × V (for simplicity); Lipschitz in x , uniformly w.r.t ( t , u , v ) , i.e., for some L ∈ R + ,

| σ ( s , x , u , v ) − σ ( s , x ′ , u , v ) | , | b ( s , x , u , v ) − b ( s , x ′ , u , v ) | ≤ L | x − x ′ | ; | σ ( s , x , u , v ) | , | b ( s , x , u , v ) | ≤ ( 1 + | x | ) . Existence and uniqueness of the solution X t , ζ , u , v ∈ S 2 F ( t , T ; R n ) ; from standard estimates: for all p ≥ 2 there is some C p (= C p , L ) ∈ R + s.t. � � − X t , ζ ′ ; u , v | p | F t | X t , ζ ; u , v ≤ C p | ζ − ζ ′ | p , P -a.s., E sup s s s ∈ [ t , T ] � � | p | F t | X t , ζ ; u , v ≤ C p ( 1 + | ζ | p ) , P -a.s. E sup s s ∈ [ t , T ]

Definition of the associated cost functionals The cost functional is defined with the help of a backward SDE (BSDE): Associated with ( t , ζ ) ∈ [ 0 , T ] × L 2 ( Ω , F t , P ; R n ) , u ∈ U and v ∈ V , we consider the BSDE: dY t , ζ ; u , v − f ( s , X t , ζ ; u , v , Y t , ζ ; u , v , Z t , ζ ; u , v , u s , v s ) ds + Z t , x ζ ; u , v = dB s , s s s s s Y t , ζ ; u , v Φ ( X t , ζ ; u , v = ) , s ∈ [ t , T ] , T T ( 2 ) where ⋄ Final cost: Φ : R n → R Lipschitz ⋄ Running cost: f : [ 0 , T ] × R n × R × R d × U × V → R , continuous; Lipschitz in ( x , y , z ) , uniformly w.r.t ( t , u , v ) . Under the above assumptions: existence and uniqueness of the solution of BSDE (2):

( Y t , ζ ; u , v , Z t , ζ ; u , v ) ∈ S 2 F ( t , T ; R ) × L 2 F ( t , T ; R d ) . From standard estimates for BSDEs using the corresponding results for the controlled stochastic system: for all p ≥ 2 there is some C p (= C p , L ) ∈ R + s.t., for any ζ , ζ ′ ∈ L 2 ( Ω , F t , P ; R n ) , � � − Y t , ζ ′ ; u , v | Y t , ζ ; u , v | p | F t ≤ C p | ζ − ζ ′ | p , P -a.s.; E sup s s s ∈ [ t , T ] � � | p | F t | Y t , ζ ; u , v ≤ C p ( 1 + | ζ | p ) , P -a.s. E sup s s ∈ [ t , T ] − Y t , ζ ′ ; u , v In particular, | Y t , ζ ; u , v | ≤ C | ζ − ζ ′ | , P -a.s., t t | Y t , ζ ; u , v | ≤ C ( 1 + | ζ | ) , P -a.s. t

Let t ∈ [ 0 , T ] , ζ = x ∈ R n - deterministic initial data; u ∈ U , v ∈ V ; associated cost functional for the game over the time interval [ t , T ] : J ( t , x ; u , v ) : = Y t , x ; u , v � ∈ L 2 ( Ω , F t , P ) � . t Remark 1: (i) If f ≡ 0 : J ( t , x ; u , v ) = E [ Φ ( X t , x ; u , v ) | F t ] ; T (ii) If f doesn’t depend on (y, z): � T J ( t , x ; u , v ) = E [ Φ ( X t , x ; u , v f ( s , X t , x ; u , v )+ , u s , v s ) ds | F t ] . T s t Notice: From J ( t , x , u , v ) : = Y t , x , u , v and the standard estimates for t Y t , x , u , v : t J ( t , x , u , v ) ∈ L ∞ ( Ω , F t , P ) , ( t , x , u , v ) ∈ [ 0 , T ] × R n × U × V , and: • | J ( t , x , u , v ) − J ( t , x ′ , u , v ) | ≤ C | x − x ′ | , • | J ( t , x , u , v ) | ≤ C ( 1 + | x | ) , P -a.s., for all x , x ′ ∈ R n , ( t , u , v ) ∈ [ 0 , T ] × U × V ;

Which kind of game shall we study? Objective of Player I : maximization of J ( t , x , u , v ) over u ∈ U ; Objective of Player II : minimization of J ( t , x , u , v ) over v ∈ V ; the both players have the same cost functional, it’s the gain for player I, the loss for player II - one speaks of “2-player zero-sum stochastic differential games”; in non-zero sum games: Player i has cost functional J i ( t , x , u 1 , u 2 ... ) , i ≥ 1 , the players want to maximize their cost functionals; problem of the existence and the characterization of Nash equilibrium points. Game “Control against Control”? • In general no value of the game, i.e., the result of the game depends on which player begins, and this even if Isaacs’ condition is fulfilled (precision later); example: pursuit games ( Example in another slide .) • Games “Control against Control” with value if: n = d ; σ ∈ R n × n ( x ) is independent of ( u , v ) and invertible (as matrix); σ − 1 : R n → R n × n is

Lipschitz (S.HAMADENE, J.-P.LEPELTIER, S.PENG 1997). Game “Strategy against Control”: This concept has been known in the deterministic differential game theory (A.FRIEDMAN, W.H.FLEMING,..)and has been translated later by W.H.FLEMING, P.E.SOUGANIDIS (1989) to the theory of stochastic differential games. Here: a generalization of the concept of W.H.FLEMING, P.E.SOUGANIDIS (1989); a comparison of their concept with ours: later. Admissible controls, admissible strategies Definition 1: ( admissible controls for a game over the time interval [ t , T ] ) • For Player I: U t , T = : L 0 F ( t , T ; U ) ; • for Player II: V t , T = : L 0 F ( t , T ; V ) .

Notice: Different from the concept by FLEMING, SOUGANIDIS, the controls u ∈ U t , s , v ∈ V t , s are not supposed to be independent of F t . Definition 2: ( admissible strategies for a game over the time interval [ t , T ] ) • For Player II: β : U t , T − → V t , T non anticipating, i.e., for any F − stopping time S : Ω → [ t , T ] and any admissible controls u 1 , u 2 ∈ U t , T ( u 1 = u 2 dsdP-a.e. on [ [ t , S ] ] = ⇒ β ( u 1 ) = β ( u 2 ) dsdP-a.e. on [ [ t , S ] ] ). B t , T : = { β : U t , T → V t , T | β is nonanticipating } . Analogously we introduce • for Player I: A t , T : = { α : V t , T → U t , T | α is nonanticipating } . Value Functions : Notice: From J ( t , x , u , v ) : = Y t , x , u , v and the standard estimates for t Y t , x , u , v : t

2-Player Zero-Sum Stochastic Differential Games based on common - PowerPoint PPT Presentation

Roscoff, March, 2010 2-Player Zero-Sum Stochastic Differential Games based on common work with Rainer Buckdahn Universite de Bretagne Occidentale Juan Li Shandong University, branch of Weihai SIAM J. on

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

Non-Zero-Sum Stochastic Differential Games of Controls and Stoppings Qinghua Li October 1, 2009

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games

Outline CS 188: Artificial Intelligence Zero-sum deterministic two player games Spring 2011

Multigrid methods for zero-sum two player stochastic games with mean reward Sylvie Detournay and

Tabletop Game Design UDLS: April 17th, 2015 Neil Newman Zero Player Games Solved Games - Tic

ARTigo Tag Cluster tags of player 2 player 4 player 1 player 3 1 russian 1 army 1

Guest Lecture: Prof. Allan Borodin Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 -

Game Theory Preliminaries: Playing and Solving Games Zero-sum games with perfect information

CS 598 RM : Algorithmic game theory Lecture 1 Two-player games For any two-player game, we have

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games Colin McMillen and Manuela Veloso

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

NEW RETAIL rethinking the physical shop DT17 - 2rd Sem - Oral Synopsis exam - Marketing &

Mind the gap Linking (telco) forecasting to innovation management Drs. Patrick A. van der Duin

Developing Effective Capacity Building in Probation CEP Conference Tbilisi May 2019 Steve Pitts

Welcome. Telling the Co-op Story so People Actually Listen The Co-op Dilemma I asked my wife.

What is game theory? Study of interacting decision makers emphasis on cold-blooded,

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r

Applications of Computer Science: Game Theory and Computational Biology Instructor: Nihshanka

MARKOV GAMES A framework for multi-agent reinforcement learning Shen (Sean) Chen Review on

2-Player Zero-Sum Stochastic Differential Games based on common - PowerPoint PPT Presentation

Roscoff, March, 2010 2-Player Zero-Sum Stochastic Differential Games based on common work with Rainer Buckdahn Universite de Bretagne Occidentale Juan Li Shandong University, branch of Weihai SIAM J. on

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

Non-Zero-Sum Stochastic Differential Games of Controls and Stoppings Qinghua Li October 1, 2009

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games

Outline CS 188: Artificial Intelligence Zero-sum deterministic two player games Spring 2011

Multigrid methods for zero-sum two player stochastic games with mean reward Sylvie Detournay and

Tabletop Game Design UDLS: April 17th, 2015 Neil Newman Zero Player Games Solved Games - Tic

ARTigo Tag Cluster tags of player 2 player 4 player 1 player 3 1 russian 1 army 1

Guest Lecture: Prof. Allan Borodin Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 -

Game Theory Preliminaries: Playing and Solving Games Zero-sum games with perfect information

CS 598 RM : Algorithmic game theory Lecture 1 Two-player games For any two-player game, we have

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games Colin McMillen and Manuela Veloso

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

NEW RETAIL rethinking the physical shop DT17 - 2rd Sem - Oral Synopsis exam - Marketing &amp;

Mind the gap Linking (telco) forecasting to innovation management Drs. Patrick A. van der Duin

Developing Effective Capacity Building in Probation CEP Conference Tbilisi May 2019 Steve Pitts

Welcome. Telling the Co-op Story so People Actually Listen The Co-op Dilemma I asked my wife.

What is game theory? Study of interacting decision makers emphasis on cold-blooded,

Nash Q-Learning for General-Sum Stochastic Games Hu &amp; Wellman March 6th, 2006 CS286r

Applications of Computer Science: Game Theory and Computational Biology Instructor: Nihshanka

MARKOV GAMES A framework for multi-agent reinforcement learning Shen (Sean) Chen Review on

NEW RETAIL rethinking the physical shop DT17 - 2rd Sem - Oral Synopsis exam - Marketing &

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r