Two-Player Zero-sum Games Played on Graphs: -Regular and - - PowerPoint PPT Presentation

two player zero sum games played on graphs regular and
SMART_READER_LITE
LIVE PREVIEW

Two-Player Zero-sum Games Played on Graphs: -Regular and - - PowerPoint PPT Presentation

Two-Player Zero-sum Games Played on Graphs: -Regular and Quantitative Objectives Jean-Franois Raskin Universit Libre de Bruxelles MOVEP 2020 Grenoble Introduction : Motivations and Basic Definitions The Basic Model


slide-1
SLIDE 1

Jean-François Raskin Université Libre de Bruxelles

MOVEP 2020 Grenoble

Two-Player Zero-sum Games 
 Played on Graphs:
 ω-Regular and Quantitative Objectives

slide-2
SLIDE 2

Introduction: Motivations and 
 Basic Definitions

slide-3
SLIDE 3

The Basic Model

slide-4
SLIDE 4

Two-player games 
 played on graphs

(Finite) directed graph Two types of vertices (states) 
 (Player 1 and Player 2)

1 3 4

2 5

slide-5
SLIDE 5

How to play ? One token is placed on initial state Players play ω rounds: in each round: the player that owns the state moves the token to an adjacent state Outcome=ω-path

1 3 4

2 5

Two-player games 
 played on graphs

slide-6
SLIDE 6

1 3 4

2 5

1

slide-7
SLIDE 7

1 3 4

2 5

1 ➞ 2

slide-8
SLIDE 8

1 3 4

2 5

1 ➞ 2 ➞ 1

slide-9
SLIDE 9

1 3 4

2 5

1 ➞ 2 ➞ 1 ➞ 4

slide-10
SLIDE 10

1 3 4

2 5

1 ➞ 2 ➞ 1 ➞ 4 ➞ 5

slide-11
SLIDE 11

1 3 4

2 5

1 ➞ 2 ➞ 1 ➞ 4 ➞ 5 ➞ 3

slide-12
SLIDE 12

1 3 4

2 5

1 ➞ 2 ➞ 1 ➞ 4 ➞ 5 ➞ 3 ➞ 4

slide-13
SLIDE 13

1 3 4

2 5

1 ➞ 2 ➞ 1 ➞ 4 ➞ 5 ➞ 3 ➞ 4 ➞ 5

slide-14
SLIDE 14

1 3 4

2 5

1 ➞ 2 ➞ 1 ➞ 4 ➞ 5 ➞ 3 ➞ 4 ➞ 5 ➞ 4 ➞ ...

slide-15
SLIDE 15

Motivations

slide-16
SLIDE 16

Reactive system synthesis 
 as solving a game

☞ support the design process with automatic synthesis

?

ψ

ØSys is constructed by an algorithm ØSys is correct by construction ØUnderlying theory: 2-player zero-sum games ØEnv is adversarial (worst-case assumption) Winning strategy = Correct Sys

Env

||

slide-17
SLIDE 17

Games for solving 
 automata problems

Nondeterministic finite tree automata emptiness =two-player zero-sum reachability game

Language inclusion between 
 non-deterministic Büchi automata =two-player zero-sum Parity game with imperfect information

… and many more …

slide-18
SLIDE 18

Checking refinements between two systems

Spec A Implementation B

Simulation ? Safety game between Prover and Spoiler Spoiler proposes moves in B Prover tries to match them in A Prover wins if he never fails to match the proposed moves

slide-19
SLIDE 19

Model-checking 
 Mu-Calculus

Sys ⊨? φ

The model-checking problem for the Mu-Calculus can be reduced to the problem of deciding the winner in a zero-sum two-player parity game

slide-20
SLIDE 20

Controller synthesis with quantitative objectives

Embedded Control Parts of OS/Chipset Communica<on Protocols Security Protocols

In most of those examples, quantitative measures

  • f performances are important : not only a matter of correctness !!!
slide-21
SLIDE 21

Controller synthesis with quantitative objectives

?

ψ Env

||

both the model and the property exhibit quantities

slide-22
SLIDE 22

Plan of the lectures

  • Introduction - Motivations - Basic definitions
  • Qualitative games
  • Reachability and safety games
  • Büchi, coBüchi and parity games
  • Quantitative games
  • Sup, Inf, LimSup, LimInf games
  • Mean-payoff and energy games
  • ne and multiple

dimensions

slide-23
SLIDE 23

Part I: Qualitative Games

slide-24
SLIDE 24

Basic Definitions

slide-25
SLIDE 25

Game arena

A finite (turn-based) two-player game arena is a tuple A=(S1,S2,E,sinit) where:

  • S1 is the finite set of states owned by Player 1 (the protagonist)
  • S2 is the finite set of states owned by Player 2 (the antagonist) 



 We write S for S1∪S2

  • E ⊆ S1∪S2×S1∪S2 is the set of transitions. Unless otherwise

stated, we assume that E is total, i.e. ∀s∈S,∃s’∈S:(s,s’)∈E

  • sinit ∈ S1∪S2 is the initial state of the game arena

Sometimes, we consider arena without designated initial states.

slide-26
SLIDE 26

Plays and histories

  • Let A=(S1,S2,E,sinit) be a two-player game arena
  • A play in A is an infinite sequence of states ρ=s0 s1 … sn… ∈Sω 


such that for all i≥0, (si,si+1) ∈ E

  • A play ρ=s0 s1 … sn… is initial if s0=sinit
  • Let ρ=s0 s1 … sn…, we use the following notations:
  • ρ(i) for the state si
  • ρ(i..j) for the infix si si+1… sj
  • ρ(i..) for for the suffix sisi+1…sn…
  • visit(ρ) is the set of states that appear along ρ
  • inf(ρ) is the set of states that appear infinitely often along ρ
slide-27
SLIDE 27

Plays and histories

  • We denote by Play(A) and InitPlay(A) the set of

plays and the set of initial plays of A

  • A history π=s0 s1 … sn ∈ S* is a finite sequence of

states, a history π=s0 s1 … sn belongs to Player 1 if its last state, noted last(π)=sn, is in S1, and to Player 2 if last(π)∈S2

  • We note Hist1(A)=S*.S1 and Hist2(A)=S*.S2 the set
  • f histories of Player 1 and Player 2 respectively
slide-28
SLIDE 28

Rounds and play

  • The two players play on the game arena as follows
  • A game in A is played by the two players for an infinite number of
  • rounds. Each round is as follows:
  • at the start of a round, a token is on a state s of the arena, the

Player that owns the state s chooses a successor state s’ which is E-adjacent to s, i.e. (s,s’) ∈ E. The token is then moved to s’ and a new round is started from there

  • initially, the token is on sinit
  • This interaction generates a play ρ∈Play(A) which is initial if the

interaction is started from sinit

slide-29
SLIDE 29

Strategies

  • Player plays according to strategies
  • A strategy for Player j, j ∈ {1,2}, is a function that

maps histories to states: 
 
 λj : S*.Sj → S
 
 with the restriction that for all histories π ∈ Histj(A), (last(π),λj(π)) ∈ E, i.e. the strategy always proposes a (next) state that is E-adjacent to the last state of the history.

slide-30
SLIDE 30

... ... ... ... ... ... Unfolding of the game graph

Strategies

slide-31
SLIDE 31

Strategies

... ... ... ... ... ... ... ... ... ... ... ...

slide-32
SLIDE 32

... ... ... ... ... ... ... ... ... ... ... ...

Strategies

slide-33
SLIDE 33

... ... ... ... ... ... ... ... ... ... ... ...

Strategy for Player 1 = One choice in each node

  • f Player I in tree unfolding

Strategies

slide-34
SLIDE 34

Outcomes of a strategy

  • A play ρ ∈ Sω is compatible with a strategy λj of Player j, if for all

i≥0 s.t. last(ρ(0..i)) ∈ Sj, ρ(i+1)= λj(ρ(0..i)), i.e. the position that follows a position that belongs to Player j is determined by λj.

  • We note OutcomeA(s,λj) the set of plays starting in s that are

compatible with λj. We write OutcomeA(s,λ1,λ2) the unique play that starts in s and is both compatible with λ1 and λ2

  • We omit A if clear from the context, and when s is omitted then it is

assumed to be equal to sinit, the initial state of A

slide-35
SLIDE 35

Types of strategies

  • A strategy λj : S*.Sj → S is memoryless if 


for all histories π,π’ ∈ S*.Sj such that last(π)=last(π’), λj(π)=λj(π’), i.e. the strategy only depends on the last state of the history

  • So a memoryless strategy is equivalent to a function λj : Sj → S
  • A strategy λj : S*.Sj → S is finite memory if 


there exists a equivalence relation ≈ on Histj(A) of finite index such that for all π,π’ ∈ S*.Sj s.t. π≈π’: λj(π)=λj(π’)

  • If ≈ is computable by a finite state machine then a finite memory

strategy can be represented as a synchronous transducer

slide-36
SLIDE 36

Memoryless strategy An example

1 3 4

2 5

1 3 4

2 5

A memoryless strategy for Player 1

slide-37
SLIDE 37

Finite memory strategy An example

1 3 4

2 5

4 4 5 5 ε 4 3 4 5 5 ε 4 3 1

Strategy:

  • in 1 always play 4
  • in 4 always play 5
  • in 3 play 4 every odd visit 


and 1 every even visit

1 4

slide-38
SLIDE 38

Winning objective

  • A winning objective for player j Winj⊆Sω defines what

are the « good » plays for Player j

  • A game (A,Win1,Win2) is specified by a game arena

and two winning objectives, one for each player

  • A game is zero-sum if Win2=Sω\Win1. In this series of

lectures, we will only consider zero-sum games and we will most often leave the winning objective of Player 2 implicit, because in this case Win2=Sω\Win1

slide-39
SLIDE 39

Winning strategy

  • In a game (A,Win1,Win2), a strategy for Player j is a

winning strategy if OutcomeA(sinit,λj)⊆Winj

  • Clearly, in a zero-sum game, a play is either

winning for one player or for the other

  • But does a zero-sum game always have a winner,

i.e. one player that has a winning strategy ?

slide-40
SLIDE 40

Determinacy

  • Let A=(S1,S2,E,sinit) be a two-player turn based arena, and Win1 be the

winning objective for Player 1

  • The game (A,Win1) is determined if 


either there exists a strategy λ1 for Player 1 s.t. Outcome(λ1)⊆Win1, 


  • r there exists a strategy λ2 for Player 2 s.t. Outcome(λ2)⊆Sω\Win1=Win2
  • Theorem[Martin 1975]. 


For all two-player turn based arenas A=(S1,S2,E,sinit) with a zero-sum winning condition, for all Borel definable objectives Win1, (A,Win1) is determined Borel set=is any set in a topological space that can be formed from open sets using countable union, countable intersection, and complement 
 Rule of thumb: any reasonable set included in Sω that you can think of is a Borel set

slide-41
SLIDE 41

Controllable Predecessors and Attractors

slide-42
SLIDE 42

Controllable predecessors

  • Let A=(S1,S2,E,sinit) be a two-player game arena.
  • Let T⊆S be a set of states.
  • CPre1(T)= { s∈S1 | ∃ s’∈T: (s,s’)∈E} 


∪ { s∈S2 | ∀ s’∈T: (s,s’)∈E}

  • CPre2(T)= { s∈S2 | ∃ s’∈T: (s,s’)∈E} 


∪ { s∈S1 | ∀ s’∈T: (s,s’)∈E}

  • if s∈CPrej(T) then from s, Player j has a strategy to

enforce the set T in one step

slide-43
SLIDE 43

CPre1(T)

T

✔ ✕ ✕ ✔

slide-44
SLIDE 44

CPre2(T)

T

✔ ✕ ✕ ✔

slide-45
SLIDE 45

Attractors

  • The attractor of T⊆S for Player j, j∈{1,2}, is defined as the limit
  • f the following ⊆-increasing chain of sets:
  • A0=T
  • and for all i≥1, Ai=Ai-1 ∪ CPrej(Ai-1)
  • As the sequence is ⊆-increasing and S is finite, the sequence

stabilizes in at most |S| steps, and we note A* the set on which the sequence stabilizes.

  • We also note this set Attrj(T) and call it the attractor for

Player j of the set T.

slide-46
SLIDE 46

Attractor - An example

G 1 2 3 4 5 A0=T={4} A1=A0∪CPre1(A0)={1,3,4} A2=A1∪CPre1(A1)={1,2,3,4} A3=A1∪CPre1(A1)={1,2,3,4}=A2 Attr1({4})={1,2,3,4} 6

slide-47
SLIDE 47

Attractor properties

  • Lemma. For all s∈Ai, there exists a memoryless strategy λj for Player

j, such that for all ρ∈Outcome(λj), there exists k, 0 ≤ k ≤ i, s.t. ρ(k)∈T, i.e. Ai is exactly the set of states from which Player j can force T in i steps or less.
 


  • Proof. We define a memoryless strategy for Player j as follows: for

each state that belongs to Sj and Ai, either s∈T or let k, 0≤k≤i, be the minimal index such that s∈Ak, then for such s there exists a choice s’ s.t. (s,s’)∈E and s’∈Ak-1, the strategy in s always chooses s’. It is easy to see that this memoryless strategy forces T in i steps or less from all s∈Ai.

  • Corollary. For all s∈A*, there exists a memoryless strategy λj for

Player j, such that for all ρ∈Outcome(λj), there exists k≥0 s.t. ρ(k)∈T.

slide-48
SLIDE 48

Trap

  • A trap for Player j, j∈{1,2}, is a set U⊆S s.t. for all

s∈U∩Sj, for all s’∈S s.t. (s,s’)∈E, s’∈U and for all s∈U∩S3-j, there exists s’∈S s.t. (s,s’)∈E and s’∈U. So a trap for Player j is a set of states from which this player cannot escape (and the other can keep staying there).

  • Lemma. Let T⊆S, j∈{1,2}, S\Attrj(T) is a trap for

Player j.

  • Exercise. Prove the lemma.
slide-49
SLIDE 49

Trapping strategy

  • Let U be a trap for Player j, a trapping strategy in U for

Player 3-j is a strategy that in a state s∈U∩S3-j always chooses a successor which is in U (such a successor exists by definition).

  • Note that when there is a trapping strategy in U for Player j,

then there is a memoryless trapping strategy (which always chooses the same successors of s when entering state s of Player j).

  • Lemma. For all s∉U=S\Attrj(T), there exists a memoryless

strategy of Player 3-j such that for all ρ∈Outcome(λj), for all 0 ≤ k: ρ(k)∉T.

slide-50
SLIDE 50

Trap and sub-arena

  • Let A=(S1,S2,E) and U⊆S, if for all s∈U, there exists

s∈U such that (s,s’)∈U then A[U] is the sub-arena A[U]=(S1∩U,S2∩U,E∩(U×U)) and it is total if A is total.

  • Let A=(S1,S2,E) be a total arena and U⊆S be a trap,

then A[U] is a sub-arena.

slide-51
SLIDE 51

Trap - An example

G 1 2 3 4 5 Attr1({4})={1,2,3,4} S\Attr1({4})={5,6} is a trap for Player 1 It define the sub-arena 6 5 6

slide-52
SLIDE 52

Qualitative Games

slide-53
SLIDE 53

Qualitative objectives

Let A=(S1,S2,E,sinit) be a two-player turn based arena

  • a reachability objective is defined by a subset
  • f states T ⊆ S that has to be reached:


reach(T)={ ρ∈Sω | visit(ρ) ∩ T ≠ ∅}

  • a safety objective is defined by a subset of

states T ⊆ S to remain in forever:
 safe(T)={ ρ∈Sω | visit(ρ) ⊆ T }

slide-54
SLIDE 54

Qualitative objectives

Let A=(S1,S2,E,sinit) be a two-player turn based arena

  • a Büchi objective is defined by a subset of states T ⊆ S that has to be

visited infinitely often:
 Büchi(T)={ ρ∈Sω | inf(ρ) ∩ T ≠ ∅}

  • a coBüchi objective defined by a subset of states T ⊆ S that has to be

reached eventually and never be left:
 coBüchi(T)={ ρ∈Sω | inf(ρ) ⊆ T }

  • let d∈ℕ, a parity objective with d+1 priorities is defined by a priority

function p : S → {0,1,…,d} as the set of sequences in Sω s.t. the smallest priority visited infinitely often is even:
 parity(p)={ ρ∈Sω | min{ p(s) | s ∈ inf(ρ) } is even }.

slide-55
SLIDE 55

Reachability and Safety Games

slide-56
SLIDE 56

Reachability games Decision problem

Given a two-player game arena with a reachability

  • bjective A=(S1,S2,E,sinit,reach(T)), decide if Player 1

has a winning strategy in this game, i.e. if there exists a strategy λ1 for Player 1 such that Outcome(λ1) ⊆ reach(T).

slide-57
SLIDE 57

Illustrations - Reachability

G 1 2 3 4 5 6 What are the states from which Player 1 can force to reach T={4} ?

slide-58
SLIDE 58

Attractor solution

  • Clearly a reachability game can be solved by

computing an attractor.

  • The following theorem is a direct consequence of

the existence of memoryless strategy in attractors.

  • Theorem. Player 1 has a winning strategy from

state s∈S for the reachability objective reach(T) iff s∈Attr1(T). Furthermore, Player 1 has a winning strategy if and only if Player 1 has a memoryless winning strategy.

slide-59
SLIDE 59

Illustrations - Reachability

G

1 4 3 4

5 6 A0=T={4} A1=A0∪CPre1(A0)={1,3,4} A2=A1∪CPre1(A1)={1,2,3,4} A3=A1∪CPre1(A1)={1,2,3,4}=A2 Attr1({4})={1,2,3,4}

slide-60
SLIDE 60

Safety games Decision problem

Given a two-player game arena with a safety

  • bjective A=(S1,S2,E,sinit,safe(T)), we want to decide

if Player 1 has a winning strategy in this game, i.e. if there exists a strategy λ1 for Player 1 such that Outcome(λ1) ⊆ safe(T).

slide-61
SLIDE 61

Attractor solution

  • We exploit here the duality between reachability and safety

games: safe(T)=Sω\reach(S\T)

  • By determinacy of safety and reachability games, and the

duality between the two objectives, we get:
 


  • Lemma. Let A=(S1,S2,E,sinit) be a two-player arena, for all

state s, Player 1 has a winning strategy from s in A for safe(T) if and only if Player 2 does not have a winning strategy from s in A for reach(S\T).

  • Corollary. Player 1 has a winning strategy from state s∈S

for the safety objective safe(T) iff s∉Attr2(S\T).

slide-62
SLIDE 62

Solution using CPre1

  • Consider the following decreasing sequence
  • U0=T
  • for all i≥1, Ui=Ui-1∩CPre1(Ui-1)
  • As the sequence of Ui is ⊆-decreasing and U0 contains at most |S|

elements, then the sequence stabilizes on the set U* after at most |S| steps.

  • Lemma. For all s∈Ui, there exists a strategy λ1 for Player 1, such that for all

ρ∈Outcome(s,λ1), for all k, 0 ≤ k ≤ i, ρ(k)∈T, i.e. from all state s ∈ Ui , Player 1 has a strategy to stay within T for at least i steps.

  • Corollary. U* is exactly the set of states from which Player 1 has a winning

strategy for the safety objective safe(T).

slide-63
SLIDE 63

Memoryless determinacy of safety games

  • Theorem. Player 1 has a winning strategy in the game

A=(S1,S2,E,sinit,safety(T)) if and only if Player 1 has a winning memoryless strategy in this game if and only if Player 2 has no winning memoryless strategy in this game.

  • Proof. We know that U* is a trap contained in T. So, Player

1 has a memoryless strategy from all states s∈U* to stay within T. For all states that are not in U*, Player 2 has a memoryless strategy to reach an unsafe state in S\T as its

  • bjective is a reachability objective and we know that

those objectives are memoryless determined.

slide-64
SLIDE 64

Memoryless determinacy of reachability and safety games

The following theorem summarizes the previous results.

  • Theorem. Player 1 has a winning strategy from state

s∈S for the reachability objective reach(T) iff he has a memoryless winning strategy. Player 2 has a winning strategy from s∈S for the safety objective safe(S\T)=Sω\reach(T) iff he has a memoryless winning strategy.

slide-65
SLIDE 65

Büchi and coBüchi Games

slide-66
SLIDE 66

Büchi games Decision problem

Given a two-player game arena with a safety

  • bjective A=(S1,S2,E,sinit,Büchi(T)), decide if Player 1

has a winning strategy, i.e. if there exists a strategy λ1 for Player 1 such that Outcome(λ1) ⊆ Büchi(T).

slide-67
SLIDE 67

A recursive algorithm

SolveBüchi
 input: A,T


  • utput: W

if S=∅ then return W:=∅;
 Win1=Attr1(T);
 if Win1=S then return W:=Win1;
 else Lose1=Attr2(S\Win1);
 return W:=SolveBüchi(A[S\Lose1],T\Lose1)

slide-68
SLIDE 68

Correctness of the recursive algorithm

  • Theorem. The procedure SolveBüchi returns the set of states from which Player

1 has a winning strategy for the objective Win1=Büchi(T).
 


  • Proof. (1) If Attr1(T)=S. In this case, Player 1 can force a visit to T from all states

s∈S. His strategy to win Büchi(T) is as follows: play the attractor strategy up to a first visit in T (which exists from all states in S), and repeat.
 
 (2) If Attr1(T)≠S. Then for all states s∈S\Attr1(T), Player 2 has a strategy to always avoid T (as S\Attr1(T) is a trap for Player 1). So, Player 1 should avoid at all cost to enter a state in S\Attr1(T). Clearly Player 2 also win in Attr2(S\Attr1(T)). So, we remove all states in this set and call recursive the procedure SolveBüchi on the sub-arena A[S\Attr2(S\Attr1(T))] with target states T\Attr2(S\Attr1(T)) which gives us, by induction, the set of states from which Player 1 can win.

slide-69
SLIDE 69

Memoryless determinacy of Büchi and coBüchi games

  • Theorem. Player 1 has a winning strategy from state s∈S for the Büchi objective

Büchi(T) iff he has a memoryless winning strategy. Player 2 has a winning strategy from s∈S for the coBüchi(T)=Sω\Büchi(T) iff he has a memoryless winning strategy. 
 


  • Proof. We have shown in the correctness proof that Player 1 simply plays repeatedly

an attractor strategy to reach T. Such a strategy can be chosen to be memoryless. To prove the existence of optimal memoryless strategies for Player 2, we reason by induction: if s has been removed at the first call to the procedure SolveBüchi then s belongs to a trap that does not contain any states in T. A trapping strategy for Player 2 can be assumed to be memoryless. Otherwise, s has been removed at call number

  • i. For states that has been removed at any call j<i, we make the induction hypothesis

that there exists from those states a memoryless strategy that wins the objective coBüchi(T). Now in state s, Player 2 plays an attractor strategy to reach a state that has been removed at an earlier round. Such an attractor strategy can be chosen to be memoryless.