Jean-François Raskin Université Libre de Bruxelles
MOVEP 2020 Grenoble
Two-Player Zero-sum Games Played on Graphs: -Regular and - - PowerPoint PPT Presentation
Two-Player Zero-sum Games Played on Graphs: -Regular and Quantitative Objectives Jean-Franois Raskin Universit Libre de Bruxelles MOVEP 2020 Grenoble Introduction : Motivations and Basic Definitions The Basic Model
Jean-François Raskin Université Libre de Bruxelles
MOVEP 2020 Grenoble
1 3 4
2 5
How to play ? One token is placed on initial state Players play ω rounds: in each round: the player that owns the state moves the token to an adjacent state Outcome=ω-path
1 3 4
2 5
1 3 4
2 5
1 3 4
2 5
1 3 4
2 5
1 3 4
2 5
1 3 4
2 5
1 3 4
2 5
1 3 4
2 5
1 3 4
2 5
1 3 4
2 5
☞ support the design process with automatic synthesis
?
ψ
ØSys is constructed by an algorithm ØSys is correct by construction ØUnderlying theory: 2-player zero-sum games ØEnv is adversarial (worst-case assumption) Winning strategy = Correct Sys
Env
||
Nondeterministic finite tree automata emptiness =two-player zero-sum reachability game
Language inclusion between non-deterministic Büchi automata =two-player zero-sum Parity game with imperfect information
… and many more …
Spec A Implementation B
≽
Simulation ? Safety game between Prover and Spoiler Spoiler proposes moves in B Prover tries to match them in A Prover wins if he never fails to match the proposed moves
The model-checking problem for the Mu-Calculus can be reduced to the problem of deciding the winner in a zero-sum two-player parity game
Embedded Control Parts of OS/Chipset Communica<on Protocols Security Protocols
In most of those examples, quantitative measures
?
ψ Env
||
both the model and the property exhibit quantities
dimensions
A finite (turn-based) two-player game arena is a tuple A=(S1,S2,E,sinit) where:
We write S for S1∪S2
stated, we assume that E is total, i.e. ∀s∈S,∃s’∈S:(s,s’)∈E
Sometimes, we consider arena without designated initial states.
such that for all i≥0, (si,si+1) ∈ E
plays and the set of initial plays of A
states, a history π=s0 s1 … sn belongs to Player 1 if its last state, noted last(π)=sn, is in S1, and to Player 2 if last(π)∈S2
Player that owns the state s chooses a successor state s’ which is E-adjacent to s, i.e. (s,s’) ∈ E. The token is then moved to s’ and a new round is started from there
interaction is started from sinit
maps histories to states: λj : S*.Sj → S with the restriction that for all histories π ∈ Histj(A), (last(π),λj(π)) ∈ E, i.e. the strategy always proposes a (next) state that is E-adjacent to the last state of the history.
Strategy for Player 1 = One choice in each node
i≥0 s.t. last(ρ(0..i)) ∈ Sj, ρ(i+1)= λj(ρ(0..i)), i.e. the position that follows a position that belongs to Player j is determined by λj.
compatible with λj. We write OutcomeA(s,λ1,λ2) the unique play that starts in s and is both compatible with λ1 and λ2
assumed to be equal to sinit, the initial state of A
for all histories π,π’ ∈ S*.Sj such that last(π)=last(π’), λj(π)=λj(π’), i.e. the strategy only depends on the last state of the history
there exists a equivalence relation ≈ on Histj(A) of finite index such that for all π,π’ ∈ S*.Sj s.t. π≈π’: λj(π)=λj(π’)
strategy can be represented as a synchronous transducer
1 3 4
2 5
1 3 4
2 5
A memoryless strategy for Player 1
1 3 4
2 5
4 4 5 5 ε 4 3 4 5 5 ε 4 3 1
Strategy:
and 1 every even visit
1 4
are the « good » plays for Player j
and two winning objectives, one for each player
lectures, we will only consider zero-sum games and we will most often leave the winning objective of Player 2 implicit, because in this case Win2=Sω\Win1
winning strategy if OutcomeA(sinit,λj)⊆Winj
winning for one player or for the other
i.e. one player that has a winning strategy ?
winning objective for Player 1
either there exists a strategy λ1 for Player 1 s.t. Outcome(λ1)⊆Win1,
For all two-player turn based arenas A=(S1,S2,E,sinit) with a zero-sum winning condition, for all Borel definable objectives Win1, (A,Win1) is determined Borel set=is any set in a topological space that can be formed from open sets using countable union, countable intersection, and complement Rule of thumb: any reasonable set included in Sω that you can think of is a Borel set
∪ { s∈S2 | ∀ s’∈T: (s,s’)∈E}
∪ { s∈S1 | ∀ s’∈T: (s,s’)∈E}
enforce the set T in one step
✔ ✕ ✕ ✔
✔ ✕ ✕ ✔
stabilizes in at most |S| steps, and we note A* the set on which the sequence stabilizes.
Player j of the set T.
G 1 2 3 4 5 A0=T={4} A1=A0∪CPre1(A0)={1,3,4} A2=A1∪CPre1(A1)={1,2,3,4} A3=A1∪CPre1(A1)={1,2,3,4}=A2 Attr1({4})={1,2,3,4} 6
j, such that for all ρ∈Outcome(λj), there exists k, 0 ≤ k ≤ i, s.t. ρ(k)∈T, i.e. Ai is exactly the set of states from which Player j can force T in i steps or less.
each state that belongs to Sj and Ai, either s∈T or let k, 0≤k≤i, be the minimal index such that s∈Ak, then for such s there exists a choice s’ s.t. (s,s’)∈E and s’∈Ak-1, the strategy in s always chooses s’. It is easy to see that this memoryless strategy forces T in i steps or less from all s∈Ai.
Player j, such that for all ρ∈Outcome(λj), there exists k≥0 s.t. ρ(k)∈T.
s∈U∩Sj, for all s’∈S s.t. (s,s’)∈E, s’∈U and for all s∈U∩S3-j, there exists s’∈S s.t. (s,s’)∈E and s’∈U. So a trap for Player j is a set of states from which this player cannot escape (and the other can keep staying there).
Player j.
Player 3-j is a strategy that in a state s∈U∩S3-j always chooses a successor which is in U (such a successor exists by definition).
then there is a memoryless trapping strategy (which always chooses the same successors of s when entering state s of Player j).
strategy of Player 3-j such that for all ρ∈Outcome(λj), for all 0 ≤ k: ρ(k)∉T.
s∈U such that (s,s’)∈U then A[U] is the sub-arena A[U]=(S1∩U,S2∩U,E∩(U×U)) and it is total if A is total.
then A[U] is a sub-arena.
G 1 2 3 4 5 Attr1({4})={1,2,3,4} S\Attr1({4})={5,6} is a trap for Player 1 It define the sub-arena 6 5 6
Let A=(S1,S2,E,sinit) be a two-player turn based arena
reach(T)={ ρ∈Sω | visit(ρ) ∩ T ≠ ∅}
states T ⊆ S to remain in forever: safe(T)={ ρ∈Sω | visit(ρ) ⊆ T }
Let A=(S1,S2,E,sinit) be a two-player turn based arena
visited infinitely often: Büchi(T)={ ρ∈Sω | inf(ρ) ∩ T ≠ ∅}
reached eventually and never be left: coBüchi(T)={ ρ∈Sω | inf(ρ) ⊆ T }
function p : S → {0,1,…,d} as the set of sequences in Sω s.t. the smallest priority visited infinitely often is even: parity(p)={ ρ∈Sω | min{ p(s) | s ∈ inf(ρ) } is even }.
Given a two-player game arena with a reachability
has a winning strategy in this game, i.e. if there exists a strategy λ1 for Player 1 such that Outcome(λ1) ⊆ reach(T).
G 1 2 3 4 5 6 What are the states from which Player 1 can force to reach T={4} ?
computing an attractor.
the existence of memoryless strategy in attractors.
state s∈S for the reachability objective reach(T) iff s∈Attr1(T). Furthermore, Player 1 has a winning strategy if and only if Player 1 has a memoryless winning strategy.
G
1 4 3 4
5 6 A0=T={4} A1=A0∪CPre1(A0)={1,3,4} A2=A1∪CPre1(A1)={1,2,3,4} A3=A1∪CPre1(A1)={1,2,3,4}=A2 Attr1({4})={1,2,3,4}
Given a two-player game arena with a safety
if Player 1 has a winning strategy in this game, i.e. if there exists a strategy λ1 for Player 1 such that Outcome(λ1) ⊆ safe(T).
games: safe(T)=Sω\reach(S\T)
duality between the two objectives, we get:
state s, Player 1 has a winning strategy from s in A for safe(T) if and only if Player 2 does not have a winning strategy from s in A for reach(S\T).
for the safety objective safe(T) iff s∉Attr2(S\T).
elements, then the sequence stabilizes on the set U* after at most |S| steps.
ρ∈Outcome(s,λ1), for all k, 0 ≤ k ≤ i, ρ(k)∈T, i.e. from all state s ∈ Ui , Player 1 has a strategy to stay within T for at least i steps.
strategy for the safety objective safe(T).
A=(S1,S2,E,sinit,safety(T)) if and only if Player 1 has a winning memoryless strategy in this game if and only if Player 2 has no winning memoryless strategy in this game.
1 has a memoryless strategy from all states s∈U* to stay within T. For all states that are not in U*, Player 2 has a memoryless strategy to reach an unsafe state in S\T as its
those objectives are memoryless determined.
The following theorem summarizes the previous results.
s∈S for the reachability objective reach(T) iff he has a memoryless winning strategy. Player 2 has a winning strategy from s∈S for the safety objective safe(S\T)=Sω\reach(T) iff he has a memoryless winning strategy.
Given a two-player game arena with a safety
has a winning strategy, i.e. if there exists a strategy λ1 for Player 1 such that Outcome(λ1) ⊆ Büchi(T).
SolveBüchi input: A,T
if S=∅ then return W:=∅; Win1=Attr1(T); if Win1=S then return W:=Win1; else Lose1=Attr2(S\Win1); return W:=SolveBüchi(A[S\Lose1],T\Lose1)
1 has a winning strategy for the objective Win1=Büchi(T).
s∈S. His strategy to win Büchi(T) is as follows: play the attractor strategy up to a first visit in T (which exists from all states in S), and repeat. (2) If Attr1(T)≠S. Then for all states s∈S\Attr1(T), Player 2 has a strategy to always avoid T (as S\Attr1(T) is a trap for Player 1). So, Player 1 should avoid at all cost to enter a state in S\Attr1(T). Clearly Player 2 also win in Attr2(S\Attr1(T)). So, we remove all states in this set and call recursive the procedure SolveBüchi on the sub-arena A[S\Attr2(S\Attr1(T))] with target states T\Attr2(S\Attr1(T)) which gives us, by induction, the set of states from which Player 1 can win.
Büchi(T) iff he has a memoryless winning strategy. Player 2 has a winning strategy from s∈S for the coBüchi(T)=Sω\Büchi(T) iff he has a memoryless winning strategy.
an attractor strategy to reach T. Such a strategy can be chosen to be memoryless. To prove the existence of optimal memoryless strategies for Player 2, we reason by induction: if s has been removed at the first call to the procedure SolveBüchi then s belongs to a trap that does not contain any states in T. A trapping strategy for Player 2 can be assumed to be memoryless. Otherwise, s has been removed at call number
that there exists from those states a memoryless strategy that wins the objective coBüchi(T). Now in state s, Player 2 plays an attractor strategy to reach a state that has been removed at an earlier round. Such an attractor strategy can be chosen to be memoryless.