The dynamic logic of policies and contingent planning Andreas - - PowerPoint PPT Presentation

the dynamic logic of policies and contingent planning
SMART_READER_LITE
LIVE PREVIEW

The dynamic logic of policies and contingent planning Andreas - - PowerPoint PPT Presentation

Policies PDL Extending PDL Programs for policies The dynamic logic of policies and contingent planning Andreas Herzig CNRS, IRIT joint work with Thomas Bolander, Thorsten Engesser, Robert Mattm uller, Bernhard Nebel Journ es MAFTEC,


slide-1
SLIDE 1

Policies PDL Extending PDL Programs for policies

The dynamic logic of policies and contingent planning

Andreas Herzig

CNRS, IRIT

joint work with Thomas Bolander, Thorsten Engesser, Robert Mattm¨ uller, Bernhard Nebel

Journ´ es MAFTEC, Rennes, Dec. 6, 2018

1 / 27

slide-2
SLIDE 2

Policies PDL Extending PDL Programs for policies

Motivation

understand policies standard concept in the planning literature

in the game theory literature: strategies central in contingent planning

  • nly semantically defined

syntactic counterpart?

can we describe policies by PDL programs?

long-term aim: understand policies for epistemic planning

⇒ understand implicitly coordinated policies

2 / 27

slide-3
SLIDE 3

Policies PDL Extending PDL Programs for policies

Action models

set of action names Act PDL model = triple MAct = (W, {Ra}a∈Act, V) where

W: non-empty set

(‘states’, ‘possible worlds’)

Ra ⊆ W × W

(‘action a’s transition relation’)

V : W −→ 2Prp

(‘valuation’)

3 / 27

slide-4
SLIDE 4

Policies PDL Extending PDL Programs for policies

Outline

1

Planning tasks and strong policies

2

PDL: language and semantics

3

Extending PDL to account for policies

4

Programs for policies

4 / 27

slide-5
SLIDE 5

Policies PDL Extending PDL Programs for policies

Planning tasks

(S0, γ, MAct)

where

S ⊆ W

(‘set of initial states’)

γ = boolean formula

(‘goal’)

MAct = PDL model

5 / 27

slide-6
SLIDE 6

Policies PDL Extending PDL Programs for policies

Policies

policy = relation

Λ ⊆ W × (Act ∪ {stop})

  • cf. ‘state-action table’ [Cimatti&Roveri 2003]

defined at set of states S ⊆ W iff for every s ∈ S: there is x ∈ Act ∪ {stop} such that (s, x) ∈ Λ strongly executable iff for every (s, a) ∈ Λ:

1

Ra(s) ∅

2

Λ is defined at Ra(s)

6 / 27

slide-7
SLIDE 7

Policies PDL Extending PDL Programs for policies

Strong solutions to planning tasks

Stop(Λ) = {t ∈ W : (t, stop) ∈ Λ}

‘license to stop’

all ‘stop states’ must satisfy the goal (v.i.) if a policy contains (s, a) and (s, stop) then at s one may both to stop and perform a: both must lead to the goal necessary if we want nondeterministic policies

Λ is a strong solution of planning task (S0, γ, MAct) iff:

1

Λ acyclic, finite, strongly executable

2

Λ defined at S0

3

MAct, Stop(Λ) γ

example:

task = (S0, HaveFerrari, MAct) sequential plan = playLottery; buyFerrari

weak solution not a strong solution

7 / 27

slide-8
SLIDE 8

Policies PDL Extending PDL Programs for policies

Properties

If Λ is a strong solution of (S, γ, MAct) then Λ is a strong solution of (Ra(S), γ, MAct) if Λ1 and Λ2 are both strong solutions to (S0, γ, MAct) and Λ1 ∪ Λ2 is acyclic then Λ1 ∪ Λ2 is a strong solution of (S0, γ, MAct)

8 / 27

slide-9
SLIDE 9

Policies PDL Extending PDL Programs for policies

Outline

1

Planning tasks and strong policies

2

PDL: language and semantics

3

Extending PDL to account for policies

4

Programs for policies

9 / 27

slide-10
SLIDE 10

Policies PDL Extending PDL Programs for policies

PDL: language

grammar of formulas ϕ and programs π:

ϕ ::= p | ¬ϕ | ϕ ∧ ϕ | πϕ | [π]ϕ π ::= a | π; π | π∪π | ϕ?

where p ∈ Prp and a ∈ Act

[π]ϕ = “ϕ true after every possible execution of π” πϕ = “ϕ true after some possible execution of π”

so: πϕ ↔ ¬[π]¬ϕ

N.B.: no Kleene star

10 / 27

slide-11
SLIDE 11

Policies PDL Extending PDL Programs for policies

PDL: semantics

interpretation of programs: Rπ1;π2 = Rπ1 ◦ Rπ2 Rπ1∪

π2

= Rπ1 ∪ Rπ2 Rψ? =

{(s, s) : MAct, s ψ}

interpretation of formulas: MAct, s p iff p ∈ V(s) MAct, s ¬ϕ iff . . . MAct, s ϕ ∧ ψ iff . . . MAct, s πϕ iff MAct, t ϕ for some t ∈ Rπ(s) MAct, s [π]ϕ iff MAct, t ϕ for every t ∈ Rπ(s) iff MAct, Rπ(s) ϕ where:

Rπ(S) =

s∈S Rπ(s)

MAct, S ϕ iff MAct, s ϕ for every s ∈ S

11 / 27

slide-12
SLIDE 12

Policies PDL Extending PDL Programs for policies

PDL: weak executability

inexecutability:

[π]⊥

= “⊥ true after every possible execution of π” = “π is inexecutable” executability: wx π

def

= ¬[π]⊥

def

= π⊤

= “π is weakly executable”

weak: I can get by with a little help from nature. . .

wx playLottery; buyFerrari

cannot account for strong solutions of planning tasks

12 / 27

slide-13
SLIDE 13

Policies PDL Extending PDL Programs for policies

Outline

1

Planning tasks and strong policies

2

PDL: language and semantics

3

Extending PDL to account for policies

4

Programs for policies

13 / 27

slide-14
SLIDE 14

Policies PDL Extending PDL Programs for policies

PDL: a new modality

when a is deterministic: “a guarantees outcome ϕ‘’ = aϕ when actions can be nondeterministic: “a guarantees outcome ϕ‘’ = a⊤ ∧ [a]ϕ

= wx a ∧ [a]ϕ

def

= ( [a] )ϕ

14 / 27

slide-15
SLIDE 15

Policies PDL Extending PDL Programs for policies

PDL: a new modality (ctd.)

straightforward extension to sequential plans/programs: “a1; · · · ; an guarantees ϕ”

= “a1 guarantees (a2; · · · ; an guarantees ϕ)” = ( [a1] ) · · · ( [an] )γ

def

= ( [a1; · · · ; an] )ϕ

characterises strong sequential solutions to planning tasks: a1; · · · ; an strong solution of (S0, γ, MAct) iff MAct, S0 (

[a1; · · · ; an] )γ

example:

MAct, S0 playLottery; buyFerrariHaveFerrari

(weak)

MAct, S0 ( [playLottery; buyFerrari] )HaveFerrari

(not strong) 15 / 27

slide-16
SLIDE 16

Policies PDL Extending PDL Programs for policies

PDL: strong executability

strong executability: sx a1; · · · ; an

def

= ( [a1; · · · ; an] )⊤ = ( [a1] ) · · · ( [an] )⊤ = a1⊤ ∧ [a1](( [a2] )· · ·( [an] )⊤) = . . . ↔ wx a1 ∧ [a1]wx a2 ∧ · · · ∧ [a1]· · ·[an−1]wx an = “a1; · · · ; an is strongly executable”

examples: sx playLottery = wx playLottery sx playLottery; buyFerrari = wx playLottery ∧

[playLottery]sx buyFerrari

16 / 27

slide-17
SLIDE 17

Policies PDL Extending PDL Programs for policies

PDL: strong executability (ctd.)

( [π] )ϕ can be reduced to sx π: ( [a1; · · · ; an] )ϕ = ( [a1] ) · · · ( [an] )ϕ = a1⊤ ∧ [a1](( [a2] )· · ·( [an] )ϕ) = . . . ↔ wx a1 ∧ [a1]wx a2 ∧ · · · ∧ [a1]· · ·[an−1]wx an ∧ = [a1][a2]· · ·[an]ϕ = sx a1; · · · ; an ∧ [a1; · · · ; an]ϕ

strong executability of a nondeterministic composition? sx π1∪π2 = ???

17 / 27

slide-18
SLIDE 18

Policies PDL Extending PDL Programs for policies

Extended PDL: language

formulas ϕ and programs π:

ϕ ::= p | ¬ϕ | ϕ ∧ ϕ | πϕ | [π]ϕ | ( [π] )ϕ π ::= a | π; π | π∪π | ϕ?

where p ∈ Prp and a ∈ Act

( [π] )ϕ = “π is strongly executable and guarantees ϕ”

nondeterministic choice π1∪π2:

nature chooses between (possible executions of) π1 and π2 can only choose πi that are weakly executable π1∪π2 equivalent to (wx π1?; π1)∪(wx π2?; π2) consequence: ( [π1∪π2] )ϕ ↔ (wx π1∪π2) ∧ (wx π1→( [π1] )ϕ) ∧ (wx π2→( [π2] )ϕ) cf. [π1∪π2]ϕ ↔ (wx π1∨wx π2)∧ (wx π1→[π1]ϕ) ∧ (wx π2→[π2]ϕ)

18 / 27

slide-19
SLIDE 19

Policies PDL Extending PDL Programs for policies

Extended PDL: interpretation of formulas

MAct, s (

[a] )ϕ

iff Ra(s) ∅ and MAct, Ra(s) ϕ MAct, s (

[π1; π2] )ϕ

iff MAct, s (

[π1] )( [π2] )ϕ

MAct, s (

[π1∪π2] )ϕ

iff MAct, s π1⊤ or MAct, s π2⊤, and if MAct, s π1⊤ then MAct, s (

[π1] )ϕ, and

if MAct, s π2⊤ then MAct, s (

[π2] )ϕ

MAct, s (

[ψ?] )ϕ

iff MAct, s ψ ∧ ϕ

19 / 27

slide-20
SLIDE 20

Policies PDL Extending PDL Programs for policies

Extended PDL: nondeterminism

two kinds of nondeterminism

1

atomic actions: nature chooses among possible executions

MAct, s |= ( [playLottery] )Rich

2

π1∪π2: nature chooses among (weakly) executable disjuncts

MAct, s |= ( [playLottery∪workHard] )Rich MAct, s |= ( [(playLottery; ⊥?)∪workHard] )Rich

20 / 27

slide-21
SLIDE 21

Policies PDL Extending PDL Programs for policies

Axiomatics

axiomatics of dynamic logic reduction axioms for program operators:

( [ψ?] )ϕ ↔ ψ ∧ ϕ ( [π1; π2] )ϕ ↔ ( [π1] )( [π2] )ϕ ( [π1∪π2] )ϕ ↔ (π1⊤ ∨ π2⊤) ∧ (π1⊤ → ( [π1] )ϕ) ∧ (π2⊤ → ( [π2] )ϕ)

reduction axiom for atomic programs:

( [a] )ϕ ↔ a⊤ ∧ [a]ϕ

21 / 27

slide-22
SLIDE 22

Policies PDL Extending PDL Programs for policies

Alternative axiomatisation

( [π] )ϕ can be reduced to sx π: ( [π] )ϕ ↔ sx π ∧ [π]ϕ

axiomatics of strong executability: sx a ↔ wx a sx ψ? ↔ ψ sx π1; π2 ↔ sx π1 ∧ [π1]sx π2 sx π1∪π2 ↔ (sx π1 ∧ sx π2) ∨

(sx π1 ∧ ¬wx π2) ∨ (¬wx π1 ∧ sx π2)

22 / 27

slide-23
SLIDE 23

Policies PDL Extending PDL Programs for policies

Properties

some programs are no longer equivalent: |= sx toss ↔ sx (toss; Heads?)∪(toss; ¬Heads?)

reason: MAct, s wx toss; Heads? MAct, s sx toss; Heads?

however: |= sx toss ↔ sx toss; (Heads?∪¬Heads?)

23 / 27

slide-24
SLIDE 24

Policies PDL Extending PDL Programs for policies

Outline

1

Planning tasks and strong policies

2

PDL: language and semantics

3

Extending PDL to account for policies

4

Programs for policies

24 / 27

slide-25
SLIDE 25

Policies PDL Extending PDL Programs for policies

From programs to policies

given program π and set of states S, recursively define policy Pol

  • S, π
  • :

Pol

  • S, a
  • = (S × {a}) ∪ (Ra(S) × {stop})

Pol

  • {s0}, playL
  • = {(s0, playL), (swin, stop), (sloose, stop)}

Pol

  • S, π1; π2
  • = Pol
  • S, π1

−stop ∪ Pol

  • Stop(Pol
  • S, π1
  • ), π2
  • Pol
  • {s0}, playL; buyF
  • = {(s0, playL), (swin, buyF)}

Pol

  • S, π1∪π2
  • = Pol
  • S, π1
  • ∪ Pol
  • S, π2
  • Pol
  • {L, ¬L}, L?∪(¬L?; toggle)
  • = {(L, stop), (¬L, toggle)}

Theorem

Let MAct be acyclic and finitely branching. Let S0 be finite. If MAct, S0 (

[π] )γ then Pol

  • S0, π
  • strong solution for (S0, γ, MAct).

25 / 27

slide-26
SLIDE 26

Policies PDL Extending PDL Programs for policies

From policies to programs

for a finite model MAct such that valuations can be identified with states:

define a characteristic formula χs for each state s

MAct, s χt iff t = s

for finite and acyclic policy Λ: given set of states S0, recursively define program progΛ,S0

uses characteristic formulas χs well-defined because Λ is finite and acyclic

Theorem

Let MAct be such that V(s) = V(s′) implies s = s′. If Λ strong solution for (S0, γ, MAct) then MAct, S0 (

[progΛ,S0] )γ.

26 / 27

slide-27
SLIDE 27

Policies PDL Extending PDL Programs for policies

Conclusion

logic of (

[π] ) when π is a complex program

difficulty: good semantics of nondeterministic choice

  • cf. semantics of strong permission in deontic logic

characterisation of strong executability almost a normal modal operator

satisfies all principles but necessitation moreover satisfies axiom D: ¬( [a] )⊥

syntactical counterpart of policies: there is a strong solution of (S0, γ, MAct) iff there is a π such that MAct, S0 (

[π] )γ

perspective: implicitly coordinated policies

27 / 27