Knowledge-Based Policies for Qualitative Decentralized POMDPs - - PowerPoint PPT Presentation

knowledge based policies for qualitative decentralized
SMART_READER_LITE
LIVE PREVIEW

Knowledge-Based Policies for Qualitative Decentralized POMDPs - - PowerPoint PPT Presentation

Knowledge-based programs Semantics Mathematical properties Conclusion Knowledge-Based Policies for Qualitative Decentralized POMDPs Abdallah Saffidine Bruno Zanuttini Franc ois Schwarzentruber 68NQRT January 25th, 2018 1 / 50


slide-1
SLIDE 1

Knowledge-based programs Semantics Mathematical properties Conclusion

Knowledge-Based Policies for Qualitative Decentralized POMDPs

Abdallah Saffidine Bruno Zanuttini Franc ¸ois Schwarzentruber

68NQRT

January 25th, 2018

1 / 50

slide-2
SLIDE 2

Knowledge-based programs Semantics Mathematical properties Conclusion

Automation of complex tasks

Building surveillance Nuclear decommissioning Intelligent farming

2 / 50

slide-3
SLIDE 3

Knowledge-based programs Semantics Mathematical properties Conclusion

Multiple robots

more robust/efficient than

3 / 50

slide-4
SLIDE 4

Knowledge-based programs Semantics Mathematical properties Conclusion

Multiple robots

more robust/efficient than

4 / 50

slide-5
SLIDE 5

Knowledge-based programs Semantics Mathematical properties Conclusion

Multiple robots

more robust/efficient than

5 / 50

slide-6
SLIDE 6

Knowledge-based programs Semantics Mathematical properties Conclusion

Multiple robots

more robust/efficient than Settings Cooperative agents; Common goal; Imperfect information; Decentralized execution.

6 / 50

slide-7
SLIDE 7

Knowledge-based programs Semantics Mathematical properties Conclusion

Methodology

Model Goal Planning

a ’s program b ’s program c ’s program

7 / 50

slide-8
SLIDE 8

Knowledge-based programs Semantics Mathematical properties Conclusion

Need: understandable system

Motivation Legal issues in case of failure Interaction with humans

8 / 50

slide-9
SLIDE 9

Knowledge-based programs Semantics Mathematical properties Conclusion

Our contribution: use of knowledge-based programs

KBP for agent a

listenRadio

if a knows strike

toStation

else

toAirport

KBP for agent b

readNewsPaper

if b knows strike

toStation

else

toAirport

Operational Semantics for Knowledge-based programs; (Un)decidability/complexity and succinctness. Extends: [Lang, Zanuttini, ECAI2012, TARK2013]

9 / 50

slide-10
SLIDE 10

Knowledge-based programs Semantics Mathematical properties Conclusion Epistemic formulas Program constructions

Outline

1

Knowledge-based programs Epistemic formulas Program constructions

2

Semantics

3

Mathematical properties

4

Conclusion

10 / 50

slide-11
SLIDE 11

Knowledge-based programs Semantics Mathematical properties Conclusion Epistemic formulas Program constructions

Outline

1

Knowledge-based programs Epistemic formulas Program constructions

2

Semantics

3

Mathematical properties

4

Conclusion

11 / 50

slide-12
SLIDE 12

Knowledge-based programs Semantics Mathematical properties Conclusion Epistemic formulas Program constructions

Properties expressed in epistemic logic

Language constructions

room 43 is safe door 12 is locked

. . .

not ...

(... or ...) (... and ...) (... → ...) (... knows ...) (... knowswhether ...)

Example

(a knows door 12 is locked ) and not (c knows door 12 is locked )

a knowswhether (c knows door 12 is locked )

12 / 50

slide-13
SLIDE 13

Knowledge-based programs Semantics Mathematical properties Conclusion Epistemic formulas Program constructions

Outline

1

Knowledge-based programs Epistemic formulas Program constructions

2

Semantics

3

Mathematical properties

4

Conclusion

13 / 50

slide-14
SLIDE 14

Knowledge-based programs Semantics Mathematical properties Conclusion Epistemic formulas Program constructions

Program constructions

Language constructions

turn left stay broadcast temperature

...; ...

if ϕ then ...else ... while ϕ do ... Example (knowledge-based program for agent a) if a knows ( door 12 is locked and justobserved( )) then

turn left broadcast temperature

else

stay

14 / 50

slide-15
SLIDE 15

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Outline

1

Knowledge-based programs

2

Semantics Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

3

Mathematical properties

4

Conclusion

15 / 50

slide-16
SLIDE 16

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Outline

1

Knowledge-based programs

2

Semantics Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

3

Mathematical properties

4

Conclusion

16 / 50

slide-17
SLIDE 17

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

QdecPOMDP

Qualitative decentralized Partially Observable Markov Decision Processes = Concurrent game structures with observations.

Transitions of the form: state1 state2 a:

stay

b:

turn left

a: b: A non-empty set of possible initial states; A set of goal states.

17 / 50

slide-18
SLIDE 18

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

States

Typically, a state describes: positions of agents; battery levels; etc.

18 / 50

slide-19
SLIDE 19

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Outline

1

Knowledge-based programs

2

Semantics Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

3

Mathematical properties

4

Conclusion

19 / 50

slide-20
SLIDE 20

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Prototype

http://people.irisa.fr/Francois.Schwarzentruber/ hintikkasworld/

20 / 50

slide-21
SLIDE 21

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Semantics of epistemic formulas

Epistemic structure S, w

S, w |= a knows ϕ

iff for all u, w ∼a u implies

S, u |= ϕ .

21 / 50

slide-22
SLIDE 22

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Outline

1

Knowledge-based programs

2

Semantics Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

3

Mathematical properties

4

Conclusion

22 / 50

slide-23
SLIDE 23

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Operational semantics

Epistemic structure Higher-order knowledge about: the current state of the QdecPOMDP; the current program counters in KBPs.

23 / 50

slide-24
SLIDE 24

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Assumptions

Common knowledge of: the QdecPOMDP; the KBPs; synchronicity of the system;

tests last 0 unit of time; actions last 1 unit of time.

KBP for agent a

listenRadio

if a knows strike

toStation

else

toAirport

KBP for agent b

readNewsPaper

if b knows strike

toStation

else

toAirport

24 / 50

slide-25
SLIDE 25

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Epistemic structures at time T: worlds

listenRadio

if Kastrike then

toStation

else

toAirport

Worlds = consistent

(wait few slides)

histories of the form s0−

pc0 −

− →

  • bs1s1−

pc1

. . . − − →

  • bsTsT−

pcT where

− − →

  • bst

vector of observations at time t st state at time t

− →

pct vector of program counters at time t

25 / 50

slide-26
SLIDE 26

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Epistemic structures at time t: indistinguishability relations

Agent a confuses two histories iff she has received the same

  • bservations.

s0−

pc0 −

− →

  • bs1s1−

pc1

. . . − − →

  • bsTsT−

pcT

∼a

s′0−

pc′0 −

− →

  • bs′1s′1−

pc′1 . . . −

− →

  • bs′Ts′T−

pc′T iff for all t ∈ {1, . . . , T},

− − →

  • bst

a = −

− →

  • bs′t

a

26 / 50

slide-27
SLIDE 27

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Program counters

Definition (Program counter) (guard, action just executed, continuation)

listenRadio

if Kastrike then

toStation

else

toAirport

(⊤, start , )

  • ⊤, listenRadio ,
  • Kastrike, toStation ,
  • ¬Kastrike, toAirport ,
  • 27 / 50
slide-28
SLIDE 28

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Control-flow graph

listenRadio

if Kastrike then

toStation

else

toAirport

(⊤, start , )

  • ⊤, listenRadio ,
  • Kastrike, toStation ,
  • ¬Kastrike, toAirport ,
  • 28 / 50
slide-29
SLIDE 29

Knowledge-based programs Semantics Mathematical properties Conclusion Models: QdecPOMDP Interlude: semantics of epistemic formulas Operational semantics of KBPs

Consistent histories (explained with one agent)

In the QdecPOMDP: s0

listenRadio ,

− − − − − − − − − − − − − → s1

s1

toStation ,

− − − − − − − − − − − → s2

KBP control-flow graph

listenRadio

if Kastrike then

toStation

else

toAirport

(⊤, start , )

  • ⊤, listenRadio ,
  • Kastrike, toStation ,
  • ¬Kastrike, toAirport ,
  • s0 (⊤, start ,

)

s1

⊤, listenRadio ,

  • |=Kastrike

s2 Kastrike, toStation ,

  • 29 / 50
slide-30
SLIDE 30

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical properties Verification Execution problem Succinctness

4

Conclusion

30 / 50

slide-31
SLIDE 31

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical properties Verification Execution problem Succinctness

4

Conclusion

31 / 50

slide-32
SLIDE 32

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Verification problem

Input: A QdecPOMDP model; Knowledge-based programs for each agent; Output: yes if all executions of the KBPs lead to a goal state.

32 / 50

slide-33
SLIDE 33

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Verification problem for while-free KBPs

Theorem The verification problem for while-free KBPs is PSPACE-complete. Proof idea. Upper bound: on-the-fly model checking; Lower bound: reduction from TQBF .

agent 1 value of p1 agent 2 value of p2 agent 3 value of p3

33 / 50

slide-34
SLIDE 34

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Verification problem for while-free KBPs

Theorem The verification problem for while-free KBPs is PSPACE-complete. Proof idea. Upper bound: on-the-fly model checking; Lower bound: reduction from TQBF .

agent 1 value of p1 agent 2 value of p2 agent 3 value of p3

34 / 50

slide-35
SLIDE 35

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Verification problem for while-free KBPs

Theorem The verification problem for while-free KBPs is PSPACE-complete. Proof idea. Upper bound: on-the-fly model checking; Lower bound: reduction from TQBF .

agent 1 value of p1 agent 2 value of p2 agent 3 value of p3

35 / 50

slide-36
SLIDE 36

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Verification problem for while-free KBPs

Theorem The verification problem for while-free KBPs is PSPACE-complete. Proof idea. Upper bound: on-the-fly model checking; Lower bound: reduction from TQBF .

agent 1 value of p1 agent 2 value of p2 agent 3 value of p3

36 / 50

slide-37
SLIDE 37

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Verification problem for general KBPs

Theorem The verification problem for general KBPs is undecidable. Proof idea. Reduction from the halting problem of a Turing machine on input ǫ.

. . .

37 / 50

slide-38
SLIDE 38

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical properties Verification Execution problem Succinctness

4

Conclusion

38 / 50

slide-39
SLIDE 39

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Execution problem

Input: an agent a; a QdecPOMDP model; policies (e.g. KBPs), one for each agent; a local view of the history for agent a. Output: the action act agent a should take.

39 / 50

slide-40
SLIDE 40

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Execution problem

Input: an agent a; a QdecPOMDP model; policies (e.g. KBPs), one for each agent; a local view of the history for agent a; an action act. Output: yes, if the next action of agent a is act; no otherwise.

40 / 50

slide-41
SLIDE 41

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Reactive policy representation

Definition (reactive policy representation) A class of policy representations is reactive iff its corresponding execution problem is in P . Example (Tree policies are reactive policy representation) if justobserved( ) then turn left else stay Unless P = PSPACE, KBPs are not reactive. Indeed: Proposition The execution problem for KBPs is PSPACE-complete.

41 / 50

slide-42
SLIDE 42

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical properties Verification Execution problem Succinctness

4

Conclusion

42 / 50

slide-43
SLIDE 43

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Modal depth

Modal depth = number of nested ‘... knows ’ operators. Formulas Modal depths justobserved( ) a knows p 1 a knows (b knows p) 2

43 / 50

slide-44
SLIDE 44

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Succinctness

Theorem (

[Lang, Zanuttini, 2012] for d = 1; [AAAI2018], for d > 1)

Let d ≥ 1. There is a poly(n)-size QdecPOMDP family (Mn,d)n∈N for which:

1

there is a d-modal depth poly(n)-size valid KBP family;

2

no (d − 1)-modal depth valid KBP family;

3

assuming NP P/ poly, for any reactive policy representations, no poly(n)-size valid policy family.

44 / 50

slide-45
SLIDE 45

Knowledge-based programs Semantics Mathematical properties Conclusion Verification Execution problem Succinctness

Succinctness

Theorem (

[Lang, Zanuttini, 2012] for d = 1; [AAAI2018], for d > 1)

Let d ≥ 1. There is a poly(n)-size QdecPOMDP family (Mn,d)n∈N for which:

1

there is a d-modal depth poly(n)-size valid KBP family;

2

no (d − 1)-modal depth valid KBP family;

3

assuming NP P/ poly, for any reactive policy representations, no poly(n)-size valid policy family. Proof idea. Mn,d : run a poly(n)-time protocol revealing a poly(n)-size 3-CNF β;

β satisfiable iff a d-md non d − 1-md expressible epistemic property holds.

45 / 50

slide-46
SLIDE 46

Knowledge-based programs Semantics Mathematical properties Conclusion

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical properties

4

Conclusion

46 / 50

slide-47
SLIDE 47

Knowledge-based programs Semantics Mathematical properties Conclusion

Conclusion

Model Goal Planning

a ’s KBP b ’s KBP c ’s KBP a ’s reactive policy b ’s reactive policy c ’s reactive policy

47 / 50

slide-48
SLIDE 48

Knowledge-based programs Semantics Mathematical properties Conclusion

Perspectives

Implementation of the verification problem; Heuristics for the planning problem; More tractable fragments; decPOMDP (with probabilities); Temporal properties; Strategic reasoning; Develop proof systems for KBPs. Use of Coq?

48 / 50

slide-49
SLIDE 49

Knowledge-based programs Semantics Mathematical properties Conclusion

Coming soon... New graphics for Hintikka’s world...

49 / 50

slide-50
SLIDE 50

Knowledge-based programs Semantics Mathematical properties Conclusion

Trugarez bras. Merci. Thank you. Dank u wel. Feel free to use it!

http://people.irisa.fr/Francois.Schwarzentruber/ hintikkasworld/

50 / 50