Kowledge-Based Programs as Explainable Policies for Contingent Planning
- J. Lang, A. Saffidine, F. Schwarzentruber, B. Zanuttini
MAFTEC, April 1, 2019
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 1/40
Kowledge-Based Programs as Explainable Policies for Contingent - - PowerPoint PPT Presentation
Kowledge-Based Programs as Explainable Policies for Contingent Planning J. Lang, A. Saffidine, F. Schwarzentruber, B. Zanuttini MAFTEC, April 1, 2019 Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 1/40 Planning
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 1/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 2/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 3/40
1, 1 1, 2 1, 3 2, 1 2, 2 2, 3 1 3, 1 1 4, 1; 4, 2 4, 3 1 4, 2 1 4, 1 2 2, 3 1 3, 3 4, 2; 4, 3 4, 1 1 4, 2 4, 3 1, 3 1 2, 1 1 3, 1 3, 3; 4, 1; 4, 2 3, 3 1 4, 2; 4, 3 1 4, 1; 4, 3 2 1, 2 1 1, 3 1 2, 3 3, 1 3, 3 1 4, 1; 4, 2 1 3, 3 2 4, 2; 4, 3 4, 1; 4, 3 1
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 3/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 4/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 5/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 6/40
◮ X = {x1, . . . , xn}: propositional variables
◮ A = {a1, . . . , ak}: actions ◮ O = {o1, . . . , op}: observations ◮ ϕδ: transition function
◮ variables X: mi,j, ci,j (∀i, j)
◮ actions A: clicki,j (∀i, j) ◮ observations O: o0, . . . , o8 + olost
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 7/40
◮ ontic effects: change current state (nondeterministic) ◮ epistemic effects: yield observation (nondeterministic, ambiguous)
i,j)
i,j = c′ i,j ∧ (mi,j → olost) ∧
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 8/40
◮ Domain + initial belief state + goal states ◮ Same as POMDPs except for proba.
◮ initial belief state:
◮ goals: i,j(ci,j ⊕ mi,j)
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 9/40
◮ Prescribe the agent what action to take ◮ Cannot be as a function of current state ◮ Function from histories actions/observations to actions ◮ Abstract notion
◮ let (pt)t = 1, 1, 1, 2, 1, 3 . . . ◮ π def. by π := clickp|h| ◮ π′ def. by
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 10/40
◮ current state st (nonobservable) ◮ current history ht = a0o0a1o1 . . . atot ◮ action at = π(ht) executed (or “stop”) ◮ observation ot + new state st+1 chosen nondet. wrt ϕδ ◮ ot given to agent ◮ st+1 = new current state ◮ new current history ht+1 = htatot
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 11/40
1, 1 1, 2 1, 3 2, 1 2, 2 2, 3 1 3, 1 1 4, 1; 4, 2 4, 3 1 4, 2 1 4, 1 2 2, 3 1 3, 3 4, 2; 4, 3 4, 1 1 4, 2 4, 3 1, 3 1 2, 1 1 3, 1 3, 3; 4, 1; 4, 2 3, 3 1 4, 2; 4, 3 1 4, 1; 4, 3 2 1, 2 1 1, 3 1 2, 3 3, 1 3, 3 1 4, 1; 4, 2 1 3, 3 2 4, 2; 4, 3 4, 1; 4, 3 1 Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 12/40
1, 1 1, 2 1, 3 2, 1 2, 2 2, 3 1 3, 1 1 4, 1; 4, 2 4, 3 1 4, 2 1 4, 1 2 2, 3 1 3, 3 4, 2; 4, 3 4, 1 1 4, 2 4, 3 1, 3 1 2, 1 1 3, 1 3, 3; 4, 1; 4, 2 3, 3 1 4, 2; 4, 3 1 4, 1; 4, 3 2 1, 2 1 1, 3 1 2, 3 3, 1 3, 3 1 4, 1; 4, 2 1 3, 3 2 4, 2; 4, 3 4, 1; 4, 3 1
? ? ? ? 1 ? ? 2 ? ? ? ?
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 12/40
1, 1 1, 2 1, 3 2, 1 2, 2 2, 3 1 3, 1 1 4, 1; 4, 2 4, 3 1 4, 2 1 4, 1 2 2, 3 1 3, 3 4, 2; 4, 3 4, 1 1 4, 2 4, 3 1, 3 1 2, 1 1 3, 1 3, 3; 4, 1; 4, 2 3, 3 1 4, 2; 4, 3 1 4, 1; 4, 3 2 1, 2 1 1, 3 1 2, 3 3, 1 3, 3 1 4, 1; 4, 2 1 3, 3 2 4, 2; 4, 3 4, 1; 4, 3 1
? ? ? ? 1 ? ? 2 ? ? ? ? 1 ? ? ? 1 ? ? 2 ? ? ? ?
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 12/40
1, 1 1, 2 1, 3 2, 1 2, 2 2, 3 1 3, 1 1 4, 1; 4, 2 4, 3 1 4, 2 1 4, 1 2 2, 3 1 3, 3 4, 2; 4, 3 4, 1 1 4, 2 4, 3 1, 3 1 2, 1 1 3, 1 3, 3; 4, 1; 4, 2 3, 3 1 4, 2; 4, 3 1 4, 1; 4, 3 2 1, 2 1 1, 3 1 2, 3 3, 1 3, 3 1 4, 1; 4, 2 1 3, 3 2 4, 2; 4, 3 4, 1; 4, 3 1
? ? ? ? 1 ? ? 2 ? ? ? ? 1 ? ? ? 1 ? ? 2 ? ? ? ? 1 1 ? ? 1 ? ? 2 ? ? ? ?
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 12/40
1, 1 1, 2 1, 3 2, 1 2, 2 2, 3 1 3, 1 1 4, 1; 4, 2 4, 3 1 4, 2 1 4, 1 2 2, 3 1 3, 3 4, 2; 4, 3 4, 1 1 4, 2 4, 3 1, 3 1 2, 1 1 3, 1 3, 3; 4, 1; 4, 2 3, 3 1 4, 2; 4, 3 1 4, 1; 4, 3 2 1, 2 1 1, 3 1 2, 3 3, 1 3, 3 1 4, 1; 4, 2 1 3, 3 2 4, 2; 4, 3 4, 1; 4, 3 1
? ? ? ? 1 ? ? 2 ? ? ? ? 1 ? ? ? 1 ? ? 2 ? ? ? ? 1 1 ? ? 1 ? ? 2 ? ? ? ? 1 1 ? 1 ? ? 2 ? ? ? ?
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 12/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 13/40
◮ Node = action, edge = observation ◮ One history = one branch ◮ No child: stop
◮ typical output of planners ◮ policy typically found as a tree ◮ DAGs when equivalent situations detected ◮ very verbose
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 14/40
a0 a1 a0 a2 a3 a4
◮ direct search in policy space ◮ representation of infinite policies
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 15/40
◮ there are other representations ◮ implicit representation as DAG (Brafman & Hoffmann 2005) ◮ with “pseudo-epistemic” literals (Albore, Geffner & Palacios 2009)
◮ branching on observations ◮ “reactive”: execution is instantaneous at each timestep ◮ unreadable
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 16/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 17/40
◮ subjective epistemic formula over variables X ◮ jo(o) for some observation o
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 18/40
i,j(ci,j ⊕ mi,j) do
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 19/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 20/40
◮ S5 semantics ◮ Belief state = set of possible states B ◮ B |
◮ B |
1 1 * 1 2 2 * 1 1 1 * 1 2 2 1 1 * 1 1 1 * 1 1 2 1 1 *
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 21/40
◮ Agent maintains current belief state Bt
◮ Agent maintains last observation received ot−1
◮ Same for [while Φ . . . ] and [while jo(o) . . . ] ◮ A KBP represents a policy given B0 and o−1
? ? ? ? 1 ? ? 2 ? ? ? ?
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 22/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 23/40
◮ with description polysize in n ◮ with valid KBPs (κn)n polysize in n ◮ with no valid reactive policy (πn)n polysize in n
◮ for any reactive representation ◮ more succinct than FSCs in the sense of KC (modulo repr. of KBPs) ◮ proof based on 3SAT but could have been on Minesweeper
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 24/40
◮ no auxiliary variable in KBP → polysize memory ◮ + polysize π → termination in at most exponential time, or loop ◮ valid reactive π cannot execute for more than 0(|π|2n) steps
◮ no auxiliary variable but 22n belief states ◮ there is a polysize KBP going through all of them and terminating ◮ can be used as a 22n-step clock ◮ a very very small clock. . . or a very very slow KBP!
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 25/40
◮ embed policy in low-memory device: robot, satelite. . . ◮ transmit policy quickly or over low bandwith: other planet. . . ◮ enhances readability/explainability by humans
◮ succinct + clear and simple semantics ◮ without jo: abstracts away from sensor model
◮ missions with high cost, high risk, high stake ◮ policy reviewed/written by experts ◮ autonomous agents in society: airport surveillance. . .
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 26/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 27/40
◮ reactive representations ◮ execution in low polynomial time at each step
◮ deciding next action to take: complete for Θ2 P = PNP ◮ however belief state is maintained
◮ PNP: use SAT solvers ◮ factored belief tracking (Brafman & Shani, Geffner)
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 28/40
◮ given: contingent planning problem Π ◮ given: some policy π (e.g., written by experts) ◮ decide whether π is valid for Π
◮ reactive representations: in PSPACE ◮ KBPs with while : EXPSPACE-hard ◮ without while : Π2 P-hard (vs in NP)
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 29/40
◮ let experts write KBPs ◮ automatically verify ◮ keep/store/transmit succinct representation ◮ if needed by application, unroll (parts) into reactive representation ◮ example use case: next plans for robot on Mars
◮ initially intended to be so ◮ literature taking this view (Van Der Meyden) ◮ automatic refinement of specifications through operational semantics
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 30/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 31/40
◮ succinct (provably more than FSCs) ◮ more high-level than standard policies ◮ abstract away from sensor model ◮ but harder to execute and verify
◮ as readable specification language for policies ◮ for low memory or low bandwidth ◮ explainable policies for robots in society
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 32/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 33/40
◮ Allˆ
◮ OK, et `
◮ J’ai cass´
◮ J’´
◮ Alice prend le train, ´
◮ Donc Alice sait que Bob sait qu’il y a une gr`
◮ Donc Alice sait que Bob ira la chercher `
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 34/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 35/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 36/40
◮ same as standard policies ◮ 2-EXPTIME-complete in the general case ◮ better complexity for some subclasses
◮ regression from goals ◮ need for efficient data structures (BDD like) ◮ work in progress ◮ with Ana¨
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 37/40
1, 1 1, 2 1, 3 2, 1 2, 2 2, 3 1 3, 1 1 4, 1; 4, 2 4, 3 1 4, 2 1 4, 1 2 2, 3 1 3, 3 4, 2; 4, 3 4, 1 1 4, 2 4, 3 1, 3 1 2, 1 1 3, 1 3, 3; 4, 1; 4, 2 3, 3 1 4, 2; 4, 3 1 4, 1; 4, 3 2 1, 2 1 1, 3 1 2, 3 3, 1 3, 3 1 4, 1; 4, 2 1 3, 3 2 4, 2; 4, 3 4, 1; 4, 3 1
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 38/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 39/40
Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 40/40