Completeness of Online Planners for Partially Observable - PowerPoint PPT Presentation

Completeness of Online Planners for Partially Observable Deterministic Tasks Blai Bonet Gabriel Formica Melecio Ponte Universidad Sim´ on Bol´ ıvar, Venezuela ICAPS. Pittsburgh, USA. June 2017.

Motivation Many online planners for partially observable deterministic tasks (e.g. Brafman & Shani 2016, B. & Geffner 2014, Maliah et al. 2014, . . . ) Some planners offer guarantees over classes of problems But theoretical analyses are often overly complex and specific to the planners and tasks Want to develop general framework for analysis of online planning 2 of 18

Model for POD Tasks Partially observable deterministic tasks correspond to tuples P = ( S, A, S init , S G , f, O, Ω) where: – S is finite state space – A is finite set of actions where A ( s ) is set of actions applicable at s – S init ⊆ S is set of possible initial states – S G ⊆ S is set of goal states – f : S × A → S is deterministic transition function – O is finite set of observation tokens – Ω : S × A → O is deterministic sensing model 3 of 18

Executions and Belief States Agent sees observable executions ; an observable execution is a finite interleaved sequence of actions and observations: τ = � a 0 , o 0 , a 1 , o 1 , . . . � Belief b τ = states deemed possible after seeing execution τ : – b �� = S init – b � τ,a � = { s ′ ∈ S : there is s ∈ b τ and s ′ = f ( s, a ) } (progression) – b � τ,a,o � = { s ′ ∈ b � τ,a � : Ω( s ′ , a ) = o } (filtering) a o b τ − → b � τ,a � − → b � τ,a,o � Belief tracking on factored models is intractable! 4 of 18

Online Planner: Closed-Loop Controller action a Planner World obs o possible actions execution τ Planner π π ( τ ) = π ( P, τ ) 5 of 18

Two Components in Online Planners Planner π τ Belief Tracking b τ ⊆ b π τ approx. π ( τ ) Action Selection 6 of 18

Online Protocol Use of planner in online setting normed/modeled by protocol Protocol L = ( P, s ) determined by task P and initial state s : 1. Let λ = � s � be initial state trajectory seeded at s 2. Let τ = �� be empty execution 3. While b π τ ⊆ S G (i.e. agent isn’t sure of reaching goal) do 4. Run planner π on input τ to obtain set of applicable actions π ( τ ) 5. If π ( τ ) is empty, terminate with FAILURE 6. Non-deterministically choose action a ∈ π ( τ ) Let s ′ := f ( Last ( λ ) , a ) and token o := Ω( s ′ , a ) 7. 8. Update λ := � λ, s ′ � and τ := � τ, a, o � where b π τ is approximation of b τ computed by agent 7 of 18

Main Goal Formulate formal properties of components and their relation in order to guarantee completeness over solvable tasks Definition (Completeness) Online planner π is complete on task P if for each initial state s ∈ S init , the protocol L ( P, s ) terminates successfully on π We would like to reason about completeness; e.g. – Is planner π complete on P ? – Why isn’t π complete on P ? – How do we make π complete on P ? – . . . 8 of 18

Solvable Tasks Two definitions: Definition (Solvable Tasks) Task P is solvable (or goal connected) if there is a plan for each state s in P Definition (Strongly Solvable Tasks) Task P is strongly solvable (or goal connected in belief space) if for each initial state s and execution τ compatible with s , there is an extension τ ′ = � τ, τ ′′ � compatible with s such that b τ ′ is a goal belief Definitions are incomparable: there are tasks that are solvable but not strongly solvable, and vice versa 9 of 18

Reasons for Incompleteness • Belief tracking is too weak; i.e. approximation b π τ of b τ is too coarse • Action selection is bad or uncommitted • Combination of belief tracking and action selection isn’t good enough 10 of 18

Uncommitted Planner Fails in Simple Example – Agent is thirsty and wants a drink; it can move and gulp a drink – There are two drinks – No need for belief tracking as state is always known – Agent may loop even if selected action always moves “toward goal” (e.g. Left, Right, Left, Right, . . . ) 11 of 18

Properties for Belief Tracking – Exact: beliefs computed by π are exact; i.e., b π τ = b τ for each τ – Monotone: for every execution τ and prefix τ ′ of τ , | b π τ | ≤ | b π τ ′ | (i.e. non-increasing “amount of uncertainty” along executions) – Asserting: there is asserting inference for pair ( τ, τ ′ ) (where τ ′ is proper prefix of τ ) if | b π τ | < | b π τ ′ | (uncertainty decreases) Exact inference = ⇒ monotone inference (because determinism) 12 of 18

Properties for Action Selection For handling commitment, we do a slight reformulation and consider planners that return set of action sequences (plans) on input τ First action on each sequence σ must be applicable Properties: – Committed: by caching last computed sequences, the planner sticks to selected plan “as much as possible” – Weak: for each approximation b π : • each sequence σ returned by π is a plan for some state s ∈ b π τ • if b π τ is non-empty, π returns at least one sequence σ – Covering: the first action in sequences returned by π cover all applicable actions at exact belief b τ 13 of 18

Relation between Components Do we need exact but intractable belief tracking for completeness? 14 of 18

Relation between Components Do we need exact but intractable belief tracking for completeness? Fortunately not! A sufficient condition: – Planner π is weak: given execution τ , π returns at least one plan σ for some state s ∈ b π τ (state s may not be in b τ ) – Plan σ is applied while possible (i.e. committed planner ) – Belief tracking is monotone – Planner is effective : if executed prefix of σ doesn’t reach goal, planner π has asserting inference for ( τ [ σ ] , τ ) 14 of 18

Main Formal Result Theorem Let P be a solvable task and π be a committed planner. If π is a weak and effective, and has monotone inference , then π is complete for P . 15 of 18

Main Formal Result Theorem Let P be a solvable task and π be a committed planner. If π is a weak and effective, and has monotone inference , then π is complete for P . Sketch: For each protocol L = ( P, s ) , planner in worst case generates a sequence of beliefs (associated to ongoing execution): n = { s ∗ } b π 0 ⊇ b π 1 ⊇ b π 2 ⊇ · · · ⊇ b π that ends at singleton . Once there, since π is weak and committed, π generates and applies a plan for the current hidden state s ∗ QED 15 of 18

Another Result Under randomized protocols where action selection is stochastic instead of just non-deterministic: Theorem Let P be a strongly solvable task with observable goals and π be a planner. If π is a covering planner , then π is complete under randomized protocols 16 of 18

Another Result Under randomized protocols where action selection is stochastic instead of just non-deterministic: Theorem Let P be a strongly solvable task with observable goals and π be a planner. If π is a covering planner , then π is complete under randomized protocols Sketch: Since task is strongly solvable, there is always a plan from current belief. Under assumptions, this plan can be “followed” with non-zero probability. Upon reaching a goal state, the agent will know it since goals are observable QED Remark: there is no need for π to be weak or committed, or to have exact inference; it has to be covering though! 16 of 18

Experimental Results See paper for details and experimental results on benchmarks 17 of 18

Wrap Up – Framework for understanding and reasoning about online planning – Preliminary theoretical results – Played with planner LW1 – Future work: • Study necessary conditions for completeness • “Effectiveness” cannot be tested in an efficient manner • Novel action selection mechanisms • Novel tractable belief tracking methods Lot of ground breaking work to be done in the area 18 of 18

Completeness of Online Planners for Partially Observable - PowerPoint PPT Presentation

Completeness of Online Planners for Partially Observable Deterministic Tasks Blai Bonet Gabriel Formica Melecio Ponte Universidad Sim on Bol var, Venezuela ICAPS. Pittsburgh, USA. June 2017. Motivation Many online planners for

1 Stochastic, Partially Observable Markov Decision Process (MDP) Partially Observable MDP S

Reinforcement Learning Environments Fully-observable vs partially-observable Single agent

Sequence Estimation and Schedulability Aim Analysis for Partially Observable Petri Nets

Partially observable Markov decision processes Matthijs Spaan Institute for Systems and Robotics

A Multi-Agent Prediction Market based on Raj Dasgupta Partially Observable Stochastic Game

Leon Henkin on Completeness Henkins renowned proofs of PhDs in Logic 2017 completeness The

Strong conceptual completeness completeness Applications of for Boolean coherent toposes strong

Introduction to Partially Observable Markov Decision Processes CS 886 Sequential Decision Making

Multiagent models for partially observable environments Matthijs Spaan Institute for Systems and

Partially Observable Markov Decision Processes 3/3/17 (Dis)Advantages of Online MCTS + Just

Dichotomic Observables; GP(1992) vs Psudospin observable(2002). Gisin-Peres observable for Bell

Completeness of Resolution Theorem (Completeness) If a set of clauses S has no models then S *

NP-Completeness : Concepts Why Studying NP-Completeness ? Pursuing your

NP-completeness Evgenij Thorstensen V18 Evgenij Thorstensen NP-completeness V18 1 / 24 Recap

Strong conceptual completeness for Boolean Applications of strong conceptual coherent

professional planners Group may the 4 th Upstate Professional Planners Meeting May 4, 2016 C

INF3170 / INF4171 Soundness and completeness of sequent calculus Andreas Nakkerud September 14,

Completeness Christophe Ritzenthaler Institut de Mathmatiques de Luminy, CNRS Montral 04-10

Course Script IN 5110: Specification and Verification of Parallel Sys- tems IN5110, autumn 2019

Dataflow analysis Theory and Applications cs6463 1 Control-flow graph Graphical

Polynomial Completeness of Malcev Aichinger algebras Polynomials Clones Description of

An Almost Constructive Proof of Classical First-Order Completeness First Bachelor Seminar Talk

Interpretability and the arithmetized completeness theorem (Taishi Kurahashi)

Hierarchic Superposition: Completeness without Compactness Peter Baumgartner NICTA and ANU,