Heuristics for Planning under Partial Observability with Sensing - PowerPoint PPT Presentation

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Heuristics for Planning under Partial Observability with Sensing Actions Shlomi Maliah Guy Shani Ronen Brafman Erez Karpas ICAPS 2013 Workshop on Heuristic Search for Domain-Independent Planning

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Outline Motivation 1 Landmarks for PPOS 2 3 The Heuristic Contingent Planner Empirical Evaluation 4

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Setting PPOS Planning under Partial Observability with Sensing Actions Partial observability Uncertainty about the initial state Actions - Deterministic - Observation effects - Conditional effects ⇒ Effects of actions during runtime are uncertain

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Example PPOS Task 1: Wumpus Each Wumpus is in one of two possible locations Cells adjacent to a wumpus have stench Goal is to reach top right corner

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Example PPOS Task 2: Mars Rover Rocks can be good/bad Activating sensor tells whether there are good rocks in range of the antenna Goal is to sample a good rock

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Formal Setting PPOS task π = � P , A , ϕ I , G � P is a set of propositions A is a set of actions ϕ I is a formula that describes the set of possible initial states G ⊆ P is the goal Each action a ∈ A consists of: pre ( a ) ⊆ P is a set of literals denoting the action’s preconditions. effects ( a ) is a set of pairs ( c , e ) denoting conditional effects, where c is a conjunction of literals and e is a single literal obs ( a ) ⊆ P are the propositions whose value is observed when a is executed Assume actions either have observations or effects, but not both

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation PPOS Solution Offline: Prepare for every possible outcome in advance Contingent plan / policy — possibly very big Online Choose the next action to execute online Between every two sensing actions, there is a sequence of non-sensing actions

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation PPOS Solution Offline: Prepare for every possible outcome in advance Contingent plan / policy — possibly very big Online Choose the next action to execute online Between every two sensing actions, there is a sequence of non-sensing actions Key Insight In simple domains, the sequence of non-sensing actions between every two sensing actions, can be obtained by solving a classical planning problem over the original state space of the problem

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Heuristic Contingent Planner — High Level Control If we can achieve the goal without sensing — do so - Classical planning, assuming all unknown propositions are false Otherwise, choose a reachable sensing action a Plan to execute a , and execute a Repeat

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Heuristic Contingent Planner — High Level Control If we can achieve the goal without sensing — do so - Classical planning, assuming all unknown propositions are false Otherwise, choose a reachable sensing action a Plan to execute a , and execute a Repeat Main Contribution A novel landmark-based heuristic for choosing the next sensing action in PPOS

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Outline Motivation 1 Landmarks for PPOS 2 3 The Heuristic Contingent Planner Empirical Evaluation 4

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Landmarks A landmark is a logical formula over the facts, which must be satisfied by some state along every solution Landmark detection is hard even in classical planning Challenge for PPOS: must handle uncertainty and sensing Our solution: Augment the problem with artificial reasoning actions Join reasoning and observation actions Relax the problem (as for classical planning)

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Reasoning Actions: Example Suppose we know from ϕ I that good-rock 1 ∨ good-rock 2 ∨ good-rock 3 ∨ good-rock 4 Suppose we also know ¬ good-rock 1 ¬ good-rock 2 ¬ good-rock 3 We create a reasoning action that can deduce that good-rock 4 holds

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Reasoning Actions Proposition p ∈ P is constant if its value never changes (Geffner and Palacios) Easy to check that p does not appear in effects of any action Create reasoning actions from clauses of ϕ I containing only constant propositions For disjunctive clause c = � i = 1 .. k l i , create actions which “reason” that if k − 1 of the literals are false, then the remaining one is true A c = { a l i } k i = 1 , with pre ( a l i ) = � j = 1 .. k , j � = i ¬ l j , and effects ( a l i ) = l i For oneof clause c = oneof i = 1 .. k l i , create actions which “reason” that if one of the literals is true, then all the others are false A c = { a l i } k i = 1 , with pre ( a l i ) = l i , and effects ( a l i ) = � j = 1 .. k , j � = i ¬ l j Works only when initial state uncertainty is expressed using such clauses

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Joining Immediate Reasoning and Observations: Example Action activate-sensor-at-2-3 Pre: at-2-3 CE: good- rock 1 → good-rocks-in-range CE: good- rock 2 → good-rocks-in-range Observation action observe-good-rocks-in-range observes fact good-rocks-in-range The only actions which affect good-rocks-in-range are activate-sensor-at-x-y, which are all mutex Create two joined actions, for i = 1 and j = 2 and for i = 2 and j = 1, where: Pre: at-2-3 ∧ ¬ good- rock j Obs: good-rocks-in-range Eff: good- rock i

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Joining Immediate Reasoning and Observations Can split propositions into 3 sets: Known (e.g., location of rover/android) Unknown, but observable (e.g., stench/good-rocks-in-range) Unknown and unobservable (e.g., location of wumpus/”goodness” of specific rock)

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Joining Immediate Reasoning and Observations Let a be an action with conditional effects { ( c i , e ) } k i = 1 where c i is unknown and unobservable, and e is observable, and There is no other action that affects the value of e which is not mutually exclusive with a Let a obs be an action that observes e We create k new actions a i ◦ a obs where: pre ( a i ◦ a obs ) = pre ( a ) ∧ pre ( a obs ) ∧ � j � = i ¬ c j obs ( a i ◦ a obs ) = { e } effects ( a i ◦ a obs ) = effects u ( a ) ∧ c i , where effects u ( a ) are the unconditional effects of a . Although this is ad-hoc and not complete, this works in many benchmarks

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Action Relaxation Ignore delete effects 1 Given action a ∈ A with k conditional effects { ( c i , e i ) : i = 1 .. k } , 2 generate k actions where a ( c i , e i ) is defined by pre ( a ( c i , e i ) ) = pre ( a ) ∧ c i effects ( a ( c i , e i ) ) = effects ( a ) ∧ e i

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Landmark Detection We use a landmark detection algorithm for a classical task The classical task is generated by: Adding reasoning actions Joining reasoning and observation actions Relaxing the actions in the original task One modification to classical landmark detection: “optimistic” sensing — we assume a sensing action will sense the required value

Heuristics for Planning under Partial Observability with Sensing - PowerPoint PPT Presentation

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Heuristics for Planning under Partial Observability with Sensing Actions Shlomi Maliah Guy Shani Ronen Brafman Erez Karpas ICAPS 2013 Workshop on Heuristic

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Heuristics and biases Tina Nane 2 Heuristics and biases Lotto Icon by Dapete is

Outline CP for VRP DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Construction Heuristics

Observability of Vortex Flows Arthur J. Krener ajkrener@nps.edu Research supported in part by

Testing Observability Amy Phillips Testing Observability | Amy Phillips | @amyjph Amy

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability By Keith

Decision Aid Methodologies In Transportation Lecture 7: Heuristics Heuristics Shadi SHARIF

36.1 Relaxed Planning Graphs 34. Planning Formalisms 35.36. Planning Heuristics: Delete

Observability The Health of Every Request Nathan LeClaire nathan@honeycomb.io

Observability & Controllability B. Wayne Bequette State Space Model Infer State i.c.

Draft EE 8235: Lecture 16 1 Lecture 16: Controllability and observability Controllability

On Observability Richard Hartmann, RichiH@ { freenode,OFTC,IRCnet } , richih@ {

Stability of uniformly bounded switched systems and observability Philippe JOUAN Universit e

Matrix Robustness, with an Application to Power System Observability Matthias Brosemann Jochen

Plan of the Lecture Review: observability; Luenberger observer and state estimation error.

Chapter 3 Solving Problems by Searching 3.5 3.6 Informed (heuristic) search strategies CS4811

Informed Search (Ch. 3.5-3.6) Announcements HW1 graded Heuristics However, for A* to be optimal

He Heur uristic c Sea earc rch h Com omputer Science c cpsc sc322, Lecture 7 7 (Te

Acknowledgements Many of the slides used in todays lecture are modifications of Heuristic

DPLL( T ):Fast Decision Procedures Harald Ganzinger George Hagen Robert Nieuwenhuis Cesare

Learn to Floorplan through Acquisition of Effective Local Search Heuristics Zhuolun He 1 , Yuzhe

Finding and Exploiting LTL Trajectory Constraints in Heuristic Search Salom e Simon Gabriele

Informed Search Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of

Heuristics for Planning under Partial Observability with Sensing - PowerPoint PPT Presentation

Motivation Landmarks for PPOS The Heuristic Contingent Planner Empirical Evaluation Heuristics for Planning under Partial Observability with Sensing Actions Shlomi Maliah Guy Shani Ronen Brafman Erez Karpas ICAPS 2013 Workshop on Heuristic

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Heuristics and biases Tina Nane 2 Heuristics and biases Lotto Icon by Dapete is

Outline CP for VRP DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Construction Heuristics

Observability of Vortex Flows Arthur J. Krener ajkrener@nps.edu Research supported in part by

Testing Observability Amy Phillips Testing Observability | Amy Phillips | @amyjph Amy

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability By Keith

Decision Aid Methodologies In Transportation Lecture 7: Heuristics Heuristics Shadi SHARIF

36.1 Relaxed Planning Graphs 34. Planning Formalisms 35.36. Planning Heuristics: Delete

Observability The Health of Every Request Nathan LeClaire nathan@honeycomb.io

Observability &amp; Controllability B. Wayne Bequette State Space Model Infer State i.c.

Draft EE 8235: Lecture 16 1 Lecture 16: Controllability and observability Controllability

On Observability Richard Hartmann, RichiH@ { freenode,OFTC,IRCnet } , richih@ {

Stability of uniformly bounded switched systems and observability Philippe JOUAN Universit e

Matrix Robustness, with an Application to Power System Observability Matthias Brosemann Jochen

Plan of the Lecture Review: observability; Luenberger observer and state estimation error.

Chapter 3 Solving Problems by Searching 3.5 3.6 Informed (heuristic) search strategies CS4811

Informed Search (Ch. 3.5-3.6) Announcements HW1 graded Heuristics However, for A* to be optimal

He Heur uristic c Sea earc rch h Com omputer Science c cpsc sc322, Lecture 7 7 (Te

Acknowledgements Many of the slides used in todays lecture are modifications of Heuristic

DPLL( T ):Fast Decision Procedures Harald Ganzinger George Hagen Robert Nieuwenhuis Cesare

Learn to Floorplan through Acquisition of Effective Local Search Heuristics Zhuolun He 1 , Yuzhe

Finding and Exploiting LTL Trajectory Constraints in Heuristic Search Salom e Simon Gabriele

Informed Search Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of

Observability & Controllability B. Wayne Bequette State Space Model Infer State i.c.