CMU-Q 15-381
Lecture 5: Classical Planning Factored Representations STRIPS
Teacher: Gianni A. Di Caro
CMU-Q 15-381 Lecture 5: Classical Planning Factored - - PowerPoint PPT Presentation
CMU-Q 15-381 Lecture 5: Classical Planning Factored Representations STRIPS Teacher: Gianni A. Di Caro A UTOMATED P LANNING : F ACTORED STATE REPRESENTATIONS 2 P LANNING , FOR ( MORE ) COMPLEX WORLDS Searching for plan of action to achieve
Teacher: Gianni A. Di Caro
2
3
§ Searching for plan of action to achieve one’s goal is a critical part of AI (in both open and closed loop) § In fact, planning is glorified search § But search needs more powerful state representations than those used so far, in order to be effective § So far: states are indivisible, they have no internal structure § → Planning exploiting structured representation of states § … And let’s keep living in deterministic, known, fully
§ This is what is commonly termed “Classical Planning”
4
B C
(a) Atomic (b) Factored (b) Structured
B C
So far Now
(Structured)
5
The vacuum-world example
6
§ The goal is to reach the banana, but achieving the goal requires achieving, in the correct sequence, a number of sub-goals that overall make a Plan
§ STRIPS = Stanford Research Institute Problem Solver (1971)
§ (Logic-based) Language expressive enough to describe a wide variety of problems, but restrictive enough to allow efficient algorithms to operate over it § PDDL = Planning Domain Definition Language (1998 - ), the standard language for defining planning domains and problems, it includes original STRIPS + more advanced
§ A state is a conjunction of propositions, e.g., at(Truck1,Shadyside) ∧ at(Truck2,Oakland)
§ States are transformed via operators (actions) that have the form Preconditions ⇒ Postconditions (effects)
7
STRIPS / PDDL language(s) to represent / solve planning problems based on propositional (factored) state representation
8
A C B
A C B
Predicates that can be used to describe the world: §
§
§ clear(X) § holding(X) § handEmpty(X) § Negation of all the above Objects of the world: § Block A § Block B § Block C § Table § Hand Actions: § …
§ Fact(ored) representation of states! § State 1 = { holding(A), clear(B), on(B,C), onTable(C)}
9
Closed World Assumption (CWA): Facts not listed in a state are assumed to be false. Under CWA the assumption the agent has full observability and only positive facts need to be stated World states are represented as sets of facts: conjunction of propositions (conditions)
10
§ State:
At(𝑦, Rome) ∧ At(𝑧, Tokyo)
At(Alex, NY) ∧ Father(Alex, Tom)
The world is represented through a set of features /
cities) and each proposition states a fact that attributes “values” to features
Objects State propositions CWA
§ Also Goals (being world states) are represented as sets of facts § Example: state { on(A,B) } can be set as a goal
11
§ State 1 is a not goal state for the goal { on(A,B) } § State 2 is a goal state for the goal { on(A,B) }
A goal state is any state that includes all the goal facts
12
§ Goals: A conjunction of facts, At(P1, JFK) ∧ At(P2, SFO), that may also contain variables, such as: At(p, JFK) ∧ Plane(p) → to have any plane at JFK § The aim is to reach a state that entails a goal: OnTable(A) ∧ OnTable(B) ∧ OnTable(D) ∧ On(C, D) ∧ Clear(A) ∧ Clear(B) ∧ Clear(C) satisfies the goal to stack C on D
We can focus on getting individual sub-goals. Not possible in atomic representations!
A goal g is a conjunction of sub-goals! g = g1 ∧ g2 ∧… ∧ gn
Goals are reached through sequences of actions (the plan)
§ Pre-cond is a conjunction of positive and negative conditions that must be satisfied to apply the operation § Post-cond is a conjunction of positive and negative conditions that become true when the operation is applied
13
A STRIPS action definition specifies: ü A set PRE of preconditions facts ü A set ADD of add effect facts (to state facts) ü A set DEL of delete effect facts (from state facts) PutDown(A,B): as PRE, robot hand is holding A + B’s top is clear à the action puts A down on top of B
Actions: Operators with Preconditions + Effects (Postconditions)
In STRIPS only positive preconditions are used
14
15
An action schema to fly a plane from one location to another: Action(Fly(p, from, to), PRECOND: At(p, from) ∧ Plane(p) ∧ Airport(from) ∧ Airport(to) EFFECT: ¬At(p, from) ∧ At(p, to)) § An action is applicable in state 𝑡 if 𝑡 entails the preconditions § The facts negated by the effect of Action are removed from 𝑡, while the positive facts resulting from Action are added to 𝑡 Action schema: a number of different actions that can be derived by universal quantification of the variables
16
§ Action schema: Action(Name(v1, v2,…., vn), PRECONDITIONS: P1(v) ∧ P2 (v) ∧ … ∧ Pm(v) ADD-LIST: {F1(v), F2(v), …., Fq (v)} DELETE-LIST: {Si(p), Sj(p) ∧ … ∧ Sk(p)} § RESULT(𝑡, 𝑏) = (𝑡 – DELETE(𝑏)) ∪ ADD(𝑏)
17
§ Planning domain:
§ Planning problem (instance):
§ Solution of the planning problem:A sequence of actions that, starting from the initial state, end in a state 𝑡 that entails the goal
18
An action-state model 𝑄 = < 𝑇, 𝑡+,-.,,𝑇/0-1,𝐵, 𝑈, 𝑑, 𝐻 > § S: the set of states (can be atomic, factorial) § sstart ∈ S: the initial state, in S § Sgoal ⊆ S : the subset of goal states, in S § A : the set of possible actions, can be defined as A(s) § T: S × A → S : the Successor / state Transition function § c: S × A → ℝ : the step cost for taking action a in state s § G: S → {0,1} : criterion to check whether or not at a goal / terminal state
Solution plan: Path [sstart , s1, s2, … sgoal] associated to the feasible sequence of actions, [a1, a2, … an] such that cost(path) is minimized
19
Air cargo transportation problem (from R&N) § Predicates: At, Cargo, Plane, Airport, In § Objects: C1 (cargo container), C2, P1 (plane), P2, SFO, JFK § Actions: Load, Unload, Fly
20
A C B
Start
A C B
Goal
§ MoveToTable(X, Y) Pre: clear(X) ∧ on(X,Y) ⇒ on(X,Table) ∧ clear(X) ∧ ¬on(X,Y) § Move (X, From, To): clear(X) ∧ on(X, From) ∧ clear(To) ∧ block(X) ∧ block(To) ⇒ on(X,To) ∧ ¬clear(To) ∧ ¬on(X,From) § MoveFromTable (X, Y) A C B A C B
MoveToTable (C, A) MoveFromTable (B, C) MoveFromTable (A, B)
Plan:
§ PLANSAT is the problem of determining whether a given planning problem is satisfiable § In general PLANSAT is PSPACE-complete (~require an amount of space which is exponential in the size of the input) § Bounded PlanSAT = decide if plan of given length exists § (Bounded) PlanSAT decidable but PSPACE-hard § Disallow neg effects: (Bounded) PlanSAT NP-hard § Disallow neg preconditions: PlanSAT in P but finding optimal (shortest) plan still NP-hard
21
22
§ (Forward) Search from initial state to goal § Can use search techniques, including heuristic search
23
At(P1,A) At(P2,A) At(P1,B) At(P2,A) At(P1,A) At(P2,B) Fly(P1,A,B) Fly(P2,A,B)
§ In absence of function symbols, the state space of a planning problem is finite → Any graph search algorithm that is complete will be a complete planning algorithm § Irrelevant action problem: All applicable actions are considered at each state! § The resulting branching factor b is typically large and the state space is exponential in b → Needs for good heuristics!
24
At home → get milk, bananas and a cordless drill → return home
§ Air Cargo Example § Initial state: 10 airports, each airport has 5 planes and 20 pieces of cargo § Goal: transport all the cargos at airport A to airport B § Solution: load the 20 pieces of cargo at A into one of the planes at A and fly it to B § Avg Branching factor b: each of the 50 planes can fly to 9
unloaded (if it is loaded), or loaded into any plane at its airport (if it is unloaded) à ~ 2000 possible actions per state § Number of states to explore: O(bd) ∼ 200041
25
26
§ Define a Relaxed problem: § (Potentially) Easy to solve § The solution of the relaxed problem gives an admissible heuristic for A*
Any ideas about how to perform a general relaxation?
§ Relaxation: Remove all preconditions from actions § → Every action will always be applicable, and any condition (sub-goal) can be potentially achieved in one step (if there is an action that sets the sub-goal literal to true, otherwise the problem is impossible) § ℎ(𝑦) = Cost-to-go(al) of the relaxed problem from state 𝑦 § Equivalent to adding edges to the state graph: including forbidden actions
§ Solving the relaxed problem should be easy enough, we can even think to take a shortcut, by setting the solution to be the same as the number of unsatisfied sub-goals from current state 𝑦 …
§ ℎ(𝑦) =~ number of unsatisfied goals from current state 𝑦
maybe not? (we need admissibility!) § Impossible to derive such a heuristic with atomic states! The successor function is a black box, here we exploit the structure
§ The heuristic is domain-independent! § ☛ With atomic states, in general only domain-specific heuristics are possible
27
a. Some operations achieve multiple sub-goals (have multiple post-conditions) b. Some operations undo the effects of others
28
1. Just a 2. Just b 3. Both a and b
a. Some operations achieve multiple sub-goals (have multiple post-conditions) § ℎ(𝑦) doesn’t correspond to the solution of the relaxed problem + it violates admissibility b. Some operations undo the effects of others § ℎ(𝑦) doesn’t correspond to the solution of the relaxed problem
29
§ To avoid actions that can cancel each
actions, except those that are facts gi, i=1,…,n, in the goal g (i.e., sub-goals) → Exploit factored structure! § ℎ 𝑦 = from 𝑦, the min number of actions such that the union of their effects contains all n sub-goals gi → Admissible § Computing ℎ(𝑦) = solving a SET COVER problem: NP-hard! § Greedy log n approximation:
30
4 6 5 3 1 2
A2 A3 A4 A5 A6 A7 G1 x X X x G2 x X x G3 x x X G4 x X
§ Ignore specific preconditions to derive domain-specific heuristics § Sliding block puzzle, move(t,s1,s2) action: § On(𝑢, 𝑡1)∧Blank(𝑡2)∧Adjacent(𝑡1, 𝑡2) ⇒ On(𝑢, 𝑡2)∧Blank(𝑡1)∧¬On(𝑢, 𝑡1)∧¬Blank(𝑡2) § Consider two options for removing specific preconditions from move()
a.Removing Blank(𝑡B)∧Adjacent(𝑡C,𝑡B) b.Removing Blank(𝑡B)
§ Poll: Match option to heuristic:
1.a↔ ∑Manhattan, b↔#misplaced tiles 2.a↔#misplaced tiles, b↔ ∑Manhattan 3.b↔#misplaced tiles, a is inadmissible 4.b↔ ∑Manhattan, a is inadmissible
31
5 4 6 1 8 7 3 2 5 4 6 1 8 7 3 2
Example state Goal state
§ Searching from a goal state to the initial state (regression)
§ We only need to consider actions that are relevant to the goal (or current state) → Relevant-state search § This can makes a strong reduction in branching factor, such that it could be more efficient than forward (progression) search § “Imagine trying to figure out how to get to some small place with few traffic connections from somewhere with a lot of traffic connections”
32
At(P1, A)
Fly(P1, A, B) Fly(P2, A, B) Fly(P1, A, B) Fly(P2, A, B)
At(P2, A) At(P1, B) At(P2, A) At(P1, A) At(P2, B) At(P1, B) At(P2, B) At(P1, B) At(P2, A) At(P1, A) At(P2, B)
§ Regression from a (goal) state g over the action a, gives state g’
§ DEL(a) doesn’t appear: we don’t know whether the facts negated by DEL(a) were true or not before a, therefore nothing can be said about them § Variables can be included, such that a set of states is defined:
Cargo(C2) ∧ Plane(p) ∧ Airport(SFO)
33
§ How to select actions? § Relevant actions only
conditions
is relevant, Fly(p, JFK, SFO) is not relevant § Consistent actions only
not consistent
34
Ø How to define good heuristics?
§ Representations using factored states allows to reason about the (factored) structure of the states (in terms of sets of variables) and exploit it § A goal states is a conjunction of sub-goals that can be individually satisfied § STRIPS / PDDL language to express problem domains and problem instances in a way which expressive enough while allowing for efficient solutions § Forward search can in principle applied but the state space and, more importantly, the branching factor b is expected to be very large § Uninformed search can’t really be used § Informed search, A*, is an option but it needs very good heuristics! § It may not be obvious to define general, problem-independent, admissible heuristics (in polynomial time) § Backward search can potentially be “easier” since it can partially overcome the problem of irrelevant actions, however, defining heuristics for backward search is even more difficult than forward search § We need a better tool, possibly a way for generating tight admissible heuristics for A* in automatic
35
36
37
38
Condition
39
Condition Precondition
Level O0 Level S0
40
Condition No-Op (Persistent action)
Level S1
Precondition
§ Add condition to level 𝑇𝑗 if it is the postcondition of an operation (it is in ADD or DELETE lists) in level 𝑃IJC § Keep a previous condition of no action negates it (persistence, no-op action)
41
O1 No-Op No-Op No-Op No-Op No-Op No-Op No-Op O1 O2
#Conditions
(always carried forward by no-ops)
𝑞 ¬𝑟 ¬𝑠 𝑞 ¬𝑟 ¬𝑠 ¬𝑞 𝑞 ¬𝑟 ¬𝑠 ¬𝑞
42
Condition Operation No-Op (Persistent action) Precondition Postcondition
Level O2 Level S2
43
§ → The level j at which a condition first appears is a (good) estimate
§ → Can optimistically estimate how many steps it takes to reach a goal g (or sub-goal gi) from the initial state: admissible heuristic! Id Idea: 𝑇𝑗 contains all conditions that could hold at stage 𝑗 based on past actions; 𝑃𝑗 contains all operations that could have their preconditions satisfied at time 𝑗 No ordering among the operations is assumed at each stage, they could be executed in parallel
44
O1 No-Op No-Op No-Op No-Op No-Op No-Op No-Op O1 O2
#Operations
(as a result of conditions monotonic increase, that keep previous preconditions hold and set new preconditions true)
𝑞 ¬𝑟 ¬𝑠 𝑞 ¬𝑟 ¬𝑠 ¬𝑞 𝑞 ¬𝑟 ¬𝑠 ¬𝑞
§ As it is the graph would be too optimistic! § The graph data structure also records conflicts between actions or conditions: two operations or conditions are mutually exclusive (mutex) if no valid plan can contain both at the same time § A bit more formally:
postconditions are mutex (inconsistent effects, competing needs, interference)
(inconsistent support)
45
46
47
applicable
48
§ Inconsistent postconditions (two ops):
the other; Eat(Cake) and no-op Have(Cake) § Interference (two ops): a postcondition of one operation negates a precondition of other; Eat(Cake) and no-op Have(Cake) (issue in parallel execution, the order should not matter but here it would)
49
Inconsistent Postconditions
B ¬ B
Interference
B ¬ B
50
Inconsistent postconditions Negation of each other Interference Interference Inconsistent post
§ Competing needs (two ops): a precondition of one operation is mutex with a precondition of the other because they are the negate of each other, like for Bake(Cake) and Eat(Cake), or because they have inconsistent support § Inconsistent support (two conditions): each possible pair of operations that achieve the two conditions is mutex. Have(Cake) and Eaten(Cake), are mutex in S1 but not in S2 because they can be achieved by Bake(Cake) and Eaten(Cake)
51
Inconsistent Support Competing Needs
B ¬ B B C
52
Inconsistent support Competing needs
53
§ STRIPS Language for Automated Planning § Factored state representation § Actions and action schema § Planning problem § Complexity of planning § Planning as search problem § Forward and Backward search § Need for domain-independent heuristics § Difficulty to define admissible heuristics for A* § Planning graph data structure: construction and stored information