B5.1 Introduction Heuristics Constraints Planning Explicit MDPs - - PowerPoint PPT Presentation

b5 1 introduction
SMART_READER_LITE
LIVE PREVIEW

B5.1 Introduction Heuristics Constraints Planning Explicit MDPs - - PowerPoint PPT Presentation

Planning and Optimization B5. SAT Planning: Core Idea and Sequential Encoding B5.1 Introduction Planning and Optimization B5. SAT Planning: Core Idea and Sequential Encoding B5.2 Formula Overview B5.3 Initial State, Goal, Operator


slide-1
SLIDE 1

Planning and Optimization

  • B5. SAT Planning: Core Idea and Sequential Encoding

Malte Helmert and Gabriele R¨

  • ger

Universit¨ at Basel

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 1 / 28

Planning and Optimization

— B5. SAT Planning: Core Idea and Sequential Encoding

B5.1 Introduction B5.2 Formula Overview B5.3 Initial State, Goal, Operator Selection B5.4 Transitions B5.5 Summary

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 2 / 28

Content of this Course

Planning Classical Foundations Logic Heuristics Constraints Probabilistic Explicit MDPs Factored MDPs

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 3 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Introduction

B5.1 Introduction

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 4 / 28

slide-2
SLIDE 2
  • B5. SAT Planning: Core Idea and Sequential Encoding

Introduction

SAT Solvers

◮ SAT solvers (algorithms that find satisfying assignments to CNF formulas) are one of the major success stories in solving hard combinatorial problems. ◮ Can we leverage them for classical planning? SAT planning (a.k.a. planning as satisfiability) background on SAT Solvers: Foundations of Artificial Intelligence Course, Ch. 31–32

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 5 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Introduction

Complexity Mismatch

◮ The SAT problem is NP-complete, while PlanEx is PSPACE-complete.

  • ne-shot polynomial reduction from PlanEx to SAT

not possible (unless NP = PSPACE)

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 6 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Introduction

Solution: Iterative Deepening

◮ We can generate a propositional formula that tests if task Π has a plan with horizon (length bound) T in time O(Πk · T) ( pseudo-polynomial reduction). ◮ Use as building block of algorithm that probes increasing horizons (a bit like IDA∗). ◮ Can be efficient if there exist plans that are not excessively long.

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 7 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Introduction

SAT Planning: Main Loop

basic SAT Planning algorithm: SAT Planning def satplan(Π): for T ∈ {0, 1, 2, . . . }: ϕ := build sat formula(Π, T) I = sat solver(ϕ) ⊲ returns a model or none if I is not none: return extract plan(Π, T, I) Termination criterion for unsolvable tasks?

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 8 / 28

slide-3
SLIDE 3
  • B5. SAT Planning: Core Idea and Sequential Encoding

Formula Overview

B5.2 Formula Overview

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 9 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Formula Overview

SAT Formula: CNF?

◮ SAT solvers require conjunctive normal form (CNF), i.e., formulas expressed as collection of clauses. ◮ We will make sure that our SAT formulas are in CNF when our input is a STRIPS task. ◮ We do allow fully general propositional tasks, but then the formula may need additional conversion to CNF.

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 10 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Formula Overview

SAT Formula: Variables

◮ given propositional planning task Π = V , I, O, γ ◮ given horizon T ∈ N0 Variables of the SAT Formula ◮ propositional variables vi for all v ∈ V , 0 ≤ i ≤ T encode state after i steps of the plan ◮ propositional variables oi for all o ∈ O, 1 ≤ i ≤ T encode operator(s) applied in i-th step of the plan

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 11 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Formula Overview

Formulas with Time Steps

Definition (Time-Stamped Formulas) Let ϕ be a propositional logic formula over the variables V . Let 0 ≤ i ≤ T. We write ϕi for the formula obtained from ϕ by replacing each v ∈ V with vi. Example: ((a ∧ b) ∨ ¬c)3 = (a3 ∧ b3) ∨ ¬c3

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 12 / 28

slide-4
SLIDE 4
  • B5. SAT Planning: Core Idea and Sequential Encoding

Formula Overview

SAT Formula: Motivation

We want to express a formula whose models are exactly the plans/traces with T steps. For this, the formula must express four things: ◮ The variables v0 (v ∈ V ) define the initial state. ◮ The variables vT (v ∈ V ) define a goal state. ◮ We select exactly one operator variable oi (o ∈ O) for each time step 1 ≤ i ≤ T. ◮ If we select oi, then variables vi−1 and vi (v ∈ V ) describe a state transition from the (i − 1)-th state of the plan to the i-th state of the plan (that uses operator o). The final formula is the conjunction of all these parts.

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 13 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Initial State, Goal, Operator Selection

B5.3 Initial State, Goal, Operator Selection

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 14 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Initial State, Goal, Operator Selection

SAT Formula: Initial State

SAT Formula: Initial State initial state clauses: ◮ v0 for all v ∈ V with I(v) = T ◮ ¬v0 for all v ∈ V with I(v) = F

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 15 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Initial State, Goal, Operator Selection

SAT Formula: Goal

SAT Formula: Goal goal clauses: ◮ γT For STRIPS, this is a conjunction of unit clauses. For general goals, this may not be in clause form.

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 16 / 28

slide-5
SLIDE 5
  • B5. SAT Planning: Core Idea and Sequential Encoding

Initial State, Goal, Operator Selection

SAT Formula: Operator Selection

Let O = {o1, . . . , on}. SAT Formula: Operator Selection

  • perator selection clauses:

  • i

1 ∨ · · · ∨ oi n

for all 1 ≤ i ≤ T

  • perator exclusion clauses:

◮ ¬oi

j ∨ ¬oi k

for all 1 ≤ i ≤ T, 1 ≤ j < k ≤ n

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 17 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Transitions

B5.4 Transitions

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 18 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Transitions

SAT Formula: Transitions

We now get to the interesting/challenging bit: encoding the transitions. Key observations: if we apply operator o at time i, ◮ its precondition must be satisfied at time i − 1:

  • i → pre(o)i−1

◮ variable v is true at time i iff its regression is true at i − 1:

  • i → (vi ↔ regr(v, eff(o))i−1)

Question: Why regr(v, eff(o)), not regr(v, o)?

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 19 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Transitions

Simplifications and Abbreviations

◮ Let us pick the last formula apart to understand it better (and also get a CNF representation along the way). ◮ Let us call the formula τ (“transition”): τ = oi → (vi ↔ regr(v, eff(o))i−1). ◮ First, some abbreviations:

◮ Let e = eff(o). ◮ Let ρ = regr(v, e) (“regression”). We have ρ = effcond(v, e) ∨ (v ∧ ¬effcond(¬v, e)). ◮ Let α = effcond(v, e) (“added”). ◮ Let δ = effcond(¬v, e) (“deleted”).

τ = oi → (vi ↔ ρi−1) with ρ = α ∨ (v ∧ ¬δ)

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 20 / 28

slide-6
SLIDE 6
  • B5. SAT Planning: Core Idea and Sequential Encoding

Transitions

Picking it Apart (1)

Reminder: τ = oi → (v i ↔ ρi−1) with ρ = α ∨ (v ∧ ¬δ)

τ = oi → (vi ↔ ρi−1) ≡ oi → ((vi → ρi−1) ∧ (ρi−1 → vi)) ≡ (oi → (vi → ρi−1))

  • τ1

∧ (oi → (ρi−1 → vi))

  • τ2

consider this two separate constraints τ1 and τ2

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 21 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Transitions

Picking it Apart (2)

Reminder: τ1 = oi → (v i → ρi−1) with ρ = α ∨ (v ∧ ¬δ)

τ1 = oi → (vi → ρi−1) ≡ oi → (¬ρi−1 → ¬vi) ≡ (oi ∧ ¬ρi−1) → ¬vi ≡ (oi ∧ ¬(αi−1 ∨ (vi−1 ∧ ¬δi−1))) → ¬vi ≡ (oi ∧ (¬αi−1 ∧ (¬vi−1 ∨ δi−1))) → ¬vi ≡ ((oi ∧ ¬αi−1 ∧ ¬vi−1) → ¬vi)

  • τ11

∧ ((oi ∧ ¬αi−1 ∧ δi−1) → ¬vi)

  • τ12

consider this two separate constraints τ11 and τ12

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 22 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Transitions

Interpreting the Constraints (1)

Can we give an intuitive description of τ11 and τ12? Yes! ◮ τ11 = (oi ∧ ¬αi−1 ∧ ¬vi−1) → ¬vi “When applying o, if v is false and o does not add it, it remains false.”

◮ called negative frame clause ◮ in clause form: ¬oi ∨ αi−1 ∨ v i−1 ∨ ¬v i

◮ τ12 = (oi ∧ ¬αi−1 ∧ δi−1) → ¬vi “When applying o, if o deletes v and does not add it, it is false afterwards.” (Note the add-after-delete semantics.)

◮ called negative effect clause ◮ in clause form: ¬oi ∨ αi−1 ∨ ¬δi−1 ∨ ¬v i

For STRIPS tasks, these are indeed clauses. (And in general?)

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 23 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Transitions

Picking it Apart (3)

Almost done!

Reminder: τ2 = oi → (ρi−1 → v i) with ρ = α ∨ (v ∧ ¬δ)

τ2 = oi → (ρi−1 → vi) ≡ (oi ∧ ρi−1) → vi ≡ (oi ∧ (αi−1 ∨ (vi−1 ∧ ¬δi−1))) → vi ≡ ((oi ∧ αi−1) → vi)

  • τ21

∧ ((oi ∧ vi−1 ∧ ¬δi−1) → vi)

  • τ22

consider this two separate constraints τ21 and τ22

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 24 / 28

slide-7
SLIDE 7
  • B5. SAT Planning: Core Idea and Sequential Encoding

Transitions

Interpreting the Constraints (2)

How about an intuitive description of τ21 and τ22? ◮ τ21 = (oi ∧ αi−1) → vi “When applying o, if o adds v, it is true afterwards.”

◮ called positive effect clause ◮ in clause form: ¬oi ∨ ¬αi−1 ∨ v i

◮ τ22 = (oi ∧ vi−1 ∧ ¬δi−1) → vi “When applying o, if v is true and o does not delete it, it remains true.”

◮ called positive frame clause ◮ in clause form: ¬oi ∨ ¬v i−1 ∨ δi−1 ∨ v i

For STRIPS tasks, these are indeed clauses. (But not in general.)

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 25 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Transitions

SAT Formula: Transitions

SAT Formula: Transitions precondition clauses: ◮ ¬oi ∨ pre(o)i−1 for all 1 ≤ i ≤ T, o ∈ O positive and negative effect clauses: ◮ ¬oi ∨ ¬αi−1 ∨ vi for all 1 ≤ i ≤ T, o ∈ O, v ∈ V ◮ ¬oi ∨ αi−1 ∨ ¬δi−1 ∨ ¬vi for all 1 ≤ i ≤ T, o ∈ O, v ∈ V positive and negative frame clauses: ◮ ¬oi ∨ ¬vi−1 ∨ δi−1 ∨ vi for all 1 ≤ i ≤ T, o ∈ O, v ∈ V ◮ ¬oi ∨ αi−1 ∨ vi−1 ∨ ¬vi for all 1 ≤ i ≤ T, o ∈ O, v ∈ V where α = effcond(v, eff(o)), δ = effcond(¬v, eff(o)). For STRIPS, all except the precondition clauses are in clause form. The precondition clauses are easily convertible to CNF (one clause ¬oi ∨ vi−1 for each precondition atom v of o).

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 26 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Summary

B5.5 Summary

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 27 / 28

  • B5. SAT Planning: Core Idea and Sequential Encoding

Summary

Summary

◮ SAT planning (planning as satisfiability) expresses a sequence

  • f bounded-horizon planning tasks as SAT formulas.

◮ Plans can be extracted from satisfying assignments; unsolvable tasks are challenging for the algorithm. ◮ For each time step, there are propositions encoding which state variables are true and which operators are applied. ◮ We describe a basic sequential encoding where one operator is applied at every time step. ◮ The encoding produces a CNF formula for STRIPS tasks. ◮ The encoding follows naturally (with some work) from using regression to link state variables in adjacent time steps.

  • M. Helmert, G. R¨
  • ger (Universit¨

at Basel) Planning and Optimization 28 / 28