[PPT] - Agent Planning Programs A New Method to Design and Control PowerPoint Presentation

SLIDE 1

Agent Planning Programs

A New Method to Design and Control Intelligent Agents Behaviour Sebastian Sardina School of Computer Science and IT RMIT University Melbourne, Australia sebastian.sardina@rmit.edu.au

Joint work with Giuseppe De Giacomo & Fabio Patrizi (Sapienza Universit` a di Roma), Alfonso Gerevini and Alessandro Saetti (Universit` a degli Studi di Brescia)

HYBRIS Workshop November 06 - 07, 2017 Aachen, Germany

1 / 50

SLIDE 2

Outline

General Motivation and Goal Brief Introduction to AI planning Planning Programs to Specify and Control Agents Behaviour Building Planning Programs: Solutions to the Realization Problem LTL Synthesis Planning-based approach Conclusions

2 / 50

SLIDE 3

UNS (ARG) = ⇒ UofT (CAN) = ⇒ RMIT (AUS)

3 / 50

SLIDE 4

UNS (ARG) = ⇒ UofT (CAN) = ⇒ RMIT (AUS)

3 / 50

SLIDE 5

Living & Working in Australia...

4 / 50

SLIDE 6

Outline

General Motivation and Goal Brief Introduction to AI planning Planning Programs to Specify and Control Agents Behaviour Building Planning Programs: Solutions to the Realization Problem LTL Synthesis Planning-based approach Conclusions

5 / 50

SLIDE 7

Control of Complex Activities in Different Contexts/Areas

Web-service Composition

6 / 50

SLIDE 8

Control of Complex Activities in Different Contexts/Areas

Web-service Composition Manufacturing

6 / 50

SLIDE 9

Control of Complex Activities in Different Contexts/Areas

Web-service Composition Manufacturing Robot ecologies

6 / 50

SLIDE 10

Control of Complex Activities in Different Contexts/Areas

Web-service Composition Manufacturing Robot ecologies Smart houses

6 / 50

SLIDE 11

Control of Complex Activities in Different Contexts/Areas

Web-service Composition Manufacturing Robot ecologies Smart houses Workflows

6 / 50

SLIDE 12

Control in Advance Manufacturing

Produce variable volumes of highly customised products, rapidly and at low cost:

overall manufacturing process

known

build part X first; then Y ;

assemble X with Y ; then ....

... but specific details unknown
unknown specific production line
external decisions (at run time)
human operators.
externals software.
allow production control software

greater autonomy

details of how products will be

manufactured.

7 / 50

SLIDE 13

Composition Control in Smart Spaces

8 / 50

SLIDE 14

Composition Control in Smart Spaces

House System Request

8 / 50

SLIDE 15

Composition Control in Smart Spaces

House System Request

8 / 50

SLIDE 16

Control in Workflows/Business Processes

general flow known by experts.
e.g., emergency management.
some choices decided externally.
e.g., doctor, game engine.
sub-processes achieve milestones.
sub-goaling?
complex steps.
data-intensive.
knowledge-based processes.
agnostic of actual implementation.
different capabilities available

9 / 50

SLIDE 17

What we are after I

A new way of specifying & building controllers for task-oriented behavior such that:

1 Can encode some know-how information.

10 / 50

SLIDE 18

What we are after I

A new way of specifying & building controllers for task-oriented behavior such that:

1 Can encode some know-how information.

general, abstract, deployment independent.

10 / 50

SLIDE 19

What we are after I

A new way of specifying & building controllers for task-oriented behavior such that:

1 Can encode some know-how information.

general, abstract, deployment independent.

2 Supports automatic decision making for

realization details.

10 / 50

SLIDE 20

What we are after I

A new way of specifying & building controllers for task-oriented behavior such that:

1 Can encode some know-how information.

general, abstract, deployment independent.

2 Supports automatic decision making for

realization details.

depends on underlying infrastructure.

10 / 50

SLIDE 21

What we are after I

A new way of specifying & building controllers for task-oriented behavior such that:

1 Can encode some know-how information.

general, abstract, deployment independent.

2 Supports automatic decision making for

realization details.

depends on underlying infrastructure.
too complex or demanding to develop by

hand.

10 / 50

SLIDE 22

What we are after I

A new way of specifying & building controllers for task-oriented behavior such that:

1 Can encode some know-how information.

general, abstract, deployment independent.

2 Supports automatic decision making for

realization details.

depends on underlying infrastructure.
too complex or demanding to develop by

hand.

3 Allows continuous operation.

10 / 50

SLIDE 23

What we are after I

A new way of specifying & building controllers for task-oriented behavior such that:

1 Can encode some know-how information.

general, abstract, deployment independent.

2 Supports automatic decision making for

realization details.

depends on underlying infrastructure.
too complex or demanding to develop by

hand.

3 Allows continuous operation.

may never stop...

10 / 50

SLIDE 24

What we are after I

A new way of specifying & building controllers for task-oriented behavior such that:

1 Can encode some know-how information.

general, abstract, deployment independent.

2 Supports automatic decision making for

realization details.

depends on underlying infrastructure.
too complex or demanding to develop by

hand.

3 Allows continuous operation.

may never stop...

4 Accommodates external decision

making/input.

10 / 50

SLIDE 25

What we are after I

A new way of specifying & building controllers for task-oriented behavior such that:

1 Can encode some know-how information.

general, abstract, deployment independent.

2 Supports automatic decision making for

realization details.

depends on underlying infrastructure.
too complex or demanding to develop by

hand.

3 Allows continuous operation.

may never stop...

4 Accommodates external decision

making/input.

human/agent decision or external software.

10 / 50

SLIDE 26

AI Approaches to Intelligent Autonomous Behavior/Control

1 Behavior-based AI: set of independent simple reactive modules.

Not much internal representation; sensor to action.
intelligent behavior emerges “implicitly” (popular in robotics since the ’80).

2 Agent-oriented programming: control specified by programmer.

BDI-like systems (e.g., JADEX, SARL) and High-level languages (e.g., Golog)

3 Learning: learn control/how-to-act based on previous experience.

Many machine learning techniques (e.g., reinforcement learning).

4 Model-based Planning: specify model by hand + derive control automatically

Model: captures predictions (what each action does in the world and what

sensors tell us about it)

Input: model of the world + initial state + goal to be achieved.
Output: plan or controller to achieve the goal.
Flexibility: no need to specify control knowledge (how to solve the problem)

11 / 50

SLIDE 27

AI Approaches to Intelligent Autonomous Behavior/Control

2 Agent-oriented programming: control specified by programmer.

BDI-like systems (e.g., JADEX, SARL) and High-level languages (e.g., Golog)

4 Model-based Planning: specify model by hand + derive control automatically

Model: captures predictions (what each action does in the world and what

sensors tell us about it)

Input: model of the world + initial state + goal to be achieved.
Output: plan or controller to achieve the goal.
Flexibility: no need to specify control knowledge (how to solve the problem)

11 / 50

SLIDE 28

AI Approaches to Intelligent Autonomous Behavior/Control

2 Agent-oriented programming: control specified by programmer.

BDI-like systems (e.g., JADEX, SARL) and High-level languages (e.g., Golog)

4 Model-based Planning: specify model by hand + derive control automatically

Model: captures predictions (what each action does in the world and what

sensors tell us about it)

Input: model of the world + initial state + goal to be achieved.
Output: plan or controller to achieve the goal.
Flexibility: no need to specify control knowledge (how to solve the problem)

11 / 50

SLIDE 29

Planning vs Agent Programming

Model-based AI Planning Agent-oriented Programming Declarative - “goals-to-be” Procedural - “actions-to-do”

vs

12 / 50

SLIDE 30

Planning vs Agent Programming

Model-based AI Planning Agent-oriented Programming Declarative - “goals-to-be” Procedural - “actions-to-do” Think in terms of goals Requires specific detailed solutions

vs

12 / 50

SLIDE 31

Planning vs Agent Programming

Model-based AI Planning Agent-oriented Programming Declarative - “goals-to-be” Procedural - “actions-to-do” Think in terms of goals Requires specific detailed solutions Missing know-how info Supports know-how info

vs

12 / 50

SLIDE 32

Planning vs Agent Programming

Model-based AI Planning Agent-oriented Programming Declarative - “goals-to-be” Procedural - “actions-to-do” Think in terms of goals Requires specific detailed solutions Missing know-how info Supports know-how info One-shot problem Long-term behavior: “act as you go”

vs

12 / 50

SLIDE 33

Planning vs Agent Programming

Model-based AI Planning Agent-oriented Programming Declarative - “goals-to-be” Procedural - “actions-to-do” Think in terms of goals Requires specific detailed solutions Missing know-how info Supports know-how info One-shot problem Long-term behavior: “act as you go” Computationally expensive Reduced search space

vs

12 / 50

SLIDE 34

Outline

General Motivation and Goal Brief Introduction to AI planning Planning Programs to Specify and Control Agents Behaviour Building Planning Programs: Solutions to the Realization Problem LTL Synthesis Planning-based approach Conclusions

13 / 50

SLIDE 35

Automated Planning: Model-based

PLANNING SYSTEM (SOLVER) Initial State Operators/actions Goal State Plan

14 / 50

SLIDE 36

Automated Planning: Model-based

PLANNING SYSTEM (SOLVER) Initial State Operators/actions Goal State Plan

achieves goal from initial state using operators

14 / 50

SLIDE 37

Automated Planning: Model-based

PLANNING SYSTEM (SOLVER) Initial State Operators/actions Goal State Plan

achieves goal from initial state using operators prec + effects

14 / 50

SLIDE 38

Automated Planning: Model-based

PLANNING SYSTEM (SOLVER) Initial State Operators/actions Goal State Plan

achieves goal from initial state using operators prec + effects

Domain-independent: no need to specify how to solve a problem.
Planning languages: specify operators (atomic actions), initial state, and

goals (e.g., PDDL)

Planning algorithms: search heuristics automatically extracted from model!
Tremendous improvements (modelling and algorithms) in last 15 years.

14 / 50

SLIDE 39

Automated Planning: Model-based

PLANNING SYSTEM (SOLVER) Initial State Operators/actions Goal State Plan

Domain-independent: no need to specify how to solve a problem.
Planning languages: specify operators (atomic actions), initial state, and

goals (e.g., PDDL)

Planning algorithms: search heuristics automatically extracted from model!
Tremendous improvements (modelling and algorithms) in last 15 years.

14 / 50

SLIDE 40

Environment Specified via a PDDL Domain Model

(define (domain academic-life) (:types location - home parking dept pub) fuel_level - low high) (:predicates (my-loc ?place - location) (car-loc ?place - location) (car-fuel ?level - fuel_level) (driven) ...) (:action go-by-car :parameters (?place - location) :precondition (and (my-loc ?source) (car-loc ?source) (not (fuel empty)) ) :effect (when (car-fuel high) (and (car-fuel low) (not (car-fuel high))) (when (car-fuel low) (and (car-fuel empty) (not (car-fuel low))) (:action refuel-car :parameters (?place - location) :precondition (and (car-loc ?place) (my-loc ?place)) :effect (car-fuel high) (not (car-fuel low)) (not (car-fuel empty)) ) ...

15 / 50

SLIDE 41

State Model for (Classical) AI Planning

Definition (Planning State Model)

Finite and discrete state space S (state = set of ground predicates).
A known initial state s0 ∈ S.
A set SG ⊆ S of goal states.
A set A of actions.
Actions A(s) ⊆ A applicable in each s ∈ S.
A deterministic state transition function s′ = f (a, s) for a ∈ A(s).
Action costs c(a, s) > 0.

16 / 50

SLIDE 42

State Model for (Classical) AI Planning

Definition (Planning State Model)

Finite and discrete state space S (state = set of ground predicates).
A known initial state s0 ∈ S.
A set SG ⊆ S of goal states.
A set A of actions.
Actions A(s) ⊆ A applicable in each s ∈ S.
A deterministic state transition function s′ = f (a, s) for a ∈ A(s).
Action costs c(a, s) > 0.

A solution is a sequence of applicable actions that maps s0 into SG. It is optimal if it minimizes sum of action costs (e.g., # of steps). The resulting controller is open-loop (no feedback).

16 / 50

SLIDE 43

Uncertainty but No Feedback: Conformant Planning

Differences with classical planning:

a set of possible initial states S0 ∈ S.
a non-deterministic transition function F(a, s) ⊆ S for a ∈ A(s).

Uncertainty but no sensing; hence controller still open-loop. A solution is (still) an action sequence that achieves the goal in spite of the uncertainty.

i.e. for any possible initial state and any possible transition.

17 / 50

SLIDE 44

Planning with Sensing

Like conformant planning plus:

a sensor model O(a, s) mapping actions and states into observations.

Solutions can be expressed in many forms:

policies mapping belief states (sets of states) into actions;
contingent trees;
finite-state controllers;
programs, etc.

Probabilistic version known as POMDP: Partially Obs. Markov Decision Process.

18 / 50

SLIDE 45

Models, Languages, Control, Scalability

A planner is a solver over a class of models; it takes a model description,

and computes the corresponding control (plan).

Many dimensions/models: plan adaptation, soft goals, constraints,

multi-agent, temporal, numerical, continuous change, uncertainty, feedback, etc.

Models described in compact form by means of planning languages (e.g.

PDDL).

Different types of control:
open-loop vs. closed-loop (feedback used);
off-line vs. on-line (full policies vs. lookahead).
All models computationally intractable; key challenge is scalability
how not to be defeated by problem size;
need to use heuristics and exploit the structure of problems.

19 / 50

SLIDE 46

Combinatorial Explosion: Example

B C A C A B A B C B A C B C A A B C C A B A B C C B A A B C A B C B C A ......... ........ GOAL GOAL INIT ..... ..... .... ....

Classical problem: move blocks to transform Init into Goal.
Problem becomes path finding in directed graph.
Difficulty is that graph size is exponential in number of blocks.
Problem simple for specialized Block Solver but difficult for General Solver.

20 / 50

SLIDE 47

Dealing with the Combinatorial Explosion: Heuristics

B C A C A B A B C B A C B C A A B C C A B A B C C B A A B C A B C B C A ......... ........ GOAL h=3 h=3 h=2 h=3 h=1 h=0 h=2 h=2 h=2 h=2 GOAL INIT

Plans can be found/constructed with heuristic search:

Heuristic values h(s) estimate “cost” from s to goal; provide sense of

direction.

They are derived automatically from problem representation!

21 / 50

SLIDE 48

Status of Model-based Planning

Classical planners work reasonably well:
Large problems solved very fast (non-optimally).
Exploit different techniques: heuristics, landmarks, helpful actions.
Other approaches like SAT and local search (LPG) work well too.

22 / 50

SLIDE 49

Status of Model-based Planning

Classical planners work reasonably well:
Large problems solved very fast (non-optimally).
Exploit different techniques: heuristics, landmarks, helpful actions.
Other approaches like SAT and local search (LPG) work well too.
Model simple but useful:
Operators not primitive; can be policies themselves.
Fast closed-loop replanning able to cope with uncertainty sometimes.

22 / 50

SLIDE 50

Status of Model-based Planning

Classical planners work reasonably well:
Large problems solved very fast (non-optimally).
Exploit different techniques: heuristics, landmarks, helpful actions.
Other approaches like SAT and local search (LPG) work well too.
Model simple but useful:
Operators not primitive; can be policies themselves.
Fast closed-loop replanning able to cope with uncertainty sometimes.
Beyond Classical Planning: temporal, numerical, soft goals, constraints,

multi-agent, incomplete information, uncertainty, . . .

Top-down approach: several general native solvers .
Bottom-up approach: Some transformations into classical planning.

22 / 50

SLIDE 51

Status of Model-based Planning

Classical planners work reasonably well:
Large problems solved very fast (non-optimally).
Exploit different techniques: heuristics, landmarks, helpful actions.
Other approaches like SAT and local search (LPG) work well too.
Model simple but useful:
Operators not primitive; can be policies themselves.
Fast closed-loop replanning able to cope with uncertainty sometimes.
Beyond Classical Planning: temporal, numerical, soft goals, constraints,

multi-agent, incomplete information, uncertainty, . . .

Top-down approach: several general native solvers .
Bottom-up approach: Some transformations into classical planning.
Other approaches: specialised algorithms/data structures
E.g., path planning (many applications in robotics and games).

22 / 50

SLIDE 52

Outline

General Motivation and Goal Brief Introduction to AI planning Planning Programs to Specify and Control Agents Behaviour Building Planning Programs: Solutions to the Realization Problem LTL Synthesis Planning-based approach Conclusions

23 / 50

SLIDE 53

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

24 / 50

SLIDE 54

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

achievement & maintenance goal types,

maybe others...

24 / 50

SLIDE 55

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

achievement & maintenance goal types,

maybe others...

2 Supports automatic plan synthesis for goals.

24 / 50

SLIDE 56

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

achievement & maintenance goal types,

maybe others...

2 Supports automatic plan synthesis for goals. 3 Allows relating/linking goals.

24 / 50

SLIDE 57

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

achievement & maintenance goal types,

maybe others...

2 Supports automatic plan synthesis for goals. 3 Allows relating/linking goals.

“high-level” know-how

24 / 50

SLIDE 58

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

achievement & maintenance goal types,

maybe others...

2 Supports automatic plan synthesis for goals. 3 Allows relating/linking goals.

“high-level” know-how
e.g., after φ1, it “makes sense” to achieve

φ2 or φ3, but not φ4.

24 / 50

SLIDE 59

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

achievement & maintenance goal types,

maybe others...

2 Supports automatic plan synthesis for goals. 3 Allows relating/linking goals.

“high-level” know-how
e.g., after φ1, it “makes sense” to achieve

φ2 or φ3, but not φ4.

4 Provides continuous control/behavior.

24 / 50

SLIDE 60

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

achievement & maintenance goal types,

maybe others...

2 Supports automatic plan synthesis for goals. 3 Allows relating/linking goals.

“high-level” know-how
e.g., after φ1, it “makes sense” to achieve

φ2 or φ3, but not φ4.

4 Provides continuous control/behavior.

may never stop...

24 / 50

SLIDE 61

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

achievement & maintenance goal types,

maybe others...

2 Supports automatic plan synthesis for goals. 3 Allows relating/linking goals.

“high-level” know-how
e.g., after φ1, it “makes sense” to achieve

φ2 or φ3, but not φ4.

4 Provides continuous control/behavior.

may never stop...

5 Allows for external decision making/input.

24 / 50

SLIDE 62

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

achievement & maintenance goal types,

maybe others...

2 Supports automatic plan synthesis for goals. 3 Allows relating/linking goals.

“high-level” know-how
e.g., after φ1, it “makes sense” to achieve

φ2 or φ3, but not φ4.

4 Provides continuous control/behavior.

may never stop...

5 Allows for external decision making/input.

At run-time it is decided whether system

should achieve φ1 or φ2 at a given point.

24 / 50

SLIDE 63

What we are after II

A new way of specifying & building controllers of agents task-oriented behavior such that:

1 Builds on declarative goals φ1, φ2, ...

achievement & maintenance goal types,

maybe others...

2 Supports automatic plan synthesis for goals. 3 Allows relating/linking goals.

“high-level” know-how
e.g., after φ1, it “makes sense” to achieve

φ2 or φ3, but not φ4.

4 Provides continuous control/behavior.

may never stop...

5 Allows for external decision making/input.

At run-time it is decided whether system

should achieve φ1 or φ2 at a given point.

H y b r i d b e t w e e n A I P l a n n i n g a n d A g e n t P r

g

r a m m i n g !

24 / 50

SLIDE 64

Agent Planning Programs: Academic Domain Example

25 / 50

SLIDE 65

Agent Planning Programs

Finite state program (including conditionals and loops).
Non-controllable transitions: selected by external entity.
With an initial state and possibly non-terminating.
Atomic instructions: requests to “achieve goal φ while maintaining goal ψ”
Meant to run in a dynamic domain where agents act: the environment.

26 / 50

SLIDE 66

Agent Planning Programs

Finite state program (including conditionals and loops).
Non-controllable transitions: selected by external entity.
With an initial state and possibly non-terminating.
Atomic instructions: requests to “achieve goal φ while maintaining goal ψ”
Meant to run in a dynamic domain where agents act: the environment.

t0 t1 t2

26 / 50

SLIDE 67

Agent Planning Programs

Finite state program (including conditionals and loops).
Non-controllable transitions: selected by external entity.
With an initial state and possibly non-terminating.
Atomic instructions: requests to “achieve goal φ while maintaining goal ψ”
Meant to run in a dynamic domain where agents act: the environment.

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty)

26 / 50

SLIDE 68

Agent Planning Programs

Finite state program (including conditionals and loops).
Non-controllable transitions: selected by external entity.
With an initial state and possibly non-terminating.
Atomic instructions: requests to “achieve goal φ while maintaining goal ψ”
Meant to run in a dynamic domain where agents act: the environment.

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G4:achieve MyLoc(pub)

26 / 50

SLIDE 69

Agent Planning Programs

Finite state program (including conditionals and loops).
Non-controllable transitions: selected by external entity.
With an initial state and possibly non-terminating.
Atomic instructions: requests to “achieve goal φ while maintaining goal ψ”
Meant to run in a dynamic domain where agents act: the environment.

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4:achieve MyLoc(pub)

26 / 50

SLIDE 70

Agent Planning Programs

Finite state program (including conditionals and loops).
Non-controllable transitions: selected by external entity.
With an initial state and possibly non-terminating.
Atomic instructions: requests to “achieve goal φ while maintaining goal ψ”
Meant to run in a dynamic domain where agents act: the environment.

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4:achieve MyLoc(pub) G5: achieve MyLoc(home)

while maintaining ¬Driven

26 / 50

SLIDE 71

Agent Planning Programs

Finite state program (including conditionals and loops).
Non-controllable transitions: selected by external entity.
With an initial state and possibly non-terminating.
Atomic instructions: requests to “achieve goal φ while maintaining goal ψ”
Meant to run in a dynamic domain where agents act: the environment.

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4:achieve MyLoc(pub) G5: achieve MyLoc(home)

while maintaining ¬Driven

Agent chooses a goal to pursue!

26 / 50

SLIDE 72

Environment for Planning Programs (a simple example)

Planning programs are executed in an environment

State Propositions

CarLoc, MyLoc: {home, parking, dept, pub} Fuel: {empty, low, high} Driven: {true, false}

Initial State

CarLoc = home ∧ MyLoc = home ∧ Fuel = high ∧ Driven = false

Operators

goByCar(x) with x ∈ {home, parking, dept, pub} prec : MyLoc = CarLoc ∧ Fuel = empty post : MyLoc = CarLoc = x; Driven = true; (when (Fuel high) (Fuel low)) (when (Fuel low) (Fuel empty))

27 / 50

SLIDE 73

Environment for Planning Programs (a simple example)

Planning programs are executed in an environment

State Propositions

CarLoc, MyLoc: {home, parking, dept, pub} Fuel: {empty, low, high} Driven: {true, false}

Initial State

CarLoc = home ∧ MyLoc = home ∧ Fuel = high ∧ Driven = false

Operators

goByCar(x) with x ∈ {home, parking, dept, pub} prec : MyLoc = CarLoc ∧ Fuel = empty post : MyLoc = CarLoc = x; Driven = true; (when (Fuel high) (Fuel low)) (when (Fuel low) (Fuel empty)) deterministic action

27 / 50

SLIDE 74

Environment for Planning Programs (a simple example)

Planning programs are executed in an environment

State Propositions

CarLoc, MyLoc: {home, parking, dept, pub} Fuel: {empty, low, high} Driven: {true, false}

Initial State

CarLoc = home ∧ MyLoc = home ∧ Fuel = high ∧ Driven = false

Operators

goByCar(x) with x ∈ {home, parking, dept, pub} prec : MyLoc = CarLoc ∧ Fuel = empty post : MyLoc = CarLoc = x; Driven = true; (when (Fuel high) (oneof (Fuel high) (Fuel low))); (when (Fuel low) (oneof (Fuel low) (Fuel empty)))

27 / 50

SLIDE 75

Environment for Planning Programs (a simple example)

Planning programs are executed in an environment

State Propositions

CarLoc, MyLoc: {home, parking, dept, pub} Fuel: {empty, low, high} Driven: {true, false}

Initial State

CarLoc = home ∧ MyLoc = home ∧ Fuel = high ∧ Driven = false

Operators

goByCar(x) with x ∈ {home, parking, dept, pub} prec : MyLoc = CarLoc ∧ Fuel = empty post : MyLoc = CarLoc = x; Driven = true; (when (Fuel high) (oneof (Fuel high) (Fuel low))); (when (Fuel low) (oneof (Fuel low) (Fuel empty))) non-deterministic action

27 / 50

SLIDE 76

Environment Specified via a PDDL Domain Model

(define (domain academic-life) (:types location - home parking dept pub) fuel_level - low high) (:predicates (my-loc ?place - location) (car-loc ?place - location) (car-fuel ?level - fuel_level) (driven) ...) (:action go-by-car :parameters (?place - location) :precondition (and (my-loc ?source) (car-loc ?source) (not (fuel empty)) ) :effect (when (car-fuel high) (and (car-fuel low) (not (car-fuel high))) (when (car-fuel low) (and (car-fuel empty) (not (car-fuel low))) (:action refuel-car :parameters (?place - location) :precondition (and (car-loc ?place) (my-loc ?place)) :effect (car-fuel high) (not (car-fuel low)) (not (car-fuel empty)) ) ...

28 / 50

SLIDE 77

Agent Planning Programs: Controller Synthesis

SYNTHESIS SYSTEM (APP SOLVER) Agent Planning Program Environment Controller

29 / 50

SLIDE 78

Agent Planning Programs: Controller Synthesis

SYNTHESIS SYSTEM (APP SOLVER) Agent Planning Program Environment Controller

dynamic domain

29 / 50

SLIDE 79

Agent Planning Programs: Controller Synthesis

SYNTHESIS SYSTEM (APP SOLVER) Agent Planning Program Environment Controller

dynamic domain possible requests

29 / 50

SLIDE 80

Agent Planning Programs: Controller Synthesis

SYNTHESIS SYSTEM (APP SOLVER) Agent Planning Program Environment Controller

dynamic domain possible requests “realizes” the APP in the environment

29 / 50

SLIDE 81

Agent Planning Programs: Controller Synthesis

SYNTHESIS SYSTEM (APP SOLVER) Agent Planning Program Environment Controller Request

29 / 50

SLIDE 82

Agent Planning Programs: Controller Synthesis

SYNTHESIS SYSTEM (APP SOLVER) Agent Planning Program Environment Controller Request Plans

29 / 50

SLIDE 83

Agent Planning Programs: Execution and Control

Execution cycle

1 the environment is in a state s and the

planning program in state t;

2 the user requests a legal transition from

state t to state t′ to achieve goal φ;

3 controller deploys a plan π

that achieves φ from environment state s;

4 environment and planning program

states are updated; and

5 cycle restarts.

Execution Cycle P L A N D E P L O Y U P D A T E R E Q U E S T

30 / 50

SLIDE 84

Agent Planning Programs: Execution and Control

Execution cycle

1 the environment is in a state s and the

planning program in state t;

2 the user requests a legal transition from

state t to state t′ to achieve goal φ;

3 controller deploys a plan π

that achieves φ from environment state s;

4 environment and planning program

states are updated; and

5 cycle restarts.

Execution Cycle P L A N D E P L O Y U P D A T E R E Q U E S T

Find those plans!

30 / 50

SLIDE 85

A Realization of a Planning Program

MyLoc(home) ∧ Fuel(high), t0, ReqG1 − → goByCar(parking); walk(dept) MyLoc(dept) ∧ Fuel(high), t1, ReqG3 − → walk(parking); goByCar(home); walk(pub) MyLoc(dept) ∧ Fuel(high), t1, ReqG2 − → walk(parking); goByCar(home) MyLoc(pub), t2, ReqG5 − → walk(home) MyLoc(home), t0, ReqG4 − → walk(pub) . . .

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4: achieve MyLoc(pub) G5: achieve MyLoc(home)

while maintaining ¬Driven

31 / 50

SLIDE 86

A Realization of a Planning Program

MyLoc(home) ∧ Fuel(high), t0, ReqG1 − → goByCar(parking); walk(dept) MyLoc(dept) ∧ Fuel(high), t1, ReqG3 − → walk(parking); goByCar(home); walk(pub) MyLoc(dept) ∧ Fuel(high), t1, ReqG2 − → walk(parking); goByCar(home) MyLoc(pub), t2, ReqG5 − → walk(home) MyLoc(home), t0, ReqG4 − → walk(pub) . . .

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4: achieve MyLoc(pub) G5: achieve MyLoc(home)

while maintaining ¬Driven

31 / 50

SLIDE 87

A Realization of a Planning Program

MyLoc(home) ∧ Fuel(high), t0, ReqG1 − → goByCar(parking); walk(dept) MyLoc(dept) ∧ Fuel(high), t1, ReqG3 − → walk(parking); goByCar(home); walk(pub) MyLoc(dept) ∧ Fuel(high), t1, ReqG2 − → walk(parking); goByCar(home) MyLoc(pub), t2, ReqG5 − → walk(home) MyLoc(home), t0, ReqG4 − → walk(pub) . . .

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4: achieve MyLoc(pub) G5: achieve MyLoc(home)

while maintaining ¬Driven

Suppose no bus, then shouldn’t drive to pub! Suppose no TakeBus actions, then shouldn’t drive to pub!

From dept, should not drive to pub (even if optimal plan!)
Otherwise, agent needs to leave the car in the pub when achieving G5 from t2.
Later, agent will not be able to achieve G1 from t0! (long walks not possible)

31 / 50

SLIDE 88

A Realization of a Planning Program (cont.)

MyLoc(home) ∧ Fuel(high), t0, ReqG1 − → goByCar(parking); walk(dept) MyLoc(dept) ∧ Fuel(high), t1, ReqG3 − → walk(parking); goByCar(home); walk(pub) MyLoc(dept) ∧ Fuel(low), t1, ReqG3 − → walk(parking); refuel; goByCar(home); walk(pub) MyLoc(dept) ∧ Fuel(high), t1, ReqG2 − → walk(parking); goByCar(home) MyLoc(pub), t2, ReqG5 − → walk(home) MyLoc(home), t0, ReqG4 − → walk(pub) . . .

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4: achieve MyLoc(pub) G5: achieve MyLoc(home)

while maintaining ¬Driven

32 / 50

SLIDE 89

A Realization of a Planning Program (cont.)

MyLoc(home) ∧ Fuel(high), t0, ReqG1 − → goByCar(parking); walk(dept) MyLoc(dept) ∧ Fuel(high), t1, ReqG3 − → walk(parking); goByCar(home); walk(pub) MyLoc(dept) ∧ Fuel(low), t1, ReqG3 − → walk(parking); refuel; goByCar(home); walk(pub) MyLoc(dept) ∧ Fuel(high), t1, ReqG2 − → walk(parking); goByCar(home) MyLoc(pub), t2, ReqG5 − → walk(home) MyLoc(home), t0, ReqG4 − → walk(pub) . . .

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4: achieve MyLoc(pub) G5: achieve MyLoc(home)

while maintaining ¬Driven

avoid empty tank!

32 / 50

SLIDE 90

Planning Program Realization (formally)

Definition

A planning program T is realizable in dynamic domain/environment D iff there is a plan-based simulation PLAN between the initial states of T and D.

33 / 50

SLIDE 91

Planning Program Realization (formally)

Definition

A planning program T is realizable in dynamic domain/environment D iff there is a plan-based simulation PLAN between the initial states of T and D.

Informally. (t, s) ∈PLAN means we can satisfy (with plans) all agent’s potential

requests from the current program state t when the dynamic domain is in state s.

33 / 50

SLIDE 92

Planning Program Realization (formally)

Definition

A planning program T is realizable in dynamic domain/environment D iff there is a plan-based simulation PLAN between the initial states of T and D.

Informally. (t, s) ∈PLAN means we can satisfy (with plans) all agent’s potential

requests from the current program state t when the dynamic domain is in state s.

Formally. A binary relation PLAN is a plan-based simulation relation iff:

(t, s) ∈PLAN implies that for all possible requests t − →achieve φ while maintaining ψ t′ there exists plan a1, a2, . . . , an s.t. such that:

33 / 50

SLIDE 93

Planning Program Realization (formally)

Definition

A planning program T is realizable in dynamic domain/environment D iff there is a plan-based simulation PLAN between the initial states of T and D.

Informally. (t, s) ∈PLAN means we can satisfy (with plans) all agent’s potential

requests from the current program state t when the dynamic domain is in state s.

Formally. A binary relation PLAN is a plan-based simulation relation iff:

(t, s) ∈PLAN implies that for all possible requests t − →achieve φ while maintaining ψ t′ there exists plan a1, a2, . . . , an s.t. such that:

s

a1

− → s1

a2

− → · · ·

an−1

− → sn−1

an

− → sn (plan is executable)

33 / 50

SLIDE 94

Planning Program Realization (formally)

Definition

A planning program T is realizable in dynamic domain/environment D iff there is a plan-based simulation PLAN between the initial states of T and D.

Informally. (t, s) ∈PLAN means we can satisfy (with plans) all agent’s potential

requests from the current program state t when the dynamic domain is in state s.

Formally. A binary relation PLAN is a plan-based simulation relation iff:

(t, s) ∈PLAN implies that for all possible requests t − →achieve φ while maintaining ψ t′ there exists plan a1, a2, . . . , an s.t. such that:

s

a1

− → s1

a2

− → · · ·

an−1

− → sn−1

an

− → sn (plan is executable)

si |

= ψ, for si = s, s1, . . . , sn−1 (maintenance goal is satisfied)

33 / 50

SLIDE 95

Planning Program Realization (formally)

Definition

A planning program T is realizable in dynamic domain/environment D iff there is a plan-based simulation PLAN between the initial states of T and D.

Informally. (t, s) ∈PLAN means we can satisfy (with plans) all agent’s potential

requests from the current program state t when the dynamic domain is in state s.

Formally. A binary relation PLAN is a plan-based simulation relation iff:

(t, s) ∈PLAN implies that for all possible requests t − →achieve φ while maintaining ψ t′ there exists plan a1, a2, . . . , an s.t. such that:

s

a1

− → s1

a2

− → · · ·

an−1

− → sn−1

an

− → sn (plan is executable)

si |

= ψ, for si = s, s1, . . . , sn−1 (maintenance goal is satisfied)

sn |

= φ (achievement goal is satisfied)

33 / 50

SLIDE 96

Planning Program Realization (formally)

Definition

A planning program T is realizable in dynamic domain/environment D iff there is a plan-based simulation PLAN between the initial states of T and D.

Informally. (t, s) ∈PLAN means we can satisfy (with plans) all agent’s potential

requests from the current program state t when the dynamic domain is in state s.

Formally. A binary relation PLAN is a plan-based simulation relation iff:

(t, s) ∈PLAN implies that for all possible requests t − →achieve φ while maintaining ψ t′ there exists plan a1, a2, . . . , an s.t. such that:

s

a1

− → s1

a2

− → · · ·

an−1

− → sn−1

an

− → sn (plan is executable)

si |

= ψ, for si = s, s1, . . . , sn−1 (maintenance goal is satisfied)

sn |

= φ (achievement goal is satisfied)

(t′, sn) ∈PLAN

(simulation holds in resulting state)

33 / 50

SLIDE 97

Outline

General Motivation and Goal Brief Introduction to AI planning Planning Programs to Specify and Control Agents Behaviour Building Planning Programs: Solutions to the Realization Problem LTL Synthesis Planning-based approach Conclusions

34 / 50

SLIDE 98

Planning Program Realization: How to Compute it?

Theorem (Complexity)

Checking whether an agent planning is realizable in a planning domain from a given initial state EXPTIME-complete.

PTIME N P T I M E NPC c

N

P T I M E PSPACE

EXPTIME

EXPSPACE . . . ELEMENTARY . . . 2EXPTIME R

Agent Planning Programs

Classical Planning Conformant Planning Non-deterministic Planning

35 / 50

SLIDE 99

Planning Program Realization: How to Compute it?

Theorem (Complexity)

Checking whether an agent planning is realizable in a planning domain from a given initial state EXPTIME-complete. Two proposed approaches:

1 Reduction to LTL synthesis

[AAMAS’10]

Reduction to reactive synthesis for certain kinds of Linear-time Temporal Logic

(LTL) specifications based on model checking game structures.

Pros: Solvers available (TLV, NuGaT); handle non-determinism easily, yield

universal solutions.

Cons: Computationally challenging; scalability.

2 Planning-based approach

[ICAPS’11, AIJ’16]

Dedicated algorithm using automated planning to realise each transition.
Pros: can exploit fast planning technology and the program structure to

efficiently solve the problem

Cons: Mostly algorithms for deterministic domains; yield single solutions.

35 / 50

SLIDE 100

Reduction to LTL Synthesis

LTL Synthesis [Pnueli & Rosner ’89]

Given a model E of the environment and an LTL formula specification φ, find a controller C such that E × C | = φ.

E and C are generally automata.
φ a temporal formula: “always p”, “eventually q”, “always eventually q ∧ q”
E × C means the evolution of E when constrained by C.

36 / 50

SLIDE 101

Reduction to LTL Synthesis

LTL Synthesis [Pnueli & Rosner ’89]

Given a model E of the environment and an LTL formula specification φ, find a controller C such that E × C | = φ.

E and C are generally automata.
φ a temporal formula: “always p”, “eventually q”, “always eventually q ∧ q”
E × C means the evolution of E when constrained by C.

Agent Planning Programs via LTL synthesis [AAMAS’10, AIJ’16]

The environment/PDDL model is encoded into E.
Formula φ states that:
Always the current maintenance goal is true.
Eventually the current goal is achieved.
The controller C will deploy plans per goal-transition request.

36 / 50

SLIDE 102

LTL synthesis is 2EXPTIME-complete!

Critics: 2EXPTIME is a horrible complexity! Response:

2EXPTIME is just worst-case complexity.
Doubly exponential bound on the size of the smallest strategy
thus, hand design cannot do better in the worst case... :-(

Critics: algorithms not ready for practical implementation Response:

Very true! More research needed
But good algorithms exist for special cases!
e.g., reachability, safety, GR(1)

37 / 50

SLIDE 103

Agent Planning Programs via GR(1) Synthesis [AAMAS’10]

LTL realizability is 2EXPTIME-complete for general LTL formulas! :-(

(Notice that satisfiability or validity for LTL is PSPACE-complete)

But: Several interesting LTL patterns have been studied.

38 / 50

SLIDE 104

Agent Planning Programs via GR(1) Synthesis [AAMAS’10]

LTL realizability is 2EXPTIME-complete for general LTL formulas! :-(

(Notice that satisfiability or validity for LTL is PSPACE-complete)

But: Several interesting LTL patterns have been studied.
“General Reactivity (1)” formulas shape: ϕass → ψreq
Synthesis can be reduced to µ-calculus model checking of a game structure!
Can exploit MC symbolic techniques (OBDD)!
Synthesis is polynomial in the size of the formula and the game structure.

38 / 50

SLIDE 105

Agent Planning Programs via GR(1) Synthesis [AAMAS’10]

LTL realizability is 2EXPTIME-complete for general LTL formulas! :-(

(Notice that satisfiability or validity for LTL is PSPACE-complete)

But: Several interesting LTL patterns have been studied.
“General Reactivity (1)” formulas shape: ϕass → ψreq
Synthesis can be reduced to µ-calculus model checking of a game structure!
Can exploit MC symbolic techniques (OBDD)!
Synthesis is polynomial in the size of the formula and the game structure.

The Good News!

Solving agent planning programs can be done with GR(1) synthesis!
Can be solved by model checking game structures.
Is polynomial in the states of the planning domain.
Is EXPTIME in the representation as for model checking.
Can be practically implemented in model checking-based LTL synthesis such

as TLV, NuGAT, Anzu.

38 / 50

SLIDE 106

Planning-based Approach (General Algorithm)

1 Open = set of joint (domain/program) states to process, initially set to initial

states s0, v0

2 Repeat 3 Select a pair s, v from Open 4 Foreach program transition d outgoing from v do 5

Construct a plan π achieving the goals of d from s

6

Update the realization function

7

Progress the program and world states possibly generating a new joint state to process (pair added to Open)

8 Until Open is empty

Plans resulting in already generated domain states are preferred using soft goals

39 / 50

SLIDE 107

A Planning-based Algorithm: Example (part 1 of 4)

v2 v1 v0

G0: {(at P1 NY)} G2: {(at P1 BO)} G0: {(at P1 NY)} G1: {(at P1 WA)}

Open = { s0, v0 } State(v0) = {s0} State(v1) = {} State(v2) = {} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 ? s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 ? a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

CG1( (

40 / 50

SLIDE 108

A Planning-based Algorithm: Example (part 1 of 4)

v2 v1 v0

G0: {(at P1 NY)} G1: {(at P1 WA)} G0: {(at P1 NY)} G2: {(at P1 BO)}

Open = { } State(v0) = {s0} State(v1) = {} State(v2) = {} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

Constructing plans for G1 and G2 from s0

40 / 50

SLIDE 109

A Planning-based Algorithm: Example (part 1 of 4)

v2 v1 v0

G0: {(at P1 NY)} G2: {(at P1 BO)} G0: {(at P1 NY)} G1: {(at P1 WA)}

Open = { s1, v1, s2, v2 } State(v0) = {s0} State(v1) = {s1} State(v2) = {s2} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 s1 = {(at P1 Bo), (at A1 Bo)} v2, G0, v0 ? s2 = {(at P1 Wa), (at A1 Wa)} v1, G0, v0 ? a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

The computed plans produce two new final states s1 for v1 and s2 for v2

40 / 50

SLIDE 110

A Planning-based Algorithm: Example (part 2 of 4)

v2 v1 v0

G0: {(at P1 NY)} G2: {(at P1 BO)} G0: {(at P1 NY)} G1: {(at P1 WA)}

Open = { s1, v1 , s2, v2 } State(v0) = {s0} State(v1) = {s1} State(v2) = {s2} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 s1 = {(at P1 Bo), (at A1 Bo)} v2, G0, v0 ? s2 = {(at P1 Wa), (at A1 Wa)} v1, G0, v0 ? a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

CG1( (

41 / 50

SLIDE 111

A Planning-based Algorithm: Example (part 2 of 4)

v2 v0 v1

G1: {(at P1 WA)} G0: {(at P1 NY)} G2: {(at P1 BO)} G0: {(at P1 NY)}

Open = { s2, v2 } State(v0) = {s0} State(v1) = {s1} State(v2) = {s2} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 s1 = {(at P1 Bo), (at A1 Bo)} v2, G0, v0 a7, a1, a8 s2 = {(at P1 Wa), (at A1 Wa)} v1, G0, v0 ? a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

Constructing a plan for G0 preferring end state s0

41 / 50

SLIDE 112

A Planning-based Algorithm: Example (part 2 of 4)

v2 v1 v0

G0: {(at P1 NY)} G2: {(at P1 BO)} G0: {(at P1 NY)} G1: {(at P1 WA)}

Open = { s2, v2, s3, v0 } State(v0) = {s0, s3} State(v1) = {s1} State(v2) = {s2} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 s1 = {(at P1 Bo), (at A1 Bo)} v2, G0, v0 a7, a1, a8 s2 = {(at P1 Wa), (at A1 Wa)} v1, G0, v0 ? s3 = {(at P1 NY), (at A1 NY)} v0, G2, v2 ? s3 = {(at P1 NY), (at A1 NY)} v0, G1, v1 ? a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

The computed plan produces a new final states s3 for v0

41 / 50

SLIDE 113

A Planning-based Algorithm: Example (part 3 of 4)

v2 v1 v0

G0: {(at P1 NY)} G2: {(at P1 BO)} G0: {(at P1 NY)} G1: {(at P1 WA)}

Open = { s2, v2 , s3, v0 } State(v0) = {s0, s3} State(v1) = {s1} State(v2) = {s2} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 s1 = {(at P1 Bo), (at A1 Bo)} v2, G0, v0 a7, a1, a8 s2 = {(at P1 Wa), (at A1 Wa)} v1, G0, v0 ? s3 = {(at P1 NY), (at A1 NY)} v0, G2, v2 ? s3 = {(at P1 NY), (at A1 NY)} v0, G1, v1 ? a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

CG1( (

42 / 50

SLIDE 114

A Planning-based Algorithm: Example (part 3 of 4)

v2 v1 v0

G0: {(at P1 NY)} G1: {(at P1 WA)} G0: {(at P1 NY)} G2: {(at P1 BO)}

Open = { s3, v0 } State(v0) = {s0, s3} State(v1) = {s1} State(v2) = {s2} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 s1 = {(at P1 Bo), (at A1 Bo)} v2, G0, v0 a7, a1, a8 s2 = {(at P1 Wa), (at A1 Wa)} v1, G0, v0 a9, a10, a8 s3 = {(at P1 NY), (at A1 NY)} v0, G2, v2 ? s3 = {(at P1 NY), (at A1 NY)} v0, G1, v1 ? a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

Constructing a plan for G0 preferring end states s0 or s3

42 / 50

SLIDE 115

A Planning-based Algorithm: Example (part 3 of 4)

v2 v1 v0

G0: {(at P1 NY)} G2: {(at P1 BO)} G0: {(at P1 NY)} G1: {(at P1 WA)}

Open = { s3, v0 } State(v0) = {s0, s3} State(v1) = {s1} State(v2) = {s2} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 s1 = {(at P1 Bo), (at A1 Bo)} v2, G0, v0 a7, a1, a8 s2 = {(at P1 Wa), (at A1 Wa)} v1, G0, v0 a9, a10, a8 s3 = {(at P1 NY), (at A1 NY)} v0, G2, v2 ? s3 = {(at P1 NY), (at A1 NY)} v0, G1, v1 ? a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

The computed plan produces the preferred final state s3 ∈ State(v0)

42 / 50

SLIDE 116

A Planning-based Algorithm: Example (part 4 of 4)

v2 v1 v0

G0: {(at P1 NY)} G2: {(at P1 BO)} G0: {(at P1 NY)} G1: {(at P1 WA)}

Open = { s3, v0 } State(v0) = {s0, s3} State(v1) = {s1} State(v2) = {s2} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 s1 = {(at P1 Bo), (at A1 Bo)} v2, G0, v0 a7, a1, a8 s2 = {(at P1 Wa), (at A1 Wa)} v1, G0, v0 a9, a10, a8 s3 = {(at P1 NY), (at A1 NY)} v0, G2, v2 ? s3 = {(at P1 NY), (at A1 NY)} v0, G1, v1 ? a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

CG1( (

43 / 50

SLIDE 117

A Planning-based Algorithm: Example (part 4 of 4)

v2 v1 v0

G0: {(at P1 NY)} G1: {(at P1 WA)} G0: {(at P1 NY)} G2: {(at P1 BO)}

Open = { } State(v0) = {s0, s3} State(v1) = {s1} State(v2) = {s2} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 s1 = {(at P1 Bo), (at A1 Bo)} v2, G0, v0 a7, a1, a8 s2 = {(at P1 Wa), (at A1 Wa)} v1, G0, v0 a9, a10, a8 s3 = {(at P1 NY), (at A1 NY)} v0, G2, v2 a2, a3, a4 s3 = {(at P1 NY), (at A1 NY)} v0, G1, v1 a2, a5, a6 a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

Constructing plans for G1 and G2 preferring end states s1 and s2

43 / 50

SLIDE 118

A Planning-based Algorithm: Example (part 4 of 4)

v2 v1 v0

G0: {(at P1 NY)} G2: {(at P1 BO)} G0: {(at P1 NY)} G1: {(at P1 WA)}

Open = { } State(v0) = {s0, s3} State(v1) = {s1} State(v2) = {s2} Program realization function under construction State Transition Plan

s0 = {(at P1 NY), (at A1 Bo)} v0, G2, v2 a1, a2, a3, a4 s0 = {(at P1 NY), (at A1 Bo)} v0, G1, v1 a1, a2, a5, a6 s1 = {(at P1 Bo), (at A1 Bo)} v2, G0, v0 a7, a1, a8 s2 = {(at P1 Wa), (at A1 Wa)} v1, G0, v0 a9, a10, a8 s3 = {(at P1 NY), (at A1 NY)} v0, G2, v2 a2, a3, a4 s3 = {(at P1 NY), (at A1 NY)} v0, G1, v1 a2, a5, a6 a1 : (fly A1 Bo NY) a2 : (board P1 A1 NY) a3 : (fly A1 NY Bo) a4 : (debark P1 A1 Bo) a5 : (fly A1 NY Wa) a6 : (debark P1 A1 Wa) a7 : (board P1 A1 Bo) a8 : (debark P1 A1 NY) a9 : (board P1 A1 Wa) a10 : (fly A1 Wa NY)

The constructed plans produce the preferred final states s1 and s2 that are already in State(v1) and State(v2), respectively

43 / 50

SLIDE 119

A Planning-based Algorithm: Backtracking

Planning for a transition can fail! (too hard or no plan exists) If for a pair s, v and a transition outgoing from v no realizing plan can be computed from s, then:

44 / 50

SLIDE 120

A Planning-based Algorithm: Backtracking

Planning for a transition can fail! (too hard or no plan exists) If for a pair s, v and a transition outgoing from v no realizing plan can be computed from s, then:

1 State s become forbidden as a resulting state of any plan π realizing any

transition t to v (tabu states).

2 Plan π for t is regenerated producing a state new s′ = s for v (backtracking). 3 The current realization function and list Open are adequately updated.

44 / 50

SLIDE 121

Implementation and Experiments

Approaches implemented and tested with 6 different program structures, 8 environments, and 3 incorporated planners (LAMA, LPG, HPlan-P) Overall more than 1200 planning programs (each with 1000sec CPU-time limit)

General observations

45 / 50

SLIDE 122

Implementation and Experiments

Approaches implemented and tested with 6 different program structures, 8 environments, and 3 incorporated planners (LAMA, LPG, HPlan-P) Overall more than 1200 planning programs (each with 1000sec CPU-time limit)

General observations

1 The more general approach via reactive synthesis is much slower, as

expected.

Used TLV and NuGAT (based on NuSMV model checker).

2 The planning-based approach performs generally well (CPU time and space). 3 For program structures with cycles:

When the same program transition requires multiple plans, plan adaptation is

much faster than generation from scratch.

Planning with preferred goal states is very useful.

4 Still poor performance when the realization algorithm incurs into many

deadends (backtracks).

45 / 50

SLIDE 123

Examples of Experimental Results (ZenoTravel Domain)

Sequence of binary chains: Random sparse graph:

46 / 50

SLIDE 124

Global Dead-end Reasoning [IJCAI’17]

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4: achieve MyLoc(pub) G5: achieve MyLoc(home)

while maintaining ¬Driven

Question: What would happen if while planning to solve goal G1, we “hit” a state s that is a dead-end for goal G3? What if s is a dead-end for goal G5?

47 / 50

SLIDE 125

Global Dead-end Reasoning [IJCAI’17]

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4: achieve MyLoc(pub) G5: achieve MyLoc(home)

while maintaining ¬Driven

Question: What would happen if while planning to solve goal G1, we “hit” a state s that is a dead-end for goal G3? What if s is a dead-end for goal G5? Answer: We can just prune that path for goal G1!

47 / 50

SLIDE 126

Global Dead-end Reasoning [IJCAI’17]

t0 t1 t2 G1 : achieve MyLoc(dept) while maintaining ¬Fuel(empty) G2: achieve MyLoc(home) ∧ CarLoc(home)

while maintaining ¬Fuel(empty)

G3:

achieve MyLoc(pub) while maintaining ¬Fuel(empty)

G4: achieve MyLoc(pub) G5: achieve MyLoc(home)

while maintaining ¬Driven

Question: What would happen if while planning to solve goal G1, we “hit” a state s that is a dead-end for goal G3? What if s is a dead-end for goal G5? Answer: We can just prune that path for goal G1! In IJCAI’07 paper (with Lukas Chrpa & Nir Lipovetzky):

1 Incorporate future dead-ends (“traps”) in each planning transition. 2 Propose to execute APPs online. 3 Evaluate, empirically, its impact in offline and online versionf of APP!

47 / 50

SLIDE 127

Outline

General Motivation and Goal Brief Introduction to AI planning Planning Programs to Specify and Control Agents Behaviour Building Planning Programs: Solutions to the Realization Problem LTL Synthesis Planning-based approach Conclusions

48 / 50

SLIDE 128

Conclusions

1 A novel paradigm for programming task-oriented behaviours of agents:

Mix of flexible declarative (tasks as goals) and high-level procedural

(“know-how” goal networks) specifications

Supports external (run time) interactions/decisions.
Supports continuous execution and control.

2 Two computational techniques: reactive synthesis & automated planning. 3 High worst-case computational complexity, but works well using automated

planning techniques (deterministic domains).

4 Future work includes:

A planning-base algorithm for non-deterministic domains.
Handling planning domains with many deadends.
Dealing with the realization quality.
Dynamic planning programs.
... yours?

49 / 50

SLIDE 129

Thanks

Thank you for your attention!

. . . and thanks to my collaborators: Giuseppe De Giacomo Fabio Patrizi Alessandro Saetti Alfonso Gerevini

50 / 50

SLIDE 130

Main References

Giuseppe De Giacomo, Alfonso Gerevini, Fabio Patrizi, Alessandro Saetti, and Sebastian Sardina. Agent planning programs. Artificial Intelligence, 231:64–106, 2016. Giuseppe De Giacomo, Fabio Patrizi, and Sebastian Sardina. Agent programming via planning programs. In Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pages 491–498, 2010. Alfonso Gerevini, Fabio Patrizi, and Alessandro Saetti. An effective approach to realizing planning programs. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), pages 323–326, 2011. Giuseppe De Giacomo, Fabio Patrizi, and Sebastian Sardina. Automatic behavior composition synthesis. Artificial Intelligence, 196:106–142, 2013. Luk´ aˇ s Chrpa and Nir Lipovetzky and Seabstian Sardina. Handling non-local dead-ends in Agent Planning Programs. In Proc. of IJCAI, to appear, 2017.

51 / 50

SLIDE 131

Reduction to LTL Synthesis

LTL Synthesis [Pnueli & Rosner ’89]

Given a model E of the environment and an LTL formula specification φ, find a controller C such that E × C | = φ.

E and C are generally automata.
φ a temporal formula: “always p”, “eventually q”, “always eventually q ∧ q”
E × C means the evolution of E when constrained by C.

Agent Planning Programs via LTL synthesis [AAMAS’10, AIJ’16]

The environment/PDDL model is encoded into E.
Formula φ states that:
Always the current maintenance goal is true.
Eventually the current goal is achieved.
The controller C will deploy plans per goal-transition request.

52 / 50

SLIDE 132

Agent Planning Programs via GR(1) Synthesis [AAMAS’10]

LTL realizability is 2EXPTIME-complete for general LTL formulas! :-(

(Notice that satisfiability or validity for LTL is PSPACE-complete)

But: Several interesting LTL patterns have been studied.

53 / 50

SLIDE 133

Agent Planning Programs via GR(1) Synthesis [AAMAS’10]

LTL realizability is 2EXPTIME-complete for general LTL formulas! :-(

(Notice that satisfiability or validity for LTL is PSPACE-complete)

But: Several interesting LTL patterns have been studied.
“General Reactivity (1)” formulas shape: ϕass → ψreq
Synthesis can be reduced to µ-calculus model checking of a game structure!
Can exploit MC symbolic techniques (OBDD)!
Synthesis is polynomial in the size of the formula and the game structure.

53 / 50

SLIDE 134

Agent Planning Programs via GR(1) Synthesis [AAMAS’10]

LTL realizability is 2EXPTIME-complete for general LTL formulas! :-(

(Notice that satisfiability or validity for LTL is PSPACE-complete)

But: Several interesting LTL patterns have been studied.
“General Reactivity (1)” formulas shape: ϕass → ψreq
Synthesis can be reduced to µ-calculus model checking of a game structure!
Can exploit MC symbolic techniques (OBDD)!
Synthesis is polynomial in the size of the formula and the game structure.

The Good News!

Solving agent planning programs can be done with GR(1) synthesis!
Can be solved by model checking game structures.
Is polynomial in the states of the planning domain.
Is EXPTIME in the representation as for model checking.
Can be practically implemented in model checking-based LTL synthesis such

as TLV, NuGAT, Anzu.

53 / 50

SLIDE 135

Reduction to LTL Synthesis [Pnueli & Rosner ’89]

Propositional variables:

PE: environment variables
PS: system variables

Game:

Environment: chooses from 2PE
System: chooses from 2PS

54 / 50

SLIDE 136

Reduction to LTL Synthesis [Pnueli & Rosner ’89]

Propositional variables:

PE: environment variables
PS: system variables

Game:

Environment: chooses from 2PE
System: chooses from 2PS

Infinite play:

pe0, pe1, pe2, . . .
ps0, ps1, ps2, . . .

Infinite behavior: pe0 ∪ ps0, pe1 ∪ ps1, pe2 ∪ ps2, . . .

54 / 50

SLIDE 137

Reduction to LTL Synthesis [Pnueli & Rosner ’89]

Propositional variables:

PE: environment variables
PS: system variables

Game:

Environment: chooses from 2PE
System: chooses from 2PS

Infinite play:

pe0, pe1, pe2, . . .
ps0, ps1, ps2, . . .

Infinite behavior: pe0 ∪ ps0, pe1 ∪ ps1, pe2 ∪ ps2, . . . Specification: LTL formula on PE ∪ PS Win: behavior | = spec Strategy: Function f : (2PE)∗ → 2PS

54 / 50

SLIDE 138

Encoding in LTL

LTL Formula Φ to be realized/synthesized: Init ∧ (TransD ∧ TransT ) − → Fulfill ∧ ♦Last

finish plans infinitely often

55 / 50

SLIDE 139

Encoding in LTL

LTL Formula Φ to be realized/synthesized: Init ∧ (TransD ∧ TransT ) − → Fulfill ∧ ♦Last

finish plans infinitely often

1 Dynamic domain (TransD):

MyLoc(x) ∧ Close(x, y) ∧ walk(y) − → MyLoc(y) CarLoc(x) ∧ walk(y) − → CarLoc(x)

(dynamics of walking)

2 Planning program (TransT ): 3 Fulfillment of goals (Fulfill):

55 / 50

SLIDE 140

Encoding in LTL

LTL Formula Φ to be realized/synthesized: Init ∧ (TransD ∧ TransT ) − → Fulfill ∧ ♦Last

finish plans infinitely often

1 Dynamic domain (TransD):

MyLoc(x) ∧ Close(x, y) ∧ walk(y) − → MyLoc(y) CarLoc(x) ∧ walk(y) − → CarLoc(x)

(dynamics of walking)

2 Planning program (TransT ):

t ∧ “achieve φ while maintaining ψ” ∧ ¬Last − → t ∧ “achieve φ while maintaining ψ”

(target request propagation)

t ∧ “achieve φ while maintaining ψ” ∧ Last − → t′

(target advance)

3 Fulfillment of goals (Fulfill):

55 / 50

SLIDE 141

Encoding in LTL

LTL Formula Φ to be realized/synthesized: Init ∧ (TransD ∧ TransT ) − → Fulfill ∧ ♦Last

finish plans infinitely often

1 Dynamic domain (TransD):

MyLoc(x) ∧ Close(x, y) ∧ walk(y) − → MyLoc(y) CarLoc(x) ∧ walk(y) − → CarLoc(x)

(dynamics of walking)

2 Planning program (TransT ):

t ∧ “achieve φ while maintaining ψ” ∧ ¬Last − → t ∧ “achieve φ while maintaining ψ”

(target request propagation)

t ∧ “achieve φ while maintaining ψ” ∧ Last − → t′

(target advance)

3 Fulfillment of goals (Fulfill):

“achieve φ while maintaining ψ” ∧ Last − → φ

(achievement goal is satisfied)

“achieve φ while maintaining ψ” − → ψ

(maintenance goal is respected)

55 / 50

SLIDE 142

LTL synthesis is 2EXPTIME-complete!

Critics: 2EXPTIME is a horrible complexity! Response:

2EXPTIME is just worst-case complexity.
Doubly exponential bound on the size of the smallest strategy
thus, hand design cannot do better in the worst case... :-(

Critics: algorithms not ready for practical implementation Response:

Very true! More research needed
But good algorithms exist for special cases!

56 / 50

SLIDE 143

GR(1) Formulas [Piterman, Pnueli, Sa’ar 2006]

LTL realizability is 2EXPTIME-complete for general LTL formulas!

(Notice that satisfiability or validity for LTL is PSPACE-complete)

Several interesting LTL patterns have been studied.

57 / 50

SLIDE 144

GR(1) Formulas [Piterman, Pnueli, Sa’ar 2006]

LTL realizability is 2EXPTIME-complete for general LTL formulas!

(Notice that satisfiability or validity for LTL is PSPACE-complete)

Several interesting LTL patterns have been studied.
“General Reactivity (1)” formulas: ϕass → ψreq of a special syntactic shape.

ϕass = Init ∧ (TransD ∧ TransT ); ψreq = Fulfill ∧ ♦Last. Variables to control: {a| a is a domain action} ∪ {Last}

57 / 50

SLIDE 145

GR(1) Formulas [Piterman, Pnueli, Sa’ar 2006]

LTL realizability is 2EXPTIME-complete for general LTL formulas!

(Notice that satisfiability or validity for LTL is PSPACE-complete)

Several interesting LTL patterns have been studied.
“General Reactivity (1)” formulas: ϕass → ψreq of a special syntactic shape.

ϕass = Init ∧ (TransD ∧ TransT ); ψreq = Fulfill ∧ ♦Last. Variables to control: {a| a is a domain action} ∪ {Last}

Good news:
Synthesis can be reduced to µ-calculus model checking a game structure!
Can exploit MC symbolic techniques (OBDD)!
Realizability is polynomial in the size of the formula and the game structure.

57 / 50

SLIDE 146

Results

Agent planning programs: programming with declarative goals.
Can be reduced to LTL synthesis of generalized reactivity formulas GR(1).
Can be solved by model checking game structures.
Is polynomial in the states of the planning domain.
Is EXPTIME in the representation as for model checking.
Can be practically implemented in model checking-based LTL synthesis such

as TLV, NuGAT, Anzu.

58 / 50