Optimal Planning and Shortcut Learning: An Unfulfilled Promise Erez - - PowerPoint PPT Presentation

optimal planning and shortcut learning an unfulfilled
SMART_READER_LITE
LIVE PREVIEW

Optimal Planning and Shortcut Learning: An Unfulfilled Promise Erez - - PowerPoint PPT Presentation

Background Learning Shortcut Rules Empirical Evaluation Optimal Planning and Shortcut Learning: An Unfulfilled Promise Erez Karpas Carmel Domshlak Faculty of Industrial Engineering and Management, Technion Israel Institute of Technology


slide-1
SLIDE 1

Background Learning Shortcut Rules Empirical Evaluation

Optimal Planning and Shortcut Learning: An Unfulfilled Promise

Erez Karpas Carmel Domshlak

Faculty of Industrial Engineering and Management, Technion — Israel Institute of Technology

May 28, 2013

slide-2
SLIDE 2

Background Learning Shortcut Rules Empirical Evaluation

Outline

1

Background

2

Learning Shortcut Rules

3

Empirical Evaluation

slide-3
SLIDE 3

Background Learning Shortcut Rules Empirical Evaluation

STRIPS

A STRIPS planning problem with action costs is a 5-tuple

Π = P,s0,G,A,C

P is a set of boolean propositions s0 ⊆ P is the initial state G ⊆ P is the goal A is a set of actions. Each action is a triple a = pre(a),add(a),del(a) C : A → R0+ assigns a cost to each action

Applying action sequence ρ = a0,a1,...,an at state s leads to s[[ρ]] The cost of action sequence ρ is ∑n

i=0 C(ai)

slide-4
SLIDE 4

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects

Chicken logic Why did the chicken cross the road?

slide-5
SLIDE 5

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects

Chicken logic Why did the chicken cross the road? To get to the other side

slide-6
SLIDE 6

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects

Chicken logic Why did the chicken cross the road? To get to the other side Observation Every action along an optimal plan is there for a reason Achieve a precondition for another action Achieve a goal

slide-7
SLIDE 7

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects — Example

A B

  • t1

t2 If load-o-t1 is the beginning of an optimal plan, then:

There must be a reason for applying load-o-t1 load-o-t1 achieves o-in-t1 Any continuation of this path to an optimal plan must use some action which requires o-in-t1

slide-8
SLIDE 8

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects — Example

A B

  • t1

t2 A B

  • t1

t2 load-o-t1 If load-o-t1 is the beginning of an optimal plan, then:

There must be a reason for applying load-o-t1 load-o-t1 achieves o-in-t1 Any continuation of this path to an optimal plan must use some action which requires o-in-t1

slide-9
SLIDE 9

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects — Example

A B

  • t1

t2 A B

  • t1

t2 load-o-t1 If load-o-t1 is the beginning of an optimal plan, then:

There must be a reason for applying load-o-t1 load-o-t1 achieves o-in-t1 Any continuation of this path to an optimal plan must use some action which requires o-in-t1

slide-10
SLIDE 10

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects — Example

A B

  • t1

t2 A B

  • t1

t2 load-o-t1 If load-o-t1 is the beginning of an optimal plan, then:

There must be a reason for applying load-o-t1 load-o-t1 achieves o-in-t1 Any continuation of this path to an optimal plan must use some action which requires o-in-t1

slide-11
SLIDE 11

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects — Example

A B

  • t1

t2 A B

  • t1

t2 load-o-t1 If load-o-t1 is the beginning of an optimal plan, then:

There must be a reason for applying load-o-t1 load-o-t1 achieves o-in-t1 Any continuation of this path to an optimal plan must use some action which requires o-in-t1

slide-12
SLIDE 12

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects — Example

A B

  • t1

t2 A B

  • t1

t2 load-o-t1 If load-o-t1 is the beginning of an optimal plan, then:

There must be a reason for applying load-o-t1 load-o-t1 achieves o-in-t1 Any continuation of this path to an optimal plan must use some action which requires o-in-t1

slide-13
SLIDE 13

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects — Formal Definition

Intended Effects Given a path π = a0,a1,...an a set of propositions X ⊆ s0 [[π]] is an intended effect of π iff there exists a path π′ such that π ·π′ is an

  • ptimal plan and π′ consumes exactly X, i.e.,

(p ∈ X iff there is a causal link ai,p,aj in π ·π′, with ai ∈ π and aj ∈ π′). IE(π) — the set of all intended effect of π

slide-14
SLIDE 14

Background Learning Shortcut Rules Empirical Evaluation

Intended Effects: Complexity

Hard to Find Exactly It is P-SPACE Hard to find the intended effects of path π. Sound Approximation We can use supersets of IE(π) to derive constraints about any continuation of π.

slide-15
SLIDE 15

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects

Intuition X can not be an intended effect of π if there is a cheaper way to achieve X s0

slide-16
SLIDE 16

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects

Intuition X can not be an intended effect of π if there is a cheaper way to achieve X s0 s

π

slide-17
SLIDE 17

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects

Intuition X can not be an intended effect of π if there is a cheaper way to achieve X s0 s

π

s′

π′

C(π′) < C(π)

slide-18
SLIDE 18

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects

Intuition X can not be an intended effect of π if there is a cheaper way to achieve X s0 s

π

s′

π′

C(π′) < C(π) Any continuation of π into an optimal plan must use some fact in s \ s′

slide-19
SLIDE 19

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects: Example

A B t1 t2

π =

  • t2-at-B can not be an intended effect of π — we must use t1-at-B

t1-at-B can not be an intended effect of π — we must use t2-at-B

slide-20
SLIDE 20

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects: Example

A B t1 t2

π = drive-t1-A-B

  • t2-at-B can not be an intended effect of π — we must use t1-at-B

t1-at-B can not be an intended effect of π — we must use t2-at-B

slide-21
SLIDE 21

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects: Example

A B t1 t2

π = drive-t1-A-B ,drive-t2-A-B

t2-at-B can not be an intended effect of π — we must use t1-at-B t1-at-B can not be an intended effect of π — we must use t2-at-B

slide-22
SLIDE 22

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects: Example

A B t1 t2

π = drive-t1-A-B ,drive-t2-A-B π′ = drive-t2-A-B

t2-at-B can not be an intended effect of π — we must use t1-at-B t1-at-B can not be an intended effect of π — we must use t2-at-B

slide-23
SLIDE 23

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects: Example

A B t1 t2

π = drive-t1-A-B ,drive-t2-A-B π′ = drive-t2-A-B

t2-at-B can not be an intended effect of π — we must use t1-at-B t1-at-B can not be an intended effect of π — we must use t2-at-B

slide-24
SLIDE 24

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects: Example

A B t1 t2

π = drive-t1-A-B ,drive-t2-A-B π′ = drive-t2-A-B

t2-at-B can not be an intended effect of π — we must use t1-at-B

π′′ = drive-t1-A-B

t1-at-B can not be an intended effect of π — we must use t2-at-B

slide-25
SLIDE 25

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects: Example

A B t1 t2

π = drive-t1-A-B ,drive-t2-A-B π′ = drive-t2-A-B

t2-at-B can not be an intended effect of π — we must use t1-at-B

π′′ = drive-t1-A-B

t1-at-B can not be an intended effect of π — we must use t2-at-B

slide-26
SLIDE 26

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts and Approximate Intended Effects: Example

A B t1 t2

π = drive-t1-A-B ,drive-t2-A-B π′ = drive-t2-A-B

t2-at-B can not be an intended effect of π — we must use t1-at-B

π′′ = drive-t1-A-B

t1-at-B can not be an intended effect of π — we must use t2-at-B We must use both t1-at-B and t2-at-B

slide-27
SLIDE 27

Background Learning Shortcut Rules Empirical Evaluation

Finding Shortcuts

Where do the shortcuts come from? They can be dynamically generated for each path Our previous paper used the causal structure of the current path — a graph whose nodes are action occurrences, with an edge from ai to aj if there is a causal link where ai provides some proposition for aj Previous shortcut rules attempted to remove some actions, according to the the causal structure, to obtain a shortcut

slide-28
SLIDE 28

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts Example

Causal Structure A B C t1 t2

π =

slide-29
SLIDE 29

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts Example

Causal Structure A B C t1 t2 drive-t1-A-B

π = drive-t1-A-B

slide-30
SLIDE 30

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts Example

Causal Structure A B C t1 t2 drive-t1-A-B drive-t2-A-B

π = drive-t1-A-B ,drive-t2-A-B

slide-31
SLIDE 31

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts Example

Causal Structure A B C t1 t2 drive-t1-A-B drive-t1-B-C drive-t2-A-B

π = drive-t1-A-B ,drive-t2-A-B ,drive-t1-B-C

slide-32
SLIDE 32

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts Example

Causal Structure A B C t1 t2 drive-t1-A-B drive-t1-B-C drive-t1-C-A drive-t2-A-B

π = drive-t1-A-B ,drive-t2-A-B ,drive-t1-B-C ,drive-t1-C-A

slide-33
SLIDE 33

Background Learning Shortcut Rules Empirical Evaluation

Shortcuts Example

Causal Structure A B C t1 t2 drive-t1-A-B drive-t1-B-C drive-t1-C-A drive-t2-A-B

π = drive-t1-A-B ,drive-t2-A-B ,drive-t1-B-C ,drive-t1-C-A π′ = drive-t2-A-B

slide-34
SLIDE 34

Background Learning Shortcut Rules Empirical Evaluation

Shortcut Rules that Add Actions

The previous shortcut rules only remove actions from π A B C t1

π =

  • The previous shortcut rules can not generate the shortcut

π′ = drive-t1-A-C

slide-35
SLIDE 35

Background Learning Shortcut Rules Empirical Evaluation

Shortcut Rules that Add Actions

The previous shortcut rules only remove actions from π A B C t1

π = drive-t1-A-B

  • The previous shortcut rules can not generate the shortcut

π′ = drive-t1-A-C

slide-36
SLIDE 36

Background Learning Shortcut Rules Empirical Evaluation

Shortcut Rules that Add Actions

The previous shortcut rules only remove actions from π A B C t1

π = drive-t1-A-B ,drive-t1-B-C

The previous shortcut rules can not generate the shortcut

π′ = drive-t1-A-C

slide-37
SLIDE 37

Background Learning Shortcut Rules Empirical Evaluation

Shortcut Rules that Add Actions

The previous shortcut rules only remove actions from π A B C t1

π = drive-t1-A-B ,drive-t1-B-C

The previous shortcut rules can not generate the shortcut

π′ = drive-t1-A-C

slide-38
SLIDE 38

Background Learning Shortcut Rules Empirical Evaluation

Outline

1

Background

2

Learning Shortcut Rules

3

Empirical Evaluation

slide-39
SLIDE 39

Background Learning Shortcut Rules Empirical Evaluation

Learning Shortcut Rules that Add Actions

We developed techniques for learning shortcut rules that add new actions Spoiler These shortcut rules do not improve performance

slide-40
SLIDE 40

Background Learning Shortcut Rules Empirical Evaluation

When to Learn

Recall that a good shortcut π′ achieves almost everything the

  • riginal path π did

In the extreme: π′ achieves everything π achieved The search algorithm detects when this happens — when a new path to an existing state is detected We learn whenever we have two paths reaching the same state, regardless of whether the new path is cheaper or not

slide-41
SLIDE 41

Background Learning Shortcut Rules Empirical Evaluation

How to Learn

The input to our learning algorithm is two paths reaching the same state Instead of looking at the state as a whole, we look at individual facts, and the causal structure leading to each fact

π π′

drive-t1-A-B

drive-t2-A-B drive-t1-B-C

drive-t1-A-C

drive-t2-A-B

drive-t1-A-B drive-t1-B-C t1-at-C drive-t2-A-B t2-at-B drive-t1-A-C t1-at-C drive-t2-A-B t2-at-B

slide-42
SLIDE 42

Background Learning Shortcut Rules Empirical Evaluation

How to Learn

The input to our learning algorithm is two paths reaching the same state Instead of looking at the state as a whole, we look at individual facts, and the causal structure leading to each fact

π π′

drive-t1-A-B

drive-t2-A-B drive-t1-B-C

drive-t1-A-C

drive-t2-A-B

drive-t1-A-B drive-t1-B-C t1-at-C drive-t2-A-B t2-at-B drive-t1-A-C t1-at-C drive-t2-A-B t2-at-B

slide-43
SLIDE 43

Background Learning Shortcut Rules Empirical Evaluation

How to Learn

The input to our learning algorithm is two paths reaching the same state Instead of looking at the state as a whole, we look at individual facts, and the causal structure leading to each fact

π π′

drive-t1-A-B

drive-t2-A-B drive-t1-B-C

drive-t1-A-C

drive-t2-A-B

drive-t1-A-B drive-t1-B-C t1-at-C drive-t2-A-B t2-at-B drive-t1-A-C t1-at-C drive-t2-A-B t2-at-B

slide-44
SLIDE 44

Background Learning Shortcut Rules Empirical Evaluation

Shortcut Rules

From the pair of partial paths reaching each fact, we learn a new shortcut rule Shortcut rules are used when a path π is evaluated, as follows:

1

Each shortcut rule is checked for applicability

2

If it is applicable, a set of shortcuts is generated

3

From each such shortcut, an ∃-opt landmark is derived

Three types of shortcut rules, which differ in the details of these steps

slide-45
SLIDE 45

Background Learning Shortcut Rules Empirical Evaluation

Concrete Shortcut Rule

Rule

drive-t1-A-B, drive-t1-B-C ← drive-t1-A-C

slide-46
SLIDE 46

Background Learning Shortcut Rules Empirical Evaluation

Concrete Shortcut Rule

Rule

drive-t1-A-B, drive-t1-B-C ← drive-t1-A-C

Example 1

π = drive-t2-C-A, drive-t1-A-B, drive-t1-B-C, drive-t2-A-B

slide-47
SLIDE 47

Background Learning Shortcut Rules Empirical Evaluation

Concrete Shortcut Rule

Rule

drive-t1-A-B, drive-t1-B-C ← drive-t1-A-C

Example 1

π = drive-t2-C-A, drive-t1-A-B, drive-t1-B-C, drive-t2-A-B

slide-48
SLIDE 48

Background Learning Shortcut Rules Empirical Evaluation

Concrete Shortcut Rule

Rule

drive-t1-A-B, drive-t1-B-C ← drive-t1-A-C

Example 1

π = drive-t2-C-A, drive-t1-A-B, drive-t1-B-C, drive-t2-A-B drive-t2-C-A, drive-t1-A-C

slide-49
SLIDE 49

Background Learning Shortcut Rules Empirical Evaluation

Concrete Shortcut Rule

Rule

drive-t1-A-B, drive-t1-B-C ← drive-t1-A-C

Example 1

π = drive-t2-C-A, drive-t1-A-B, drive-t1-B-C, drive-t2-A-B drive-t2-C-A, drive-t1-A-C drive-t2-C-A, drive-t1-A-C, drive-t2-A-B

slide-50
SLIDE 50

Background Learning Shortcut Rules Empirical Evaluation

Concrete Shortcut Rule

Rule

drive-t1-A-B, drive-t1-B-C ← drive-t1-A-C

Example 1

π = drive-t2-C-A, drive-t1-A-B, drive-t1-B-C, drive-t2-A-B drive-t2-C-A, drive-t1-A-C drive-t2-C-A, drive-t1-A-C, drive-t2-A-B

Example 2

π = drive-t2-C-A, drive-t1-A-B, drive-t2-A-B, drive-t1-B-C

slide-51
SLIDE 51

Background Learning Shortcut Rules Empirical Evaluation

Concrete Shortcut Rule

Rule

drive-t1-A-B, drive-t1-B-C ← drive-t1-A-C

Example 1

π = drive-t2-C-A, drive-t1-A-B, drive-t1-B-C, drive-t2-A-B drive-t2-C-A, drive-t1-A-C drive-t2-C-A, drive-t1-A-C, drive-t2-A-B

Example 2

π = drive-t2-C-A, drive-t1-A-B, drive-t2-A-B, drive-t1-B-C

slide-52
SLIDE 52

Background Learning Shortcut Rules Empirical Evaluation

Concrete Shortcut Rule

Rule

drive-t1-A-B, drive-t1-B-C ← drive-t1-A-C

Example 1

π = drive-t2-C-A, drive-t1-A-B, drive-t1-B-C, drive-t2-A-B drive-t2-C-A, drive-t1-A-C drive-t2-C-A, drive-t1-A-C, drive-t2-A-B

Example 2

π = drive-t2-C-A, drive-t1-A-B, drive-t2-A-B, drive-t1-B-C

Not applicable

slide-53
SLIDE 53

Background Learning Shortcut Rules Empirical Evaluation

Unordered Shortcut Rule

Rule

{drive-t1-A-B, drive-t1-B-C} ← {drive-t1-A-C}

Example 1

π = drive-t2-C-A, drive-t1-A-B, drive-t1-B-C, drive-t2-A-B drive-t2-C-A, drive-t1-A-C drive-t2-C-A, drive-t1-A-C, drive-t2-A-B

Example 2

π = drive-t2-C-A, drive-t1-A-B, drive-t2-A-B, drive-t1-B-C drive-t2-C-A, drive-t1-A-C drive-t2-C-A, drive-t1-A-C, drive-t2-A-B

slide-54
SLIDE 54

Background Learning Shortcut Rules Empirical Evaluation

Generalized Shortcut Rule

Rule

{drive-T-X-Y, drive-T-Y-Z} ← {drive-T-X-Z}

Example 1

π = drive-t1-B-C, drive-t2-A-B, drive-t1-C-A drive-t1-B-A, drive-t2-A-B

slide-55
SLIDE 55

Background Learning Shortcut Rules Empirical Evaluation

The Utility Problem

Checking if a shortcut rule is applicable takes time Sometimes, this applicabilty check says the rule is not applicable This is the well known utility problem We address it by keeping counts of how many times each rule was checked for applicability, and how many times it was applicable Low utlity shortcut rules are discarded

slide-56
SLIDE 56

Background Learning Shortcut Rules Empirical Evaluation

Outline

1

Background

2

Learning Shortcut Rules

3

Empirical Evaluation

slide-57
SLIDE 57

Background Learning Shortcut Rules Empirical Evaluation

Coverage

Shortcut Rules Solved Problems none 661 concrete 609 unordered 585 generalized 487 None of the shortcut rules solve more problems than no shortcuts in any domain

slide-58
SLIDE 58

Background Learning Shortcut Rules Empirical Evaluation

Expansions

Shortcut Rules Total Expanded States none 3785517 concrete 3758042 unordered 3736052 generalized 3777860 The reduction in total number of expanded state for commonly solved problems is about 1%

slide-59
SLIDE 59

Background Learning Shortcut Rules Empirical Evaluation

Discussions

Possible reasons for failure:

Concrete shortcut rules are too strict Unordered and generalized shortcut rules generate too many possible shortcuts, and we only look at some of them The base heuristic (∃-opt and regular landmarks) is already very powerful

Future work:

Exploit the partial order information from the causal structure Smarter ways of applying unordered/generalized shortcut rules Inter-problem learning with generalized shortcut rules Insert your idea here

slide-60
SLIDE 60

Background Learning Shortcut Rules Empirical Evaluation

Discussions

Possible reasons for failure:

Concrete shortcut rules are too strict Unordered and generalized shortcut rules generate too many possible shortcuts, and we only look at some of them The base heuristic (∃-opt and regular landmarks) is already very powerful

Future work:

Exploit the partial order information from the causal structure Smarter ways of applying unordered/generalized shortcut rules Inter-problem learning with generalized shortcut rules Insert your idea here

slide-61
SLIDE 61

Background Learning Shortcut Rules Empirical Evaluation

Thank You