Planning and Optimization G1. Heuristic Search: AO & LAO Part I - - PowerPoint PPT Presentation

planning and optimization
SMART_READER_LITE
LIVE PREVIEW

Planning and Optimization G1. Heuristic Search: AO & LAO Part I - - PowerPoint PPT Presentation

Planning and Optimization G1. Heuristic Search: AO & LAO Part I Gabriele R oger and Thomas Keller Universit at Basel December 3, 2018 A with Backward Induction Heuristic Search Motivation Summary Content of this Course


slide-1
SLIDE 1

Planning and Optimization

  • G1. Heuristic Search: AO∗ & LAO∗ Part I

Gabriele R¨

  • ger and Thomas Keller

Universit¨ at Basel

December 3, 2018

slide-2
SLIDE 2

Heuristic Search Motivation A∗ with Backward Induction Summary

Content of this Course

Planning Classical Tasks Progression/ Regression Complexity Heuristics Probabilistic MDPs Blind Methods Heuristic Search Monte-Carlo Methods

slide-3
SLIDE 3

Heuristic Search Motivation A∗ with Backward Induction Summary

Heuristic Search

slide-4
SLIDE 4

Heuristic Search Motivation A∗ with Backward Induction Summary

Heuristic Search: Recap

Heuristic Search Algorithms Heuristic search algorithms use heuristic functions to (partially or fully) determine the order of node expansion. (From Lecture 15 of the AI course last semester)

slide-5
SLIDE 5

Heuristic Search Motivation A∗ with Backward Induction Summary

Best-first Search: Recap

Best-first Search A best-first search is a heuristic search algorithm that evaluates search nodes with an evaluation function f and always expands a node n with minimal f (n) value. (From Lecture 15 of the AI course last semester)

slide-6
SLIDE 6

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗Search: Recap

A∗Search A∗ is the best-first search algorithm with evaluation function f (n) = g(n) + h(n.state). (From Lecture 15 of the AI course last semester)

slide-7
SLIDE 7

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗ Search (With Reopening): Example

s0

18

s1

12

s2

14

s3

12

s4

6

s5

4

s6 8 5 10 8 4 10 6 8 8

slide-8
SLIDE 8

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗ Search (With Reopening): Example

s0

18

s1

12

s2

14

s3

12

s4

6

s5

4

s6 8 5 10 8 4 10 6 8 8 s0

0 + 18

s1

8 + 12

s2

5 + 14

s5

15 + 4

s6

23 + 0

s3

18 + 12

s4

16 + 6

s5

12 + 4

s6

20 + 0

8 5 108 4 10 10 8

slide-9
SLIDE 9

Heuristic Search Motivation A∗ with Backward Induction Summary

Motivation

slide-10
SLIDE 10

Heuristic Search Motivation A∗ with Backward Induction Summary

From A∗to AO∗

Equivalent of A∗ in (acyclic) probabilistic planning is AO∗ Even though we know A∗ and foundations of probabilistic planning, the generalization is far from straightforward:

e.g., in A∗, g(n) is cost from root n0 to n equivalent in AO∗ is expected cost from n0 to n

slide-11
SLIDE 11

Heuristic Search Motivation A∗ with Backward Induction Summary

Expected Cost to Reach State

Consider the following expansion of state s0:

s0 a0 a1 s1 100 s2 1 s3 2 s4 2

1 1 .99 .01 .5 .5 Expected cost to reach any of the leaves is infinite or undefined (neither is reached with probability 1).

slide-12
SLIDE 12

Heuristic Search Motivation A∗ with Backward Induction Summary

From A∗to AO∗

Equivalent of A∗ in (acyclic) probabilistic planning is AO∗ Even though we know A∗ and foundations of probabilistic planning, the generalization is far from straightforward:

e.g., in A∗, g(n) is cost from root n0 to n equivalent in AO∗ is expected cost from n0 to n alternative could be expected cost from n0 to n given n is reached

slide-13
SLIDE 13

Heuristic Search Motivation A∗ with Backward Induction Summary

Expected Cost to Reach State Given It Is Reached

Consider the following expansion of state s0:

s0 a0 a1 s1 100 s2 1 s3 2 s4 2

1 1 .99 .01 .5 .5 Conditional probability is misleading: s2 would be expanded, which isn’t part of the best looking option

slide-14
SLIDE 14

Heuristic Search Motivation A∗ with Backward Induction Summary

The Best Looking Action

Consider the following expansion of state s0:

s0 a0 a1 s1 100 s2 1 s3 2 s4 2

1 1 .99 .01 .5 .5 Conditional probability is misleading: s2 would be expanded, which isn’t part of the best looking option: with state-value estimate ˆ V (s) := h(s), greedy action a ˆ

V (s) = a1

slide-15
SLIDE 15

Heuristic Search Motivation A∗ with Backward Induction Summary

Expansion in Best Solution Graph

AO∗ uses different idea: AO∗ keeps track of best solution graph AO∗ expands a state that can be reached from s0 by only applying greedy actions ⇒ no g-value equivalent required

slide-16
SLIDE 16

Heuristic Search Motivation A∗ with Backward Induction Summary

Expansion in Best Solution Graph

AO∗ uses different idea: AO∗ keeps track of best solution graph AO∗ expands a state that can be reached from s0 by only applying greedy actions ⇒ no g-value equivalent required Equivalent version of A∗ built on this idea can be derived ⇒ A∗ with backward induction Since change is non-trivial, we focus on A∗ variant now and generalize later to acyclic probabilistic tasks (AO∗) and probabilistic tasks in general (LAO∗)

slide-17
SLIDE 17

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗ with Backward Induction

slide-18
SLIDE 18

Heuristic Search Motivation A∗ with Backward Induction Summary

Transition Systems

A∗ with backward induction distinguishes three transition systems: The transition system T = S, L, c, T, s0, S⋆ ⇒ given implicitly The explicated graph ˆ Tt = ˆ St, L, c, ˆ Tt, s0, S⋆ ⇒ the part of T explicitly considered during search The partial solution graph ˆ T ⋆

t = ˆ

S⋆

t , L, c, ˆ

T ⋆

t , s0, S⋆

⇒ The part of ˆ Tt that contains best solution

s0

T ˆ Tt ˆ T ⋆

t

slide-19
SLIDE 19

Heuristic Search Motivation A∗ with Backward Induction Summary

Explicated Graph

Expanding a state s at time step t explicates all successors s′ ∈ succ(s) by adding them to explicated graph: ˆ Tt = ˆ St−1 ∪ succ(s), L, c, ˆ Tt−1 ∪ {s, l, s′ ∈ T}, s0, S⋆} Each explicated state is annotated with state-value estimate ˆ Vt(s) that describes estimated cost to a goal at time step t When state s′ is explicated and s′ / ∈ ˆ St−1, its state-value estimate is initialized to ˆ Vt(s′) := h(s′) We call leaf states of ˆ Tt fringe states

slide-20
SLIDE 20

Heuristic Search Motivation A∗ with Backward Induction Summary

Partial Solution Graph

The partial solution graph ˆ T ⋆

t is the subgraph of ˆ

Tt that is spanned by the smallest set of states ˆ S⋆

t that satisfies:

s0 ∈ ˆ S⋆

t

if s ∈ ˆ S⋆

t , s′ ∈ ˆ

St and s, a ˆ

Vt(s)(s), s′ ∈ ˆ

Tt, then s′ in ˆ S⋆

t

The partial solution graph forms a sequence of states s0, . . . , sn, starting with the initial state s0 and ending in the greedy fringe state sn

slide-21
SLIDE 21

Heuristic Search Motivation A∗ with Backward Induction Summary

Backward Induction

A∗ with backward induction does not maintain static open list State-value estimates determine partial solution graph Partial solution graph determines which state is expanded (Some) state-value estimates are updated in time step t by backward induction: ˆ Vt(s) = min

s,l,s′∈ ˆ Tt(s)

c(l) + ˆ Vt(s′)

slide-22
SLIDE 22

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗ with backward induction

A∗ with backward induction for classical planning task T explicate s0 while greedy fringe state s / ∈ S⋆: expand s perform backward induction of states in ˆ T ⋆

t−1 in reverse order

return ˆ T ⋆

t

slide-23
SLIDE 23

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗ with backward induction

s0

18

s1

12

s2

14

s3

12

s4

6

s5

4

s6 8 5 10 8 4 10 6 8 8 s0

18

slide-24
SLIDE 24

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗ with backward induction

s0

18

s1

12

s2

14

s3

12

s4

6

s5

4

s6 8 5 10 8 4 10 6 8 8 s0

19

s1

12

s2

14

8 5

slide-25
SLIDE 25

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗ with backward induction

s0

18

s1

12

s2

14

s3

12

s4

6

s5

4

s6 8 5 10 8 4 10 6 8 8 s0

19

s1

12

s2

14

s5

4

8 5 10

slide-26
SLIDE 26

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗ with backward induction

s0

18

s1

12

s2

14

s3

12

s4

6

s5

4

s6 8 5 10 8 4 10 6 8 8 s0

20

s1

12

s2

18

s5

8

s6 8 5 10 8

slide-27
SLIDE 27

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗ with backward induction

s0

18

s1

12

s2

14

s3

12

s4

6

s5

4

s6 8 5 10 8 4 10 6 8 8 s0

20

s1

12

s2

18

s3

12

s4

6

s5

8

s6 8 5 10 8 4 10 8

slide-28
SLIDE 28

Heuristic Search Motivation A∗ with Backward Induction Summary

A∗ with backward induction

s0

18

s1

12

s2

14

s3

12

s4

6

s5

4

s6 8 5 10 8 4 10 6 8 8 s0

20

s1

12

s2

18

s3

12

s4

6

s5

8

s6 s6 8 5 10 8 4 10 8

slide-29
SLIDE 29

Heuristic Search Motivation A∗ with Backward Induction Summary

Equivalence of A∗ and A∗ with Backward Induction

Theorem A∗ and A∗ with Backward Induction expand the same set of states if run with identical admissible heuristic h and identical tie-breaking criterion. Proof Sketch. The proof shows that there is always a unique state s in greedy fringe of A∗ with backward induction f (s) = g(s) + h(s) is minimal among all fringe states g(s) of fringe node s encoded in greedy action choices h(s) of fringe node equal to ˆ Vt(s)

slide-30
SLIDE 30

Heuristic Search Motivation A∗ with Backward Induction Summary

Summary

slide-31
SLIDE 31

Heuristic Search Motivation A∗ with Backward Induction Summary

Summary

Non-trivial to generalize A∗ to probabilistic planning For better understanding of AO∗, we change A∗ towards AO∗ Derived A∗ with backward induction, which is similar to AO∗ and expands identical states as A∗