Logic Programming and MDPs for Planning Alborz Geramifard Winter - - PowerPoint PPT Presentation

logic programming and mdps for planning
SMART_READER_LITE
LIVE PREVIEW

Logic Programming and MDPs for Planning Alborz Geramifard Winter - - PowerPoint PPT Presentation

Logic Programming and MDPs for Planning Alborz Geramifard Winter 2009 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Why do we care about


slide-1
SLIDE 1

Alborz Geramifard

Logic Programming and MDPs for Planning

Winter 2009

slide-2
SLIDE 2

MDP+ Logic + Programming MDP

Index

2

Introduction Logic Programming

slide-3
SLIDE 3

MDP+ Logic + Programming MDP

Index

2

Introduction Logic Programming

slide-4
SLIDE 4

Why do we care about planning?

3

slide-5
SLIDE 5

Sequential Decision Making

4

?

slide-6
SLIDE 6

Sequential Decision Making

4

?

slide-7
SLIDE 7

Sequential Decision Making

4

?

slide-8
SLIDE 8

Sequential Decision Making

4

?

Desired Property?

slide-9
SLIDE 9

Sequential Decision Making

4

?

Desired Property? Feasible: Logic (Focuses on Feasibility)

slide-10
SLIDE 10

Sequential Decision Making

4

?

Desired Property? Feasible: Logic (Focuses on Feasibility) Optimal: MDPs (Scaling is Hard)

slide-11
SLIDE 11

MDP+ Logic + Programming MDP

Index

5

Introduction Logic Programming

slide-12
SLIDE 12

MDP+ Logic + Programming MDP

Index

5

Introduction Logic Programming

slide-13
SLIDE 13

Logic

6

Goal Directed Search Start Goal State: Set of propositions (e.g. ¬Quite, Garbage, ¬Dinner) Feasible Plan: Sequence of actions from start to goal

slide-14
SLIDE 14

Actions

7

Clean Hands Ingredients Cook Dinner Ready Precondition Action Effect STRIPS, Graph Plan (16.410/16.413)

slide-15
SLIDE 15

8

Graph Plan or Forward Search

Not easily scalable

GOLOG: ALGOL in LOGIC

Restrict action space with programs Scales easier Results are dependent on high-level programs

Logic Programming

[Levesque, et. al. 97]

slide-16
SLIDE 16

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S)

slide-17
SLIDE 17

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))

slide-18
SLIDE 18

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations

slide-19
SLIDE 19

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s)

slide-20
SLIDE 20

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions

slide-21
SLIDE 21

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions Example: loc(robot, s)

slide-22
SLIDE 22

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions Example: loc(robot, s)

More expressive than LTL

slide-23
SLIDE 23

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

slide-24
SLIDE 24

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

slide-25
SLIDE 25

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

slide-26
SLIDE 26

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

slide-27
SLIDE 27

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

slide-28
SLIDE 28

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

slide-29
SLIDE 29

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

slide-30
SLIDE 30

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

slide-31
SLIDE 31

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

slide-32
SLIDE 32

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

slide-33
SLIDE 33

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

?

Aren’t we hard-coding the whole solution?

slide-34
SLIDE 34

Blocks World

11

Actions: pickup(b), putOnTable(b), putOn(b1,b2) Fluents: onTable(b, s), on(b1, b2, s) GOLOG Program: while (∃b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile

slide-35
SLIDE 35

Blocks World

11

Actions: pickup(b), putOnTable(b), putOn(b1,b2) Fluents: onTable(b, s), on(b1, b2, s) GOLOG Program: while (∃b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile

?

What does it do?

slide-36
SLIDE 36

GOLOG Execution

12

slide-37
SLIDE 37

Given Start S0, Goal S’, and program δ

GOLOG Execution

12

slide-38
SLIDE 38

Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’)

GOLOG Execution

12

slide-39
SLIDE 39

Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system

GOLOG Execution

12

slide-40
SLIDE 40

Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog

GOLOG Execution

12

slide-41
SLIDE 41

Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog

S0

GOLOG Execution

12

slide-42
SLIDE 42

Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog

S0 S1

GOLOG Execution

12

slide-43
SLIDE 43

Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog

S0 S1 S2 S3

GOLOG Execution

12

slide-44
SLIDE 44

Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog

S0 S1 S2 S3

GOLOG Execution

12

Non-deterministic

slide-45
SLIDE 45

S4

Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog

S0 S1 S2 S3

GOLOG Execution

12

Non-deterministic

slide-46
SLIDE 46

MDP+ Logic + Programming MDP

Index

13

Introduction Logic Programming

slide-47
SLIDE 47

MDP+ Logic + Programming MDP

Index

13

Introduction Logic Programming

slide-48
SLIDE 48

MDP

S, A, Pa

ss′, Ra ss′, γ

14

B.S.

100%,+85

M.S.

100%,+120 Studying 10%,-50 80%,-50 Studying 30%,-200 70%,-200

Ph.D.

10%,-300 Working 100%,+60 Working Working

B.S., W

  • rking, +60, B.S. Studying, -50, M.S. , ...
slide-49
SLIDE 49

MDP

15

V π(s) = E ∞

  • t=1

γt−1rt

  • s0 = s, π
  • .

Policy π(s): How to act in each state Value Function V(s): How good is it to be in each state?

slide-50
SLIDE 50

MDP

16

Goal

Find the optimal policy (π*), which maximizes the value function for all states

V ∗(s) = max

a

E

  • r + γV ∗(s′)
  • π∗(s) = argmaxa · · ·
slide-51
SLIDE 51

Example (Value Iteration)

17

  • 1

Reward: Actions:

V (s) = max

a

E

  • r + γV (s′)
slide-52
SLIDE 52

Example (Value Iteration)

17

  • 1

Reward: Actions:

V (s) = max

a

E

  • r + γV (s′)
slide-53
SLIDE 53

Example (Value Iteration)

17

  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1

Reward: Actions:

V (s) = max

a

E

  • r + γV (s′)
slide-54
SLIDE 54

Example (Value Iteration)

17

  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2

Reward: Actions:

V (s) = max

a

E

  • r + γV (s′)
slide-55
SLIDE 55

Example (Value Iteration)

17

  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 2
  • 3
  • 3
  • 4
  • 5
  • 6
  • 1
  • 2
  • 3
  • 4
  • 2
  • 3
  • 4
  • 5

Reward: Actions:

V (s) = max

a

E

  • r + γV (s′)
slide-56
SLIDE 56

MDP+ Logic + Programming MDP

Index

18

Introduction Logic Programming

slide-57
SLIDE 57

MDP+ Logic + Programming MDP

Index

18

Introduction Logic Programming

slide-58
SLIDE 58

DT-GOLOG: Decision Theoretic GOLOG

19

[Boutilier, et. al 00]

slide-59
SLIDE 59

DT-GOLOG: Decision Theoretic GOLOG

19

GOLOG No Optimization [Boutilier, et. al 00]

slide-60
SLIDE 60

DT-GOLOG: Decision Theoretic GOLOG

19

MDP Scaling is challenging GOLOG No Optimization [Boutilier, et. al 00]

slide-61
SLIDE 61

DT-GOLOG: Decision Theoretic GOLOG

19

MDP Scaling is challenging GOLOG No Optimization All non-deterministic actions are optimized using MDP concepts. [Boutilier, et. al 00]

slide-62
SLIDE 62

DT-GOLOG: Decision Theoretic GOLOG

19

MDP Scaling is challenging GOLOG No Optimization All non-deterministic actions are optimized using MDP concepts. Uncertainty, Reward [Boutilier, et. al 00]

slide-63
SLIDE 63

Decision Theoretic GOLOG

20

Programming(GOLOG) Planning(MDP)

slide-64
SLIDE 64

Decision Theoretic GOLOG

20

Programming(GOLOG) Planning(MDP)

Known Solution High-level Structure

slide-65
SLIDE 65

Decision Theoretic GOLOG

20

Programming(GOLOG) Planning(MDP)

Not enough grasping Low-level Structure Known Solution High-level Structure

slide-66
SLIDE 66

Example

21

Build a tower with least effort

slide-67
SLIDE 67

Example

21

Build a tower with least effort Pick a block as base Stack all other blocks on top of it

slide-68
SLIDE 68

Example

21

Build a tower with least effort Use which block for base? In which order pick up the blocks Pick a block as base Stack all other blocks on top of it

slide-69
SLIDE 69

Example

21

Build a tower with least effort Use which block for base? In which order pick up the blocks Pick a block as base Stack all other blocks on top of it

GOLOG MDP

slide-70
SLIDE 70

Example

21

Build a tower with least effort Use which block for base? In which order pick up the blocks Pick a block as base Stack all other blocks on top of it

GOLOG MDP

?

Guaranteed to be optimal?

slide-71
SLIDE 71

DT-GOLOG Execution

22

Given Start S0, Goal S’, and program δ Add rewards Formulate the problem as an MDP

slide-72
SLIDE 72

S4 S0 S1 S2 S3

DT-GOLOG Execution

22

Given Start S0, Goal S’, and program δ Add rewards Formulate the problem as an MDP

slide-73
SLIDE 73

S4 S0 S1 S2 S3

DT-GOLOG Execution

22

Given Start S0, Goal S’, and program δ Add rewards Formulate the problem as an MDP

  • 1

+3

  • 2
  • 5
slide-74
SLIDE 74

S4 S0 S1 S2 S3

DT-GOLOG Execution

22

Given Start S0, Goal S’, and program δ Add rewards Formulate the problem as an MDP (∃b) ¬onTable(b), b∈{b1, ..., bn} ¬onTable(b1)∨¬onTable(b2)∨... ∨¬onTable(bn)

  • 1

+3

  • 2
  • 5
slide-75
SLIDE 75

23

First Order Dynamic Programming

Idea:

Logical Structure Abstract Value Function Avoid curse of dimensionality!

[Sanner 07] Resulting MDP can still be intractable.

slide-76
SLIDE 76

24

Symbolic Dynamic Programming (Deterministic)

Tabular: Symbolic ?

V (s) = max

a

  • r + γV (s′)
slide-77
SLIDE 77

24

Symbolic Dynamic Programming (Deterministic)

Tabular: Symbolic ?

V (s) = max

a

  • r + γV (s′)
slide-78
SLIDE 78

24

Symbolic Dynamic Programming (Deterministic)

Tabular: Symbolic ?

V (s) = max

a

  • r + γV (s′)
slide-79
SLIDE 79

24

Symbolic Dynamic Programming (Deterministic)

Tabular: Symbolic ?

V (s) = max

a

  • r + γV (s′)
slide-80
SLIDE 80

24

Symbolic Dynamic Programming (Deterministic)

Tabular: Symbolic ?

V (s) = max

a

  • r + γV (s′)
slide-81
SLIDE 81

24

Symbolic Dynamic Programming (Deterministic)

Tabular: Symbolic ?

V (s) = max

a

  • r + γV (s′)
  • Representation of Reward and Values

Adding rewards and values Max Operator Find S

slide-82
SLIDE 82

25

Reward and Value Representation

Case Representation

∃b, b≠b1, on(b,b1) 10 ∄b, b≠b1, on(b,b1)

b1

rCase =

slide-83
SLIDE 83

26

Add Symbolically

⊖ ⊗

Similarly defined for and [Scott Sanner - ICAPS08 Tutorial]

A 10 ¬A 20 B 1 ¬B 2 A∧B 11 A∧¬B 12 ¬A∧B 21 ¬A∧¬B 22

=

slide-84
SLIDE 84

27

Max operator

max Operator

Φ1 10 Φ2 5 Φ3 3 Φ4

max_a =

Φ1 10 ¬Φ1∧Φ2 5 ¬Φ1∧¬Φ2∧Φ3 3 ¬Φ1∧¬Φ2∧¬Φ3∧Φ4

a_1 a_2 [Scott Sanner - ICAPS08 Tutorial]

slide-85
SLIDE 85

28

Find s? Isnʼt it obvious?

slide-86
SLIDE 86

28

Find s? Isnʼt it obvious?

s

s’

a

slide-87
SLIDE 87

28

Find s? Isnʼt it obvious?

s

s’

a

Dynamic Programming: Given V(s’) find V(s) In MDPs, we have s explicitly. In symbolic representation we have it implicitly so we have to build it.

slide-88
SLIDE 88

29

Find s = Goal Regression

b1 B b1

regress(Φ1,a)

Weakest relation that ensures Φ1 after taking a

b1 A

Clear(b1) ∧ B≠b1 On(A,b1)

Φ1=clear(b1) put(A,B)

?

slide-89
SLIDE 89

29

Find s = Goal Regression

b1 B b1

regress(Φ1,a)

Weakest relation that ensures Φ1 after taking a

b1 A

Clear(b1) ∧ B≠b1 On(A,b1)

Φ1=clear(b1) put(A,B)

slide-90
SLIDE 90

vCase = max

a

  • rCase ⊕ γ × regr(vCase, a)
  • 30

Symbolic Dynamic Programming (Deterministic)

Tabular: Symbolic:

?

V (s) = max

a

  • r + γV (s′)
slide-91
SLIDE 91

vCase = max

a

  • rCase ⊕ γ × regr(vCase, a)
  • 30

Symbolic Dynamic Programming (Deterministic)

Tabular: Symbolic:

V (s) = max

a

  • r + γV (s′)
slide-92
SLIDE 92

31

Classical Example

Box Truck City Goal: Have a box in Paris

∃b, BoxIn(b,Paris) 10 else

rCase =

slide-93
SLIDE 93

32

Classical Example

Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop load and unload have 10% chance of failure Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c) Assumptions: All cities are connected. ϒ = .9

slide-94
SLIDE 94

33

Example

[Sanner 07]

∃b, BoxIn(b,Paris) 100 noop else, ∃b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89 unload(b,t) else, ∃b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80 drive(t,c,paris) else, ∃b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72 load(b,t) else, ∃b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 65 drive(t,c2,c1) else noop

s V*(s) π*(s)

slide-95
SLIDE 95

33

Example

[Sanner 07]

∃b, BoxIn(b,Paris) 100 noop else, ∃b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89 unload(b,t) else, ∃b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80 drive(t,c,paris) else, ∃b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72 load(b,t) else, ∃b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 65 drive(t,c2,c1) else noop

?

What did we gain by going through all of this? s V*(s) π*(s)

slide-96
SLIDE 96

Conclusion

34

slide-97
SLIDE 97

Conclusion

34

Logic Programming Situation Calculus GOLOG Planning

slide-98
SLIDE 98

Conclusion

34

Logic Programming Situation Calculus GOLOG Planning MDP Review Value Iteration

slide-99
SLIDE 99

Conclusion

34

Logic Programming Situation Calculus GOLOG Planning MDP Review Value Iteration MDP+ Logic + Programming DT-GOLOG Symbolic DP

slide-100
SLIDE 100

References

35

Levesque, H.,Reiter, R., Lespérance, Y., Lin, F ., and Scherl, R. “ GOLOG: A Logic Programming Language for Dynamic Domains “, Journal of Logic Programming, 31:59--84, 1997 Richard S. Sutton, Andrew G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, Cambridge, 1998 Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, “Decision-Theoretic, High-Level Agent Programming in the Situation Calculus”. AAAI/IAAI 2000: 355-362

  • S. Sanner, and K. Kersting, ”Symbolic dynamic programming”.

Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag, 2007

slide-101
SLIDE 101

References

36

Craig Boutilier, Ray Reiter and Bob Price, “Symbolic Dynamic Programming for First-order MDPs”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Seattle, pp.690--697 (2001).

slide-102
SLIDE 102

37

Questions ?

slide-103
SLIDE 103

37

Questions ?