Alborz Geramifard
Logic Programming and MDPs for Planning
Winter 2009
Logic Programming and MDPs for Planning Alborz Geramifard Winter - - PowerPoint PPT Presentation
Logic Programming and MDPs for Planning Alborz Geramifard Winter 2009 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Why do we care about
Alborz Geramifard
Winter 2009
MDP+ Logic + Programming MDP
2
Introduction Logic Programming
MDP+ Logic + Programming MDP
2
Introduction Logic Programming
3
4
?
4
?
4
?
4
?
Desired Property?
4
?
Desired Property? Feasible: Logic (Focuses on Feasibility)
4
?
Desired Property? Feasible: Logic (Focuses on Feasibility) Optimal: MDPs (Scaling is Hard)
MDP+ Logic + Programming MDP
5
Introduction Logic Programming
MDP+ Logic + Programming MDP
5
Introduction Logic Programming
6
Goal Directed Search Start Goal State: Set of propositions (e.g. ¬Quite, Garbage, ¬Dinner) Feasible Plan: Sequence of actions from start to goal
7
Clean Hands Ingredients Cook Dinner Ready Precondition Action Effect STRIPS, Graph Plan (16.410/16.413)
8
Graph Plan or Forward Search
Not easily scalable
GOLOG: ALGOL in LOGIC
Restrict action space with programs Scales easier Results are dependent on high-level programs
[Levesque, et. al. 97]
9
Situation: S0, do(A,S)
9
Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))
9
Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations
9
Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s)
9
Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions
9
Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions Example: loc(robot, s)
9
Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions Example: loc(robot, s)
10
[Scott Sanner - ICAPS08 Tutorial]
10
[Scott Sanner - ICAPS08 Tutorial]
10
[Scott Sanner - ICAPS08 Tutorial]
10
[Scott Sanner - ICAPS08 Tutorial]
10
[Scott Sanner - ICAPS08 Tutorial]
10
[Scott Sanner - ICAPS08 Tutorial]
10
[Scott Sanner - ICAPS08 Tutorial]
10
[Scott Sanner - ICAPS08 Tutorial]
10
[Scott Sanner - ICAPS08 Tutorial]
10
[Scott Sanner - ICAPS08 Tutorial]
10
[Scott Sanner - ICAPS08 Tutorial]
?
Aren’t we hard-coding the whole solution?
11
Actions: pickup(b), putOnTable(b), putOn(b1,b2) Fluents: onTable(b, s), on(b1, b2, s) GOLOG Program: while (∃b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile
11
Actions: pickup(b), putOnTable(b), putOn(b1,b2) Fluents: onTable(b, s), on(b1, b2, s) GOLOG Program: while (∃b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile
?
What does it do?
12
Given Start S0, Goal S’, and program δ
12
Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’)
12
Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system
12
Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog
12
Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog
S0
12
Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog
S0 S1
12
Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog
S0 S1 S2 S3
12
Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog
S0 S1 S2 S3
12
Non-deterministic
S4
Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog
S0 S1 S2 S3
12
Non-deterministic
MDP+ Logic + Programming MDP
13
Introduction Logic Programming
MDP+ Logic + Programming MDP
13
Introduction Logic Programming
14
B.S.
100%,+85
M.S.
100%,+120 Studying 10%,-50 80%,-50 Studying 30%,-200 70%,-200
Ph.D.
10%,-300 Working 100%,+60 Working Working
15
Policy π(s): How to act in each state Value Function V(s): How good is it to be in each state?
16
Goal
Find the optimal policy (π*), which maximizes the value function for all states
17
Reward: Actions:
a
17
Reward: Actions:
a
17
Reward: Actions:
a
17
Reward: Actions:
a
17
Reward: Actions:
a
MDP+ Logic + Programming MDP
18
Introduction Logic Programming
MDP+ Logic + Programming MDP
18
Introduction Logic Programming
19
[Boutilier, et. al 00]
19
GOLOG No Optimization [Boutilier, et. al 00]
19
MDP Scaling is challenging GOLOG No Optimization [Boutilier, et. al 00]
19
MDP Scaling is challenging GOLOG No Optimization All non-deterministic actions are optimized using MDP concepts. [Boutilier, et. al 00]
19
MDP Scaling is challenging GOLOG No Optimization All non-deterministic actions are optimized using MDP concepts. Uncertainty, Reward [Boutilier, et. al 00]
20
Programming(GOLOG) Planning(MDP)
20
Programming(GOLOG) Planning(MDP)
Known Solution High-level Structure
20
Programming(GOLOG) Planning(MDP)
Not enough grasping Low-level Structure Known Solution High-level Structure
21
Build a tower with least effort
21
Build a tower with least effort Pick a block as base Stack all other blocks on top of it
21
Build a tower with least effort Use which block for base? In which order pick up the blocks Pick a block as base Stack all other blocks on top of it
21
Build a tower with least effort Use which block for base? In which order pick up the blocks Pick a block as base Stack all other blocks on top of it
GOLOG MDP
21
Build a tower with least effort Use which block for base? In which order pick up the blocks Pick a block as base Stack all other blocks on top of it
GOLOG MDP
Guaranteed to be optimal?
22
Given Start S0, Goal S’, and program δ Add rewards Formulate the problem as an MDP
S4 S0 S1 S2 S3
22
Given Start S0, Goal S’, and program δ Add rewards Formulate the problem as an MDP
S4 S0 S1 S2 S3
22
Given Start S0, Goal S’, and program δ Add rewards Formulate the problem as an MDP
+3
S4 S0 S1 S2 S3
22
Given Start S0, Goal S’, and program δ Add rewards Formulate the problem as an MDP (∃b) ¬onTable(b), b∈{b1, ..., bn} ¬onTable(b1)∨¬onTable(b2)∨... ∨¬onTable(bn)
+3
23
First Order Dynamic Programming
Idea:
Logical Structure Abstract Value Function Avoid curse of dimensionality!
[Sanner 07] Resulting MDP can still be intractable.
24
Symbolic Dynamic Programming (Deterministic)
Tabular: Symbolic ?
24
Symbolic Dynamic Programming (Deterministic)
Tabular: Symbolic ?
24
Symbolic Dynamic Programming (Deterministic)
Tabular: Symbolic ?
24
Symbolic Dynamic Programming (Deterministic)
Tabular: Symbolic ?
24
Symbolic Dynamic Programming (Deterministic)
Tabular: Symbolic ?
24
Symbolic Dynamic Programming (Deterministic)
Tabular: Symbolic ?
Adding rewards and values Max Operator Find S
25
Reward and Value Representation
Case Representation
∃b, b≠b1, on(b,b1) 10 ∄b, b≠b1, on(b,b1)
b1
26
Add Symbolically
Similarly defined for and [Scott Sanner - ICAPS08 Tutorial]
A 10 ¬A 20 B 1 ¬B 2 A∧B 11 A∧¬B 12 ¬A∧B 21 ¬A∧¬B 22
27
Max operator
max Operator
Φ1 10 Φ2 5 Φ3 3 Φ4
max_a =
Φ1 10 ¬Φ1∧Φ2 5 ¬Φ1∧¬Φ2∧Φ3 3 ¬Φ1∧¬Φ2∧¬Φ3∧Φ4
a_1 a_2 [Scott Sanner - ICAPS08 Tutorial]
28
Find s? Isnʼt it obvious?
28
Find s? Isnʼt it obvious?
28
Find s? Isnʼt it obvious?
Dynamic Programming: Given V(s’) find V(s) In MDPs, we have s explicitly. In symbolic representation we have it implicitly so we have to build it.
29
Find s = Goal Regression
b1 B b1
regress(Φ1,a)
Weakest relation that ensures Φ1 after taking a
b1 A
Clear(b1) ∧ B≠b1 On(A,b1)
Φ1=clear(b1) put(A,B)
29
Find s = Goal Regression
b1 B b1
regress(Φ1,a)
Weakest relation that ensures Φ1 after taking a
b1 A
Clear(b1) ∧ B≠b1 On(A,b1)
Φ1=clear(b1) put(A,B)
a
Symbolic Dynamic Programming (Deterministic)
Tabular: Symbolic:
a
Symbolic Dynamic Programming (Deterministic)
Tabular: Symbolic:
31
Classical Example
Box Truck City Goal: Have a box in Paris
∃b, BoxIn(b,Paris) 10 else
32
Classical Example
Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop load and unload have 10% chance of failure Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c) Assumptions: All cities are connected. ϒ = .9
33
Example
[Sanner 07]
∃b, BoxIn(b,Paris) 100 noop else, ∃b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89 unload(b,t) else, ∃b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80 drive(t,c,paris) else, ∃b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72 load(b,t) else, ∃b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 65 drive(t,c2,c1) else noop
s V*(s) π*(s)
33
Example
[Sanner 07]
∃b, BoxIn(b,Paris) 100 noop else, ∃b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89 unload(b,t) else, ∃b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80 drive(t,c,paris) else, ∃b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72 load(b,t) else, ∃b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 65 drive(t,c2,c1) else noop
What did we gain by going through all of this? s V*(s) π*(s)
34
34
Logic Programming Situation Calculus GOLOG Planning
34
Logic Programming Situation Calculus GOLOG Planning MDP Review Value Iteration
34
Logic Programming Situation Calculus GOLOG Planning MDP Review Value Iteration MDP+ Logic + Programming DT-GOLOG Symbolic DP
35
Levesque, H.,Reiter, R., Lespérance, Y., Lin, F ., and Scherl, R. “ GOLOG: A Logic Programming Language for Dynamic Domains “, Journal of Logic Programming, 31:59--84, 1997 Richard S. Sutton, Andrew G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, Cambridge, 1998 Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, “Decision-Theoretic, High-Level Agent Programming in the Situation Calculus”. AAAI/IAAI 2000: 355-362
Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag, 2007
36
Craig Boutilier, Ray Reiter and Bob Price, “Symbolic Dynamic Programming for First-order MDPs”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Seattle, pp.690--697 (2001).
37
37