Practical Linear Practical Linear-
- value
value Approximation Techniques Approximation Techniques for First for First-
- order
- rder MDPs
Practical Linear- -value value Practical Linear Approximation - - PowerPoint PPT Presentation
Practical Linear- -value value Practical Linear Approximation Techniques Approximation Techniques for First- -order order MDPs MDPs for First & Craig Sanner & Scott Sanner Craig Boutilier Boutilier Scott University of Toronto
2
( (:action :action load load-
box-
truck-
in-
city :parameters :parameters (?b (?b -
box ?t -
truck ?c -
city) :precondition :precondition (and ( (and (BIn BIn ?b ?c) ( ?b ?c) (TIn TIn ?t ?c)) ?t ?c)) :effect :effect (and (On ?b ?t) (not ( (and (On ?b ?t) (not (BIn BIn ?b ?c))) ?b ?c)))
London London Paris Paris Rome Rome Berlin Berlin Moscow Moscow
3
4
loadS(b,t), (b,t), unloadS unloadS(b,t), … (b,t), …
S0
0, do(
, do(loadS loadS(b,t), S (b,t), S0
0), …
), …
BIn(b,c,s), (b,c,s), TIn TIn(t,c,s), On(b,t,s) (t,c,s), On(b,t,s)
F:
BIn(b,c,do(a,s)) (b,c,do(a,s)) ≡ ≡ (1) Bin(b,c,s) (1) Bin(b,c,s) AND AND a a g g loadS loadS(b,t) (b,t) OR OR (2) (2) for some for some t t: : a a = = unloadS unloadS(b,t) (b,t) AND AND TIn TIn(t,c,s) (t,c,s)
Regr( (ϕ ϕ) ) =
ϕ’ ’
ϕ describing a
ϕ’ ’ describing
5
¬ ¬∃ ∃x.A(x)
x.A(x)
∃ ∃x.A(x)
x.A(x) 20 20 10 10
¬∃ ¬∃x.A(x)
x.A(x) ∧
∧ ¬ ¬∃ ∃y.A(y)
y.A(y)∧
∧B(y)
B(y)
¬∃ ¬∃x.A(x)
x.A(x) ∧
∧ ∃ ∃y.A(y)
y.A(y)∧
∧B(y)
B(y)
∃ ∃x.A(x)
x.A(x) ∧
∧ ¬ ¬∃ ∃y.A(y)
y.A(y)∧
∧B(y)
B(y)
∃ ∃x.A(x)
x.A(x) ∧
∧ ∃ ∃y.A(y)
y.A(y)∧
∧B(y)
B(y) 2 24 4 2 23 3 1 14 4 1 13 3
¬ ¬∃ ∃y.A(y)
y.A(y)∧
∧B(y)
B(y)
∃ ∃y.A(y)
y.A(y)∧
∧B(y)
B(y) 4 4 3 3 =
BoxWorld FOMDP as…
¬ ¬ ∀ ∀b,c.
b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s)
∀ ∀b,c.
b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s) 1 1
rCase rCase(s) (s) = =
6
vCase(s) (s) and user action, produces
“Q “Q-
Fun” = FODTR[ FODTR[ vCase vCase(s), load(b,t) ] = (s), load(b,t) ] = Regr Regr[ [ vCase vCase( after ( after loadS loadS… … ) ] ) ]
1 1
P( P( loadS loadS… … | load… ) | load… ) / / Regr Regr[ [ vCase vCase( after ( after loadF loadF… … ) ] ) ]
1 1
P( P( loadF loadF… … | load… ) | load… ) P( P(loadS loadS(b,t) (b,t) | load(b,t)) = | load(b,t)) = P( P(loadF loadF(b,t) (b,t) | load(b,t)) = 1 | load(b,t)) = 1 0
P(loadS loadS(b,t) (b,t) | load(b,t)) | load(b,t)) ¬ ¬ snow snow(s)
(s)
snow snow(s)
(s) .5 .5 .1 .1
7
1) 1) B BA(
A(x x) )[
[vCase vCase(s) (s)] = ] = rCase rCase(s) (s) /
/ γ⋅
γ⋅FODTR[ FODTR[vCase vCase(s) (s)] ] 2) 2) B BA
A[
[vCase vCase(s) (s)] = ] = ∃ ∃x
x. . B
BA(
A(x x) )[
[vCase vCase(s) (s)] ] (action abstraction!) (action abstraction!) 3) 3) B BA
A max max[
[vCase vCase(s) (s)] = max( B ] = max( BA
A[
[vCase vCase(s) (s)] ) ] )
Let Let B Bload
load(b,t) (b,t)[
[vCase vCase(s)] (s)] = =
¬ ¬ϕ
ϕ(b,t)
(b,t)
ϕ ϕ(b,t)
(b,t) .9 .9
B Bload
load[
[vCase vCase(s)] (s)] = = ∃ ∃b,t
b,t. . ¬ ¬ϕ
ϕ(b,t)
(b,t)
∃ ∃b,t
b,t. . ϕ
ϕ(b,t)
(b,t) .9 .9
B Bload
load max max[
[vCase vCase(s)] (s)] = =
¬ ¬( (∃
∃b,t
b,t. . ϕ
ϕ(b,t))
(b,t)) ∧ ∧ ∃
∃b,t
b,t. . ¬ ¬ϕ
ϕ(b,t)
(b,t)
∃ ∃b,t
b,t. . ϕ
ϕ(b,t)
(b,t) .9 .9
Think of as Think of as Q(A(x),s) Q(A(x),s), , note the free note the free vars vars! ! Think of as Think of as ~Q(A,s) ~Q(A,s), , no no free free vars vars but now overlap! but now overlap! Think of as Think of as Q(A,s) Q(A,s), , no no free free vars vars and and no no overlap!
8
¬ ¬ ∃ ∃b,c
b,c BIn BIn(b,c,s) (b,c,s)
∃ ∃b,c
b,c BIn BIn(b,c,s) (b,c,s) 1 1
¬ ¬ ∃ ∃t,c
t,c TIn TIn(t,c,s) (t,c,s)
∃ ∃t,c
t,c TIn TIn(t,c,s) (t,c,s) 1 1
vCase vCase(s) = w (s) = w1
1•
… ⊕
wk
k•
Vars: : w wi
i; i
; i [ [ k k Minimize: Minimize:
s Σ
i=1..k w
wi
i •bCase
bCasei
i(s)
(s) Subject to: Subject to: 0 0 m m B Ba
a max max[
[/ /i=1..k
i=1..k w
wi
i •bCase
bCasei
i(s)]
(s)] 0 / /i=1..k
i=1..k w
wi
i •bCase
bCasei
i(s);
(s); ∀ ∀a a∈ ∈A,s A,s
9
πCase(s) = max( Case(s) = max( ∪ ∪i=1..m
i=1..m B
BA
Ai
i[
[vCase vCase(s)] ) (s)] )
i
i-
πCase CaseAi
Ai(s) = { part
(s) = { part ∈ ∈ π πCase(s) s.t. part Case(s) s.t. part → → A Ai
i }
}
i
wi
i 0=0,
=0, π πCase Case0
0(s)
(s); ; i
j+1= π
j:
Vars Vars: : w wi
i (j+1) (j+1); i
; i [ [ k k Minimize: Minimize: φ
(j+1)
Subject to: Subject to: φ
(j+1) m
m | | π πCase Casej
j a a(s)
(s) / / B Ba
a max max (
(/ /i=1..k
i=1..k w
wi
i (j+1) (j+1) •bCase
bCasei
i(s))
(s)) 0 / /i=1..k
i=1..k w
wi
i (j+1) (j+1) •bCase
bCasei
i(s)|;
(s)|; ∀ ∀a a∈ ∈A,s A,s
10
∃b
b. .Bin(b,P,s) Bin(b,P,s)
unload:
∃b
b. .Bin(b,P,s) Bin(b,P,s) ∨ ∨ (
(∃ ∃b*,t*.
b*,t*.TIn TIn(t*,P,s) (t*,P,s)∧ ∧On(b*,t*,s)) On(b*,t*,s))
11
n n-
1 boxes not at dest dest … … 1 box not at 1 box not at dest dest ∀ ∀b,c.
b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s)
n-
1 … …
1 1
vCase vCasen
n(s)=
(s)= ¬ ¬” ” ∀ ∀b,c.
b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s) 1 1
rCase rCase (s)= (s)=
12
∀b,c.
b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s)
BIn(b*,c*,s) (b*,c*,s)
Q(A,s)<b*,c*>
<b*,c*>(s)
(s) for
∀a a∈ ∈A A
Dest(b (b1
1,c
,c1
1),
), Dest Dest(b (b2
2,c
,c2
2),
), Dest Dest(b (b3
3,c
,c3
3) }
) }
Q(A,s) = Q(A,s) Q(A,s)<b1,c1>
<b1,c1>(s) +
(s) + Q(A,s)
Q(A,s)<b2,c2>
<b2,c2>(s) +
(s) +
Q(A,s) Q(A,s)<b3,c3>
<b3,c3>(s)
(s) for
∀a a∈ ∈A A
Q(A,s)
13
B of basis functions: must examine
|B
B|
|
B’ of basis functions,
B’|
0 m m B Ba
a max max …
…with
0 m m B Ba
a …
…
(∃
∃x A(x))
x A(x)) ∧ ∧ ( (∃
∃x A(x))
x A(x)) ∧ ∧ ( (∃
∃x A(x)
x A(x)∧ ∧B(x)) B(x))
a⇒ ⇒b b ∃ ∃x A(x)
x A(x)∧ ∧B(x) B(x)
a a ¬ ¬b b⇒¬ ⇒¬a a ∃ ∃x A(x)
x A(x)
b b
FOL Mapping FOL Mapping Impl Impl. . Prop Prop Var Var
a
T F
a a
T
b b
F
∃x A(x)
x A(x)∧ ∧B(x) B(x)
14
Offline solution times for BoxWorld BoxWorld & & BlocksWorld BlocksWorld: :
Without optimizations, cannot get past iteration 2 (> 36000 sec.) )
BoxWorld: Policies simple, fewer constraints for FOAPI : Policies simple, fewer constraints for FOAPI
BlocksWorld: Policies complex (lots of equality) : Policies complex (lots of equality)
15
16
(KvOdR KvOdR, 2004) , 2004)
(KS, 2005)
(JKW, 2006)
(GKGK, 2003)
(GT, 2004)
(FYG, 2004)
17
nd place in ICAPS 2006 IPPC by # problems solved
(∀
∀b,c.
b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s)) (b,c,s)) ∨ ∨ ∃
∃b.Bin(b,Paris,s)
b.Bin(b,Paris,s)
Σb
b (
(∀ ∀c.
Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s)) (b,c,s))