practical linear value value practical linear
play

Practical Linear- -value value Practical Linear Approximation - PowerPoint PPT Presentation

Practical Linear- -value value Practical Linear Approximation Techniques Approximation Techniques for First- -order order MDPs MDPs for First & Craig Sanner & Scott Sanner Craig Boutilier Boutilier Scott University of Toronto


  1. Practical Linear- -value value Practical Linear Approximation Techniques Approximation Techniques for First- -order order MDPs MDPs for First & Craig Sanner & Scott Sanner Craig Boutilier Boutilier Scott University of Toronto University of Toronto UAI 2006 UAI 2006

  2. Why Solve First- -order order MDPs MDPs? ? Why Solve First � Relational Relational desc desc. of ( . of (prob prob) planning domain in (P)PDDL: ) planning domain in (P)PDDL: � Paris Paris Box World: Box World: Moscow Moscow London London Berlin Berlin Rome Rome (:action :action load load- -box box- -on on- -truck truck- -in in- -city city ( :parameters (?b :parameters (?b - - box ?t box ?t - - truck ?c truck ?c - - city) city) :precondition (and ( (and (BIn BIn ?b ?c) ( ?b ?c) (TIn TIn ?t ?c)) ?t ?c)) :precondition :effect (and (On ?b ?t) (not ( (and (On ?b ?t) (not (BIn BIn ?b ?c))) ?b ?c))) :effect � Can solve a Can solve a ground MDP ground MDP for for each each domain instantiation: domain instantiation: � � 3 trucks: 2 planes: 4 boxes: 3 trucks: 2 planes: 4 boxes: � � Or solve Or solve first first- -order MDP order MDP for for all all domain inst. at once! domain inst. at once! � � Lift PPDDL MDP specification to first Lift PPDDL MDP specification to first- -order (FOMDP) order (FOMDP) � � Soln Soln makes value distinctions for makes value distinctions for all all dom. instantiations! dom. instantiations! � 2

  3. Background / Talk Outline Background / Talk Outline 1) Symbolic DP for first Symbolic DP for first- -order order MDPs MDPs (BRP, 2001) (BRP, 2001) 1) Defines FOMDP / operators / value iteration Defines FOMDP / operators / value iteration � � � Requires FO simplification for compactness � Requires FO simplification for compactness � � 2) First First- -order approx. linear order approx. linear prog prog. (SB, 2005) . (SB, 2005) 2) Approximate value with linear comb. of basis funs. Approximate value with linear comb. of basis funs. � � ☺ project onto weight space ☺ No simplification → → project onto weight space No simplification � � 3) Many practical questions remaining (SB, 2006) Many practical questions remaining (SB, 2006) 3) Other algorithms – – first first- -order API? order API? Other algorithms � � Where do basis functions come from? Where do basis functions come from? � � How to efficiently handle universal rewards? How to efficiently handle universal rewards? � � Optimizations for scalability? Optimizations for scalability? � � 3

  4. FOMDP Foundation: SitCalc SitCalc FOMDP Foundation: loadS(b,t), (b,t), unloadS unloadS(b,t), … (b,t), … Deterministic Actions: loadS � Deterministic Actions: � Situations: S S 0 , do(loadS loadS(b,t), S (b,t), S 0 ), … 0 , do( 0 ), … � Situations: � : BIn BIn(b,c,s), (b,c,s), TIn TIn(t,c,s), On(b,t,s) (t,c,s), On(b,t,s) � Fluents Fluents: � F : each fluent F � Successor Successor- -state axioms ( state axioms (SSAs SSAs) ) for for each fluent : � (like det det. FO . FO- -DBN) DBN) � Describe how action affects fluent Describe how action affects fluent (like � BIn(b,c,do(a,s)) (b,c,do(a,s)) ≡ Ex: BIn � Ex: ≡ � (1) Bin(b,c,s) AND (1) Bin(b,c,s) AND a a g g loadS loadS(b,t) (b,t) OR (2) (2) for some for some t t : : a a = = unloadS unloadS(b,t) (b,t) AND AND TIn TIn(t,c,s) (t,c,s) OR ) = = ϕ Regression Operator: Regr Regr( ( ϕ ’ ϕ ) ϕ ’ � Regression Operator: � � Takes a formula Takes a formula ϕ ϕ describing a describing a post post- -action action state state � ’ describing ϕ ’ � Uses Uses SSAs SSAs to build to build ϕ describing pre pre- -action action state state � � Crucial for backing up value fun to produce Q Crucial for backing up value fun to produce Q- -fun! fun! � 4

  5. FOMDP Case Representation FOMDP Case Representation Assign value to first- -order state abstraction order state abstraction Case: Assign value to first � Case: � E.g., can express reward in BoxWorld BoxWorld FOMDP as… � E.g., can express reward in FOMDP as… � 1 1 b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s) ∀ b,c. ∀ rCase(s) (s) = = rCase 0 0 b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s) ∀ b,c. ¬ ∀ ¬ Define unary, binary case operations Operators: Define unary, binary case operations � Operators: � / (or 1 , 0 ) sum” / (or 1 , 0 � E.g., can take “cross E.g., can take “cross- -sum” ) of two cases of two cases… … � 13 1 3 ∃ x.A(x) x.A(x) ∧ ∃ y.A(y) y.A(y) ∧ ∧ B(y) B(y) ∃ ∧ ∃ 3 = 10 10 3 x.A(x) ∃ x.A(x) ∃ y.A(y) y.A(y) ∧ ∧ B(y) B(y) 14 4 1 ∃ x.A(x) x.A(x) ∧ ∃ y.A(y) y.A(y) ∧ ∧ B(y) B(y) ∃ ∃ = ∧ ¬ ¬∃ ∃ / / 20 4 20 4 x.A(x) y.A(y) ∧ B(y) ∃ x.A(x) ∃ y.A(y) ∧ B(y) 23 3 2 x.A(x) ∧ y.A(y) ∧ B(y) ¬∃ ¬∃ ¬∃ x.A(x) ∃ y.A(y) ∧ B(y) ¬ ¬ ∧ ∃ ¬∃ 24 4 2 x.A(x) ∧ y.A(y) ∧ B(y) ¬∃ x.A(x) ∃ y.A(y) ∧ B(y) ∧ ¬ ¬∃ ¬∃ � Must remove inconsistent elements (i.e., red bar ) Must remove inconsistent elements (i.e., red bar ) � 5

  6. FOMDP Actions and FODTR FOMDP Actions and FODTR � SitCalc SitCalc is deterministic, how to handle probabilities? is deterministic, how to handle probabilities? � User’s stochastic actions: load(b,t) load(b,t) � User’s stochastic actions: � Nature’s deterministic choice: loadS loadS(b,t) (b,t), , loadF loadF(b,t) (b,t) � Nature’s deterministic choice: � � Probability distribution over Nature’s choice: Probability distribution over Nature’s choice: � snow (s) .1 snow .1 (s) P(loadS loadS(b,t) (b,t) | load(b,t)) = | load(b,t)) = P( ¬ snow snow (s) .5 .5 (s) ¬ 0 P( | load(b,t)) = 1 0 P(loadF loadF(b,t) (b,t) | load(b,t)) = 1 P(loadS loadS(b,t) (b,t) | load(b,t)) | load(b,t)) P( � First First- -order decision order decision- -theoretic regression (FODTR): theoretic regression (FODTR): � Given value fun vCase vCase(s) (s) and user action, produces � Given value fun and user action, produces � first- -order description of “Q order description of “Q- -fun” (modulo reward) fun” (modulo reward) first “Q- -Fun” = Fun” = FODTR[ FODTR[ vCase vCase(s), load(b,t) ] = (s), load(b,t) ] = “Q Regr[ Regr [ vCase vCase( after ( after loadS loadS… … ) ] ) ] P( loadS P( loadS… … | load… ) | load… ) 1 1 Regr[ Regr [ vCase vCase( after ( after loadF loadF… … ) ] ) ] P( loadF P( loadF… … | load… ) | load… ) 1 1 / / 6

  7. FOMDP Backup Operators FOMDP Backup Operators In fact, there are 3 types of “Q- -funs”/backup operators: funs”/backup operators: In fact, there are 3 types of “Q 1) B A( [vCase vCase(s) (s)] = ] = rCase rCase(s) (s) / FODTR[vCase vCase(s) (s)] ] 1) B ) [ γ⋅ FODTR[ A( x x ) / γ⋅ .9 .9 (b,t) ϕ (b,t) Think of as Q(A(x),s) Q(A(x),s), , ϕ Think of as Let B Let B load (b,t) [ [vCase vCase(s)] (s)] = = load(b,t) 0 0 (b,t) ϕ (b,t) note the free vars vars! ! ¬ ϕ note the free ¬ 2) B A [vCase vCase(s) (s)] = ] = ∃ . B B A( [vCase vCase(s) (s)] ] (action abstraction!) 2) B A [ ) [ A( x x ) ∃ x x . (action abstraction!) .9 .9 ∃ b,t b,t. . ϕ ϕ (b,t) (b,t) Think of as ~Q(A,s) ~Q(A,s), , no ∃ no Think of as B load [vCase vCase(s)] (s)] = = B load [ 0 0 b,t. . ¬ (b,t) free vars vars but now overlap! but now overlap! ∃ b,t ϕ (b,t) free ¬ ϕ ∃ 3) B A [vCase vCase(s) (s)] = max( B ] = max( B A [vCase vCase(s) (s)] ) ] ) 3) B max [ A [ A max .9 .9 ∃ b,t b,t. . ϕ ϕ (b,t) (b,t) ∃ Think of as Q(A,s) Q(A,s), , no B load [vCase vCase(s)] (s)] = = B max [ Think of as load no max 0 0 ¬ ( ( ∃ ∃ b,t b,t. . ϕ ϕ (b,t)) (b,t)) free vars vars and and no no overlap! overlap! free ¬ ∃ b,t b,t. . ¬ ϕ (b,t) (b,t) ¬ ϕ ∧ ∃ ∧ 7

  8. First- -order Approx. Linear order Approx. Linear Prog Prog. (FOALP) . (FOALP) First � Represent value fn as linear comb. of k basis fns: Represent value fn as linear comb. of k basis fns: � 1 1 1 1 b,c BIn BIn(b,c,s) (b,c,s) t,c TIn TIn(t,c,s) (t,c,s) ∃ b,c ∃ t,c ∃ ∃ vCase(s) = w (s) = w 1 1 • vCase ⊕ … … ⊕ ⊕ w w k • k • ⊕ 0 0 0 0 b,c BIn BIn(b,c,s) (b,c,s) t,c TIn TIn(t,c,s) (t,c,s) ∃ b,c ∃ t,c ¬ ∃ ¬ ∃ ¬ ¬ � Reduces MDP solution to finding good weights… Reduces MDP solution to finding good weights… � generalize approx. LP approx. LP used by (van Roy, GKP, SP): used by (van Roy, GKP, SP): generalize Vars: : w i ; i [ [ k k Vars w i ; i Σ s Σ i=1..k Minimize: Minimize: Σ s Σ i=1..k w w i i •bCase bCase i i (s) (s) Subject to: 0 0 m m B B a [ / w i i •bCase bCase i (s)] Subject to: max [ i=1..k w i (s)] a / i=1..k max w i i •bCase bCase i (s); ∀ a ∈ A,s i=1..k w i (s); ∀ a ∈ A,s 0 / 0 / i=1..k � FOALP issues resolved in (SB, 2005): FOALP issues resolved in (SB, 2005): � We give principled approximation � ∞ ∞ sum in objective: sum in objective: We give principled approximation � Only finite set of distinct distinct constraints, constraints, � ∞ ∞ constraints: constraints: Only finite set of � solve exactly & efficiently w/ constraint gen. (SP) solve exactly & efficiently w/ constraint gen. (SP) 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend