Practical Linear- -value value Practical Linear Approximation - PowerPoint PPT Presentation

Practical Linear- -value value Practical Linear Approximation Techniques Approximation Techniques for First- -order order MDPs MDPs for First & Craig Sanner & Scott Sanner Craig Boutilier Boutilier Scott University of Toronto University of Toronto UAI 2006 UAI 2006

Why Solve First- -order order MDPs MDPs? ? Why Solve First � Relational Relational desc desc. of ( . of (prob prob) planning domain in (P)PDDL: ) planning domain in (P)PDDL: � Paris Paris Box World: Box World: Moscow Moscow London London Berlin Berlin Rome Rome (:action :action load load- -box box- -on on- -truck truck- -in in- -city city ( :parameters (?b :parameters (?b - - box ?t box ?t - - truck ?c truck ?c - - city) city) :precondition (and ( (and (BIn BIn ?b ?c) ( ?b ?c) (TIn TIn ?t ?c)) ?t ?c)) :precondition :effect (and (On ?b ?t) (not ( (and (On ?b ?t) (not (BIn BIn ?b ?c))) ?b ?c))) :effect � Can solve a Can solve a ground MDP ground MDP for for each each domain instantiation: domain instantiation: � � 3 trucks: 2 planes: 4 boxes: 3 trucks: 2 planes: 4 boxes: � � Or solve Or solve first first- -order MDP order MDP for for all all domain inst. at once! domain inst. at once! � � Lift PPDDL MDP specification to first Lift PPDDL MDP specification to first- -order (FOMDP) order (FOMDP) � � Soln Soln makes value distinctions for makes value distinctions for all all dom. instantiations! dom. instantiations! � 2

Background / Talk Outline Background / Talk Outline 1) Symbolic DP for first Symbolic DP for first- -order order MDPs MDPs (BRP, 2001) (BRP, 2001) 1) Defines FOMDP / operators / value iteration Defines FOMDP / operators / value iteration � � � Requires FO simplification for compactness � Requires FO simplification for compactness � � 2) First First- -order approx. linear order approx. linear prog prog. (SB, 2005) . (SB, 2005) 2) Approximate value with linear comb. of basis funs. Approximate value with linear comb. of basis funs. � � ☺ project onto weight space ☺ No simplification → → project onto weight space No simplification � � 3) Many practical questions remaining (SB, 2006) Many practical questions remaining (SB, 2006) 3) Other algorithms – – first first- -order API? order API? Other algorithms � � Where do basis functions come from? Where do basis functions come from? � � How to efficiently handle universal rewards? How to efficiently handle universal rewards? � � Optimizations for scalability? Optimizations for scalability? � � 3

FOMDP Foundation: SitCalc SitCalc FOMDP Foundation: loadS(b,t), (b,t), unloadS unloadS(b,t), … (b,t), … Deterministic Actions: loadS � Deterministic Actions: � Situations: S S 0 , do(loadS loadS(b,t), S (b,t), S 0 ), … 0 , do( 0 ), … � Situations: � : BIn BIn(b,c,s), (b,c,s), TIn TIn(t,c,s), On(b,t,s) (t,c,s), On(b,t,s) � Fluents Fluents: � F : each fluent F � Successor Successor- -state axioms ( state axioms (SSAs SSAs) ) for for each fluent : � (like det det. FO . FO- -DBN) DBN) � Describe how action affects fluent Describe how action affects fluent (like � BIn(b,c,do(a,s)) (b,c,do(a,s)) ≡ Ex: BIn � Ex: ≡ � (1) Bin(b,c,s) AND (1) Bin(b,c,s) AND a a g g loadS loadS(b,t) (b,t) OR (2) (2) for some for some t t : : a a = = unloadS unloadS(b,t) (b,t) AND AND TIn TIn(t,c,s) (t,c,s) OR ) = = ϕ Regression Operator: Regr Regr( ( ϕ ’ ϕ ) ϕ ’ � Regression Operator: � � Takes a formula Takes a formula ϕ ϕ describing a describing a post post- -action action state state � ’ describing ϕ ’ � Uses Uses SSAs SSAs to build to build ϕ describing pre pre- -action action state state � � Crucial for backing up value fun to produce Q Crucial for backing up value fun to produce Q- -fun! fun! � 4

FOMDP Case Representation FOMDP Case Representation Assign value to first- -order state abstraction order state abstraction Case: Assign value to first � Case: � E.g., can express reward in BoxWorld BoxWorld FOMDP as… � E.g., can express reward in FOMDP as… � 1 1 b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s) ∀ b,c. ∀ rCase(s) (s) = = rCase 0 0 b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s) ∀ b,c. ¬ ∀ ¬ Define unary, binary case operations Operators: Define unary, binary case operations � Operators: � / (or 1 , 0 ) sum” / (or 1 , 0 � E.g., can take “cross E.g., can take “cross- -sum” ) of two cases of two cases… … � 13 1 3 ∃ x.A(x) x.A(x) ∧ ∃ y.A(y) y.A(y) ∧ ∧ B(y) B(y) ∃ ∧ ∃ 3 = 10 10 3 x.A(x) ∃ x.A(x) ∃ y.A(y) y.A(y) ∧ ∧ B(y) B(y) 14 4 1 ∃ x.A(x) x.A(x) ∧ ∃ y.A(y) y.A(y) ∧ ∧ B(y) B(y) ∃ ∃ = ∧ ¬ ¬∃ ∃ / / 20 4 20 4 x.A(x) y.A(y) ∧ B(y) ∃ x.A(x) ∃ y.A(y) ∧ B(y) 23 3 2 x.A(x) ∧ y.A(y) ∧ B(y) ¬∃ ¬∃ ¬∃ x.A(x) ∃ y.A(y) ∧ B(y) ¬ ¬ ∧ ∃ ¬∃ 24 4 2 x.A(x) ∧ y.A(y) ∧ B(y) ¬∃ x.A(x) ∃ y.A(y) ∧ B(y) ∧ ¬ ¬∃ ¬∃ � Must remove inconsistent elements (i.e., red bar ) Must remove inconsistent elements (i.e., red bar ) � 5

FOMDP Actions and FODTR FOMDP Actions and FODTR � SitCalc SitCalc is deterministic, how to handle probabilities? is deterministic, how to handle probabilities? � User’s stochastic actions: load(b,t) load(b,t) � User’s stochastic actions: � Nature’s deterministic choice: loadS loadS(b,t) (b,t), , loadF loadF(b,t) (b,t) � Nature’s deterministic choice: � � Probability distribution over Nature’s choice: Probability distribution over Nature’s choice: � snow (s) .1 snow .1 (s) P(loadS loadS(b,t) (b,t) | load(b,t)) = | load(b,t)) = P( ¬ snow snow (s) .5 .5 (s) ¬ 0 P( | load(b,t)) = 1 0 P(loadF loadF(b,t) (b,t) | load(b,t)) = 1 P(loadS loadS(b,t) (b,t) | load(b,t)) | load(b,t)) P( � First First- -order decision order decision- -theoretic regression (FODTR): theoretic regression (FODTR): � Given value fun vCase vCase(s) (s) and user action, produces � Given value fun and user action, produces � first- -order description of “Q order description of “Q- -fun” (modulo reward) fun” (modulo reward) first “Q- -Fun” = Fun” = FODTR[ FODTR[ vCase vCase(s), load(b,t) ] = (s), load(b,t) ] = “Q Regr[ Regr [ vCase vCase( after ( after loadS loadS… … ) ] ) ] P( loadS P( loadS… … | load… ) | load… ) 1 1 Regr[ Regr [ vCase vCase( after ( after loadF loadF… … ) ] ) ] P( loadF P( loadF… … | load… ) | load… ) 1 1 / / 6

FOMDP Backup Operators FOMDP Backup Operators In fact, there are 3 types of “Q- -funs”/backup operators: funs”/backup operators: In fact, there are 3 types of “Q 1) B A( [vCase vCase(s) (s)] = ] = rCase rCase(s) (s) / FODTR[vCase vCase(s) (s)] ] 1) B ) [ γ⋅ FODTR[ A( x x ) / γ⋅ .9 .9 (b,t) ϕ (b,t) Think of as Q(A(x),s) Q(A(x),s), , ϕ Think of as Let B Let B load (b,t) [ [vCase vCase(s)] (s)] = = load(b,t) 0 0 (b,t) ϕ (b,t) note the free vars vars! ! ¬ ϕ note the free ¬ 2) B A [vCase vCase(s) (s)] = ] = ∃ . B B A( [vCase vCase(s) (s)] ] (action abstraction!) 2) B A [ ) [ A( x x ) ∃ x x . (action abstraction!) .9 .9 ∃ b,t b,t. . ϕ ϕ (b,t) (b,t) Think of as ~Q(A,s) ~Q(A,s), , no ∃ no Think of as B load [vCase vCase(s)] (s)] = = B load [ 0 0 b,t. . ¬ (b,t) free vars vars but now overlap! but now overlap! ∃ b,t ϕ (b,t) free ¬ ϕ ∃ 3) B A [vCase vCase(s) (s)] = max( B ] = max( B A [vCase vCase(s) (s)] ) ] ) 3) B max [ A [ A max .9 .9 ∃ b,t b,t. . ϕ ϕ (b,t) (b,t) ∃ Think of as Q(A,s) Q(A,s), , no B load [vCase vCase(s)] (s)] = = B max [ Think of as load no max 0 0 ¬ ( ( ∃ ∃ b,t b,t. . ϕ ϕ (b,t)) (b,t)) free vars vars and and no no overlap! overlap! free ¬ ∃ b,t b,t. . ¬ ϕ (b,t) (b,t) ¬ ϕ ∧ ∃ ∧ 7

First- -order Approx. Linear order Approx. Linear Prog Prog. (FOALP) . (FOALP) First � Represent value fn as linear comb. of k basis fns: Represent value fn as linear comb. of k basis fns: � 1 1 1 1 b,c BIn BIn(b,c,s) (b,c,s) t,c TIn TIn(t,c,s) (t,c,s) ∃ b,c ∃ t,c ∃ ∃ vCase(s) = w (s) = w 1 1 • vCase ⊕ … … ⊕ ⊕ w w k • k • ⊕ 0 0 0 0 b,c BIn BIn(b,c,s) (b,c,s) t,c TIn TIn(t,c,s) (t,c,s) ∃ b,c ∃ t,c ¬ ∃ ¬ ∃ ¬ ¬ � Reduces MDP solution to finding good weights… Reduces MDP solution to finding good weights… � generalize approx. LP approx. LP used by (van Roy, GKP, SP): used by (van Roy, GKP, SP): generalize Vars: : w i ; i [ [ k k Vars w i ; i Σ s Σ i=1..k Minimize: Minimize: Σ s Σ i=1..k w w i i •bCase bCase i i (s) (s) Subject to: 0 0 m m B B a [ / w i i •bCase bCase i (s)] Subject to: max [ i=1..k w i (s)] a / i=1..k max w i i •bCase bCase i (s); ∀ a ∈ A,s i=1..k w i (s); ∀ a ∈ A,s 0 / 0 / i=1..k � FOALP issues resolved in (SB, 2005): FOALP issues resolved in (SB, 2005): � We give principled approximation � ∞ ∞ sum in objective: sum in objective: We give principled approximation � Only finite set of distinct distinct constraints, constraints, � ∞ ∞ constraints: constraints: Only finite set of � solve exactly & efficiently w/ constraint gen. (SP) solve exactly & efficiently w/ constraint gen. (SP) 8

Practical Linear- -value value Practical Linear Approximation - PowerPoint PPT Presentation

Practical Linear- -value value Practical Linear Approximation Techniques Approximation Techniques for First- -order order MDPs MDPs for First & Craig Sanner & Scott Sanner Craig Boutilier Boutilier Scott University of Toronto

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Practical Experience with Practical Experience with Practical Experience with Practical

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

CSpace CSpace CSpace CSpace A More Practical and A More Practical and A

ARDUINO & ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model

Study to Select Value Chain and Analyze Selected Value Chain Presentation on Value Chain

Value Management Overview Doug Cantrell, PE, PMP AGC Conference March 14, 2018 Value

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

How To Make It Work For You DATE: February 28, 2019 Agenda 1. Introduction of Speakers 2. Social

Day Ahead Market Enhancements: Updates to Revised Straw Proposal Workshop June 19, 2018 Agenda

1 Apollo includes

2019-21 GOVERNORS BUDGET (GB) PICS & ORBITS RECONCILIATION PROCESS SABR Coordinators

Reclaiming the Harlem River Waterfront Envisioning a better future for the Bronx and the Harlem

Determining Pavement Design Criteria for Recycled Aggregate Base and Large Stone Subbase Bora

Getting Started in Community Partnerships PARTNERSHIPS THAT WORK SERIES SESSION 2 AUGUST 23,

Transit Effectiveness Project March 28, 2014 SFMTAs Board of Directors Possible SFMTAB Actions

Practical Linear- -value value Practical Linear Approximation - PowerPoint PPT Presentation

Practical Linear- -value value Practical Linear Approximation Techniques Approximation Techniques for First- -order order MDPs MDPs for First & Craig Sanner & Scott Sanner Craig Boutilier Boutilier Scott University of Toronto

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Practical Experience with Practical Experience with Practical Experience with Practical

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

CSpace CSpace CSpace CSpace A More Practical and A More Practical and A

ARDUINO &amp; ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &amp;

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model

Study to Select Value Chain and Analyze Selected Value Chain Presentation on Value Chain

Value Management Overview Doug Cantrell, PE, PMP AGC Conference March 14, 2018 Value

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

How To Make It Work For You DATE: February 28, 2019 Agenda 1. Introduction of Speakers 2. Social

Day Ahead Market Enhancements: Updates to Revised Straw Proposal Workshop June 19, 2018 Agenda

1 Apollo includes

2019-21 GOVERNORS BUDGET (GB) PICS &amp; ORBITS RECONCILIATION PROCESS SABR Coordinators

Reclaiming the Harlem River Waterfront Envisioning a better future for the Bronx and the Harlem

Determining Pavement Design Criteria for Recycled Aggregate Base and Large Stone Subbase Bora

Getting Started in Community Partnerships PARTNERSHIPS THAT WORK SERIES SESSION 2 AUGUST 23,

Transit Effectiveness Project March 28, 2014 SFMTAs Board of Directors Possible SFMTAB Actions

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

ARDUINO & ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &

2019-21 GOVERNORS BUDGET (GB) PICS & ORBITS RECONCILIATION PROCESS SABR Coordinators