integrating decision theoretic planning and programming
play

Integrating decision-theoretic planning and programming for robot - PowerPoint PPT Presentation

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains Christian Fritz Thesis, Final Presentation Integrating decision-theoretic planning and programming for robot control in highly dynamic domains


  1. Integrating decision-theoretic planning and programming for robot control in highly dynamic domains Christian Fritz Thesis, Final Presentation Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.1/32

  2. Introduction Goals: ◮ combine: � programming � decision-theoretic planning � on-line! ◮ extend planning with options ◮ evaluate in three diversified example domains � grid world � RoboCup Simulation � RoboCup Mid-Size Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.2/32

  3. Programming ICP GOLOG ◮ based on situation calculus ◮ extends basic GOLOG: + on-line: incremental, sensing (active and passive) + continuous change + concurrency + progression + probabilistic projection – nondeterminism ◮ problems: � decision making: explicit, missing utility theory � projection comparatively slow Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.3/32

  4. Decision-Theoretic Planning Markov Decision Processes (MDPs) standard model for decision-theoretic planning problems ◮ Formally: M = < S, A, T, R > , with � S a set of states � A a set of actions � T : S × A × S → [0 , 1] a transition function � R : S → I R a reward function ◮ Here: fully observable MDPs ◮ Planning task: find an optimal policy, maximizing expected reward ◮ Note: S and A are usually finite! Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.4/32

  5. Programming & Planning: DTGolog ◮ New Golog derivative DTG OLOG [Boutilier et al.] ◮ Combines explicit agent programming with planning ◮ Uses MDPs to model the planning problem: � S = situations � A = primitive actions � T = for each action a ∈ A , a list of outcomes and their respective probability � R : situations → I R ◮ applies decision-tree search to solve MDP up to a given horizon Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.5/32

  6. Programming & Planning: DTGolog Disadvantages: ◮ offline ◮ situations = states � infinite state space � inefficient Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.6/32

  7. R EADY L OG Contributions: ◮ re-added nondeterminism with decision-theoretic semantics → on-line decision-theoretic Golog ◮ added options to speed up MDP solution ◮ preprocessor to minimize interpretation on-line Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.7/32

  8. Part I Extending DTGolog with Options Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.8/32

  9. Options? what’s that? Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.9/32

  10. Options Idea: ◮ construct complex actions from primitive ones ◮ options: solutions to sub-MDPs ◮ generate models about them: � when possible to execute? � which outcomes possible to occur? � which probabilities do the outcomes have? � expected rewards and costs? ( expected value ) ◮ these can then be used in planning Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.10/32

  11. Integrating Options into Golog how do we integrate options into DTGolog/ReadyLog? ◮ avoiding the inconvenience “situations = states” ◮ instead mappings: � situations → states (when ’entering’ option) � states → situations (when ’leaving’ option) ◮ options.. � ..are solutions to local MDPs.. � ..encapsulated into a stochastic procedure. ◮ stochastic procedures.. � ..are procedures with an explicit model (preconditions/effects/costs); � ..replace stochastic actions; � ..can model options. Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.11/32

  12. Generating Options how do we generate options? ◮ defi ne: � φ precondition (think: states where option is applicable) � β : exitstates → value pseudo-rewards for local MDP � θ option-skeleton one-step program to take in each step.. • ..usually something like nondet( ) ; [left, right, down, up] • ..can contain ifs; • ..can build on options/stochastic procedures � and: two mappings: • Φ : s ituations → s tates • Σ : s tates → s ituations • option _ mapping ( o, σ, Γ , ϕ ) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.12/32

  13. Examples ◮ example policy: proc (room1_2 , [exogf_Update , while (is_possible (room1_2 ), [ if (pos=[0, 0], go_right , if (pos=[0, 1], go_right , if (pos=[0, 2], go_up , if (pos =[1, 0], go_right , if (pos=[1, 1], go_right , if (pos=[1, 2], go_right , if (pos=[2, 0], go_down , if (pos=[2, 1], go_right , if (pos=[2, 2], go_up , []))))))))), exogf_Update ])]). ◮ example model (for state ’position=(0,0)’): opt_costs (room1_2, [(pos, [0, 0])], 4.51650594972207). opt_probability_list (room1_2 , [(pos , [0, 0])], [([(pos , [1, 3])], 0.00012), ([(pos , [3, 1])], 0.99987)]). Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.13/32

  14. Test Setting 1 2 S G3 G4 3 4 5 6 7 G11 Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.14/32

  15. Experimental Results (a) full MDP (b) heuristics (c) options (C) planning (A) (B) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.15/32

  16. Experimental Results 9555.86 2507.56 702.33 302.01 53.6 38.63 seconds 11.23 6.81 3.66 1.04 A 0.55 A’ B 0.1 B’ 0.048 0.03 C 3 4 5 6 7 8 9 10 11 manhattan distance from start to goal Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.16/32

  17. Part II On-line Decision-Theoretic Golog for Unpredictable Domains Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.17/32

  18. R EADY L OG : on-line DT planning on-line: ◮ incremental � solve( plan-skeleton , horizon) � execute returned policy ◮ sensing / exogenous events � problem: • dynamic environment (changes while thinking) • imperfect models → policy can get invalid ⇒ execution monitoring : • program and policy coexistence • markers Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.18/32

  19. Execution Monitoring Semantics Trans ( solve ( p, h ) , s, δ ′ , s ′ ) ≡ ∃ π, v, pr . BestDo ( p, s, h, π, v, pr ) ∧ δ ′ = applyPol ( π ) ∧ s ′ = s. BestDo ( if ( ϕ, p 1 , p 2 ); p , s, h, π, v, pr ) = . ϕ [ s ] ∧ ∃ π 1 . BestDo ( p 1 ; p, s, h, π 1 , pr ) ∧ π = M ( ϕ, true ); π 1 ∨ ¬ ϕ [ s ] ∧ ∃ π 2 . BestDo ( p 2 ; p, s, h, π 2 , v, pr ) ∧ π = M ( ϕ, false ); π 2 Trans ( applyPol ( M ( ϕ, v ); π ) , s, δ ′ , s ′ ) ≡ s = s ′ ∧ ( v = true ∧ ϕ [ s ] ∧ δ ′ = applyPol ( π ) ∨ v = false ∧ ¬ ϕ [ s ] ∧ δ ′ = applyPol ( π ) ∨ v = true ∧ ¬ ϕ [ s ] ∧ δ ′ = nil ∨ v = false ∧ ϕ [ s ] ∧ δ ′ = nil ) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.19/32

  20. R EADY L OG II ◮ options (..) ◮ preprocessor: � translates R EADY L OG functions, conditions, defi nitions.. to Prolog code � creates successor state axioms from effect axioms � speed-up of about factor 16 effect axioms (uncompiled) 1024 successor state axioms (compiled) 256 64 seconds 16 4 1 200 400 600 800 1000 1200 1400 1600 1800 2000 length of situation term Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.20/32

  21. Experimental Results Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.21/32

  22. Experimental Results: SimLeague ◮ compared with ICP GOLOG(Normans results) planning time in seconds ICP GOLOG R EADY L OG goal shot 0.35 0.01 direct pass 0.25 0.01 ◮ speed-up due to preprocessor Example where these are combined (demo): solve ( nondet ([goalKick (OwnNumber ), [ pickBest (bestP , [2..11], [directPass (OwnNumber , bestP, pass_NORMAL ), goalKick (bestP )])] ]), Horizon ) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.22/32

  23. Experimental Results: MidSize Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.23/32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend