Integrating decision-theoretic planning and programming for robot - PowerPoint PPT Presentation

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains Christian Fritz Thesis, Final Presentation Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.1/32

Introduction Goals: ◮ combine: � programming � decision-theoretic planning � on-line! ◮ extend planning with options ◮ evaluate in three diversified example domains � grid world � RoboCup Simulation � RoboCup Mid-Size Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.2/32

Programming ICP GOLOG ◮ based on situation calculus ◮ extends basic GOLOG: + on-line: incremental, sensing (active and passive) + continuous change + concurrency + progression + probabilistic projection – nondeterminism ◮ problems: � decision making: explicit, missing utility theory � projection comparatively slow Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.3/32

Decision-Theoretic Planning Markov Decision Processes (MDPs) standard model for decision-theoretic planning problems ◮ Formally: M = < S, A, T, R > , with � S a set of states � A a set of actions � T : S × A × S → [0 , 1] a transition function � R : S → I R a reward function ◮ Here: fully observable MDPs ◮ Planning task: find an optimal policy, maximizing expected reward ◮ Note: S and A are usually finite! Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.4/32

Programming & Planning: DTGolog ◮ New Golog derivative DTG OLOG [Boutilier et al.] ◮ Combines explicit agent programming with planning ◮ Uses MDPs to model the planning problem: � S = situations � A = primitive actions � T = for each action a ∈ A , a list of outcomes and their respective probability � R : situations → I R ◮ applies decision-tree search to solve MDP up to a given horizon Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.5/32

Programming & Planning: DTGolog Disadvantages: ◮ offline ◮ situations = states � infinite state space � inefficient Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.6/32

R EADY L OG Contributions: ◮ re-added nondeterminism with decision-theoretic semantics → on-line decision-theoretic Golog ◮ added options to speed up MDP solution ◮ preprocessor to minimize interpretation on-line Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.7/32

Part I Extending DTGolog with Options Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.8/32

Options? what’s that? Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.9/32

Options Idea: ◮ construct complex actions from primitive ones ◮ options: solutions to sub-MDPs ◮ generate models about them: � when possible to execute? � which outcomes possible to occur? � which probabilities do the outcomes have? � expected rewards and costs? ( expected value ) ◮ these can then be used in planning Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.10/32

Integrating Options into Golog how do we integrate options into DTGolog/ReadyLog? ◮ avoiding the inconvenience “situations = states” ◮ instead mappings: � situations → states (when ’entering’ option) � states → situations (when ’leaving’ option) ◮ options.. � ..are solutions to local MDPs.. � ..encapsulated into a stochastic procedure. ◮ stochastic procedures.. � ..are procedures with an explicit model (preconditions/effects/costs); � ..replace stochastic actions; � ..can model options. Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.11/32

Generating Options how do we generate options? ◮ defi ne: � φ precondition (think: states where option is applicable) � β : exitstates → value pseudo-rewards for local MDP � θ option-skeleton one-step program to take in each step.. • ..usually something like nondet( ) ; [left, right, down, up] • ..can contain ifs; • ..can build on options/stochastic procedures � and: two mappings: • Φ : s ituations → s tates • Σ : s tates → s ituations • option _ mapping ( o, σ, Γ , ϕ ) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.12/32

Examples ◮ example policy: proc (room1_2 , [exogf_Update , while (is_possible (room1_2 ), [ if (pos=[0, 0], go_right , if (pos=[0, 1], go_right , if (pos=[0, 2], go_up , if (pos =[1, 0], go_right , if (pos=[1, 1], go_right , if (pos=[1, 2], go_right , if (pos=[2, 0], go_down , if (pos=[2, 1], go_right , if (pos=[2, 2], go_up , []))))))))), exogf_Update ])]). ◮ example model (for state ’position=(0,0)’): opt_costs (room1_2, [(pos, [0, 0])], 4.51650594972207). opt_probability_list (room1_2 , [(pos , [0, 0])], [([(pos , [1, 3])], 0.00012), ([(pos , [3, 1])], 0.99987)]). Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.13/32

Test Setting 1 2 S G3 G4 3 4 5 6 7 G11 Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.14/32

Experimental Results (a) full MDP (b) heuristics (c) options (C) planning (A) (B) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.15/32

Experimental Results 9555.86 2507.56 702.33 302.01 53.6 38.63 seconds 11.23 6.81 3.66 1.04 A 0.55 A’ B 0.1 B’ 0.048 0.03 C 3 4 5 6 7 8 9 10 11 manhattan distance from start to goal Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.16/32

Part II On-line Decision-Theoretic Golog for Unpredictable Domains Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.17/32

R EADY L OG : on-line DT planning on-line: ◮ incremental � solve( plan-skeleton , horizon) � execute returned policy ◮ sensing / exogenous events � problem: • dynamic environment (changes while thinking) • imperfect models → policy can get invalid ⇒ execution monitoring : • program and policy coexistence • markers Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.18/32

Execution Monitoring Semantics Trans ( solve ( p, h ) , s, δ ′ , s ′ ) ≡ ∃ π, v, pr . BestDo ( p, s, h, π, v, pr ) ∧ δ ′ = applyPol ( π ) ∧ s ′ = s. BestDo ( if ( ϕ, p 1 , p 2 ); p , s, h, π, v, pr ) = . ϕ [ s ] ∧ ∃ π 1 . BestDo ( p 1 ; p, s, h, π 1 , pr ) ∧ π = M ( ϕ, true ); π 1 ∨ ¬ ϕ [ s ] ∧ ∃ π 2 . BestDo ( p 2 ; p, s, h, π 2 , v, pr ) ∧ π = M ( ϕ, false ); π 2 Trans ( applyPol ( M ( ϕ, v ); π ) , s, δ ′ , s ′ ) ≡ s = s ′ ∧ ( v = true ∧ ϕ [ s ] ∧ δ ′ = applyPol ( π ) ∨ v = false ∧ ¬ ϕ [ s ] ∧ δ ′ = applyPol ( π ) ∨ v = true ∧ ¬ ϕ [ s ] ∧ δ ′ = nil ∨ v = false ∧ ϕ [ s ] ∧ δ ′ = nil ) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.19/32

R EADY L OG II ◮ options (..) ◮ preprocessor: � translates R EADY L OG functions, conditions, defi nitions.. to Prolog code � creates successor state axioms from effect axioms � speed-up of about factor 16 effect axioms (uncompiled) 1024 successor state axioms (compiled) 256 64 seconds 16 4 1 200 400 600 800 1000 1200 1400 1600 1800 2000 length of situation term Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.20/32

Experimental Results Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.21/32

Experimental Results: SimLeague ◮ compared with ICP GOLOG(Normans results) planning time in seconds ICP GOLOG R EADY L OG goal shot 0.35 0.01 direct pass 0.25 0.01 ◮ speed-up due to preprocessor Example where these are combined (demo): solve ( nondet ([goalKick (OwnNumber ), [ pickBest (bestP , [2..11], [directPass (OwnNumber , bestP, pass_NORMAL ), goalKick (bestP )])] ]), Horizon ) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.22/32

Experimental Results: MidSize Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.23/32

Integrating decision-theoretic planning and programming for robot - PowerPoint PPT Presentation

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains Christian Fritz Thesis, Final Presentation Integrating decision-theoretic planning and programming for robot control in highly dynamic domains

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Causality: a decision theoretic foundation Pablo Schenone Arizona State University This

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

A proof-theoretic foundation for tabled higher-order logic programming Brigitte Pientka

Position-theoretic semantics and entailment David Ripley Monash University

Lattice-Theoretic Data-Flow Framework and Intro to SSA Last Time Started lattice theoretic

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 1 - Elements of Information

Faster arithmetic for number-theoretic transforms David Harvey University of New South Wales 7th

ORDER-THEORETIC INVARIANTS IN SET-THEORETIC TOPOLOGY By David Milovich A dissertation submitted

Lattice-Theoretic Framework for Data-Flow Analysis Last time Generalizing data-flow

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 2 - Elements of Information

A linear operator-theoretic approach to nonlinear systems Alexandre Mauroy University of Namur

Position paper: Proof-Theoretic Semantics as a viable alternative to Model-Theoretic Semantics

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 4 - Elements of Information

A Model-Theoretic Reconstruction of Type-Theoretic Semantics for Anaphora Matthew Gotham

Window Uniqueness Constraint Digital Human Research Center, AIST Shuntaro Yamazaki and Masaaki

Q-Learning An agent tries an action at a particular state, and evaluates its consequences in

Towards a Computer Algebra System with Automatic Differentiation for use with object-oriented

Pr t sss

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

TITLEPAGE Bc., Ing., Ph.D. October 22, 2018 Die Die Hardcode all six possibilities

The Power of Compromise Approximation in Multicriteria Optimization C. B using, Kai-Simon

Approximate Search and Data Reduction Algorithms Research Questions Kyle Porter NTNU Gjvik