Integrating decision-theoretic planning and programming for robot - - PowerPoint PPT Presentation

integrating decision theoretic planning and programming
SMART_READER_LITE
LIVE PREVIEW

Integrating decision-theoretic planning and programming for robot - - PowerPoint PPT Presentation

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains Christian Fritz Thesis, Final Presentation Integrating decision-theoretic planning and programming for robot control in highly dynamic domains


slide-1
SLIDE 1

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains

Christian Fritz Thesis, Final Presentation

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.1/32

slide-2
SLIDE 2

Introduction

Goals:

◮ combine:

programming decision-theoretic planning

  • n-line!

◮ extend planning with options ◮ evaluate in three diversified example domains

grid world RoboCup Simulation RoboCup Mid-Size

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.2/32

slide-3
SLIDE 3

Programming

ICPGOLOG

◮ based on situation calculus ◮ extends basic GOLOG: + on-line: incremental, sensing (active and passive) + continuous change + concurrency + progression + probabilistic projection – nondeterminism ◮ problems:

decision making: explicit, missing utility theory projection comparatively slow

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.3/32

slide-4
SLIDE 4

Decision-Theoretic Planning

Markov Decision Processes (MDPs) standard model for decision-theoretic planning problems ◮ Formally: M =< S, A, T, R >, with

S a set of states A a set of actions T : S × A × S → [0, 1] a transition function R : S → I

R a reward function ◮ Here: fully observable MDPs ◮ Planning task: find an optimal policy, maximizing expected reward ◮ Note: S and A are usually finite!

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.4/32

slide-5
SLIDE 5

Programming & Planning: DTGolog

◮ New Golog derivative DTGOLOG [Boutilier et al.] ◮ Combines explicit agent programming with planning ◮ Uses MDPs to model the planning problem:

S = situations A = primitive actions T = for each action a ∈ A, a list of outcomes and their

respective probability

R : situations → I

R ◮ applies decision-tree search to solve MDP up to a given horizon

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.5/32

slide-6
SLIDE 6

Programming & Planning: DTGolog

Disadvantages:

◮ offline ◮ situations = states

infinite state space inefficient

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.6/32

slide-7
SLIDE 7

READYLOG

Contributions:

◮ re-added nondeterminism with decision-theoretic

semantics

→ on-line decision-theoretic Golog ◮ added options to speed up MDP solution ◮ preprocessor to minimize interpretation on-line

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.7/32

slide-8
SLIDE 8

Part I

Extending DTGolog with Options

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.8/32

slide-9
SLIDE 9

Options? what’s that?

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.9/32

slide-10
SLIDE 10

Options

Idea:

◮ construct complex actions from primitive ones ◮ options: solutions to sub-MDPs ◮ generate models about them:

when possible to execute? which outcomes possible to occur? which probabilities do the outcomes have? expected rewards and costs? (expected value)

◮ these can then be used in planning

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.10/32

slide-11
SLIDE 11

Integrating Options into Golog

how do we integrate options into DTGolog/ReadyLog? ◮ avoiding the inconvenience “situations = states” ◮ instead mappings:

situations → states (when ’entering’ option) states → situations (when ’leaving’ option)

◮ options..

..are solutions to local MDPs.. ..encapsulated into a stochastic procedure.

◮ stochastic procedures..

..are procedures with an explicit model (preconditions/effects/costs); ..replace stochastic actions; ..can model options.

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.11/32

slide-12
SLIDE 12

Generating Options

how do we generate options? ◮ defi ne:

φ precondition (think: states where option is applicable) β : exitstates → value pseudo-rewards for local MDP θ option-skeleton one-step program to take in each step..

  • ..usually something like nondet(

[left, right, down, up] );

  • ..can contain ifs;
  • ..can build on options/stochastic procedures

and: two mappings:

  • Φ : situations → states
  • Σ : states → situations
  • option_mapping(o, σ, Γ, ϕ)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.12/32

slide-13
SLIDE 13

Examples

◮ example policy:

proc (room1_2 , [exogf_Update , while (is_possible (room1_2 ), [ if(pos=[0, 0], go_right , if(pos=[0, 1], go_right , if(pos=[0, 2], go_up , if(pos =[1, 0], go_right , if(pos=[1, 1], go_right , if(pos=[1, 2], go_right , if(pos=[2, 0], go_down , if(pos=[2, 1], go_right , if(pos=[2, 2], go_up , []))))))))), exogf_Update ])]).

◮ example model (for state ’position=(0,0)’):

  • pt_costs

(room1_2, [(pos, [0, 0])], 4.51650594972207).

  • pt_probability_list

(room1_2 , [(pos , [0, 0])], [([(pos , [1, 3])], 0.00012), ([(pos , [3, 1])], 0.99987)]).

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.13/32

slide-14
SLIDE 14

Test Setting

3 4 5 6 7 2 1

S

G11 G4 G3

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.14/32

slide-15
SLIDE 15

Experimental Results

(a) full MDP planning (A) (b) heuristics (B) (c) options (C)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.15/32

slide-16
SLIDE 16

Experimental Results

0.1 0.03 9555.86 2507.56 302.01 0.048 53.6 702.33 38.63 11.23 6.81 1.04 0.55 3.66 3 4 5 6 7 8 9 10 11 seconds manhattan distance from start to goal A A’ B B’ C

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.16/32

slide-17
SLIDE 17

Part II

On-line Decision-Theoretic Golog for Unpredictable Domains

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.17/32

slide-18
SLIDE 18

READYLOG: on-line DT planning

  • n-line:

◮ incremental

solve(

plan-skeleton , horizon)

execute returned policy

◮ sensing / exogenous events

problem:

  • dynamic environment (changes while thinking)
  • imperfect models

→ policy can get invalid ⇒ execution monitoring:

  • program and policy coexistence
  • markers

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.18/32

slide-19
SLIDE 19

Execution Monitoring Semantics

Trans(solve(p, h), s, δ′, s′) ≡ ∃π, v, pr.BestDo(p, s, h, π, v, pr) ∧ δ′ = applyPol(π) ∧ s′ = s. BestDo(if (ϕ, p1, p2); p,s, h, π, v, pr)

.

= ϕ[s] ∧ ∃π1.BestDo(p1; p, s, h, π1, pr) ∧ π = M(ϕ, true); π1 ∨ ¬ϕ[s] ∧ ∃π2.BestDo(p2; p, s, h, π2, v, pr) ∧ π = M(ϕ, false); π2 Trans(applyPol(M(ϕ, v); π), s, δ′, s′) ≡ s = s′∧ (v = true ∧ ϕ[s] ∧ δ′ = applyPol(π) ∨ v = false ∧ ¬ϕ[s] ∧ δ′ = applyPol(π) ∨ v = true ∧ ¬ϕ[s] ∧ δ′ = nil ∨ v = false ∧ ϕ[s] ∧ δ′ = nil)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.19/32

slide-20
SLIDE 20

READYLOG II

◮ options (..) ◮ preprocessor:

translates READYLOG functions, conditions, defi nitions.. to Prolog code creates successor state axioms from effect axioms speed-up of about factor 16

1 4 16 64 256 1024 200 400 600 800 1000 1200 1400 1600 1800 2000 seconds length of situation term effect axioms (uncompiled) successor state axioms (compiled)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.20/32

slide-21
SLIDE 21

Experimental Results

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.21/32

slide-22
SLIDE 22

Experimental Results: SimLeague

◮ compared with ICPGOLOG(Normans results) planning time in seconds

ICPGOLOG

READYLOG goal shot 0.35 0.01 direct pass 0.25 0.01 ◮ speed-up due to preprocessor Example where these are combined (demo):

solve (nondet ([goalKick (OwnNumber ), [pickBest (bestP , [2..11], [directPass (OwnNumber , bestP, pass_NORMAL ), goalKick (bestP )])] ]), Horizon )

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.22/32

slide-23
SLIDE 23

Experimental Results: MidSize

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.23/32

slide-24
SLIDE 24

Experimental Results: MidSize, Code

solve ( nondet ([kick(ownNumber , 40), dribble_or_move_kick (ownNumber ), dribble_to_points (ownNumber ), if(isKickable (ownNumber ), pickBest (var_turnAngle , [-3.1,

  • 2.3,

2.3, 3.1], [ turn_relative (ownNumber , var_turnAngle , 2), nondet ([ [intercept_ball (ownNumber , 1), dribble_or_move_kick (ownNumber )], [intercept_ball (numberByRole (supporter ), 1), dribble_or_move_kick (numberByRole (supporter ))] ]) ]), nondet ([ [intercept_ball (ownNumber , 1), dribble_or_move_kick (ownNumber )], intercept_ball (ownNumber , 0.0, 1)]) ) ]), 4) proc (dribble_or_move_kick (Own ), nondet ([[dribble_to (Own , oppGoalBestCorner , 1)], [move_kick (Own , oppGoalBestCorner , 1)]])). proc (dribble_to_points (Own), pickBest (var_pos , [[2.5,

  • 1.25],

[2.5,

  • 2.5],

[2.5, 0.0],[2.5, 2.5], [2.5, 1.25]], dribble_to (Own , var_pos , 1))).

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.24/32

slide-25
SLIDE 25

Experimental Results: MidSize, Behavior

ball behavior when turning with ball move_kick/dribble move/dribble/intercept

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.25/32

slide-26
SLIDE 26

Experimental Results: MidSize, Example: Situation

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.26/32

slide-27
SLIDE 27

Experimental Results: MidSize, Example: Plans

(d) (e) (f)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.27/32

slide-28
SLIDE 28

Experimental Results: MidSize, Example: Teamplay

(g) (h) (i)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.28/32

slide-29
SLIDE 29

Experimental Results: MidSize, Example: Decision Tree

natures choices agent choices move_kick kick turn intercept(me) intercept(TM) move_kick move_kick 0.8 0.2 0.8 0.2

10000 10000 4169 4169 costs: −70 costs: −70 costs: −70 costs: −70 4169 4169 4169 costs: −12 costs: −7 4557 4776 4623

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.29/32

slide-30
SLIDE 30

Experimental Results: MidSize, Computation

planning time in seconds examples min avg max without ball 698 0.0 0.094 0.450 with ball 117 0.170 0.536 2.110

◮ variance due to processor load on robots ◮ qualitatively: enough for rudimentarily playing

soccer

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.30/32

slide-31
SLIDE 31

Conclusion

◮ On-line decision theoretic Golog..

..can be applied to highly dynamic domains with

infinite/continuous state spaces,

..can coexist with passive sensing, ..motivates more sophisticated execution monitoring.

◮ Options..

..can be added to decision theoretic Golog, ..provide good speed-ups, ..rely on finite state spaces(!).

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.31/32