Distributed Planning for Large Teams Prasanna Velagapudi Thesis - - PowerPoint PPT Presentation

distributed planning for large teams
SMART_READER_LITE
LIVE PREVIEW

Distributed Planning for Large Teams Prasanna Velagapudi Thesis - - PowerPoint PPT Presentation

Distributed Planning for Large Teams Prasanna Velagapudi Thesis Committee: Katia Sycara (co-chair) Paul Scerri (co-chair) J. Andrew Bagnell Edmund H. Durfee Distributed Planning for Large Teams Outline Motivation Background


slide-1
SLIDE 1

Distributed Planning for Large Teams

Prasanna Velagapudi

Thesis Committee: Katia Sycara (co-chair) Paul Scerri (co-chair)

  • J. Andrew Bagnell

Edmund H. Durfee

Distributed Planning for Large Teams

slide-2
SLIDE 2

Outline

  • Motivation
  • Background
  • Approach

– SI-Dec-POMDP – DIMS

  • Preliminary Work

– DPP – D-TREMOR

  • Proposed Work
  • Conclusion

Distributed Planning for Large Teams 2

slide-3
SLIDE 3

Motivation

  • 100s to 1000s of robots,

agents, people

  • Complex, collaborative tasks
  • Dynamic, uncertain

environment

  • Offline planning

Distributed Planning for Large Teams 3

slide-4
SLIDE 4

Motivation

  • Scaling planning to large teams is hard

– Need to plan (with uncertainty) for each agent in team – Agents must consider the actions of a growing number of teammates – Full, joint problem has NEXP complexity [Bernstein 2002]

  • Optimality is going to be infeasible
  • Find and exploit structure in the problem
  • Make good plans in reasonable amount of time

4 Distributed Planning for Large Teams

slide-5
SLIDE 5

Motivation

  • Exploit three characteristics of these domains
  • 1. Explicit Interactions
  • Specific combinations of states and actions where effects

depend on more than one agent

  • 2. Sparsity of Interactions
  • Many potential interactions could occur between agents
  • Only a few will occur in any given solution
  • 3. Distributed Computation
  • Each agent has access to local computation
  • A centralized algorithm has access to 1 unit of computation
  • A distributed algorithm has access to N units of computation

Distributed Planning for Large Teams 5

slide-6
SLIDE 6

Example: Interactions

Distributed Planning for Large Teams 6

Rescue robot Cleaner robot Debris Victim

slide-7
SLIDE 7

Example: Sparsity

Distributed Planning for Large Teams 7

slide-8
SLIDE 8

Related Work

Distributed Planning for Large Teams 8

Scalability Generality

slide-9
SLIDE 9

Related Work

Distributed Planning for Large Teams 9

Scalability Generality

Structured Dec-(PO)MDP planners

– JESP

[Nair 2003]

– TD-Dec-POMDP

[Witwicki 2010]

– EDI-CR

[Mostafa 2009]

– SPIDER

[Marecki 2009]

  • Restrict generality

slightly to get scalability

  • High optimality
slide-10
SLIDE 10

Related Work

Distributed Planning for Large Teams 10

Scalability Generality

Heuristic Dec-(PO)MDP planners

– TREMOR

[Varakantham 2009]

– OC-Dec-MDP

[Beynier 2005]

  • Sacrifice optimality

for scalability

  • High generality
slide-11
SLIDE 11

Related Work

Distributed Planning for Large Teams 11

Scalability Generality

Structured multiagent path planners

– DPC

[Bhattacharya 2010]

– Optimal Decoupling

[Van den Berg 2009]

  • Sacrifice generality

further to get scalability

  • High optimality
slide-12
SLIDE 12

Related Work

Distributed Planning for Large Teams 12

Scalability Generality

Heuristic multiagent path planners

– Dynamic Networks

[Clark 2003]

– Prioritized Planning

[Van den Berg 2005]

  • Sacrifice optimality

to get scalability

  • Poor generality
slide-13
SLIDE 13

Scalability Generality

Related Work

Distributed Planning for Large Teams 13

Our approach:

  • Fix high scalability

and generality

  • Explore what level
  • f optimality is

possible

slide-14
SLIDE 14

Distributed, Iterative Planning

  • Inspiration:

– TREMOR

[Varankantham 2009]

– JESP

[Nair 2003]

  • Reduce the full joint problem

into a set of smaller, independent sub-problems

  • Solve independent sub-

problems with local algorithm

  • Modify sub-problems to push

locally optimal solutions towards high-quality joint solution

Distributed Planning for Large Teams 14

slide-15
SLIDE 15

Thesis Statement

Agents in a large team with known sparse interactions can find computationally efficient high-quality solutions to planning problems through an iterative process of estimating the

actions of teammates, locally planning based on these

estimates, and refining their estimates by exchanging

coordination messages.

Distributed Planning for Large Teams 15

slide-16
SLIDE 16

Outline

  • Motivation
  • Background
  • Approach

– SI-Dec-POMDP – DIMS

  • Preliminary Work

– DPP – D-TREMOR

  • Proposed Work
  • Conclusion

Distributed Planning for Large Teams 16

Problem Formulation Proposed Algorithm

slide-17
SLIDE 17

Problem Formulation

POMDP Dec-POMDP Sparse-Interaction Dec-POMDP

Distributed Planning for Large Teams 17

slide-18
SLIDE 18

Review: POMDP

Distributed Planning for Large Teams 18

: Set of States : Set of Actions : Set of Observations : Transition function : Reward function : Observation function

slide-19
SLIDE 19

Review: Dec-POMDP

Distributed Planning for Large Teams 19

: Joint Transition : Joint Reward : Joint Observation

slide-20
SLIDE 20

Dec-POMDP  SI-Dec-POMDP

Distributed Planning for Large Teams 20

slide-21
SLIDE 21

Sparse Interaction Dec-POMDP

Distributed Planning for Large Teams 21

: : :

slide-22
SLIDE 22

Proposed Approach: DIMS

Distributed Iterative Model Shaping

  • Reduce the full joint problem into a set
  • f smaller, independent sub-problems

(one for each agent)

  • Solve independent sub-problems with

existing state-of-the-art algorithms

  • Modify sub-problems to such that local
  • ptimum solution corresponds to high-

quality joint solution

Distributed Planning for Large Teams 22

Task Allocation Local Planning Interaction Exchange Model Shaping

slide-23
SLIDE 23

Proposed Approach: DIMS

Distributed Iterative Model Shaping

Distributed Planning for Large Teams 23

Task Allocation Local Planning Interaction Exchange Model Shaping

  • Assign tasks to agents
  • Reduce search space considered by agent
  • Define local sub-problem for each robot
slide-24
SLIDE 24

Proposed Approach: DIMS

Distributed Iterative Model Shaping

Distributed Planning for Large Teams 24

Task Allocation Local Planning Interaction Exchange Model Shaping

  • Assign tasks to agents
  • Reduce search space considered by agent
  • Define local sub-problem for each robot

Full SI-Dec-POMDP Local (Independent) POMDP

slide-25
SLIDE 25

Proposed Approach: DIMS

Distributed Iterative Model Shaping

Distributed Planning for Large Teams 25

Task Allocation Local Planning Interaction Exchange Model Shaping

  • Solve local sub-problems using off-the-

shelf centralized solver

  • Result: Locally-optimal policy
slide-26
SLIDE 26

Proposed Approach: DIMS

Distributed Iterative Model Shaping

Distributed Planning for Large Teams 26

Task Allocation Local Planning Interaction Exchange Model Shaping

  • Given local policy: estimate local

probability and value of interactions

  • Communicate local probability and value
  • f relevant interactions to team members
  • Sparsity  Relatively small # of messages
slide-27
SLIDE 27

Proposed Approach: DIMS

Distributed Iterative Model Shaping

Distributed Planning for Large Teams 27

Task Allocation Local Planning Interaction Exchange Model Shaping

  • Modify local sub-problems to account for

presence of interactions

slide-28
SLIDE 28

Proposed Approach: DIMS

Distributed Iterative Model Shaping

Distributed Planning for Large Teams 28

Task Allocation Local Planning Interaction Exchange Model Shaping

  • Reallocate tasks or re-plan using modified

local sub-problem

slide-29
SLIDE 29

Any decentralized allocation mechanism (e.g. auctions) Stock graph, MDP, POMDP solver Lightweight local evaluation and low-bandwidth messaging Methods to alter local problem to incorporate non-local effects

Proposed Approach: DIMS

Distributed Iterative Model Shaping

Distributed Planning for Large Teams 29

Task Allocation Local Planning Interaction Exchange Model Shaping

slide-30
SLIDE 30

Outline

  • Motivation
  • Background
  • Approach

– SI-Dec-POMDP – DIMS

  • Preliminary Work

– DPP – D-TREMOR

  • Proposed Work
  • Conclusion

Distributed Planning for Large Teams 30

slide-31
SLIDE 31

Preliminary Results

Distributed Prioritized Planning (DPP) Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR)

Distributed Planning for Large Teams 31

5 10 15 2 4 6 8 10 12 14 16 18

  • P. Velagapudi, K. Sycara, and P. Scerri,

“Decentralized prioritized planning in large multirobot teams,” IROS 2010.

  • P. Velagapudi, P. Varakantham,
  • K. Sycara, and P. Scerri,

“Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents,” AAMAS 2011 (in submission).

slide-32
SLIDE 32

Preliminary Results

Distributed Prioritized Planning (DPP) Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR)

Distributed Planning for Large Teams 32

5 10 15 2 4 6 8 10 12 14 16 18

  • No uncertainty
  • Many potential interactions
  • Simple interactions
  • Action/Observation uncertainty
  • Fewer potential interactions
  • Complex interactions
slide-33
SLIDE 33

Multiagent Path Planning

5 10 15 2 4 6 8 10 12 14 16 18

33 Distributed Planning for Large Teams

Start Goal

slide-34
SLIDE 34

Multiagent Path Planning  SI-Dec-POMDP

  • Only one interaction: collision
  • Many potential collisions
  • Few collisions in any solution

Distributed Planning for Large Teams 34

5 10 15 2 4 6 8 10 12 14 16 18

slide-35
SLIDE 35

(Given) A* Path messages Prioritized configuration-time

  • bstacles

DIMS: Distributed Prioritized Planning

Distributed Planning for Large Teams 35

Task Allocation Local Planning Interaction Exchange Model Shaping

slide-36
SLIDE 36
  • Assign priorities to agents based on path length
  • Longer path length estimate  higher priority

Distributed Planning for Large Teams 36

[van den Berg, et al 2005]

Prioritized Planning

[van den Berg, et al 2005]

slide-37
SLIDE 37

Prioritized Planning

[van den Berg, et al 2005]

  • Sequentially plan from highest to lowest priority

– Takes n steps for n agents

  • Use previous agents as dynamic obstacles

Distributed Planning for Large Teams 37

slide-38
SLIDE 38

Interaction Exchange Local Planning

Distributed Prioritized Planning

38 Distributed Planning for Large Teams

Agent Prioirity

Model-Shaping

slide-39
SLIDE 39

Large-Scale Path Solutions

39 Distributed Planning for Large Teams

slide-40
SLIDE 40

Large-Scale Path Solutions

40 Distributed Planning for Large Teams

slide-41
SLIDE 41

Experimental Results

  • Scaling Dataset

– # robots varied: {40, 60, 80, 120, 160, 240} – Density of map constant: 8 cells per robot

  • Density Dataset

– # robots constant: 240 – Density of map varied: {32, 24, 16, 12, 8} cells per robot

  • Cellular automata to generate 15 random maps
  • Maps solved with centralized prioritized planning

Distributed Planning for Large Teams 41

slide-42
SLIDE 42

High quality solutions

Distributed Planning for Large Teams 42

50 100 150 200 1.02 1.03 1.04 1.05 1.06 1.07 Number of robots Proportion of independent path cost Centralized Reduced 0.04 0.06 0.08 0.1 0.12 1 1.005 1.01 1.015 1.02 1.025 1.03 Map density Proportion of independent path cost

DPP

Both centralized and distributed PP are near-optimal Varying Team Size Varying Density

(240 agents)

slide-43
SLIDE 43

!"!# !"!$ !"!% !"& !"&' ( &! &( '! )*+,-./0123 4567.8,9:,0.;5./21*<,+<*//1/=,12.8*219/0 !" #"" #!" $"" ! #" #! $" %&'()*+,-+*,(,./ %&'()*+,-+/)0&)1.234+54311216+2.)*3.2,1/

Few sequential iterations

Varying Team Size Varying Density

(240 agents)

Distributed Planning for Large Teams 43 (Centralized Prioritized Planning takes 50 - 240 iterations) (Centralized Prioritized Planning takes 240 iterations)

DPP’s sequential iterations are a fraction of team size

slide-44
SLIDE 44

Summary of DPP

  • DPP achieves high-quality solutions

– Same quality as centralized PP

  • Prioritization + sparsity = rapid convergence

– Able to handle large numbers of collision interactions – Far fewer sequential planning iterations

Distributed Planning for Large Teams 44

slide-45
SLIDE 45

Preliminary Results

Distributed Prioritized Planning (DPP) Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR)

Distributed Planning for Large Teams 45

5 10 15 2 4 6 8 10 12 14 16 18

  • No uncertainty
  • Many potential interactions
  • Simple interactions
  • Action/Observation uncertainty
  • Fewer potential interactions
  • Complex interactions
slide-46
SLIDE 46

A Simple Rescue Domain

46 Distributed Planning for Large Teams

Rescue Agent Cleaner Agent Narrow Corridor Victim Unsafe Cell Clearable Debris

slide-47
SLIDE 47

A Simple (Large) Rescue Domain

47 Distributed Planning for Large Teams

slide-48
SLIDE 48

Distributed POMDP with Coordination Locales

[Varakantham, et al 2009]

  • Subset of SI-Dec-POMDP: only modifies ,
  • Coordination locales (CLs) are subtypes of interactions:

Distributed Planning for Large Teams 48

Explicit time Explicit time constraint Implicitly construct interaction functions

CL =

slide-49
SLIDE 49

Decentralized auction EVA POMDP solver Policy sub-sampling and Coordination Locale (CL) messages Prioritized/randomized reward and transition shaping

DIMS: D-TREMOR

(extending [Varakantham, et al 2009])

Distributed Planning for Large Teams 49

Task Allocation Local Planning Interaction Exchange Model Shaping

slide-50
SLIDE 50

D-TREMOR: Task Allocation

  • Assign “tasks” using

decentralized auction

– Greedy, nearest allocation

  • Create local, independent

sub-problem:

Distributed Planning for Large Teams 50

slide-51
SLIDE 51

D-TREMOR: Local Planning

  • Solve using off-the-shelf

algorithm (EVA)

  • Result: locally-optimal policies

Distributed Planning for Large Teams 51

slide-52
SLIDE 52

D-TREMOR: Interaction Exchange

Finding PrCLi

  • Evaluate local policy value
  • Compute frequency of

associated si, ai

Distributed Planning for Large Teams 52

[Kearns 2002]:

Entered corridor in 95 of 100 runs: PrCLi= 0.95

slide-53
SLIDE 53

D-TREMOR: Interaction Exchange

Finding ValCLi

  • Sample local policy value with/

without interactions

– Test interactions independently

  • Compute change in value if

interaction occurred

Distributed Planning for Large Teams 53

No collision Collision

ValCLi= -7

[Kearns 2002]:

slide-54
SLIDE 54

D-TREMOR: Interaction Exchange

  • Send CL messages to

teammates:

  • Sparsity  Relatively small

# of messages

Distributed Planning for Large Teams 54

slide-55
SLIDE 55

D-TREMOR: Model Shaping

  • Shape local model rewards/

transitions based on remote interactions

Distributed Planning for Large Teams 55 55 Probability of interaction Interaction model functions Independent model functions

slide-56
SLIDE 56

D-TREMOR: Local Planning (again)

  • Re-solve shaped local models

to get new policies

  • Result: new locally-optimal

policies  new interactions

Distributed Planning for Large Teams 56 56

slide-57
SLIDE 57

D-TREMOR: Adv. Model Shaping

  • In practice, we run into three common issues

faced by concurrent optimization algorithms: – Slow convergence – Oscillation – Local optima

  • We can alter our model-shaping to mitigate

these by reasoning about the types of interactions we have

Distributed Planning for Large Teams 57

slide-58
SLIDE 58

D-TREMOR: Adv. Model Shaping

  • Slow convergence  Prioritization

– Majority of interactions are collisions – Assign priorities to agents, only model-shape collision interactions for higher priority agents – From DPP: prioritization can quickly resolve collision interactions – Similar properties for any purely negative interaction

  • Negative interaction: when every agent is guaranteed to

have a lower-valued local policy if an interaction occurs

Distributed Planning for Large Teams 58

slide-59
SLIDE 59

D-TREMOR: Adv. Model Shaping

  • Oscillation  Probabilistic shaping

– Often caused by time dynamics between agents

  • Agent 1 shapes based on Agent 2’s old policy
  • Agent 2 shapes based on Agent 1’s old policy

– Each agent only applies model-shaping with probability δ [Zhang 2005] – Breaks out of cycles between agent policies

Distributed Planning for Large Teams 59

slide-60
SLIDE 60

D-TREMOR: Adv. Model Shaping

  • Local Optima  Optimistic initialization

– Agents cannot detect mixed interactions (e.g. debris)

  • Rescue agent policies can only improve if debris is cleared
  • Cleaner agent policies can only worsen if they clear debris

Distributed Planning for Large Teams 60

PrCL = low, ValCL = low If (ValCL = low):

  • ptimal policy 

do nothing PrCL = low, ValCL = low

slide-61
SLIDE 61

D-TREMOR: Adv. Model Shaping

  • Local Optima  Optimistic initialization

– Agents cannot detect mixed interactions (e.g. debris)

  • Rescue agent policies can only improve if debris is cleared
  • Cleaner agent policies can only worsen if they clear debris

– Let each agent solve an initial model that uses an

  • ptimistic assumption of interaction condition

Distributed Planning for Large Teams 61

slide-62
SLIDE 62

Preliminary Results

Scaling Dataset Density Dataset

Distributed Planning for Large Teams 62

slide-63
SLIDE 63

Experimental Setup

  • D-TREMOR policies

– Max-joint-value – Last iteration

  • Comparison policies

– Independent – Optimistic – Do-nothing – Random

  • Scaling:

– 10 to 100 agents – Random maps

  • Density

– 100 agents – Concentric ring maps

  • 3 problems/condition
  • 20 planning iterations
  • 7 time step horizon
  • 1 CPU per agent

Distributed Planning for Large Teams 63

D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs.

(with some caveats)

slide-64
SLIDE 64

Preliminary Results: Scaling

20 40 60 80 100 500 400 300 200 100 100 Number of Agents Normalized Joint Value

Distributed Planning for Large Teams 64

!"#$%$"#$"& '%&()(*&(+ ,- .-&/("0 1234, 5671'6 , 5671'6 62"#-) 82*&

Naïve Policies D-TREMOR Policies

slide-65
SLIDE 65

1 1.5 2 2.5 3 −2500 −2000 −1500 −1000 −500 Number of Rings Average Joint Value

!"#$%$&"'( )*+,-!., /01234 %$.3(564

Preliminary Results: Density

Distributed Planning for Large Teams 65

!"#$%$"#$"& '%&()(*&(+ ,- .-&/("0 1234, 5671'6 , 5671'6 62"#-) 82*&

Do-nothing does the best?

Ignoring interactions = poor performance

slide-66
SLIDE 66

Preliminary Results: Density

Distributed Planning for Large Teams 66

!"#$%$"#$"& '%&()(*&(+ ,- .-&/("0 1234, 5671'6 , 5671'6 62"#-) 82*&

1 1.5 2 2.5 3 100 200 300 400 Number of Rings Average # of Collisions

!"#$%& '()%*+,&

  • ./('(0.1*

23456-)5

1 1.5 2 2.5 3 5 10 15 20 25 30 35 Number of Rings Average # of Victims Rescued

!"#$%$&"'( )*+,-!., /01234 %$.3(564 )7*87(950: %$,"0176

D-TREMOR rescues the most victims D-TREMOR does not resolve every collision

slide-67
SLIDE 67

Preliminary Results: Time

20 40 60 80 100 5 10 15 20 Number of Agents Time Per Iteration (min)

Distributed Planning for Large Teams 67

Why is this increasing?

slide-68
SLIDE 68

Preliminary Results: Time

20 40 60 80 100 5 10 15 20 Number of Agents Time Per Iteration (min)

20 40 60 80 100 1000 2000 3000 4000 Number of Agents # of CLs Active (per agent)

Distributed Planning for Large Teams 68

Increase in time related to # of CLs, not # of agents

slide-69
SLIDE 69

Summary of D-TREMOR

D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs.

– Partially-observable, uncertain world – Multiple types of interactions & agents

  • Improves over independent planning
  • Resolved interactions in large problems
  • Still some convergence/efficiency issues

Distributed Planning for Large Teams 69

slide-70
SLIDE 70

Outline

  • Motivation
  • Background
  • Approach

– SI-Dec-POMDP – DIMS

  • Preliminary Work

– DPP – D-TREMOR

  • Proposed Work
  • Conclusion

Distributed Planning for Large Teams 70

slide-71
SLIDE 71

Proposed Work

Distributed Planning for Large Teams 71

5 10 15 2 4 6 8 10 12 14 16 18

slide-72
SLIDE 72

A* (Graph) Policy evaluation & Prioritized exchange Prioritized shaping Given

Proposed Work

Distributed Planning for Large Teams 72

Task Allocation Local Planning Interaction Exchange Model Shaping

EVA (POMDP) Auction Policy sub-sampling & Full exchange Stochastic shaping Optimistic initialization

DPP D-TREMOR

slide-73
SLIDE 73

Proposed Work

Consolidate and generalize DIMS:

  • 1. Interaction classification
  • 2. Model-shaping heuristics
  • 3. Domain evaluation

– Search & Rescue – Humanitarian Convoy

73 Distributed Planning for Large Teams

D-TREMOR DIMS Framework DPP

slide-74
SLIDE 74

Interaction Classification

What are the different classes of possible interactions between agents in DIMS?

74 Distributed Planning for Large Teams

?

Collisions (Reward Only)

Collisions (Neg. Reward + Transition) Debris Clearing (Transition + Delay) Model-shaping Terms: Reward Transition Obs. Collisions (DPP) x Collisions (D-TREMOR) x x Debris Clearing x Policy Effects: Negative Positive Mixed Collisions (DPP) x Collisions (D-TREMOR) x Debris Clearing x

slide-75
SLIDE 75

Interaction Classification

  • 1. Determine the sets of interactions that occur in

the domains of interest

  • 2. Formalize the characteristics of useful classes
  • f interactions from this relevant set

– Start with identifying differences between interactions in preliminary work:

  • Collisions: Reward-only, same-time
  • Collisions: Reward + Transition, same-time
  • Debris-Clearing: Transition-only, different-time
  • 3. Classify potential interactions by common

features

75 Distributed Planning for Large Teams

slide-76
SLIDE 76

Model-shaping Heuristics

Given classes of relevant interactions, what do we need to do to find good solutions?

76 Distributed Planning for Large Teams

Prioritized shaping

Model Shaping

Stochastic shaping Optimistic initialization

?

Collisions (Reward Only)

Collisions (Neg. Reward + Transition) Debris Clearing (Transition + Delay) ?

slide-77
SLIDE 77

Model-shaping Heuristics

  • Explore which if, any, of the existing heuristics

apply to each class of interaction

  • Apply targeted heuristics for newly-identified

classes of interactions

  • Attempt to bound the performance of the

heuristics for particular classes of interaction

– e.g. Proved that prioritization converges in n iterations for negative interactions

77 Distributed Planning for Large Teams

slide-78
SLIDE 78

Search and Rescue Humanitarian Convoy

Using our approach, how well can we do in realistic planning scenarios?

Domain Evaluation

Distributed Planning for Large Teams 78

slide-79
SLIDE 79

Domain Evaluation

79 Distributed Planning for Large Teams

Domain

Search and Rescue Humanitarian Convoy

Model

Simple Model USARSim Simple Model VBS2 Graph DPP

  

MDP

   

POMDP D-TREMOR

  

Increasing Uncertainty

DIMS

slide-80
SLIDE 80

Proposed Work: Timeline

Date Description Nov 2010 – Feb 2011 Develop classification of interactions Feb 2011 – Mar 2011 Design heuristics for common interactions Mar 2011 – Jul 2011 Implementation of DIMS solver Jul 2011 – Oct 2011 Rescue experiments Oct 2011 – Jan 2012 Convoy experiments Feb 2012 – May 2012 Thesis preparation May 2012 Defend Thesis

Distributed Planning for Large Teams 80

slide-81
SLIDE 81

Outline

  • Motivation
  • Background
  • Approach

– SI-Dec-POMDP – DIMS

  • Preliminary Work

– DPP – D-TREMOR

  • Proposed Work
  • Conclusion

Distributed Planning for Large Teams 81

slide-82
SLIDE 82

Conclusions (1/3): Work-to-date

  • DPP: Distributed path planning for large teams
  • D-TREMOR: Decentralized solutions for sparse

Dec-POMDPs with many agents

  • Demonstrated complete distributability, fast

heuristic interaction detection, and local message exchange to achieve high scalability

  • Empirical results in simulated search and rescue

domain

Distributed Planning for Large Teams 82

slide-83
SLIDE 83

Conclusions (2/3): Contributions

  • 1. DIMS: a modular algorithm for solving planning

problems in large teams with sparse interactions

– Single framework, applied to path planning, MDP, POMDP

  • 2. Empirical results of distributed planning using DIMS in

teams of at least 100 agents across two domains

  • 3. Study of characteristics of interaction in sparse

planning problems

– Provide classification of interactions – Determine features for distinguishing interaction behaviors

Distributed Planning for Large Teams 83

slide-84
SLIDE 84

Conclusions (3/3): Take-home Message

This thesis will demonstrate that it is possible to

efficiently, distributedly find high-quality solutions to planning problems with known sparse interactions with and without

uncertainty for teams of at least a hundred

agents.

Distributed Planning for Large Teams 84

slide-85
SLIDE 85

Distributed Planning for Large Teams 85

slide-86
SLIDE 86

The VECNA Bear (Yes, it exists!)

Distributed Planning for Large Teams 86

slide-87
SLIDE 87

SI-Dec-POMDP vs. other models

  • DPCL

– Extends DPCL model – Adds observational interactions – Time integrated in state rather than explicit

  • EDI/EDI-CR

– Adds complex transitions and observations

  • TD-Dec-MDP

– Allows simultaneous interaction (within epoch)

  • Factored MDP/POMDP

– Adds interactions that span epochs

Distributed Planning for Large Teams 87

slide-88
SLIDE 88

Assign tasks to agents Create local sub-problems Use local solver to find optimal solution to sub-problem Compute and exchange probability and expected value

  • f interactions

Alter local sub-problem to incorporate non-local effects

Proposed Approach: DIMS

Distributed Iterative Model Shaping

Distributed Planning for Large Teams 88

Task Allocation Local Planning Interaction Exchange Model Shaping

slide-89
SLIDE 89

Motivation

Distributed Planning for Large Teams 89

slide-90
SLIDE 90

D-TREMOR

Distributed Planning for Large Teams 90

slide-91
SLIDE 91

D-TREMOR

Distributed Planning for Large Teams 91

slide-92
SLIDE 92

D-TREMOR: Reward functions

  • Probability that a debris will not allow a robot to enter the cell:

– P_Debris = 0.9;

  • Probability of action failure

– P_ActionFailure = 0.2;

  • Probability that success is observed if the action succeeded.

– P_ObsSuccessOnSuccess = 0.8;

  • Probability that success is observed if the action failed

– P_ObsSuccessOnFailure = 0.2;

  • Probability that a robot will return to the same cell after collision

– P_ReboundAfterCollision = 0.5;

  • Reward of saving a victim

– R_Victim = 10.0;

  • Reward of cleaning debris

– R_Cleaning = 0.25;

  • Reward of moving

– R_Move = -0.5;

  • Reward of observing

– R_Observe = -0.25;

  • Reward for a collision

– R_Collision = -5.0;

  • Reward for landing in an unsafe cell

– R_Unsafe = -1;

Distributed Planning for Large Teams 92