Distributed Planning for Large Teams
Prasanna Velagapudi
Thesis Committee: Katia Sycara (co-chair) Paul Scerri (co-chair)
- J. Andrew Bagnell
Edmund H. Durfee
Distributed Planning for Large Teams
Distributed Planning for Large Teams Prasanna Velagapudi Thesis - - PowerPoint PPT Presentation
Distributed Planning for Large Teams Prasanna Velagapudi Thesis Committee: Katia Sycara (co-chair) Paul Scerri (co-chair) J. Andrew Bagnell Edmund H. Durfee Distributed Planning for Large Teams Outline Motivation Background
Prasanna Velagapudi
Thesis Committee: Katia Sycara (co-chair) Paul Scerri (co-chair)
Edmund H. Durfee
Distributed Planning for Large Teams
– SI-Dec-POMDP – DIMS
– DPP – D-TREMOR
Distributed Planning for Large Teams 2
agents, people
environment
Distributed Planning for Large Teams 3
– Need to plan (with uncertainty) for each agent in team – Agents must consider the actions of a growing number of teammates – Full, joint problem has NEXP complexity [Bernstein 2002]
4 Distributed Planning for Large Teams
depend on more than one agent
Distributed Planning for Large Teams 5
Distributed Planning for Large Teams 6
Rescue robot Cleaner robot Debris Victim
Distributed Planning for Large Teams 7
Distributed Planning for Large Teams 8
Scalability Generality
Distributed Planning for Large Teams 9
Scalability Generality
Structured Dec-(PO)MDP planners
– JESP
[Nair 2003]
– TD-Dec-POMDP
[Witwicki 2010]
– EDI-CR
[Mostafa 2009]
– SPIDER
[Marecki 2009]
slightly to get scalability
Distributed Planning for Large Teams 10
Scalability Generality
Heuristic Dec-(PO)MDP planners
– TREMOR
[Varakantham 2009]
– OC-Dec-MDP
[Beynier 2005]
for scalability
Distributed Planning for Large Teams 11
Scalability Generality
Structured multiagent path planners
– DPC
[Bhattacharya 2010]
– Optimal Decoupling
[Van den Berg 2009]
further to get scalability
Distributed Planning for Large Teams 12
Scalability Generality
Heuristic multiagent path planners
– Dynamic Networks
[Clark 2003]
– Prioritized Planning
[Van den Berg 2005]
to get scalability
Scalability Generality
Distributed Planning for Large Teams 13
Our approach:
and generality
possible
– TREMOR
[Varankantham 2009]
– JESP
[Nair 2003]
into a set of smaller, independent sub-problems
problems with local algorithm
locally optimal solutions towards high-quality joint solution
Distributed Planning for Large Teams 14
Agents in a large team with known sparse interactions can find computationally efficient high-quality solutions to planning problems through an iterative process of estimating the
actions of teammates, locally planning based on these
estimates, and refining their estimates by exchanging
coordination messages.
Distributed Planning for Large Teams 15
– SI-Dec-POMDP – DIMS
– DPP – D-TREMOR
Distributed Planning for Large Teams 16
Problem Formulation Proposed Algorithm
Distributed Planning for Large Teams 17
Distributed Planning for Large Teams 18
: Set of States : Set of Actions : Set of Observations : Transition function : Reward function : Observation function
Distributed Planning for Large Teams 19
: Joint Transition : Joint Reward : Joint Observation
Distributed Planning for Large Teams 20
Distributed Planning for Large Teams 21
: : :
Distributed Iterative Model Shaping
(one for each agent)
existing state-of-the-art algorithms
quality joint solution
Distributed Planning for Large Teams 22
Task Allocation Local Planning Interaction Exchange Model Shaping
Distributed Iterative Model Shaping
Distributed Planning for Large Teams 23
Task Allocation Local Planning Interaction Exchange Model Shaping
Distributed Iterative Model Shaping
Distributed Planning for Large Teams 24
Task Allocation Local Planning Interaction Exchange Model Shaping
Full SI-Dec-POMDP Local (Independent) POMDP
Distributed Iterative Model Shaping
Distributed Planning for Large Teams 25
Task Allocation Local Planning Interaction Exchange Model Shaping
shelf centralized solver
Distributed Iterative Model Shaping
Distributed Planning for Large Teams 26
Task Allocation Local Planning Interaction Exchange Model Shaping
probability and value of interactions
Distributed Iterative Model Shaping
Distributed Planning for Large Teams 27
Task Allocation Local Planning Interaction Exchange Model Shaping
presence of interactions
Distributed Iterative Model Shaping
Distributed Planning for Large Teams 28
Task Allocation Local Planning Interaction Exchange Model Shaping
local sub-problem
Any decentralized allocation mechanism (e.g. auctions) Stock graph, MDP, POMDP solver Lightweight local evaluation and low-bandwidth messaging Methods to alter local problem to incorporate non-local effects
Distributed Iterative Model Shaping
Distributed Planning for Large Teams 29
Task Allocation Local Planning Interaction Exchange Model Shaping
– SI-Dec-POMDP – DIMS
– DPP – D-TREMOR
Distributed Planning for Large Teams 30
Distributed Prioritized Planning (DPP) Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR)
Distributed Planning for Large Teams 31
5 10 15 2 4 6 8 10 12 14 16 18
“Decentralized prioritized planning in large multirobot teams,” IROS 2010.
“Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents,” AAMAS 2011 (in submission).
Distributed Prioritized Planning (DPP) Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR)
Distributed Planning for Large Teams 32
5 10 15 2 4 6 8 10 12 14 16 18
5 10 15 2 4 6 8 10 12 14 16 18
33 Distributed Planning for Large Teams
Start Goal
Distributed Planning for Large Teams 34
5 10 15 2 4 6 8 10 12 14 16 18
(Given) A* Path messages Prioritized configuration-time
Distributed Planning for Large Teams 35
Task Allocation Local Planning Interaction Exchange Model Shaping
Distributed Planning for Large Teams 36
[van den Berg, et al 2005]
[van den Berg, et al 2005]
[van den Berg, et al 2005]
– Takes n steps for n agents
Distributed Planning for Large Teams 37
Interaction Exchange Local Planning
38 Distributed Planning for Large Teams
Agent Prioirity
Model-Shaping
39 Distributed Planning for Large Teams
40 Distributed Planning for Large Teams
– # robots varied: {40, 60, 80, 120, 160, 240} – Density of map constant: 8 cells per robot
– # robots constant: 240 – Density of map varied: {32, 24, 16, 12, 8} cells per robot
Distributed Planning for Large Teams 41
Distributed Planning for Large Teams 42
50 100 150 200 1.02 1.03 1.04 1.05 1.06 1.07 Number of robots Proportion of independent path cost Centralized Reduced 0.04 0.06 0.08 0.1 0.12 1 1.005 1.01 1.015 1.02 1.025 1.03 Map density Proportion of independent path cost
DPP
Both centralized and distributed PP are near-optimal Varying Team Size Varying Density
(240 agents)
!"!# !"!$ !"!% !"& !"&' ( &! &( '! )*+,-./0123 4567.8,9:,0.;5./21*<,+<*//1/=,12.8*219/0 !" #"" #!" $"" ! #" #! $" %&'()*+,-+*,(,./ %&'()*+,-+/)0&)1.234+54311216+2.)*3.2,1/
Varying Team Size Varying Density
(240 agents)
Distributed Planning for Large Teams 43 (Centralized Prioritized Planning takes 50 - 240 iterations) (Centralized Prioritized Planning takes 240 iterations)
DPP’s sequential iterations are a fraction of team size
– Same quality as centralized PP
– Able to handle large numbers of collision interactions – Far fewer sequential planning iterations
Distributed Planning for Large Teams 44
Distributed Prioritized Planning (DPP) Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR)
Distributed Planning for Large Teams 45
5 10 15 2 4 6 8 10 12 14 16 18
46 Distributed Planning for Large Teams
Rescue Agent Cleaner Agent Narrow Corridor Victim Unsafe Cell Clearable Debris
47 Distributed Planning for Large Teams
[Varakantham, et al 2009]
Distributed Planning for Large Teams 48
Explicit time Explicit time constraint Implicitly construct interaction functions
Decentralized auction EVA POMDP solver Policy sub-sampling and Coordination Locale (CL) messages Prioritized/randomized reward and transition shaping
(extending [Varakantham, et al 2009])
Distributed Planning for Large Teams 49
Task Allocation Local Planning Interaction Exchange Model Shaping
– Greedy, nearest allocation
Distributed Planning for Large Teams 50
Distributed Planning for Large Teams 51
associated si, ai
Distributed Planning for Large Teams 52
[Kearns 2002]:
Entered corridor in 95 of 100 runs: PrCLi= 0.95
without interactions
– Test interactions independently
interaction occurred
Distributed Planning for Large Teams 53
No collision Collision
ValCLi= -7
[Kearns 2002]:
Distributed Planning for Large Teams 54
Distributed Planning for Large Teams 55 55 Probability of interaction Interaction model functions Independent model functions
Distributed Planning for Large Teams 56 56
Distributed Planning for Large Teams 57
– Majority of interactions are collisions – Assign priorities to agents, only model-shape collision interactions for higher priority agents – From DPP: prioritization can quickly resolve collision interactions – Similar properties for any purely negative interaction
have a lower-valued local policy if an interaction occurs
Distributed Planning for Large Teams 58
– Often caused by time dynamics between agents
– Each agent only applies model-shaping with probability δ [Zhang 2005] – Breaks out of cycles between agent policies
Distributed Planning for Large Teams 59
– Agents cannot detect mixed interactions (e.g. debris)
Distributed Planning for Large Teams 60
PrCL = low, ValCL = low If (ValCL = low):
do nothing PrCL = low, ValCL = low
– Agents cannot detect mixed interactions (e.g. debris)
– Let each agent solve an initial model that uses an
Distributed Planning for Large Teams 61
Scaling Dataset Density Dataset
Distributed Planning for Large Teams 62
– Max-joint-value – Last iteration
– Independent – Optimistic – Do-nothing – Random
– 10 to 100 agents – Random maps
– 100 agents – Concentric ring maps
Distributed Planning for Large Teams 63
D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs.
(with some caveats)
20 40 60 80 100 500 400 300 200 100 100 Number of Agents Normalized Joint Value
Distributed Planning for Large Teams 64
!"#$%$"#$"& '%&()(*&(+ ,- .-&/("0 1234, 5671'6 , 5671'6 62"#-) 82*&
Naïve Policies D-TREMOR Policies
1 1.5 2 2.5 3 −2500 −2000 −1500 −1000 −500 Number of Rings Average Joint Value
!"#$%$&"'( )*+,-!., /01234 %$.3(564
Distributed Planning for Large Teams 65
!"#$%$"#$"& '%&()(*&(+ ,- .-&/("0 1234, 5671'6 , 5671'6 62"#-) 82*&
Do-nothing does the best?
Ignoring interactions = poor performance
Distributed Planning for Large Teams 66
!"#$%$"#$"& '%&()(*&(+ ,- .-&/("0 1234, 5671'6 , 5671'6 62"#-) 82*&
1 1.5 2 2.5 3 100 200 300 400 Number of Rings Average # of Collisions
!"#$%& '()%*+,&
23456-)5
1 1.5 2 2.5 3 5 10 15 20 25 30 35 Number of Rings Average # of Victims Rescued
!"#$%$&"'( )*+,-!., /01234 %$.3(564 )7*87(950: %$,"0176
D-TREMOR rescues the most victims D-TREMOR does not resolve every collision
20 40 60 80 100 5 10 15 20 Number of Agents Time Per Iteration (min)
Distributed Planning for Large Teams 67
Why is this increasing?
20 40 60 80 100 5 10 15 20 Number of Agents Time Per Iteration (min)
20 40 60 80 100 1000 2000 3000 4000 Number of Agents # of CLs Active (per agent)
Distributed Planning for Large Teams 68
Increase in time related to # of CLs, not # of agents
– Partially-observable, uncertain world – Multiple types of interactions & agents
Distributed Planning for Large Teams 69
– SI-Dec-POMDP – DIMS
– DPP – D-TREMOR
Distributed Planning for Large Teams 70
Distributed Planning for Large Teams 71
5 10 15 2 4 6 8 10 12 14 16 18
A* (Graph) Policy evaluation & Prioritized exchange Prioritized shaping Given
Distributed Planning for Large Teams 72
Task Allocation Local Planning Interaction Exchange Model Shaping
EVA (POMDP) Auction Policy sub-sampling & Full exchange Stochastic shaping Optimistic initialization
DPP D-TREMOR
– Search & Rescue – Humanitarian Convoy
73 Distributed Planning for Large Teams
D-TREMOR DIMS Framework DPP
74 Distributed Planning for Large Teams
?
Collisions (Reward Only)
Collisions (Neg. Reward + Transition) Debris Clearing (Transition + Delay) Model-shaping Terms: Reward Transition Obs. Collisions (DPP) x Collisions (D-TREMOR) x x Debris Clearing x Policy Effects: Negative Positive Mixed Collisions (DPP) x Collisions (D-TREMOR) x Debris Clearing x
– Start with identifying differences between interactions in preliminary work:
75 Distributed Planning for Large Teams
76 Distributed Planning for Large Teams
Prioritized shaping
Model Shaping
Stochastic shaping Optimistic initialization
?
Collisions (Reward Only)
Collisions (Neg. Reward + Transition) Debris Clearing (Transition + Delay) ?
– e.g. Proved that prioritization converges in n iterations for negative interactions
77 Distributed Planning for Large Teams
Search and Rescue Humanitarian Convoy
Distributed Planning for Large Teams 78
79 Distributed Planning for Large Teams
Domain
Search and Rescue Humanitarian Convoy
Model
Simple Model USARSim Simple Model VBS2 Graph DPP
MDP
POMDP D-TREMOR
Increasing Uncertainty
Date Description Nov 2010 – Feb 2011 Develop classification of interactions Feb 2011 – Mar 2011 Design heuristics for common interactions Mar 2011 – Jul 2011 Implementation of DIMS solver Jul 2011 – Oct 2011 Rescue experiments Oct 2011 – Jan 2012 Convoy experiments Feb 2012 – May 2012 Thesis preparation May 2012 Defend Thesis
Distributed Planning for Large Teams 80
– SI-Dec-POMDP – DIMS
– DPP – D-TREMOR
Distributed Planning for Large Teams 81
Distributed Planning for Large Teams 82
problems in large teams with sparse interactions
– Single framework, applied to path planning, MDP, POMDP
teams of at least 100 agents across two domains
planning problems
– Provide classification of interactions – Determine features for distinguishing interaction behaviors
Distributed Planning for Large Teams 83
Distributed Planning for Large Teams 84
Distributed Planning for Large Teams 85
Distributed Planning for Large Teams 86
– Extends DPCL model – Adds observational interactions – Time integrated in state rather than explicit
– Adds complex transitions and observations
– Allows simultaneous interaction (within epoch)
– Adds interactions that span epochs
Distributed Planning for Large Teams 87
Assign tasks to agents Create local sub-problems Use local solver to find optimal solution to sub-problem Compute and exchange probability and expected value
Alter local sub-problem to incorporate non-local effects
Distributed Iterative Model Shaping
Distributed Planning for Large Teams 88
Task Allocation Local Planning Interaction Exchange Model Shaping
Distributed Planning for Large Teams 89
Distributed Planning for Large Teams 90
Distributed Planning for Large Teams 91
– P_Debris = 0.9;
– P_ActionFailure = 0.2;
– P_ObsSuccessOnSuccess = 0.8;
– P_ObsSuccessOnFailure = 0.2;
– P_ReboundAfterCollision = 0.5;
– R_Victim = 10.0;
– R_Cleaning = 0.25;
– R_Move = -0.5;
– R_Observe = -0.25;
– R_Collision = -5.0;
– R_Unsafe = -1;
Distributed Planning for Large Teams 92