Planning for Cooperative Multiple Agents with Sparse Interactions - - PowerPoint PPT Presentation

planning for cooperative multiple agents with sparse
SMART_READER_LITE
LIVE PREVIEW

Planning for Cooperative Multiple Agents with Sparse Interactions - - PowerPoint PPT Presentation

Planning for Cooperative Multiple Agents with Sparse Interactions Guy Revach Supervised by Professor Nahum Shimkin Technion - Israel Institute of Technology The Andrew and Erna Viterbi Faculty of Electrical Engineering November 21, 2017 Guy


slide-1
SLIDE 1

Planning for Cooperative Multiple Agents with Sparse Interactions

Guy Revach

Supervised by Professor Nahum Shimkin Technion - Israel Institute of Technology The Andrew and Erna Viterbi Faculty of Electrical Engineering

November 21, 2017

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 1 / 28

slide-2
SLIDE 2

Overview

The problem of planning for cooperative multiple agents in a deterministic environment, and under a finite time horizon, is considered. We suggest a model of interacting agents that partially decouples the agents in the group, using the notion of soft cooperation constraints. We present a two-step planning algorithm that breaks down a K multi-agent problem to K independent single-agent problems, such that the aggregation of the single-agent plans is optimal for the group. We suggest an efficient algorithm for computing a response function and a parametric policy under soft and hard constraints. We utilize a well known graphical model for efficient min-sum computation. The planning algorithm is complete, optimal, and efficient when interactions among the agents are sparse.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 2 / 28

slide-3
SLIDE 3

Cooperative Multi-Agents

A multi-agent system (MAS) is a distributed system composed of multiple, interacting intelligent agents within a shared environment. Agents with a common goal work together to achieve goals, carry out tasks, or solve problems that are difficult or impossible for a single individual agent. The decision making of an individual agent depends on the actions of the

  • ther agents. Agents need to interact and coordinate to ensure that

individual decisions result in jointly optimal decisions for the group. Diversity among agents (spatial, temporal, functional) is a major driver for distributing the execution of the main task.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 3 / 28

slide-4
SLIDE 4

Cooperative Multi-Agent Planning (MAP)

MAP is the process of distributing tasks and coordinating the resources and activities of multiple agents. It applies to different domains, so in each domain we may apply a different planning strategy according to the problem assumptions and specifications. Optimizing the performance according to team objectives is considered to be NP-hard. There is a resemblance to well known, intractable problems such as network flow, the multi-depot multi-traveling salesman, and the vehicle routing problem. It is a challenging problem, requiring a considerable amount of computing resources, compared to the more restrictive single-agent planning. Since there is growing interest in this problem both in theory and practice, there is a need for an efficient mechanism for coordinating the agents’ actions so as to optimize their joint performance measure.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 4 / 28

slide-5
SLIDE 5

Coupling Level and Sparse Interactions

The inherent complexity of a MAP task is often described by means of its coupling level (Brafman and Domshlak, 2008); that is, the number of interactions that arise among agents during the resolution of a MAP task. Sparsely coupled tasks require few interactions among agents, and are perceived as easier to plan for, whereas tightly coupled tasks require a large number of interactions to obtain a solution plan. Some methods take a more general approach and are equally effective, regardless of the coupling level of the task. In other approaches the algorithm complexity is relative to the coupling level. The main question is to what extent would planning for MAS be harder than solving individual planning problems over the domains of each agent in isolation.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 5 / 28

slide-6
SLIDE 6

Planning Approaches

In the coupled approach, planning is formalized as a global search. It usually incorporates a single centralized decision-maker that plans simultaneously for all agents. Although complete, optimal, and easier to design, it usually leads to a large optimization problem that is computationally intensive and may not scale up well in practice. The decoupled approach tries to decouple the decisions to some degree by decomposing the problem into several sub-problems. An advanced planner may leverage the distributed structure of the MAP tasks to improve efficiency. Theoretical optimality and sometimes even completeness may be traded off with improved efficiency. In the plan merging approach, each agent first applies local planning according to some degree of freedom and free parameters and then a single centralized entity coordinates and merges individual solutions into a global optimal joint solution.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 6 / 28

slide-7
SLIDE 7

Related Work on Coordination Graphs

Guestrin, Koller, and Parr worked on a MAP with factored MDPs. The idea of cooperative action selection via coordination graphs is presented. A group of multiple cooperative agents, each with its own set of possible actions and its own observations, coordinates and globally selects an

  • ptimal joint action to achieve a common goal, and maximizes their joint

long term utility using MDP.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 7 / 28

slide-8
SLIDE 8

Motivation for MAP Model

The standard formulation is generic, but tightly coupled. The complexity

  • f the standard planning algorithm is exponential in the number of agents.

Our model is motivated by sparsely coupled real world problems. We aim to achieve a decoupling among agents for efficient algorithm. There is a need by the military to optimally coordinating tactical units, therefore sharing limited resources, increasing safety, and conserving energy.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 8 / 28

slide-9
SLIDE 9

Multi-Agent Standard Model

G is a group of K agents: G = {g1, ..., gk, ..., gK} T is the time domain, T is its finite horizon: t ∈ T = {0, 1, ..., T} SK is a factored state space: SK = S1 × .... × SK

  • σI,

σ∗ ∈ SK are source and target state vectors AK is a factored action space: AK = A1 × ... × AK HK is a deterministic factored state transition function for the group: HK H1 × ... × HK : SK × AK → SK , st+1 = HK ( st, at) (1) Ck is a coupled, time dependent, immediate cost function for gk: Ck : SK × AK × T → R ∪ {∞} , Jt,k = Ck ( st, at, t) (2) Agents are coupled via the transition function and via the cost function.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 9 / 28

slide-10
SLIDE 10

Multi-Agent Policy

Let πK ∈ ΠK be a deterministic factored policy for K agents. Given the current state, πK defines the action. πK : SK × T → AK , at = πK ( st, t) (3) Let JπK be defined as the cumulative cost of the policy πK under finite-horizon T and termination constraint sT = σ∗: JπK =     

T−1

  • t=0

K

  • k=1

Jt,k =

T−1

  • t=0

K

  • k=1

Ck ( st, at, t) for

  • sT =

σ∗ ∞ for

  • sT =

σ∗ (4) Given a source state s0 = σI, the objective is to find the optimal policy πK ∗ such that JπK is minimal. πK ∗ ∈ arg min

πK ∈ΠK

  • JπK

(5)

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 10 / 28

slide-11
SLIDE 11

Our Assumptions

We consider the set of models in which a coordinator explicitly defines the coupling among agents by a set of soft cooperation constraints Ψ = {ψ1, ..., ψℓ, ..., ψL}. A constraint ψℓ defines a single interaction among an affecting agent g+

and an affected agent g−

ℓ . It is associated with a context

  • σ+

ℓ , σ− ℓ , A− ℓ

  • , a

discounted cost C−

ℓ , an activation function fℓ, and an interaction variable

τℓ. A constraint may be chosen to be satisfied by the designated affecting agent, in which case the discounted cost will be applied to the context for the designated affected agent. An independent transition function is assumed: Hk : Sk × Ak → Sk. By minimizing the joint additive total cost for the group, agents are driven to cooperative behavior only in case of need.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 11 / 28

slide-12
SLIDE 12

Interaction Variables

A valid assignment to the interaction variable τℓ is equal to satisfying the constraint in timing that is within the horizon, while a null assignment T∅ is equal to the case when the constraint is not satisfied.

  • τ is an interaction vector, and D is its domain; i.e., the cross space of time

domains:

  • τ = (τ1, τ2, ..., τL) ∈ T1 × T2 × ... × TL D

(6) Let L+

k be the subsets of all cooperation constraints that apply to agent

gk as an affecting agent, and τ +

k be the respective vector of interaction

variables L+

k =

  • ℓ|g+

ℓ = gk

  • ,

τ +

k (τℓ)ℓ∈L+

k ∈

  • ℓ∈L+

k

Tℓ D+

k

(7) The same applied to agent gk as an affected agent L−

k =

  • ℓ|g−

ℓ = gk

  • ,

τ −

k (τℓ)ℓ∈L−

k ∈

  • ℓ∈L−

k

Tℓ D−

k

(8)

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 12 / 28

slide-13
SLIDE 13

Affected Agent - Immediate Cost Function

The cost function Ck is partially coupled. It does not depend on the

  • ther agents, except for the context, which is explicitly defined by the set
  • f constraints for which such dependencies exist. It has an indirect

dependency in the group only via the interaction variables. Ck : Sk × Ak × T × D−

k → R , Ck

  • sk, ak, t,

τ −

k

  • =

(9) C−

ℓ (ak, t)

: sk = σ−

ℓ ∈ S− k , ak ∈ A− ℓ , fℓ (t, τℓ) = 1, for some ℓ ∈ L− k

C0,k (sk, ak, t) :

  • therwise

(10) C0,k is the (baseline) cost with no consideration of interactions. C−

ℓ is the (modified) cost applicable only when the constraint is satisfied.

C−

ℓ : A− ℓ × T → R

(11) fℓ is the activation function of the modified cost under constraint ψℓ. fℓ : T × Tℓ → {0, 1} (12)

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 13 / 28

slide-14
SLIDE 14

Single Agent - Interaction-Dependent Policy Cost

Given σI k, σ∗k, and Ψ, we can define the interaction-dependent cumulative cost of the policy πk as the mapping: Jπk

k : Dk → R ∪ {∞}

Jπk

k (

τk) =     

T−1

  • t=0

Ck

  • st,k, at,k, t,

τ −

k

  • for

ˆ Ψ+

k

  • τ +

k

  • = 1 and sT,k =

σ∗k ∞

  • therwise

(13) Here ˆ Ψ+

k

  • τ +

k

  • =
  • ℓ∈L+

k

ˆ ψℓ (τℓ) , ˆ ψℓ (τℓ) = for τℓ = T∅ and sτℓ,k = σ+

1

  • therwise

(14) The policy πk is given by the mapping πk : Sk × T → Ak , at,k = πk (st,k, t) (15) It has an indirect interaction dependency via the policy cost function.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 14 / 28

slide-15
SLIDE 15

Decomposable Multi-Agent Policy Cost

Given σI, σ∗, and Ψ, we define the cumulative cost of the policy πK JπK ( τ) =     

T−1

  • t=0

K

  • k=1

Ck

  • st,k, at,k, t,

τ −

k

  • for
  • gk∈G

ˆ Ψ+

k

  • τ +

k

  • = 1,

sT = σ∗ ∞

  • therwise

(16) Each single agent cost function depends only on the single agent policy. Therefore, the cost is decomposable and we may switch the order of summation to compute independently for each agent: JπK ( τ) =

K

  • k=1

Jπk

k (

τk) (17) We showed that cooperation constraints imply a partially coupled multi-agent cost function, which may be decomposed to a set of partially coupled single agent cost functions.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 15 / 28

slide-16
SLIDE 16

Two-Step Policy Minimization - Step #1

Minimize the multi-agent cumulative cost under L interaction constraints. J∗ = min

  • τ∈D min

πK ∈ΠK

  • JπK (

τ)

  • (18)

We decouple a K multi-agent problem to K independent single-agent problems, such that the aggregation of the single-agent plans is optimal for the group. For any concrete realization of the interaction variables, compute the

  • ptimal

τ conditional multi-agent cost. ∀ τ ∈ D , J∗ ( τ) = min

πK ∈ΠK

  • JπK (

τ)

  • =

K

  • k=1

min

πk∈Πk

  • Jπk

k (

τk)

  • (19)

Each agent plans independently and computes its response function J∗

k (

τk) and its parametric policy π∗

k (st,k, t;

τk) to the associated constraints.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 16 / 28

slide-17
SLIDE 17

Two-Step Policy Minimization - Step #2

Centralized global plan merging is applied to find the optimal assignment to the interaction variables under the minimum-sum objective. J∗ = min

  • τ∈D {J∗ (

τ)} = min

  • τ∈D

K

  • k=1

J∗

k (

τk)

  • (20)

The global optimal multi-agent policy is then given by πK ∗ = {π∗

1, π∗ 2, ..., π∗ k, ..., π∗ K}

(21) A factor graph, which captures dependencies among cooperative agents and utilizes the internal structure of the problem, is applied to the problem with a variable elimination algorithm for efficient min-sum

  • ptimization.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 17 / 28

slide-18
SLIDE 18

Response Function - Affecting Agent - Single Constraint

For a given assignment of τ, J∗

σ (τ) is the optimal cumulative cost from σI

to σ∗ in T time steps, via σ.

1: for all τ ∈ {1, 2, .., T} do 2:

Compute J ∗ (σI, σ, τ) using J ∗ (σI, σ, τ − 1).

3: for all τ ∈ {T − 1, .., 0} do 4:

Compute V∗ (σ, σ∗, τ) using V∗ (σ, σ∗, τ + 1).

5: for all τ ∈ Tτ do 6:

if τ = T∅ then ⊲ Namely no constraint

7:

J∗

σ (τ) = J∗

⊲ We trivially have

8:

else

9:

J∗

σ (τ) = J ∗ (σI, σ, τ) + V∗ (σ, σ∗, τ)

⊲ Comp by two parts The time complexity is: 2 · T (V∗, T) + O (T)

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 18 / 28

slide-19
SLIDE 19

Response Function - Affecting Agent - Multiple Constraints: Time Complexity

Interaction Cost T (V∗, T) + O (L) (22) Response Function L2 · T 2 · T (V∗, T) + O

  • T L

(23) Assuming Bellman-Ford L · T 2 · T (V∗

B, T) + O

  • T L

(24) Assuming stationary model L · T (V∗

B, T) + O

  • T L

(25) Practically speaking this algorithm may be improved.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 19 / 28

slide-20
SLIDE 20

Response Function - Affected Agent - Single Constraint

1: for all τ ∈ {1, .., T} do Comp J ∗ (σI, σ, τ) using J ∗ (σI, σ, τ − 1). 2: for all τ ∈ {T − 1, ..., 0} do 3:

for all a ∈ A− (σ) do

4:

Compute V∗ (H (σ, a) , σ∗, τ) using V∗ (H (σ, a) , σ∗, τ + 1).

5: for all τ ∈ {T − 1, ..., 0} do 6:

V− (σ, σ∗, τ) = mina∈A−(σ) {C− (σ, a, τ) + V∗ (H (σ, a) , σ∗, τ + 1)}.

7: for all τ ∈ Tτ do 8:

if τ = T∅ then J∗

δ (τ) = J∗

9:

else

10:

J−

σ (τ) = J ∗ (σI, σ, τ) + V− (σ, σ∗, τ)

11:

J∗

δ (τ) = min {J− σ (τ) , J∗}

Time Complexity: (1 + ˜ γ) · T (V∗, T) + (˜ γ + 2) · O (T) Assuming Bellman-Ford: T (V∗

B, T) + (˜

γ + 2) · O (T)

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 20 / 28

slide-21
SLIDE 21

Graphical Models and Factor Graphs

A factor graph KF is a bipartite graph that expresses how a multi-agent global cost function of several variables J∗ ( τ), factors into a sum of single agent local cost functions J∗

k (

τk). This factorization conveniently captures and visualizes the internal structure of the dependency among agents, and by that, enables efficient computations. J∗ ( τ) =

3

  • k=1

J∗

k (

τk) , τ1 = (τ1, τ2, τ3) , τ2 = (τ2, τ4) , τ3 = (τ3, τ4) (26)

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 21 / 28

slide-22
SLIDE 22

Variable Elimination (VE) Algorithm

A DP, message passing and exact algorithm. It exploits the internal structure of the problem, reduces computations, and causes the calculation to be more efficient.

1: for all i ∈ {1, 2, ..., L} do

⊲ Run until all variables are eliminated

2:

Pick a variable τℓi for elimination

3:

Apply a factor sum operation on all factors involving the variable τℓi

4:

Generate a new single intermediate factor φi

5:

Eliminate the variable τℓi from φi and generate new factor ξi The time complexity is with an exponential dependency in the intermediate factors scope size m: O ((K + L) · dm) ≤ O

  • dL

, d max

{|Tℓ|} (27) The graph structure and the VE order have a major effect on the efficiency of the algorithm.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 22 / 28

slide-23
SLIDE 23

Complexity with Respect to Elimination Ordering

In very general terms, the maximal scope size can be characterized in terms of the maximal clique size in the induced graph, where the induced graph is an outcome of the elimination process. For a given elimination ordering ≺ the time complexity is with respect to the width of the induced graph. Therefore, finding an elimination order that minimizes the width is a major concern for algorithm efficiency. We may conclude by saying that when the factor graph is a ”tree” it is possible to eliminate variables efficiently. A tree-like structure of the factor graph may appear in many practical cases where there is a hierarchy among agents. More details are in the thesis and in the literature (e.g., Koller and Friedman, and probabilistic graphical models).

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 23 / 28

slide-24
SLIDE 24

Total Time Complexity and Coupling Measure

For a given fixed number of constraints, we get better time complexity when there are more agents in the system. Considering a uniform spreading of constraints to agents, the inherent coupling of the system is measured by: ρ (L, K) = 2 · L K (28) Concrete examples: ρ (4, 8) = 1 , ρ (4, 4) = 2 , ρ (4, 2) = 4 (29) We characterize complexity using the following terms: O

  • K · ρ ·

T 2 · T (V∗

B, T) + T ρ

  • (30)

Examples of sparse cases with tree factor graphs are in the thesis.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 24 / 28

slide-25
SLIDE 25

Simulation

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 25 / 28

slide-26
SLIDE 26

Simulation

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 26 / 28

slide-27
SLIDE 27

Summary

As a result of independency among agents, our algorithm may be decentralized and deployed in a distributed manner on multiple computing resources. Complexity analysis shows that the overall algorithm is linear in the number of agents, polynomial in the span of the time horizon, and dependent on the number of interactions among agents. Preliminary simulation shows that the algorithm is efficient for this particular multi-agent setup and robust for different sizes of graphs and numbers of agents when compared to a standard solution.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 27 / 28

slide-28
SLIDE 28

Future Work

Model: Many-to-many interactions; Stochastic model; Decreasing dependency in the time span. E.g., sub-sampling. Computing the response function: More types of activation functions; Applying state-of-the-art single agent planning; Estimation and anytime algorithms. E.g., feature selection. Plan merging: Factor graph structure analysis; Comparing VE to state-of-the-art plan merging algorithms; Distributed computation of VE. Implementation and simulation: Distributed computation of the response function; Testing on classical and practical benchmarks; Comparison against state-of-the-art planning algorithms.

Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 28 / 28