Communicating with Unknown Teammates Samuel Barrett 1 Noa Agmon 2 - - PowerPoint PPT Presentation

communicating with unknown teammates
SMART_READER_LITE
LIVE PREVIEW

Communicating with Unknown Teammates Samuel Barrett 1 Noa Agmon 2 - - PowerPoint PPT Presentation

Communicating with Unknown Teammates Samuel Barrett 1 Noa Agmon 2 Noam Hazon 3 Sarit Kraus 2 , 4 Peter Stone 1 1 University of Texas at Austin 2 Bar-Ilan University {sbarrett,pstone}@cs.utexas.edu {agmon,sarit}@macs.biu.ac.il 3 Ariel University 4


slide-1
SLIDE 1

Communicating with Unknown Teammates

Samuel Barrett1 Noa Agmon2 Noam Hazon3 Sarit Kraus2,4 Peter Stone1

1University of Texas at Austin 2Bar-Ilan University

{sbarrett,pstone}@cs.utexas.edu {agmon,sarit}@macs.biu.ac.il

3Ariel University 4University of Maryland

noamh@ariel.ac.il

ECAI Aug 21, 2014

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-2
SLIDE 2

Introduction Problem Description Theoretical Results Empirical Results Conclusions Ad Hoc Teamwork Motivation Example

Ad Hoc Teamwork

◮ Only in control of a single

agent or subset of agents

◮ Unknown teammates ◮ No pre-coordination ◮ Shared goals

Examples in humans:

◮ Pick up soccer ◮ Accident response

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-3
SLIDE 3

Introduction Problem Description Theoretical Results Empirical Results Conclusions Ad Hoc Teamwork Motivation Example

Motivation

◮ Agents are becoming more common and lasting longer ◮ Both robots and software agents ◮ Pre-coordination may not be possible ◮ Agents should be robust to various teammates ◮ Past work focused on cases with no communication

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-4
SLIDE 4

Introduction Problem Description Theoretical Results Empirical Results Conclusions Ad Hoc Teamwork Motivation Example

Motivation

◮ Agents are becoming more common and lasting longer ◮ Both robots and software agents ◮ Pre-coordination may not be possible ◮ Agents should be robust to various teammates ◮ Past work focused on cases with no communication

Research Question: How can an agent act and communicate optimally with teammates of uncertain types?

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-5
SLIDE 5

Introduction Problem Description Theoretical Results Empirical Results Conclusions Ad Hoc Teamwork Motivation Example

Example

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-6
SLIDE 6

Introduction Problem Description Theoretical Results Empirical Results Conclusions Ad Hoc Teamwork Motivation Example

Example

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-7
SLIDE 7

Introduction Problem Description Theoretical Results Empirical Results Conclusions Ad Hoc Teamwork Motivation Example

Example

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-8
SLIDE 8

Introduction Problem Description Theoretical Results Empirical Results Conclusions Ad Hoc Teamwork Motivation Example

Example

Ad Hoc Agent Teammates

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-9
SLIDE 9

Introduction Problem Description Theoretical Results Empirical Results Conclusions Ad Hoc Teamwork Motivation Example

Example

Ad Hoc Agent Teammates

/ How long does the first road take?

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-10
SLIDE 10

Introduction Problem Description Theoretical Results Empirical Results Conclusions Ad Hoc Teamwork Motivation Example

Outline

1

Introduction

2

Problem Description

3

Theoretical Results

4

Empirical Results

5

Conclusions

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-11
SLIDE 11

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Outline

1

Introduction

2

Problem Description

3

Theoretical Results

4

Empirical Results

5

Conclusions

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-12
SLIDE 12

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Problem Description

◮ Multi-armed bandit ◮ Two Bernoulli arms ◮ Ad hoc agent observes all payoffs

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-13
SLIDE 13

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Problem Description

◮ Multi-armed bandit ◮ Two Bernoulli arms ◮ Ad hoc agent observes all payoffs ◮ Multi-agent ◮ Simultaneous actions

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-14
SLIDE 14

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Problem Description

◮ Multi-armed bandit ◮ Two Bernoulli arms ◮ Ad hoc agent observes all payoffs ◮ Multi-agent ◮ Simultaneous actions ◮ Limited communication ◮ Fixed set of messages ◮ Has explicit cost

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-15
SLIDE 15

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Problem Description

◮ Multi-armed bandit ◮ Two Bernoulli arms ◮ Ad hoc agent observes all payoffs ◮ Multi-agent ◮ Simultaneous actions ◮ Limited communication ◮ Fixed set of messages ◮ Has explicit cost ◮ Goal: Maximize payoffs and minimize communication costs

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-16
SLIDE 16

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Communication

◮ Last observation ◮ Arm mean ◮ Suggestion

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-17
SLIDE 17

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Communication

◮ Last observation - The last arm chosen and the resulting

payoff

◮ Arm mean - The mean and number of pulls of a selected

arm

◮ Suggestion - Suggest that your teammates should pull the

selected arm

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-18
SLIDE 18

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Teammates

◮ Limited number of types ◮ Continuous parameters ◮ Tightly coordinated

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-19
SLIDE 19

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Teammates

◮ Limited number of types ◮ Continuous parameters ◮ Tightly coordinated ◮ Team shares knowledge through communication ◮ Do not need to track each agent’s pulls

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-20
SLIDE 20

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Teammate Behaviors

ε-Greedy UCB(c)

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-21
SLIDE 21

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Teammate Behaviors

ε-Greedy

◮ Track arm means ◮ Usually choose greedily ◮ ε - fraction of time to

explore UCB(c)

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-22
SLIDE 22

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Teammate Behaviors

ε-Greedy

◮ Track arm means ◮ Usually choose greedily ◮ ε - fraction of time to

explore UCB(c)

◮ Track arm means and pulls ◮ Choose greedily with

respect to bounds

◮ c - weight given to bounds

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-23
SLIDE 23

Introduction Problem Description Theoretical Results Empirical Results Conclusions Overview Communication Teammates

Teammate Behaviors

ε-Greedy

◮ Track arm means ◮ Usually choose greedily ◮ ε - fraction of time to

explore UCB(c)

◮ Track arm means and pulls ◮ Choose greedily with

respect to bounds

◮ c - weight given to bounds ◮ Have probability of following suggestion sent by ad hoc

agent

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-24
SLIDE 24

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Outline

1

Introduction

2

Problem Description

3

Theoretical Results

4

Empirical Results

5

Conclusions

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-25
SLIDE 25

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Research Question

Can an ad hoc agent approximately plan to communicate

  • ptimally with these teammates in polynomial time?
  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-26
SLIDE 26

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Model

◮ Model as a POMDP (teammates’ behaviors) ◮ State: ◮ Pulls and successes: ◮ Teammates’ ◮ Ad hoc agent’s ◮ Communicated

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-27
SLIDE 27

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Model

◮ Model as a POMDP (teammates’ behaviors) ◮ State: ◮ Pulls and successes: ◮ Teammates’ ◮ Ad hoc agent’s ◮ Communicated ◮ Types and parameters of teammates (partially

  • bserved)
  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-28
SLIDE 28

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Model

◮ Model as a POMDP (teammates’ behaviors) ◮ State: ◮ Pulls and successes: ◮ Teammates’ ◮ Ad hoc agent’s ◮ Communicated ◮ Types and parameters of teammates (partially

  • bserved)

◮ Actions are arms to choose and messages to send ◮ Transition function is based on arms’ distributions and

teammates’ behaviors

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-29
SLIDE 29

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Simple Version

◮ What if we know the teammates’ behaviors?

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-30
SLIDE 30

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Simple Version

◮ What if we know the teammates’ behaviors? ◮ Problem simplifies to an MDP ◮ What is the size of the state space?

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-31
SLIDE 31

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Simple Version

◮ What if we know the teammates’ behaviors? ◮ Problem simplifies to an MDP ◮ What is the size of the state space? ◮ Team is tightly coordinated ⇒ only track pulls and

successes of team

◮ Track team’s, ad hoc agent’s, and communicated pulls

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-32
SLIDE 32

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Simple Version

◮ What if we know the teammates’ behaviors? ◮ Problem simplifies to an MDP ◮ What is the size of the state space? ◮ Team is tightly coordinated ⇒ only track pulls and

successes of team

◮ Track team’s, ad hoc agent’s, and communicated pulls ◮ Polynomial in terms of number of teammates and

rounds

◮ Solvable in polynomial time

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-33
SLIDE 33

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Full version

◮ Do not fully know teammates’ behaviors ◮ Know teammates are either ε-greedy or UCB(c) ◮ Do not know ε or c ◮ Problem is a POMDP

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-34
SLIDE 34

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Background

◮ POMDPs can be approximately solved in polynomial time

in terms of the number of δ-neighborhoods that can cover the belief space (aka the covering number)

  • H. Kurniawati, D. Hsu, and W. S. Lee. SARSOP: Efficient point-based POMDP planning by

approximating optimally reachable belief spaces. In In Proc. Robotics: Science and Systems, 2008

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-35
SLIDE 35

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

δ-neighborhood

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-36
SLIDE 36

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor ◮ Only need to worry about the partially observed teammates

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-37
SLIDE 37

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor ◮ Only need to worry about the partially observed teammates ◮ Belief space of ε can be represented as beta

distribution

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-38
SLIDE 38

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor ◮ Only need to worry about the partially observed teammates ◮ Belief space of ε can be represented as beta

distribution

◮ Belief space of c can be represented by the upper and

lower possible values

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-39
SLIDE 39

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor ◮ Only need to worry about the partially observed teammates ◮ Belief space of ε can be represented as beta

distribution

◮ Belief space of c can be represented by the upper and

lower possible values

◮ Can track probability of ε-greedy vs UCB using Bayes

updates

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-40
SLIDE 40

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor ◮ Only need to worry about the partially observed teammates ◮ Belief space of ε can be represented as beta

distribution

◮ Belief space of c can be represented by the upper and

lower possible values

◮ Can track probability of ε-greedy vs UCB using Bayes

updates

◮ Covering number of belief space is polynomial ⇒ POMDP

can be solved in polynomial time

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-41
SLIDE 41

Introduction Problem Description Theoretical Results Empirical Results Conclusions Question Model Simple Problem Proof sketch

Proof Sketch

◮ Observable part of the state adds a polynomial factor ◮ Only need to worry about the partially observed teammates ◮ Belief space of ε can be represented as beta

distribution

◮ Belief space of c can be represented by the upper and

lower possible values

◮ Can track probability of ε-greedy vs UCB using Bayes

updates

◮ Covering number of belief space is polynomial ⇒ POMDP

can be solved in polynomial time

◮ Results carry over into case of unknown arm means

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-42
SLIDE 42

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Outline

1

Introduction

2

Problem Description

3

Theoretical Results

4

Empirical Results

5

Conclusions

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-43
SLIDE 43

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Approach

◮ POMDP problem is tractable ⇒ we can use existing

POMDP solvers

◮ POMCP ◮ Particle filtering to track beliefs ◮ Monte Carlo tree search to plan ◮

  • D. Silver and J. Veness. Monte-Carlo planning in large POMDPs. In NIPS ’10. 2010
  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-44
SLIDE 44

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Approach

◮ POMDP problem is tractable ⇒ we can use existing

POMDP solvers

◮ POMCP ◮ Particle filtering to track beliefs ◮ Monte Carlo tree search to plan ◮ Fast ◮ Handles large state-action spaces ◮ Approximate ◮

  • D. Silver and J. Veness. Monte-Carlo planning in large POMDPs. In NIPS ’10. 2010
  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-45
SLIDE 45

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Empirical Setup

◮ Vary message costs ◮ Vary number of rounds ◮ Vary number of arms ◮ Vary number of teammates

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-46
SLIDE 46

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Ad Hoc Agent Behaviors

◮ POMCP - Plan using POMCP ◮ NoComm - Act greedily and do not communicate ◮ Obs - Act greedily and communicate the last observation

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-47
SLIDE 47

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Problem Description

◮ Problem tackled in the theory ◮ Teammates are either ε-greedy or UCB(c) ◮ Need to figure out: ◮ Type ◮ Parameter (ε or c) ◮ Chance of following suggestion

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-48
SLIDE 48

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

ε-Greedy Teammates

0.08 0.16 0.32 0.64 1.28 2.56

Message Cost

0.3 0.4 0.5 0.6 0.7 0.8 0.9

Frac of Max Reward

POMCP NoComm Obs

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-49
SLIDE 49

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

UCB(c) Teammates

0.08 0.16 0.32 0.64 1.28 2.56

Message Cost

0.3 0.4 0.5 0.6 0.7 0.8 0.9

Frac of Max Reward

POMCP NoComm Obs

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-50
SLIDE 50

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Unknown arms - ε-greedy or UCB(c)

1 2 3 4 5 6 7 8 9

Num Teammates

0.60 0.65 0.70 0.75 0.80 0.85

Frac of Max Reward

POMCP NoComm Obs Match

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-51
SLIDE 51

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Externally-created Teammates

◮ Teammates we did not create ◮ Created by students for project

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-52
SLIDE 52

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Externally-created Teammates

◮ Teammates we did not create ◮ Created by students for project ◮ Not necessarily tightly coordinated ◮ Not considering ad hoc teamwork

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-53
SLIDE 53

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Externally-created Teammates

◮ True ad hoc teamwork scenario ◮ Models are incorrect ◮ Theoretical guarantees do not hold

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-54
SLIDE 54

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Externally-created Teammates – Cost

0.08 0.16 0.32 0.64 1.28 2.56

Message Cost

0.0 0.2 0.4 0.6 0.8

Frac of Max Reward

POMCP NoComm Obs

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-55
SLIDE 55

Introduction Problem Description Theoretical Results Empirical Results Conclusions Setup ε-Greedy Teammates UCB(c) Teammates Unknown arms Externally-created Teammates

Externally-created Teammates – Num Teammates

1 2 3 4 5 6 7 8 9

Num Teammates

0.3 0.4 0.5 0.6 0.7 0.8

Frac of Max Reward

POMCP NoComm Obs

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-56
SLIDE 56

Introduction Problem Description Theoretical Results Empirical Results Conclusions Related Work Conclusions Future Work Questions

Outline

1

Introduction

2

Problem Description

3

Theoretical Results

4

Empirical Results

5

Conclusions

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-57
SLIDE 57

Introduction Problem Description Theoretical Results Empirical Results Conclusions Related Work Conclusions Future Work Questions

Related Work

  • S. Liemhetcharat and M. Veloso. Modeling mutual capabilities in heterogeneous teams for role assignment.

In IROS ’11, pages 3638 –3644, 2011

  • F. Wu, S. Zilberstein, and X. Chen. Online planning for ad hoc autonomous agent teams. In IJCAI, 2011

  • M. Bowling and P

. McCracken. Coordination and adaptation in impromptu teams. In AAAI, pages 53–58, 2005

  • J. Han, M. Li, and L. Guo. Soft control on collective behavior of a group of autonomous agents by a shill
  • agent. Journal of Systems Science and Complexity, 19:54–62, 2006

  • M. Knudson and K. Tumer. Robot coordination with ad-hoc team formation. In AAMAS ’10, pages

1441–1442, 2010

  • E. Jones, B. Browning, M. B. Dias, B. Argall, M. M. Veloso, and A. T. Stentz. Dynamically formed

heterogeneous robot teams performing tightly-coordinated tasks. In ICRA, pages 570 – 575, May 2006

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-58
SLIDE 58

Introduction Problem Description Theoretical Results Empirical Results Conclusions Related Work Conclusions Future Work Questions

Conclusions

◮ Can optimally plan best way to communicate with unknown

teammates

◮ Can handle an infinite set of possible teammates ◮ Can cooperate with a variety of teammates not covered in

theory

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-59
SLIDE 59

Introduction Problem Description Theoretical Results Empirical Results Conclusions Related Work Conclusions Future Work Questions

Future Work

◮ More complex domains ◮ Unknown environments ◮ Teammates that learn about us

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates

slide-60
SLIDE 60

Introduction Problem Description Theoretical Results Empirical Results Conclusions Related Work Conclusions Future Work Questions

Thank You!

In some cases, ad hoc agents can optimally plan about how to communicate with their teammates.

  • S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone

Communicating with Unknown Teammates