Ad Hoc Autonomous Agent Teams: Collaboration without - - PowerPoint PPT Presentation

ad hoc autonomous agent teams collaboration without pre
SMART_READER_LITE
LIVE PREVIEW

Ad Hoc Autonomous Agent Teams: Collaboration without - - PowerPoint PPT Presentation

Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination Peter Stone Director, Learning Agents Research Group Department of Computer Science The University of Texas at Austin Joint work with Gal A. Kaminka , Sarit Kraus , Bar Ilan


slide-1
SLIDE 1

Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination

Peter Stone Director, Learning Agents Research Group Department of Computer Science The University of Texas at Austin Joint work with Gal A. Kaminka, Sarit Kraus, Bar Ilan University Jeffrey S. Rosenschein, Hebrew University

slide-2
SLIDE 2

Teamwork

c 2010 Peter Stone

slide-3
SLIDE 3

Teamwork

c 2010 Peter Stone

slide-4
SLIDE 4

Teamwork

  • Typical scenario: pre-coordination

− People practice together − Robots given coordination languages, protocols − “Locker room agreement” [Stone & Veloso, ’99]

c 2010 Peter Stone

slide-5
SLIDE 5

Ad Hoc Teams

  • Ad hoc team player is an individual

− Unknown teammates (programmed by others)

c 2010 Peter Stone

slide-6
SLIDE 6

Ad Hoc Teams

  • Ad hoc team player is an individual

− Unknown teammates (programmed by others)

  • May or may not be able to communicate

c 2010 Peter Stone

slide-7
SLIDE 7

Ad Hoc Teams

  • Ad hoc team player is an individual

− Unknown teammates (programmed by others)

  • May or may not be able to communicate
  • Teammates likely sub-optimal: no control

c 2010 Peter Stone

slide-8
SLIDE 8

Ad Hoc Teams

  • Ad hoc team player is an individual

− Unknown teammates (programmed by others)

  • May or may not be able to communicate
  • Teammates likely sub-optimal: no control

c 2010 Peter Stone

slide-9
SLIDE 9

Ad Hoc Teams

  • Ad hoc team player is an individual

− Unknown teammates (programmed by others)

  • May or may not be able to communicate
  • Teammates likely sub-optimal: no control

Challenge: Create a good team player

c 2010 Peter Stone

slide-10
SLIDE 10

Illustration

c 2010 Peter Stone

slide-11
SLIDE 11

An Individual

c 2010 Peter Stone

slide-12
SLIDE 12

With Teammates

c 2010 Peter Stone

slide-13
SLIDE 13

Made by Others

c 2010 Peter Stone

slide-14
SLIDE 14

Heterogeneous

c 2010 Peter Stone

slide-15
SLIDE 15

May not Communicate

c 2010 Peter Stone

slide-16
SLIDE 16

May Have Different Capabilities

c 2010 Peter Stone

slide-17
SLIDE 17

And/Or Maneuverability

c 2010 Peter Stone

slide-18
SLIDE 18

May be a Previously Unknown Type

c 2010 Peter Stone

slide-19
SLIDE 19

Human Ad Hoc Teams

  • Military and industrial settings

c 2010 Peter Stone

slide-20
SLIDE 20

Human Ad Hoc Teams

  • Military and industrial settings

− Outsourcing

c 2010 Peter Stone

slide-21
SLIDE 21

Human Ad Hoc Teams

  • Military and industrial settings

− Outsourcing

  • Agents support human ad hoc team formation

[Just et al., 2004; Kildare, 2004]

c 2010 Peter Stone

slide-22
SLIDE 22

Human Ad Hoc Teams

  • Military and industrial settings

− Outsourcing

  • Agents support human ad hoc team formation

[Just et al., 2004; Kildare, 2004]

  • Autonomous agents (robots) deployed for short times

− Teams developed as cohesive groups − Tuned to interact well together

c 2010 Peter Stone

slide-23
SLIDE 23

Challenge Statement

Create an autonomous agent that is able to efficiently and robustly collaborate with previously unknown teammates on tasks to which they are all individually capable of contributing as team members.

c 2010 Peter Stone

slide-24
SLIDE 24

Challenge Statement

Create an autonomous agent that is able to efficiently and robustly collaborate with previously unknown teammates on tasks to which they are all individually capable of contributing as team members.

  • Aspects can be approached theoretically

c 2010 Peter Stone

slide-25
SLIDE 25

Challenge Statement

Create an autonomous agent that is able to efficiently and robustly collaborate with previously unknown teammates on tasks to which they are all individually capable of contributing as team members.

  • Aspects can be approached theoretically
  • Ultimately an empirical challenge

c 2010 Peter Stone

slide-26
SLIDE 26

Empirical Evaluation

a0

c 2010 Peter Stone

slide-27
SLIDE 27

Evaluation: A Metric

a0 a1

c 2010 Peter Stone

slide-28
SLIDE 28

Evaluation: A Metric

a0 a1

  • Most meaningful when a0 and a1 have similar individual

competencies

c 2010 Peter Stone

slide-29
SLIDE 29

Evaluation: Domain Consisting of Tasks

a0 a1

D

c 2010 Peter Stone

slide-30
SLIDE 30

Evaluation: Set of Possible Teammates

a0 a1

A D

c 2010 Peter Stone

slide-31
SLIDE 31

Evaluation: Draw a Random Task

a0 a1

A D

c 2010 Peter Stone

slide-32
SLIDE 32

Evaluation: Random Team, Check Comp

a0 a1

A D

c 2010 Peter Stone

slide-33
SLIDE 33

Evalution: Replace Random with a0

a0

a1

A D

c 2010 Peter Stone

slide-34
SLIDE 34

Evaluation: Then a1 — Evaluate Diff

a1

a0

A D

c 2010 Peter Stone

slide-35
SLIDE 35

Evaluation: Repeat

a0 a1

A D

c 2010 Peter Stone

slide-36
SLIDE 36

Evaluate(a0, a1, A, D)

  • Initialize performance (reward) counters r0 and r1 for agents a0 and

a1 respectively to r0 = r1 = 0.

  • Repeat:

– Sample a task d from D. – Randomly draw a subset of agents B, |B| ≥ 2, from A such that E[s(B, d)] ≥ smin. – Randomly select one agent b ∈ B to remove from the team to create the team B−. – increment r0 by s({a0} ∪ B−, d) – increment r1 by s({a1} ∪ B−, d)

  • If r0 > r1 then we conclude that a0 is a better ad-hoc team player

than a1 in domain D over the set of possible teammates A.

c 2010 Peter Stone

slide-37
SLIDE 37

Technical Requirements

  • Assess capabilities of other agents (teammate modeling)

c 2010 Peter Stone

slide-38
SLIDE 38

Technical Requirements

  • Assess capabilities of other agents (teammate modeling)
  • Assess the other agents’ knowledge states

c 2010 Peter Stone

slide-39
SLIDE 39

Technical Requirements

  • Assess capabilities of other agents (teammate modeling)
  • Assess the other agents’ knowledge states
  • Estimate effects of actions on teammates

c 2010 Peter Stone

slide-40
SLIDE 40

Technical Requirements

  • Assess capabilities of other agents (teammate modeling)
  • Assess the other agents’ knowledge states
  • Estimate effects of actions on teammates
  • Be prepared to interact with many types of teammates:

− May or may not be able to communicate − May be more or less mobile − May be better or worse at sensing

c 2010 Peter Stone

slide-41
SLIDE 41

Technical Requirements

  • Assess capabilities of other agents (teammate modeling)
  • Assess the other agents’ knowledge states
  • Estimate effects of actions on teammates
  • Be prepared to interact with many types of teammates:

− May or may not be able to communicate − May be more or less mobile − May be better or worse at sensing A good team player’s best actions will differ depending on its teammates’ characteristics.

c 2010 Peter Stone

slide-42
SLIDE 42

Preliminary Theoretical Progress

  • Aspects can be approached theoretically
  • Ultimately an empirical challenge

c 2010 Peter Stone

slide-43
SLIDE 43

Preliminary Theoretical Progress

  • Aspects can be approached theoretically
  • Ultimately an empirical challenge

Be prepared to interact with many types of teammates

c 2010 Peter Stone

slide-44
SLIDE 44

Preliminary Theoretical Progress

  • Aspects can be approached theoretically
  • Ultimately an empirical challenge

Be prepared to interact with many types of teammates

  • Minimal representative scenarios

− One teammate, no communication − Fixed and known behavior

c 2010 Peter Stone

slide-45
SLIDE 45

Scenarios

  • Cooperative iterated normal form game

[w/ Kaminka & Rosenschein—AMEC’09] M1 b0 b1 b2 a0 25 1 a1 10 30 10 a2 33 40

  • Cooperative k-armed bandit

[w/ Kraus—AAMAS’10]

c 2010 Peter Stone

slide-46
SLIDE 46

Scenarios

  • Cooperative normal form game

M1 b0 b1 b2 a0 25 1 a1 10 30 10 a2 33 40

  • Cooperative k-armed bandit

c 2010 Peter Stone

slide-47
SLIDE 47

3-armed bandit

= ⇒

  • Random value from a distribution
  • Expected value µ
  • c

2010 Peter Stone

slide-48
SLIDE 48

3-armed bandit

Arm∗ Arm1 Arm2

c 2010 Peter Stone

slide-49
SLIDE 49

3-armed bandit

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Agent A: teacher

− Knows payoff distributions − Objective: maximize expected sum of payoffs

c 2010 Peter Stone

slide-50
SLIDE 50

3-armed bandit

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Agent A: teacher

− Knows payoff distributions − Objective: maximize expected sum of payoffs − If alone, always Arm∗

c 2010 Peter Stone

slide-51
SLIDE 51

3-armed bandit

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Agent A: teacher

− Knows payoff distributions − Objective: maximize expected sum of payoffs − If alone, always Arm∗

  • Agent B: learner

− Can only pull Arm1 or Arm2

c 2010 Peter Stone

slide-52
SLIDE 52

3-armed bandit

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Agent A: teacher

− Knows payoff distributions − Objective: maximize expected sum of payoffs − If alone, always Arm∗

  • Agent B: learner

− Can only pull Arm1 or Arm2 − Selects arm with highest observed sample average

c 2010 Peter Stone

slide-53
SLIDE 53

Assumptions

Arm∗ Arm1 Arm2

c 2010 Peter Stone

slide-54
SLIDE 54

Assumptions

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Alternate actions (teacher first)
  • Results of all actions fully observable (to both)

c 2010 Peter Stone

slide-55
SLIDE 55

Assumptions

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Alternate actions (teacher first)
  • Results of all actions fully observable (to both)
  • Number of rounds remaining finite, known to teacher

c 2010 Peter Stone

slide-56
SLIDE 56

Assumptions

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Alternate actions (teacher first)
  • Results of all actions fully observable (to both)
  • Number of rounds remaining finite, known to teacher

Objective: maximize expected sum of payoffs

c 2010 Peter Stone

slide-57
SLIDE 57

Summary of Findings

Arm∗ Arm1 Arm2

c 2010 Peter Stone

slide-58
SLIDE 58

Summary of Findings

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Arm1 is sometimes optimal
  • Arm2 is never optimal

c 2010 Peter Stone

slide-59
SLIDE 59

Summary of Findings

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Arm1 is sometimes optimal
  • Arm2 is never optimal
  • Optimal solution when arms have discrete distribution
  • Interesting patterns in optimal action
  • Extensions to more arms

c 2010 Peter Stone

slide-60
SLIDE 60

Summary of Findings

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Arm1 is sometimes optimal
  • Arm2 is never optimal
  • Optimal solution when arms have discrete distribution
  • Interesting patterns in optimal action
  • Extensions to more arms
  • Exploitation vs.

c 2010 Peter Stone

slide-61
SLIDE 61

Summary of Findings

Arm∗ Arm1 Arm2 µ∗ > µ1 > µ2

  • Arm1 is sometimes optimal
  • Arm2 is never optimal
  • Optimal solution when arms have discrete distribution
  • Interesting patterns in optimal action
  • Extensions to more arms
  • Exploitation vs. vs. teaching

c 2010 Peter Stone

slide-62
SLIDE 62

Challenge Statement

Create an autonomous agent that is able to efficiently and robustly collaborate with previously unknown teammates on tasks to which they are all individually capable of contributing as team members.

c 2010 Peter Stone

slide-63
SLIDE 63

Suggested Research Plan

  • 1. Identify the full range of possible teamwork situations that a complete

ad hoc team player needs to be capable of addressing (D and A).

c 2010 Peter Stone

slide-64
SLIDE 64

Suggested Research Plan

  • 1. Identify the full range of possible teamwork situations that a complete

ad hoc team player needs to be capable of addressing (D and A).

  • 2. For each such situation, find theoretically optimal and/or empirically

effective algorithms for behavior.

c 2010 Peter Stone

slide-65
SLIDE 65

Suggested Research Plan

  • 1. Identify the full range of possible teamwork situations that a complete

ad hoc team player needs to be capable of addressing (D and A).

  • 2. For each such situation, find theoretically optimal and/or empirically

effective algorithms for behavior.

  • 3. Develop methods for identifying which type of teamwork situation the

agent is currently in, in an online fashion.

c 2010 Peter Stone

slide-66
SLIDE 66

Suggested Research Plan

  • 1. Identify the full range of possible teamwork situations that a complete

ad hoc team player needs to be capable of addressing (D and A).

  • 2. For each such situation, find theoretically optimal and/or empirically

effective algorithms for behavior.

  • 3. Develop methods for identifying which type of teamwork situation the

agent is currently in, in an online fashion.

  • 2 and 3: the core technical challenges

c 2010 Peter Stone

slide-67
SLIDE 67

Suggested Research Plan

  • 1. Identify the full range of possible teamwork situations that a complete

ad hoc team player needs to be capable of addressing (D and A).

  • 2. For each such situation, find theoretically optimal and/or empirically

effective algorithms for behavior.

  • 3. Develop methods for identifying which type of teamwork situation the

agent is currently in, in an online fashion.

  • 2 and 3: the core technical challenges
  • 1 and 3: a knob to incrementally increase difficulty

c 2010 Peter Stone

slide-68
SLIDE 68

Related Work

Multiagent learning [Claus & Boutilier, ’98],[Littman, ’01],

[Conitzer & Sandholm, ’03],[Powers & Shoham, ’05],[Chakraborty & Stone, ’08]

Opponent Modeling

  • Intended plan recognition [Sidner, ’85],[Lochbaum,’91],[Carberry, ’01]
  • SharedPlans [Grosz & Kraus, ’96]
  • Recursive Modeling [Vidal & Durfee, ’95]

Human-Robot-Agent Teams

  • Overlapping but different challenges, including HRI [Klein, ’04]
  • Out of scope

Much More pertaining to specific teammate characteristics

c 2010 Peter Stone

slide-69
SLIDE 69

Acknowledgements

  • Fulbright and Guggenheim Foundations
  • Israel Science Foundation

c 2010 Peter Stone

slide-70
SLIDE 70

Ad Hoc Teams

  • Ad hoc team player is an individual

− Unknown teammates (programmed by others)

  • May or may not be able to communicate
  • Teammates likely sub-optimal: no control

Challenge: Create a good team player

c 2010 Peter Stone