CS344M Autonomous Multiagent Systems Todd Hester Department or - - PowerPoint PPT Presentation

cs344m autonomous multiagent systems
SMART_READER_LITE
LIVE PREVIEW

CS344M Autonomous Multiagent Systems Todd Hester Department or - - PowerPoint PPT Presentation

CS344M Autonomous Multiagent Systems Todd Hester Department or Computer Science The University of Texas at Austin Good Afternoon, Colleagues Are there any questions? Todd Hester Logistics Readings Todd Hester Logistics Readings


slide-1
SLIDE 1

CS344M Autonomous Multiagent Systems

Todd Hester Department or Computer Science The University of Texas at Austin

slide-2
SLIDE 2

Good Afternoon, Colleagues

Are there any questions?

Todd Hester

slide-3
SLIDE 3

Logistics

  • Readings

Todd Hester

slide-4
SLIDE 4

Logistics

  • Readings

– Specify which papers you read!

Todd Hester

slide-5
SLIDE 5

Logistics

  • Readings

– Specify which papers you read! – 2 case studies and 1 TDP

Todd Hester

slide-6
SLIDE 6

Logistics

  • Readings

– Specify which papers you read! – 2 case studies and 1 TDP

  • How to read a research paper

Todd Hester

slide-7
SLIDE 7

Logistics

  • Readings

– Specify which papers you read! – 2 case studies and 1 TDP

  • How to read a research paper

– Some have too few details...

Todd Hester

slide-8
SLIDE 8

Logistics

  • Readings

– Specify which papers you read! – 2 case studies and 1 TDP

  • How to read a research paper

– Some have too few details... – Others have too many.

Todd Hester

slide-9
SLIDE 9

Logistics

  • Readings

– Specify which papers you read! – 2 case studies and 1 TDP

  • How to read a research paper

– Some have too few details... – Others have too many.

  • Next week’s readings posted

Todd Hester

slide-10
SLIDE 10

Logistics

  • Readings

– Specify which papers you read! – 2 case studies and 1 TDP

  • How to read a research paper

– Some have too few details... – Others have too many.

  • Next week’s readings posted
  • Use the undergrad writing center!

– Friday afternoon workshops (3 p.m.)

Todd Hester

slide-11
SLIDE 11

Overview of the Readings

  • Darwin: genetic programming approach

Todd Hester

slide-12
SLIDE 12

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection

Todd Hester

slide-13
SLIDE 13

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection
  • Riley et al: Coach competition, extracting models

Todd Hester

slide-14
SLIDE 14

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection
  • Riley et al: Coach competition, extracting models
  • Kuhlmann et al: Learning for coaching

Todd Hester

slide-15
SLIDE 15

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection
  • Riley et al: Coach competition, extracting models
  • Kuhlmann et al: Learning for coaching
  • Withopf and Riedmiller: Reinforcement learning

Todd Hester

slide-16
SLIDE 16

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection
  • Riley et al: Coach competition, extracting models
  • Kuhlmann et al: Learning for coaching
  • Withopf and Riedmiller: Reinforcement learning
  • MacAlpine et al: UT Austin Villa 2011

Todd Hester

slide-17
SLIDE 17

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection
  • Riley et al: Coach competition, extracting models
  • Kuhlmann et al: Learning for coaching
  • Withopf and Riedmiller: Reinforcement learning
  • MacAlpine et al: UT Austin Villa 2011
  • Barrett et al: SPL Kicking strategy

Todd Hester

slide-18
SLIDE 18

Evolutionary Computation

  • Motivated by biological evolution: GA, GP

Todd Hester

slide-19
SLIDE 19

Evolutionary Computation

  • Motivated by biological evolution: GA, GP
  • Search through a space

Todd Hester

slide-20
SLIDE 20

Evolutionary Computation

  • Motivated by biological evolution: GA, GP
  • Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

Todd Hester

slide-21
SLIDE 21

Evolutionary Computation

  • Motivated by biological evolution: GA, GP
  • Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

  • Randomized, parallel hill-climbing through space

Todd Hester

slide-22
SLIDE 22

Evolutionary Computation

  • Motivated by biological evolution: GA, GP
  • Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

  • Randomized, parallel hill-climbing through space
  • Learning is an optimization problem (fitness)

Todd Hester

slide-23
SLIDE 23

Evolutionary Computation

  • Motivated by biological evolution: GA, GP
  • Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

  • Randomized, parallel hill-climbing through space
  • Learning is an optimization problem (fitness)

Some slides from Machine Learning [Mitchell, 1997]

Todd Hester

slide-24
SLIDE 24

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)

Todd Hester

slide-25
SLIDE 25

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction

Todd Hester

slide-26
SLIDE 26

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction
  • Evolves whole teams — lexicographic fitness function

Todd Hester

slide-27
SLIDE 27

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction
  • Evolves whole teams — lexicographic fitness function
  • Evolved on huge (at the time) hypercube

Todd Hester

slide-28
SLIDE 28

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction
  • Evolves whole teams — lexicographic fitness function
  • Evolved on huge (at the time) hypercube
  • Lots of spinning, but figured out dribbling, offsides

Todd Hester

slide-29
SLIDE 29

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction
  • Evolves whole teams — lexicographic fitness function
  • Evolved on huge (at the time) hypercube
  • Lots of spinning, but figured out dribbling, offsides
  • 1-1-1 record. Tied a good team, but didn’t advance

Todd Hester

slide-30
SLIDE 30

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction
  • Evolves whole teams — lexicographic fitness function
  • Evolved on huge (at the time) hypercube
  • Lots of spinning, but figured out dribbling, offsides
  • 1-1-1 record. Tied a good team, but didn’t advance
  • Success of the method, but not pursued

Todd Hester

slide-31
SLIDE 31

Architecture for Action Selection

  • (other slides, video)

Todd Hester

slide-32
SLIDE 32

Architecture for Action Selection

  • (other slides, video)
  • downsides

Todd Hester

slide-33
SLIDE 33

Architecture for Action Selection

  • (other slides, video)
  • downsides
  • Keepaway

Todd Hester

slide-34
SLIDE 34

Coaching

  • Learn best strategy to play a fixed team

Todd Hester

slide-35
SLIDE 35

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency

Todd Hester

slide-36
SLIDE 36

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations

Todd Hester

slide-37
SLIDE 37

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations
  • Learn when successful teams passed/kicked

Todd Hester

slide-38
SLIDE 38

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations
  • Learn when successful teams passed/kicked
  • Learn when opponent will pass and try to block

Todd Hester

slide-39
SLIDE 39

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations
  • Learn when successful teams passed/kicked
  • Learn when opponent will pass and try to block
  • What if players switch roles?

Todd Hester

slide-40
SLIDE 40

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations
  • Learn when successful teams passed/kicked
  • Learn when opponent will pass and try to block
  • What if players switch roles?
  • Why just imitate another team?

Todd Hester

slide-41
SLIDE 41

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations
  • Learn when successful teams passed/kicked
  • Learn when opponent will pass and try to block
  • What if players switch roles?
  • Why just imitate another team?
  • Other slides

Todd Hester

slide-42
SLIDE 42

Reinforcement Learning

  • RL Slides

Todd Hester

slide-43
SLIDE 43

Reinforcement Learning

  • RL Slides
  • Extend to grid soccer

Todd Hester

slide-44
SLIDE 44

Reinforcement Learning

  • RL Slides
  • Extend to grid soccer
  • Large state space, joint actions

Todd Hester

slide-45
SLIDE 45

Reinforcement Learning

  • RL Slides
  • Extend to grid soccer
  • Large state space, joint actions

Todd Hester

slide-46
SLIDE 46

UT Austin Villa 2011

  • Other slides

Todd Hester

slide-47
SLIDE 47

UT Austin Villa 2011

  • Other slides
  • Why not use CMA-ES on role positions as well?

Todd Hester

slide-48
SLIDE 48

UT Austin Villa 2011

  • Other slides
  • Why not use CMA-ES on role positions as well?
  • Changes for 2012?

Todd Hester

slide-49
SLIDE 49

Kicking Under Uncertainty

  • Used by our SPL team

Todd Hester

slide-50
SLIDE 50

Kicking Under Uncertainty

  • Used by our SPL team
  • Kick engine to kick at various distances/headings

Todd Hester

slide-51
SLIDE 51

Kicking Under Uncertainty

  • Used by our SPL team
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location

Todd Hester

slide-52
SLIDE 52

Kicking Under Uncertainty

  • Used by our SPL team
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location
  • Select first kick that moves ball up field

Todd Hester

slide-53
SLIDE 53

Kicking Under Uncertainty

  • Used by our SPL team
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location
  • Select first kick that moves ball up field
  • Figure

Todd Hester

slide-54
SLIDE 54

Kicking Under Uncertainty

  • Used by our SPL team
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location
  • Select first kick that moves ball up field
  • Figure
  • Emphasis on quickness

Todd Hester

slide-55
SLIDE 55

Kicking Under Uncertainty

  • Used by our SPL team
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location
  • Select first kick that moves ball up field
  • Figure
  • Emphasis on quickness
  • Now: Better model of opponents -> Know if we have more

time

Todd Hester

slide-56
SLIDE 56

Kicking Under Uncertainty

  • Used by our SPL team
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location
  • Select first kick that moves ball up field
  • Figure
  • Emphasis on quickness
  • Now: Better model of opponents -> Know if we have more

time

Todd Hester

slide-57
SLIDE 57

Learning Commentary

  • David Chen and Ray Mooney

Todd Hester

slide-58
SLIDE 58

Coordination Graphs

  • n agents, each choose an action Ai

Todd Hester

slide-59
SLIDE 59

Coordination Graphs

  • n agents, each choose an action Ai
  • A = A1 × . . . × An

Todd Hester

slide-60
SLIDE 60

Coordination Graphs

  • n agents, each choose an action Ai
  • A = A1 × . . . × An
  • Ri(A) → IR

Todd Hester

slide-61
SLIDE 61

Coordination Graphs

  • n agents, each choose an action Ai
  • A = A1 × . . . × An
  • Ri(A) → IR
  • Coordination problem: R1 = . . . = Rn = R

Todd Hester

slide-62
SLIDE 62

Coordination Graphs

  • n agents, each choose an action Ai
  • A = A1 × . . . × An
  • Ri(A) → IR
  • Coordination problem: R1 = . . . = Rn = R
  • Nash equilibrium: no agent could do better given what
  • thers are doing.

Todd Hester

slide-63
SLIDE 63

Coordination Graphs

  • n agents, each choose an action Ai
  • A = A1 × . . . × An
  • Ri(A) → IR
  • Coordination problem: R1 = . . . = Rn = R
  • Nash equilibrium: no agent could do better given what
  • thers are doing.
  • May be more than one (chicken)

Todd Hester

slide-64
SLIDE 64

Example from the paper

  • Understand the rule syntax

Todd Hester

slide-65
SLIDE 65

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph

Todd Hester

slide-66
SLIDE 66

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph
  • First eliminate rules based on context

Todd Hester

slide-67
SLIDE 67

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph
  • First eliminate rules based on context
  • What does it mean for G3 to collect all relevant rules?

Todd Hester

slide-68
SLIDE 68

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph
  • First eliminate rules based on context
  • What does it mean for G3 to collect all relevant rules?
  • What does it mean for G3 to maximize over all actions of

a1 and a2?

Todd Hester

slide-69
SLIDE 69

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph
  • First eliminate rules based on context
  • What does it mean for G3 to collect all relevant rules?
  • What does it mean for G3 to maximize over all actions of

a1 and a2?

  • How are the results propagated back?

Todd Hester

slide-70
SLIDE 70

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph
  • First eliminate rules based on context
  • What does it mean for G3 to collect all relevant rules?
  • What does it mean for G3 to maximize over all actions of

a1 and a2?

  • How are the results propagated back?
  • Let’s try again with G1 eliminated first

Todd Hester

slide-71
SLIDE 71

Application to soccer

  • Make the world discrete by assigning roles, using high-

level predicates

Todd Hester

slide-72
SLIDE 72

Application to soccer

  • Make the world discrete by assigning roles, using high-

level predicates

  • Assume global state information

Todd Hester

slide-73
SLIDE 73

Application to soccer

  • Make the world discrete by assigning roles, using high-

level predicates

  • Assume global state information
  • Finds pass sequences and starts players moving ahead of

time.

Todd Hester

slide-74
SLIDE 74

Application to soccer

  • Make the world discrete by assigning roles, using high-

level predicates

  • Assume global state information
  • Finds pass sequences and starts players moving ahead of

time.

  • Note the results: with and without coordination.

Todd Hester

slide-75
SLIDE 75

Reactive Deliberation

  • A hybrid approach
  • Executor: carry out reactive behaviors
  • Deliberator:

evaluate possible high-level schema with parameters; generate bids

  • Deliberator takes time, but something keeps happening

always.

  • In effect: deliberator commits to schema for some time

Todd Hester