CS344M Autonomous Multiagent Systems Todd Hester Department or - - PowerPoint PPT Presentation

cs344m autonomous multiagent systems
SMART_READER_LITE
LIVE PREVIEW

CS344M Autonomous Multiagent Systems Todd Hester Department or - - PowerPoint PPT Presentation

CS344M Autonomous Multiagent Systems Todd Hester Department or Computer Science The University of Texas at Austin Good Afternoon, Colleagues Are there any questions? Todd Hester Good Afternoon, Colleagues Are there any questions?


slide-1
SLIDE 1

CS344M Autonomous Multiagent Systems

Todd Hester Department or Computer Science The University of Texas at Austin

slide-2
SLIDE 2

Good Afternoon, Colleagues

Are there any questions?

Todd Hester

slide-3
SLIDE 3

Good Afternoon, Colleagues

Are there any questions?

  • Changes from 2011 to now
  • Do different formations in different situations?
  • How does UT’s walk engine work?
  • Has the formation code been released? copied?
  • Why does world model give 0s for some players? Unseen?

Todd Hester

slide-4
SLIDE 4

Good Afternoon, Colleagues

Are there any questions?

  • Changes from 2011 to now
  • Do different formations in different situations?
  • How does UT’s walk engine work?
  • Has the formation code been released? copied?
  • Why does world model give 0s for some players? Unseen?
  • Todd: Why not run CMA-ES to optimize role positions too?

Todd Hester

slide-5
SLIDE 5

Logistics

  • Assignment 4 due today

Todd Hester

slide-6
SLIDE 6

Logistics

  • Assignment 4 due today
  • Next week’s readings posted

Todd Hester

slide-7
SLIDE 7

Logistics

  • Assignment 4 due today
  • Next week’s readings posted
  • Final project proposal assigned

Todd Hester

slide-8
SLIDE 8

Final Projects

  • Proposal (10/11): 3+ pages
  • What you’re going to do; graded on writing

Todd Hester

slide-9
SLIDE 9

Final Projects

  • Proposal (10/11): 3+ pages
  • What you’re going to do; graded on writing
  • Progress Report (11/8): 5+ pages + binaries + logs
  • What you’ve been doing; graded on writing

Todd Hester

slide-10
SLIDE 10

Final Projects

  • Proposal (10/11): 3+ pages
  • What you’re going to do; graded on writing
  • Progress Report (11/8): 5+ pages + binaries + logs
  • What you’ve been doing; graded on writing
  • Peer Review (11/15): review 2 progress reports
  • Clear? suggestions?; graded on writing and feedback

quality

Todd Hester

slide-11
SLIDE 11

Final Projects

  • Team (12/4): source + binaries
  • The tournament entry; make sure it runs!

Todd Hester

slide-12
SLIDE 12

Final Projects

  • Team (12/4): source + binaries
  • The tournament entry; make sure it runs!
  • Final Report (12/6): 8+ pages
  • A term paper; the main component of your grade

Todd Hester

slide-13
SLIDE 13

Final Projects

  • Team (12/4): source + binaries
  • The tournament entry; make sure it runs!
  • Final Report (12/6): 8+ pages
  • A term paper; the main component of your grade
  • Tournament (12/17): nothing due
  • Oral presentation

Todd Hester

slide-14
SLIDE 14

Final Projects

  • Team (12/4): source + binaries
  • The tournament entry; make sure it runs!
  • Final Report (12/6): 8+ pages
  • A term paper; the main component of your grade
  • Tournament (12/17): nothing due
  • Oral presentation

Due at beginning of classes

Todd Hester

slide-15
SLIDE 15

Final Project info

  • All writing is individual!

Todd Hester

slide-16
SLIDE 16

Final Project info

  • All writing is individual!
  • Two hard copies and one electronic copy

Todd Hester

slide-17
SLIDE 17

Final Project info

  • All writing is individual!
  • Two hard copies and one electronic copy
  • Due at beginning of class

Todd Hester

slide-18
SLIDE 18

Final Project info

  • All writing is individual!
  • Two hard copies and one electronic copy
  • Due at beginning of class
  • One idea:

Re-implement an idea from one of the readings

Todd Hester

slide-19
SLIDE 19

Final Project info

  • All writing is individual!
  • Two hard copies and one electronic copy
  • Due at beginning of class
  • One idea:

Re-implement an idea from one of the readings

  • Be careful with machine learning

Todd Hester

slide-20
SLIDE 20

Final Project info

  • All writing is individual!
  • Two hard copies and one electronic copy
  • Due at beginning of class
  • One idea:

Re-implement an idea from one of the readings

  • Be careful with machine learning
  • Example final report on website

Todd Hester

slide-21
SLIDE 21

Overview of the Readings

  • Darwin: genetic programming approach

Todd Hester

slide-22
SLIDE 22

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection

Todd Hester

slide-23
SLIDE 23

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection
  • Riley et al: Coach competition, extracting models

Todd Hester

slide-24
SLIDE 24

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection
  • Riley et al: Coach competition, extracting models
  • Kuhlmann et al: Learning for coaching

Todd Hester

slide-25
SLIDE 25

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection
  • Riley et al: Coach competition, extracting models
  • Kuhlmann et al: Learning for coaching
  • Withopf and Riedmiller: Reinforcement learning

Todd Hester

slide-26
SLIDE 26

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection
  • Riley et al: Coach competition, extracting models
  • Kuhlmann et al: Learning for coaching
  • Withopf and Riedmiller: Reinforcement learning
  • MacAlpine et al: UT Austin Villa 2011

Todd Hester

slide-27
SLIDE 27

Overview of the Readings

  • Darwin: genetic programming approach
  • Stone and McAllester: Architecture for action selection
  • Riley et al: Coach competition, extracting models
  • Kuhlmann et al: Learning for coaching
  • Withopf and Riedmiller: Reinforcement learning
  • MacAlpine et al: UT Austin Villa 2011
  • Barrett et al: SPL Kicking strategy

Todd Hester

slide-28
SLIDE 28

Evolutionary Computation

  • Motivated by biological evolution: GA, GP

Todd Hester

slide-29
SLIDE 29

Evolutionary Computation

  • Motivated by biological evolution: GA, GP
  • Search through a space

Todd Hester

slide-30
SLIDE 30

Evolutionary Computation

  • Motivated by biological evolution: GA, GP
  • Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

Todd Hester

slide-31
SLIDE 31

Evolutionary Computation

  • Motivated by biological evolution: GA, GP
  • Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

  • Randomized, parallel hill-climbing through space

Todd Hester

slide-32
SLIDE 32

Evolutionary Computation

  • Motivated by biological evolution: GA, GP
  • Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

  • Randomized, parallel hill-climbing through space
  • Learning is an optimization problem (fitness)

Todd Hester

slide-33
SLIDE 33

Evolutionary Computation

  • Motivated by biological evolution: GA, GP
  • Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

  • Randomized, parallel hill-climbing through space
  • Learning is an optimization problem (fitness)

Some slides from Machine Learning [Mitchell, 1997]

Todd Hester

slide-34
SLIDE 34

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)

Todd Hester

slide-35
SLIDE 35

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction

Todd Hester

slide-36
SLIDE 36

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction
  • Evolves whole teams — lexicographic fitness function

Todd Hester

slide-37
SLIDE 37

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction
  • Evolves whole teams — lexicographic fitness function
  • Evolved on huge (at the time) hypercube

Todd Hester

slide-38
SLIDE 38

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction
  • Evolves whole teams — lexicographic fitness function
  • Evolved on huge (at the time) hypercube
  • Lots of spinning, but figured out dribbling, offsides

Todd Hester

slide-39
SLIDE 39

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction
  • Evolves whole teams — lexicographic fitness function
  • Evolved on huge (at the time) hypercube
  • Lots of spinning, but figured out dribbling, offsides
  • 1-1-1 record. Tied a good team, but didn’t advance

Todd Hester

slide-40
SLIDE 40

Darwin United

  • More ambitious follow-up to Luke, 97 (made 2nd round)
  • Motivated in part by Peter’s detailed team construction
  • Evolves whole teams — lexicographic fitness function
  • Evolved on huge (at the time) hypercube
  • Lots of spinning, but figured out dribbling, offsides
  • 1-1-1 record. Tied a good team, but didn’t advance
  • Success of the method, but not pursued

Todd Hester

slide-41
SLIDE 41

Architecture for Action Selection

  • (other slides, video)

Todd Hester

slide-42
SLIDE 42

Architecture for Action Selection

  • (other slides, video)
  • downsides

Todd Hester

slide-43
SLIDE 43

Architecture for Action Selection

  • (other slides, video)
  • downsides
  • Keepaway

Todd Hester

slide-44
SLIDE 44

Coaching

  • Learn best strategy to play a fixed team

Todd Hester

slide-45
SLIDE 45

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency

Todd Hester

slide-46
SLIDE 46

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations

Todd Hester

slide-47
SLIDE 47

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations
  • Learn when successful teams passed/kicked

Todd Hester

slide-48
SLIDE 48

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations
  • Learn when successful teams passed/kicked
  • Learn when opponent will pass and try to block

Todd Hester

slide-49
SLIDE 49

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations
  • Learn when successful teams passed/kicked
  • Learn when opponent will pass and try to block
  • What if players switch roles?

Todd Hester

slide-50
SLIDE 50

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations
  • Learn when successful teams passed/kicked
  • Learn when opponent will pass and try to block
  • What if players switch roles?
  • Why just imitate another team?

Todd Hester

slide-51
SLIDE 51

Coaching

  • Learn best strategy to play a fixed team
  • Give high level advice to players at low frequency
  • Focus on learning formations
  • Learn when successful teams passed/kicked
  • Learn when opponent will pass and try to block
  • What if players switch roles?
  • Why just imitate another team?
  • Other slides

Todd Hester

slide-52
SLIDE 52

Reinforcement Learning

  • RL Slides

Todd Hester

slide-53
SLIDE 53

Reinforcement Learning

  • RL Slides
  • Extend to grid soccer

Todd Hester

slide-54
SLIDE 54

Reinforcement Learning

  • RL Slides
  • Extend to grid soccer
  • Large state space, joint actions

Todd Hester

slide-55
SLIDE 55

Reinforcement Learning

  • RL Slides
  • Extend to grid soccer
  • Large state space, joint actions
  • Address this with state aliasing, options

Todd Hester

slide-56
SLIDE 56

Reinforcement Learning

  • RL Slides
  • Extend to grid soccer
  • Large state space, joint actions
  • Address this with state aliasing, options
  • Successfully learn the task, use for some of team behavior

Todd Hester

slide-57
SLIDE 57

Reinforcement Learning

  • RL Slides
  • Extend to grid soccer
  • Large state space, joint actions
  • Address this with state aliasing, options
  • Successfully learn the task, use for some of team behavior
  • However, takes 12 million actions to learn

Todd Hester

slide-58
SLIDE 58

UT Austin Villa 2011

  • Other slides

Todd Hester

slide-59
SLIDE 59

UT Austin Villa 2011

  • Other slides
  • Why not use CMA-ES on role positions as well?

Todd Hester

slide-60
SLIDE 60

UT Austin Villa 2011

  • Other slides
  • Why not use CMA-ES on role positions as well?
  • Changes for 2012?

Todd Hester

slide-61
SLIDE 61

Kicking Under Uncertainty

  • Previous SPL approach: always rotate to kick at goal

Todd Hester

slide-62
SLIDE 62

Kicking Under Uncertainty

  • Previous SPL approach: always rotate to kick at goal
  • Kick engine to kick at various distances/headings

Todd Hester

slide-63
SLIDE 63

Kicking Under Uncertainty

  • Previous SPL approach: always rotate to kick at goal
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location

Todd Hester

slide-64
SLIDE 64

Kicking Under Uncertainty

  • Previous SPL approach: always rotate to kick at goal
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location
  • Select first kick that moves ball up field

Todd Hester

slide-65
SLIDE 65

Kicking Under Uncertainty

  • Previous SPL approach: always rotate to kick at goal
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location
  • Select first kick that moves ball up field
  • Figure

Todd Hester

slide-66
SLIDE 66

Kicking Under Uncertainty

  • Previous SPL approach: always rotate to kick at goal
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location
  • Select first kick that moves ball up field
  • Figure
  • Emphasis on quickness

Todd Hester

slide-67
SLIDE 67

Kicking Under Uncertainty

  • Previous SPL approach: always rotate to kick at goal
  • Kick engine to kick at various distances/headings
  • Adjust to seen ball location
  • Select first kick that moves ball up field
  • Figure
  • Emphasis on quickness
  • Now: Better model of opponents -> Know if we have more

time

Todd Hester

slide-68
SLIDE 68

Learning Keepaway

KEEPAWAY SLIDES

Todd Hester

slide-69
SLIDE 69

Learning Commentary

  • David Chen and Ray Mooney

Todd Hester

slide-70
SLIDE 70

Coordination Graphs

  • n agents, each choose an action Ai

Todd Hester

slide-71
SLIDE 71

Coordination Graphs

  • n agents, each choose an action Ai
  • A = A1 × . . . × An

Todd Hester

slide-72
SLIDE 72

Coordination Graphs

  • n agents, each choose an action Ai
  • A = A1 × . . . × An
  • Ri(A) → IR

Todd Hester

slide-73
SLIDE 73

Coordination Graphs

  • n agents, each choose an action Ai
  • A = A1 × . . . × An
  • Ri(A) → IR
  • Coordination problem: R1 = . . . = Rn = R

Todd Hester

slide-74
SLIDE 74

Coordination Graphs

  • n agents, each choose an action Ai
  • A = A1 × . . . × An
  • Ri(A) → IR
  • Coordination problem: R1 = . . . = Rn = R
  • Nash equilibrium: no agent could do better given what
  • thers are doing.

Todd Hester

slide-75
SLIDE 75

Coordination Graphs

  • n agents, each choose an action Ai
  • A = A1 × . . . × An
  • Ri(A) → IR
  • Coordination problem: R1 = . . . = Rn = R
  • Nash equilibrium: no agent could do better given what
  • thers are doing.
  • May be more than one (chicken)

Todd Hester

slide-76
SLIDE 76

Example from the paper

  • Understand the rule syntax

Todd Hester

slide-77
SLIDE 77

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph

Todd Hester

slide-78
SLIDE 78

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph
  • First eliminate rules based on context

Todd Hester

slide-79
SLIDE 79

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph
  • First eliminate rules based on context
  • What does it mean for G3 to collect all relevant rules?

Todd Hester

slide-80
SLIDE 80

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph
  • First eliminate rules based on context
  • What does it mean for G3 to collect all relevant rules?
  • What does it mean for G3 to maximize over all actions of

a1 and a2?

Todd Hester

slide-81
SLIDE 81

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph
  • First eliminate rules based on context
  • What does it mean for G3 to collect all relevant rules?
  • What does it mean for G3 to maximize over all actions of

a1 and a2?

  • How are the results propagated back?

Todd Hester

slide-82
SLIDE 82

Example from the paper

  • Understand the rule syntax
  • Form the coordination graph
  • First eliminate rules based on context
  • What does it mean for G3 to collect all relevant rules?
  • What does it mean for G3 to maximize over all actions of

a1 and a2?

  • How are the results propagated back?
  • Let’s try again with G1 eliminated first

Todd Hester

slide-83
SLIDE 83

Application to soccer

  • Make the world discrete by assigning roles, using high-

level predicates

Todd Hester

slide-84
SLIDE 84

Application to soccer

  • Make the world discrete by assigning roles, using high-

level predicates

  • Assume global state information

Todd Hester

slide-85
SLIDE 85

Application to soccer

  • Make the world discrete by assigning roles, using high-

level predicates

  • Assume global state information
  • Finds pass sequences and starts players moving ahead of

time.

Todd Hester

slide-86
SLIDE 86

Application to soccer

  • Make the world discrete by assigning roles, using high-

level predicates

  • Assume global state information
  • Finds pass sequences and starts players moving ahead of

time.

  • Note the results: with and without coordination.

Todd Hester

slide-87
SLIDE 87

Reactive Deliberation

  • A hybrid approach
  • Executor: carry out reactive behaviors
  • Deliberator:

evaluate possible high-level schema with parameters; generate bids

  • Deliberator takes time, but something keeps happening

always.

  • In effect: deliberator commits to schema for some time

Todd Hester