CS344M Autonomous Multiagent Systems Todd Hester Department or - - PowerPoint PPT Presentation
CS344M Autonomous Multiagent Systems Todd Hester Department or - - PowerPoint PPT Presentation
CS344M Autonomous Multiagent Systems Todd Hester Department or Computer Science The University of Texas at Austin Good Afternoon, Colleagues Are there any questions? Todd Hester Logistics Readings Todd Hester Logistics Readings
Good Afternoon, Colleagues
Are there any questions?
Todd Hester
Logistics
- Readings
Todd Hester
Logistics
- Readings
– Specify which papers you read!
Todd Hester
Logistics
- Readings
– Specify which papers you read! – 2 case studies and 1 TDP
Todd Hester
Logistics
- Readings
– Specify which papers you read! – 2 case studies and 1 TDP
- How to read a research paper
Todd Hester
Logistics
- Readings
– Specify which papers you read! – 2 case studies and 1 TDP
- How to read a research paper
– Some have too few details...
Todd Hester
Logistics
- Readings
– Specify which papers you read! – 2 case studies and 1 TDP
- How to read a research paper
– Some have too few details... – Others have too many.
Todd Hester
Logistics
- Readings
– Specify which papers you read! – 2 case studies and 1 TDP
- How to read a research paper
– Some have too few details... – Others have too many.
- Next week’s readings posted
Todd Hester
Logistics
- Readings
– Specify which papers you read! – 2 case studies and 1 TDP
- How to read a research paper
– Some have too few details... – Others have too many.
- Next week’s readings posted
- Use the undergrad writing center!
– Friday afternoon workshops (3 p.m.)
Todd Hester
Overview of the Readings
- Darwin: genetic programming approach
Todd Hester
Overview of the Readings
- Darwin: genetic programming approach
- Stone and McAllester: Architecture for action selection
Todd Hester
Overview of the Readings
- Darwin: genetic programming approach
- Stone and McAllester: Architecture for action selection
- Riley et al: Coach competition, extracting models
Todd Hester
Overview of the Readings
- Darwin: genetic programming approach
- Stone and McAllester: Architecture for action selection
- Riley et al: Coach competition, extracting models
- Kuhlmann et al: Learning for coaching
Todd Hester
Overview of the Readings
- Darwin: genetic programming approach
- Stone and McAllester: Architecture for action selection
- Riley et al: Coach competition, extracting models
- Kuhlmann et al: Learning for coaching
- Withopf and Riedmiller: Reinforcement learning
Todd Hester
Overview of the Readings
- Darwin: genetic programming approach
- Stone and McAllester: Architecture for action selection
- Riley et al: Coach competition, extracting models
- Kuhlmann et al: Learning for coaching
- Withopf and Riedmiller: Reinforcement learning
- MacAlpine et al: UT Austin Villa 2011
Todd Hester
Overview of the Readings
- Darwin: genetic programming approach
- Stone and McAllester: Architecture for action selection
- Riley et al: Coach competition, extracting models
- Kuhlmann et al: Learning for coaching
- Withopf and Riedmiller: Reinforcement learning
- MacAlpine et al: UT Austin Villa 2011
- Barrett et al: SPL Kicking strategy
Todd Hester
Evolutionary Computation
- Motivated by biological evolution: GA, GP
Todd Hester
Evolutionary Computation
- Motivated by biological evolution: GA, GP
- Search through a space
Todd Hester
Evolutionary Computation
- Motivated by biological evolution: GA, GP
- Search through a space
− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space
Todd Hester
Evolutionary Computation
- Motivated by biological evolution: GA, GP
- Search through a space
− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space
- Randomized, parallel hill-climbing through space
Todd Hester
Evolutionary Computation
- Motivated by biological evolution: GA, GP
- Search through a space
− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space
- Randomized, parallel hill-climbing through space
- Learning is an optimization problem (fitness)
Todd Hester
Evolutionary Computation
- Motivated by biological evolution: GA, GP
- Search through a space
− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space
- Randomized, parallel hill-climbing through space
- Learning is an optimization problem (fitness)
Some slides from Machine Learning [Mitchell, 1997]
Todd Hester
Darwin United
- More ambitious follow-up to Luke, 97 (made 2nd round)
Todd Hester
Darwin United
- More ambitious follow-up to Luke, 97 (made 2nd round)
- Motivated in part by Peter’s detailed team construction
Todd Hester
Darwin United
- More ambitious follow-up to Luke, 97 (made 2nd round)
- Motivated in part by Peter’s detailed team construction
- Evolves whole teams — lexicographic fitness function
Todd Hester
Darwin United
- More ambitious follow-up to Luke, 97 (made 2nd round)
- Motivated in part by Peter’s detailed team construction
- Evolves whole teams — lexicographic fitness function
- Evolved on huge (at the time) hypercube
Todd Hester
Darwin United
- More ambitious follow-up to Luke, 97 (made 2nd round)
- Motivated in part by Peter’s detailed team construction
- Evolves whole teams — lexicographic fitness function
- Evolved on huge (at the time) hypercube
- Lots of spinning, but figured out dribbling, offsides
Todd Hester
Darwin United
- More ambitious follow-up to Luke, 97 (made 2nd round)
- Motivated in part by Peter’s detailed team construction
- Evolves whole teams — lexicographic fitness function
- Evolved on huge (at the time) hypercube
- Lots of spinning, but figured out dribbling, offsides
- 1-1-1 record. Tied a good team, but didn’t advance
Todd Hester
Darwin United
- More ambitious follow-up to Luke, 97 (made 2nd round)
- Motivated in part by Peter’s detailed team construction
- Evolves whole teams — lexicographic fitness function
- Evolved on huge (at the time) hypercube
- Lots of spinning, but figured out dribbling, offsides
- 1-1-1 record. Tied a good team, but didn’t advance
- Success of the method, but not pursued
Todd Hester
Architecture for Action Selection
- (other slides, video)
Todd Hester
Architecture for Action Selection
- (other slides, video)
- downsides
Todd Hester
Architecture for Action Selection
- (other slides, video)
- downsides
- Keepaway
Todd Hester
Coaching
- Learn best strategy to play a fixed team
Todd Hester
Coaching
- Learn best strategy to play a fixed team
- Give high level advice to players at low frequency
Todd Hester
Coaching
- Learn best strategy to play a fixed team
- Give high level advice to players at low frequency
- Focus on learning formations
Todd Hester
Coaching
- Learn best strategy to play a fixed team
- Give high level advice to players at low frequency
- Focus on learning formations
- Learn when successful teams passed/kicked
Todd Hester
Coaching
- Learn best strategy to play a fixed team
- Give high level advice to players at low frequency
- Focus on learning formations
- Learn when successful teams passed/kicked
- Learn when opponent will pass and try to block
Todd Hester
Coaching
- Learn best strategy to play a fixed team
- Give high level advice to players at low frequency
- Focus on learning formations
- Learn when successful teams passed/kicked
- Learn when opponent will pass and try to block
- What if players switch roles?
Todd Hester
Coaching
- Learn best strategy to play a fixed team
- Give high level advice to players at low frequency
- Focus on learning formations
- Learn when successful teams passed/kicked
- Learn when opponent will pass and try to block
- What if players switch roles?
- Why just imitate another team?
Todd Hester
Coaching
- Learn best strategy to play a fixed team
- Give high level advice to players at low frequency
- Focus on learning formations
- Learn when successful teams passed/kicked
- Learn when opponent will pass and try to block
- What if players switch roles?
- Why just imitate another team?
- Other slides
Todd Hester
Reinforcement Learning
- RL Slides
Todd Hester
Reinforcement Learning
- RL Slides
- Extend to grid soccer
Todd Hester
Reinforcement Learning
- RL Slides
- Extend to grid soccer
- Large state space, joint actions
Todd Hester
Reinforcement Learning
- RL Slides
- Extend to grid soccer
- Large state space, joint actions
Todd Hester
UT Austin Villa 2011
- Other slides
Todd Hester
UT Austin Villa 2011
- Other slides
- Why not use CMA-ES on role positions as well?
Todd Hester
UT Austin Villa 2011
- Other slides
- Why not use CMA-ES on role positions as well?
- Changes for 2012?
Todd Hester
Kicking Under Uncertainty
- Used by our SPL team
Todd Hester
Kicking Under Uncertainty
- Used by our SPL team
- Kick engine to kick at various distances/headings
Todd Hester
Kicking Under Uncertainty
- Used by our SPL team
- Kick engine to kick at various distances/headings
- Adjust to seen ball location
Todd Hester
Kicking Under Uncertainty
- Used by our SPL team
- Kick engine to kick at various distances/headings
- Adjust to seen ball location
- Select first kick that moves ball up field
Todd Hester
Kicking Under Uncertainty
- Used by our SPL team
- Kick engine to kick at various distances/headings
- Adjust to seen ball location
- Select first kick that moves ball up field
- Figure
Todd Hester
Kicking Under Uncertainty
- Used by our SPL team
- Kick engine to kick at various distances/headings
- Adjust to seen ball location
- Select first kick that moves ball up field
- Figure
- Emphasis on quickness
Todd Hester
Kicking Under Uncertainty
- Used by our SPL team
- Kick engine to kick at various distances/headings
- Adjust to seen ball location
- Select first kick that moves ball up field
- Figure
- Emphasis on quickness
- Now: Better model of opponents -> Know if we have more
time
Todd Hester
Kicking Under Uncertainty
- Used by our SPL team
- Kick engine to kick at various distances/headings
- Adjust to seen ball location
- Select first kick that moves ball up field
- Figure
- Emphasis on quickness
- Now: Better model of opponents -> Know if we have more
time
Todd Hester
Learning Commentary
- David Chen and Ray Mooney
Todd Hester
Coordination Graphs
- n agents, each choose an action Ai
Todd Hester
Coordination Graphs
- n agents, each choose an action Ai
- A = A1 × . . . × An
Todd Hester
Coordination Graphs
- n agents, each choose an action Ai
- A = A1 × . . . × An
- Ri(A) → IR
Todd Hester
Coordination Graphs
- n agents, each choose an action Ai
- A = A1 × . . . × An
- Ri(A) → IR
- Coordination problem: R1 = . . . = Rn = R
Todd Hester
Coordination Graphs
- n agents, each choose an action Ai
- A = A1 × . . . × An
- Ri(A) → IR
- Coordination problem: R1 = . . . = Rn = R
- Nash equilibrium: no agent could do better given what
- thers are doing.
Todd Hester
Coordination Graphs
- n agents, each choose an action Ai
- A = A1 × . . . × An
- Ri(A) → IR
- Coordination problem: R1 = . . . = Rn = R
- Nash equilibrium: no agent could do better given what
- thers are doing.
- May be more than one (chicken)
Todd Hester
Example from the paper
- Understand the rule syntax
Todd Hester
Example from the paper
- Understand the rule syntax
- Form the coordination graph
Todd Hester
Example from the paper
- Understand the rule syntax
- Form the coordination graph
- First eliminate rules based on context
Todd Hester
Example from the paper
- Understand the rule syntax
- Form the coordination graph
- First eliminate rules based on context
- What does it mean for G3 to collect all relevant rules?
Todd Hester
Example from the paper
- Understand the rule syntax
- Form the coordination graph
- First eliminate rules based on context
- What does it mean for G3 to collect all relevant rules?
- What does it mean for G3 to maximize over all actions of
a1 and a2?
Todd Hester
Example from the paper
- Understand the rule syntax
- Form the coordination graph
- First eliminate rules based on context
- What does it mean for G3 to collect all relevant rules?
- What does it mean for G3 to maximize over all actions of
a1 and a2?
- How are the results propagated back?
Todd Hester
Example from the paper
- Understand the rule syntax
- Form the coordination graph
- First eliminate rules based on context
- What does it mean for G3 to collect all relevant rules?
- What does it mean for G3 to maximize over all actions of
a1 and a2?
- How are the results propagated back?
- Let’s try again with G1 eliminated first
Todd Hester
Application to soccer
- Make the world discrete by assigning roles, using high-
level predicates
Todd Hester
Application to soccer
- Make the world discrete by assigning roles, using high-
level predicates
- Assume global state information
Todd Hester
Application to soccer
- Make the world discrete by assigning roles, using high-
level predicates
- Assume global state information
- Finds pass sequences and starts players moving ahead of
time.
Todd Hester
Application to soccer
- Make the world discrete by assigning roles, using high-
level predicates
- Assume global state information
- Finds pass sequences and starts players moving ahead of
time.
- Note the results: with and without coordination.
Todd Hester
Reactive Deliberation
- A hybrid approach
- Executor: carry out reactive behaviors
- Deliberator:
evaluate possible high-level schema with parameters; generate bids
- Deliberator takes time, but something keeps happening
always.
- In effect: deliberator commits to schema for some time
Todd Hester