CS344M Autonomous Multiagent Systems Patrick MacAlpine Department - - PowerPoint PPT Presentation

cs344m autonomous multiagent systems
SMART_READER_LITE
LIVE PREVIEW

CS344M Autonomous Multiagent Systems Patrick MacAlpine Department - - PowerPoint PPT Presentation

CS344M Autonomous Multiagent Systems Patrick MacAlpine Department of Computer Science The University of Texas at Austin Good Afternoon, Colleagues Are there any questions? Patrick MacAlpine Good Afternoon, Colleagues Are there any


slide-1
SLIDE 1

CS344M Autonomous Multiagent Systems

Patrick MacAlpine Department of Computer Science The University of Texas at Austin

slide-2
SLIDE 2

Good Afternoon, Colleagues

Are there any questions?

Patrick MacAlpine

slide-3
SLIDE 3

Good Afternoon, Colleagues

Are there any questions?

  • How is SMDP different from MDP?
  • Advantages of tile coding vs other approaches?
  • What about SPAR (Strategic Position by Attraction and

Repulsion)?

Patrick MacAlpine

slide-4
SLIDE 4

Logistics

  • Progress reports due at beginning of class today

Patrick MacAlpine

slide-5
SLIDE 5

Logistics

  • Progress reports due at beginning of class today
  • Progress report peer review due next Thursday – reports to

review will be sent out shortly

Patrick MacAlpine

slide-6
SLIDE 6

Logistics

  • Progress reports due at beginning of class today
  • Progress report peer review due next Thursday – reports to

review will be sent out shortly

  • Prize for winning class tournament

Patrick MacAlpine

slide-7
SLIDE 7

Logistics

  • Progress reports due at beginning of class today
  • Progress report peer review due next Thursday – reports to

review will be sent out shortly

  • Prize for winning class tournament
  • 10+ students went to Undergraduate Writing Center :)

Patrick MacAlpine

slide-8
SLIDE 8

Logistics

  • Progress reports due at beginning of class today
  • Progress report peer review due next Thursday – reports to

review will be sent out shortly

  • Prize for winning class tournament
  • 10+ students went to Undergraduate Writing Center :)

Patrick MacAlpine

slide-9
SLIDE 9

Reinforcement Learning

Image from wikipedia Patrick MacAlpine

slide-10
SLIDE 10

Reinforcement Learning

Image from wikipedia

Markov Decision Process (MDP)

Patrick MacAlpine

slide-11
SLIDE 11

Reinforcement Learning

Image from wikipedia

Markov Decision Process (MDP) Important questions:

  • What is your state space?

Patrick MacAlpine

slide-12
SLIDE 12

Reinforcement Learning

Image from wikipedia

Markov Decision Process (MDP) Important questions:

  • What is your state space?
  • What is your action space?

Patrick MacAlpine

slide-13
SLIDE 13

Reinforcement Learning

Image from wikipedia

Markov Decision Process (MDP) Important questions:

  • What is your state space?
  • What is your action space?
  • What is your reward function?

Patrick MacAlpine

slide-14
SLIDE 14

SARSA (st,at,rt,st+1,at+1)

Image from wikipedia Patrick MacAlpine

slide-15
SLIDE 15

SARSA (st,at,rt,st+1,at+1)

Image from wikipedia

Learn Q table (value function) for state - action pairs Q(st, at) ← Q(st, at) + α[rt+1 + γQ(st+1, at+1) − Q(st, at)]

Patrick MacAlpine

slide-16
SLIDE 16

Keepaway

  • Keepaway videos

Patrick MacAlpine

slide-17
SLIDE 17

Keepaway

  • Keepaway videos
  • Slides

Patrick MacAlpine

slide-18
SLIDE 18

Keepaway Discussion

  • Could you use learned policies for full soccer game?

Patrick MacAlpine

slide-19
SLIDE 19

Keepaway Discussion

  • Could you use learned policies for full soccer game?
  • Could we apply competitve co-evolution?

Patrick MacAlpine

slide-20
SLIDE 20

Keepaway Discussion

  • Could you use learned policies for full soccer game?
  • Could we apply competitve co-evolution?
  • Other sub-tasks in soccer that might be learnable?

Patrick MacAlpine

slide-21
SLIDE 21

Keepaway Discussion

  • Could you use learned policies for full soccer game?
  • Could we apply competitve co-evolution?
  • Other sub-tasks in soccer that might be learnable?

Patrick MacAlpine

slide-22
SLIDE 22

Half Field Offense

<Slides>

Patrick MacAlpine

slide-23
SLIDE 23

Policy Search vs Value Function Based RL

Policy Search Value Function Based Learn Policy parameters Value function Good For Tuning parameter values Learning discrete actions Evaluation Fitness function Reward function Algorithms CMA-ES, genetic algorithms, etc. SARSA, Q-learning, etc.

Patrick MacAlpine