Agent for Ms. Pac-Man vs. Ghost Team Competition , - - PowerPoint PPT Presentation

agent for ms pac man vs ghost team competition
SMART_READER_LITE
LIVE PREVIEW

Agent for Ms. Pac-Man vs. Ghost Team Competition , - - PowerPoint PPT Presentation

Agent for Ms. Pac-Man vs. Ghost Team Competition , 2012030142 513 - Target Try to maximize your score by eating as many pills/power pills/ghosts as you can


slide-1
SLIDE 1

Agent for Ms. Pac-Man vs. Ghost Team Competition

Πλατανιώτης Στέργιος, 2012030142

ΠΛΗ513 - Αυτόνομοι Πράκτορες

slide-2
SLIDE 2

Target

  • Try to maximize your score by eating as many

pills/power pills/ghosts as you can

  • Available moves are UP/DOWN/LEFT/RIGHT/NEUTRAL
  • Partial observability(PO): Ms. Pac-Man can only see in

a vertical and horizontal line

  • This yields many problems as it is more likely to get

stuck in local maxima states when you can see no food

  • r ghosts around you
  • Also, it is more difficult because the ghosts have

internal communication and can get you trapped very easily without you even realizing it

slide-3
SLIDE 3

Q-learning

  • Implementation of the reinforcement learning Q-

learning algorithm

  • A table with a value for every pair of (state, move)
  • After every round we update the entry for the

previous (state, move)

  • Takes as parameters a:learning rate and γ:discount

factor

  • The values are proven to converge to an optimal

policy for 0<=α<=1 and 0<=γ<=1

slide-4
SLIDE 4

Move selection

  • ε-soft implementation: in each round we choose a

random move with a small probability ε, this is used only during learning to encourage exploration

  • Otherwise, we choose our move greedily by

choosing the move with the highest Q value

  • If multiple moves have the same best value, we can

either keep our old move, if it is still optimal, or just choose a random from the optimal ones

slide-5
SLIDE 5

State generalization

  • There is a huge amount of different states in Ms. Pac-

Man game

  • We generalize the states based on specific features
  • We check:
  • If there is a wall up, down, left and right of Ms. Pac-Man
  • If there is an intimidating ghost approaching her
  • And finally, the direction of the nearest food or exit if we are

being chased

  • This way we decrease the number of possible states

dramatically and make the learning process faster

slide-6
SLIDE 6

Reward Function

  • Reward function gives a positive value if Ms. Pac-Man

did something good or negative if she did something bad

  • We encourage her to eat pills/power/pills/ghost (+20)
  • We give a penalty for being eaten by a ghost (-350),

for hitting a wall (-100) , for doing an opposite move (-6) and for every step she takes (-2.5) to make her find a quickest optimal path

slide-7
SLIDE 7

Results

Score Average MAX MIN StarterGhostsComm 3671 13200 810 StarterGhosts 3650 14860 670

  • Training for thousands of games using a

decaying ε probability starting at 0.1

slide-8
SLIDE 8

Future Work

  • In the future we can make use of a genetic

algorithm to find an

  • ptimal

pair

  • f

parameters α and γ

  • Also there can be implemented a Neural

Network to better train our agent, this is also known as deep learning and is popular for its results

slide-9
SLIDE 9

Thank you for your time!