agent for ms pac man vs ghost team competition
play

Agent for Ms. Pac-Man vs. Ghost Team Competition , - PowerPoint PPT Presentation

Agent for Ms. Pac-Man vs. Ghost Team Competition , 2012030142 513 - Target Try to maximize your score by eating as many pills/power pills/ghosts as you can


  1. Agent for Ms. Pac-Man vs. Ghost Team Competition Πλατανιώτης Στέργιος, 2012030142 ΠΛΗ513 - Αυτόνομοι Πράκτορες

  2. Target • Try to maximize your score by eating as many pills/power pills/ghosts as you can • Available moves are UP/DOWN/LEFT/RIGHT/NEUTRAL • Partial observability(PO): Ms. Pac-Man can only see in a vertical and horizontal line • This yields many problems as it is more likely to get stuck in local maxima states when you can see no food or ghosts around you • Also, it is more difficult because the ghosts have internal communication and can get you trapped very easily without you even realizing it

  3. Q-learning • Implementation of the reinforcement learning Q- learning algorithm • A table with a value for every pair of (state, move) • After every round we update the entry for the previous (state, move) • Takes as parameters a:learning rate and γ :discount factor • The values are proven to converge to an optimal policy for 0 <=α<= 1 and 0 <=γ<= 1

  4. Move selection • ε -soft implementation: in each round we choose a random move with a small probability ε, this is used only during learning to encourage exploration • Otherwise, we choose our move greedily by choosing the move with the highest Q value • If multiple moves have the same best value, we can either keep our old move, if it is still optimal, or just choose a random from the optimal ones

  5. State generalization • There is a huge amount of different states in Ms. Pac- Man game • We generalize the states based on specific features • We check: • If there is a wall up, down, left and right of Ms. Pac-Man • If there is an intimidating ghost approaching her • And finally, the direction of the nearest food or exit if we are being chased • This way we decrease the number of possible states dramatically and make the learning process faster

  6. Reward Function • Reward function gives a positive value if Ms. Pac-Man did something good or negative if she did something bad • We encourage her to eat pills/power/pills/ghost (+20) • We give a penalty for being eaten by a ghost (-350), for hitting a wall (-100) , for doing an opposite move (-6) and for every step she takes (-2.5) to make her find a quickest optimal path

  7. Results • Training for thousands of games using a decaying ε probability starting at 0.1 Score Average MAX MIN StarterGhostsComm 3671 13200 810 StarterGhosts 3650 14860 670

  8. Future Work • In the future we can make use of a genetic algorithm to find an optimal pair of parameters α and γ • Also there can be implemented a Neural Network to better train our agent, this is also known as deep learning and is popular for its results

  9. Thank you for your time!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend