grid wise control for multi agent reinforcement learning
play

Grid-Wise Control for Multi-Agent Reinforcement Learning in Video - PowerPoint PPT Presentation

Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI Lei Han* 1 , Peng Sun* 1 , Yali Du* 2 , Jiechao Xiong 1 , Qing Wang 1 , Xinghai Sun 1 , Han Liu 3 , Tong Zhang 4 1 Tencent AI Lab, Shenzhen, China 2 University of


  1. Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI Lei Han* 1 , Peng Sun* 1 , Yali Du* 2 , Jiechao Xiong 1 , Qing Wang 1 , Xinghai Sun 1 , Han Liu 3 , Tong Zhang 4 1 Tencent AI Lab, Shenzhen, China 2 University of Technology Sydney, Australia 3 Northwestern University, IL, USA 4 Hong Kong University of Science and Technology, Hong Kong, China * Equal contribution Email: leihan.cs@gmail.com

  2. Introduction q Considered Problem • Multi-agent reinforcement learning (MARL) • Grid-world environment (video game) • Challenge Ø flexibly control an arbitrary number of agents Ø while achieving effective collaboration q Existing MARL Approaches • Decentralized learning Ø IQL, IAC (Tan, 1993; Foerster et al., 2017) • Centralized learning Ø CommNet, BicNet (Sukhbaatar et al., 2016; Peng et al., 2017) • Mixture Ø COMA, QMIX, Mean-Field (Foerster et al., 2017; Rashid et al., 2018; Yang et al., 2018) v Unable/instable to deal with variant agent number

  3. GridNet q Architecture • Encoder • Decoder Ø Inputs are represented as an image-like structure Ø Up-sampling to construct an action map Ø An agent will take the action in the grid it occupies Ø Using conv/pooling layers to generate an embedding

  4. GridNet q Algorithms • Can be integrated with many general RL algorithms Ø Q-learning Ø Actor-critic q Properties • Collaboration is natural Ø Stacked convolutional and/or pooling layers provide a large receptive field Ø Each agent is aware of other agents in its neighborhood • Fast parallel exploration Ø Convolutional parameters are shared by all the agents Ø Once an agent takes a beneficial action during its own exploration, the other agents will acquire the knowledge as well • Transferrable policy Ø The trained policy is easy to be transferred to other settings with a various number of agents

  5. Experiments on Battle Games in StarCraft II q Scenarios • 5Immortals vs. 5Immortals ( 5I ) • 3Immortals+2Zealots vs. 3Immortals+2Zealots ( 3I2Z ) • mixed army battle ( MAB ) with a random number of various Zerg units • including Baneling, Zergling, Roach, Hydralisk and Mutalisk. q Training Strategies • Against handcraft policies: random (Rand) , attack-nearest (AN) , hit-and-run (HR) • Against self historic versions: self-play (SP) q Compared Methods • IQL : independent Q-learning [Tan, 1993] • IAC : independent actor-critic [Foerster et al., 2017] • Central-V : centralized value with decentralized policy [Foerster et al., 2017] • CommNet : communication net [Sukhbaatar et al., 2016] q Video link: https://youtu.be/LTcr01iTgZA

  6. Experiments on Battle Games in StarCraft II • On 5I and 3I2Z • Performance (against each other) • Performance (against handcraft policies)

  7. Experiments on Battle Games in StarCraft II q Learned Tactics • Transferability On 5I and 3I2Z • Directly apply the trained policy to maps with more agents • 10I , 20I , 5I5Z , 10I10Z • Performance On MAB • CommNet and Central-V cannot be applied

  8. Thanks! Poster at Pacific Ballroom #243 Jun 11 th , 6:30 pm

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend