Autonomous Agents
Assault game - A3C agent 2016030010-Kosmas Pinitas
Technical University of Crete
February 23, 2020
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 1 / 16
Autonomous Agents Assault game - A3C agent 2016030010-Kosmas - - PowerPoint PPT Presentation
Autonomous Agents Assault game - A3C agent 2016030010-Kosmas Pinitas Technical University of Crete February 23, 2020 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 1 / 16 Outline Background
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 1 / 16
◮ Environment ◮ MDPs ◮ Q Learning ◮ Policy Gradients
◮ Archtecture ◮ Results
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 2 / 16
◮ do nothing, shoot, move left, move right, shoot left, shoot right 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 3 / 16
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 4 / 16
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 5 / 16
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 6 / 16
◮ ∇θ log π(as) tells us a direction in which logged probability of taking
◮ A(s, a) is a scalar value and tells us what’s the advantage of taking this
◮ If we combine the above terms , we will see that the likelihood of
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 7 / 16
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 8 / 16
◮ Multiple agents in parallel and each one has its own network
◮ This agents learn only from their respective environments ◮ As each agent gains more knowledge, it contributes to the total
◮ A(s, a) = Q(s, a) − V (s) = r + γV (s′) − V (s) ◮ Expresses how good it is to take an action α in a state s compared to
◮ Combines the best parts of Policy-Gradient and Value-Iteration
◮ Predicts both the value function V (s) as well as the optimal policy
◮ Agent uses the value of the Value function (Critic) to update the
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 9 / 16
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 10 / 16
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 11 / 16
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 12 / 16
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 13 / 16
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 14 / 16
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 15 / 16
◮ https://jaromiru.com/2017/02/16/lets-make-an-a3c-theory/ ◮ https://www.geeksforgeeks.org/asynchronous-advantage-actor-critic-
2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 16 / 16