control of a quadrotor with reinforcement learning
play

Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, - PowerPoint PPT Presentation

Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter Robotic Systems Lab, ETH Zurich Presented by Nicole McNabb University of Waterloo June 27, 2018 1 / 15 Overview Introduction 1


  1. Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter Robotic Systems Lab, ETH Zurich Presented by Nicole McNabb University of Waterloo June 27, 2018 1 / 15

  2. Overview Introduction 1 The Method 2 Empirical Results 3 Summary and Future Work 4 2 / 15

  3. Introduction What is a quadrotor? Figure: Quadrotor [1] 3 / 15

  4. Introduction What is a quadrotor? High-level goal: Train the quadrotor to perform tasks with varying initializations A policy optimization problem. Figure: Quadrotor [1] 4 / 15

  5. Introduction Related Approaches Deep Deterministic Policy Trust Region Policy Optimization Gradient (DDPG) (TRPO) Actor-critic architecture Actor-critic architecture Off-policy, model-free Off-policy, model-free Deterministic Stochastic Insufficient exploration Computationally intensive Very slow (if any) Slow, unreliable convergence convergence 5 / 15

  6. Introduction A New Approach Goal: A deterministic model with Fast and stable convergence Model-free training Extensive exploration Solution: A method combining the actor-critic architecture with an on-policy deterministic policy gradient algorithm and a new exploration strategy. 6 / 15

  7. The Method Setup Continuous State-Action Space State Space 18-D states, model: Orientation (or rotation) Position Linear velocity of system Angular velocity of system Action Space 4-D actions, dictate rotor thrust for each rotor 7 / 15

  8. The Method Exploration Figure: Exploration Strategy [2] 8 / 15

  9. The Method Network Training Figure: Value Network [2] Figure: Policy Network [2] Value function training: Policy optimization: Approximate with Monte-Carlo Same idea as TRPO, replacing samples obtained from current KL-divergence with Mahalanobis trajectory metric 9 / 15

  10. The Method Learning Algorithm Algorithm 1 Policy optimization 1: Input: Initial value function approximation, initial policy 2: for j = 1,2,. . . do Perform exploration, take action 3: Compute MC estimates from current trajectory 4: Do approximate value function update 5: Do policy gradient update 6: 7: end for 10 / 15

  11. Empirical Results Empirical Results Training done in simulation Testing on two main tasks done on a real quadrotor 11 / 15

  12. Summary and Future Work Summary Primary contributions: A new deterministic, model-free neural network policy for training a quadrotor Stable and reliable performance on hard tasks, even under harsh initial conditions 12 / 15

  13. Summary and Future Work Future Research Also compare model against PPO Introducing more accurate model of the system into simulation Train an RNN to adapt to model errors automatically 13 / 15

  14. Summary and Future Work References https://www.seeedstudio.com/Crazyflie-2.0-p-2103.html Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter Control of a Quadrotor with Reinforcement Learning IEEE Robotics and Automation Letters , June 2017. 14 / 15

  15. Summary and Future Work Questions? 15 / 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend