Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, - - PowerPoint PPT Presentation

control of a quadrotor with reinforcement learning
SMART_READER_LITE
LIVE PREVIEW

Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, - - PowerPoint PPT Presentation

Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter Robotic Systems Lab, ETH Zurich Presented by Nicole McNabb University of Waterloo June 27, 2018 1 / 15 Overview Introduction 1


slide-1
SLIDE 1

Control of a Quadrotor with Reinforcement Learning

Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter

Robotic Systems Lab, ETH Zurich

Presented by Nicole McNabb University of Waterloo

June 27, 2018

1 / 15

slide-2
SLIDE 2

Overview

1

Introduction

2

The Method

3

Empirical Results

4

Summary and Future Work

2 / 15

slide-3
SLIDE 3

Introduction

What is a quadrotor?

Figure: Quadrotor [1]

3 / 15

slide-4
SLIDE 4

Introduction

What is a quadrotor?

Figure: Quadrotor [1]

High-level goal: Train the quadrotor to perform tasks with varying initializations A policy optimization problem.

4 / 15

slide-5
SLIDE 5

Introduction

Related Approaches

Deep Deterministic Policy Gradient (DDPG) Actor-critic architecture Off-policy, model-free Deterministic Insufficient exploration Very slow (if any) convergence Trust Region Policy Optimization (TRPO) Actor-critic architecture Off-policy, model-free Stochastic Computationally intensive Slow, unreliable convergence

5 / 15

slide-6
SLIDE 6

Introduction

A New Approach

Goal: A deterministic model with Fast and stable convergence Model-free training Extensive exploration Solution: A method combining the actor-critic architecture with an on-policy deterministic policy gradient algorithm and a new exploration strategy.

6 / 15

slide-7
SLIDE 7

The Method

Setup

Continuous State-Action Space State Space 18-D states, model: Orientation (or rotation) Position Linear velocity of system Angular velocity of system Action Space 4-D actions, dictate rotor thrust for each rotor

7 / 15

slide-8
SLIDE 8

The Method

Exploration

Figure: Exploration Strategy [2]

8 / 15

slide-9
SLIDE 9

The Method

Network Training

Figure: Value Network [2]

Value function training: Approximate with Monte-Carlo samples obtained from current trajectory

Figure: Policy Network [2]

Policy optimization: Same idea as TRPO, replacing KL-divergence with Mahalanobis metric

9 / 15

slide-10
SLIDE 10

The Method

Learning Algorithm

Algorithm 1 Policy optimization

1: Input: Initial value function approximation, initial policy 2: for j = 1,2,. . . do 3:

Perform exploration, take action

4:

Compute MC estimates from current trajectory

5:

Do approximate value function update

6:

Do policy gradient update

7: end for

10 / 15

slide-11
SLIDE 11

Empirical Results

Empirical Results

Training done in simulation Testing on two main tasks done on a real quadrotor

11 / 15

slide-12
SLIDE 12

Summary and Future Work

Summary

Primary contributions: A new deterministic, model-free neural network policy for training a quadrotor Stable and reliable performance on hard tasks, even under harsh initial conditions

12 / 15

slide-13
SLIDE 13

Summary and Future Work

Future Research

Also compare model against PPO Introducing more accurate model of the system into simulation Train an RNN to adapt to model errors automatically

13 / 15

slide-14
SLIDE 14

Summary and Future Work

References

https://www.seeedstudio.com/Crazyflie-2.0-p-2103.html Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter Control of a Quadrotor with Reinforcement Learning IEEE Robotics and Automation Letters, June 2017.

14 / 15

slide-15
SLIDE 15

Summary and Future Work

Questions?

15 / 15