Learning Visual Servoing with Deep Features and Fitted Q-Iteration - - PowerPoint PPT Presentation

learning visual servoing with deep features and fitted q
SMART_READER_LITE
LIVE PREVIEW

Learning Visual Servoing with Deep Features and Fitted Q-Iteration - - PowerPoint PPT Presentation

Learning Visual Servoing with Deep Features and Fitted Q-Iteration Alex X. Lee 1 , 2,1,3 1 , Pieter Abbeel Sergey Levine 1 UC Berkeley, 2 OpenAI, 3 International Computer Science Institute Motivation Deep Neural Networks in Computer Vision


slide-1
SLIDE 1

Learning Visual Servoing with Deep Features and Fitted Q-Iteration

Alex X. Lee

1

, Sergey Levine

1, Pieter Abbeel 2,1,3 1UC Berkeley, 2OpenAI, 3International Computer Science Institute

slide-2
SLIDE 2

Motivation

slide-3
SLIDE 3

Deep Neural Networks in Computer Vision

image classification semantic segmentation

woman animal shaking singer1

  • bject tracking
  • bject detection
bottle : 0.726 person : 0.992 dog : 0.981 car : 0.955 55 55 car : 0.745 .745 horse : 0.991 person : 0.988 person : 0.797 bird : 0.978 bird : 0.972 bird : 0.941 bird : 0.902 person : 0.918 cow : 0.998 cow : 0.995 dog : 0.697 person : 0.961 person : 0.960 person person person : 0.958 person : 0.757 bus : 0.999 person : 0.996 per per per person : 0.995 person : 0.994 person : 0.985 cow : 0.985 cow : 0.979 cow : 0.979 cow : 0.974 cow : 0.892 person : 0.998 car : 0.999 person : 0.929 person : 0.994 person : 0.991 person : 0.988 pers p person : 0.976 person : 0.964 car : 0.997 car : 0.980 person : 0.993 person person person person person : 0.986 0 993 : n 86 n : n : nperson : 0.959

AlexNet

slide-4
SLIDE 4

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

slide-5
SLIDE 5

What is Reinforcement Learning?

Agent Environment state s, reward r actjon u

slide-6
SLIDE 6

Reinforcement Learning Approaches

low sample complexity medium sample complexity high sample complexity relies on a good model challenge for continuous and high-dimensional action spaces policy might be simpler than value or model

value based

Q

Q-value

st ut

model based

environment model

rt+1 st+1

st ut

policy

  • ptimization

π

st ut

Agent Environment

state s, reward r actjon u

model free

π(ut|st) = arg max

u

Q(st, u)

slide-7
SLIDE 7

What is Deep Reinforcement Learning?

value based

Q

Q-value

st ut

model based

environment model

rt+1 st+1 st ut

policy

  • ptimization

π

st

ut

model free

slide-8
SLIDE 8

Examples of Deep Reinforcement Learning

Schulman et al, 2016 (TRPO + GAE) Silver et al, 2014 (DPG) Lillicrap et al, 2015 (DDPG) Levine*, Finn*, et al, 2016 (GPS) Gu*, Holly*, et al, 2016 Sadeghi et al, 2017 (CAD)2RL Tamar et al, 2016 (VIN) Mnih et al, 2015 (DQN) Mnih et al, 2016 (A3C)

slide-9
SLIDE 9

Deep Reinforcement Learning for Robotics

Levine*, Finn*, et al, 2016 (GPS) Gu*, Holly*, et al, 2016 Sadeghi et al, 2017 (CAD)2RL

slide-10
SLIDE 10

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

slide-11
SLIDE 11

Visual Servoing

current

  • bservation

goal

  • bservation
slide-12
SLIDE 12

Examples of Visual Servoing: Manipulation

Source: SeRViCE Lab, UT Dallas

slide-13
SLIDE 13

Examples of Visual Servoing: Surgical Tasks

Source: Kehoe et al. 2016

slide-14
SLIDE 14

Examples of Visual Servoing: Space Docking

Source: NASA

4x

slide-15
SLIDE 15

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

slide-16
SLIDE 16

Learning Visual Servoing with Reinforcement Learning

Agent Environment state s, reward r actjon u current and goal image observation linear and angular velocities distance to desired pose relative to car

slide-17
SLIDE 17

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

slide-18
SLIDE 18

Learning Visual Servoing with Policy Optimization

policy

  • ptimization

π

st ut

goal

  • bservation

current

  • bservation

example executions of trained policy

trained with more than 20000 trajectories!

slide-19
SLIDE 19

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

slide-20
SLIDE 20

Combining Value and Model Based Reinforcement Learning

State-action value based RL:

π(st) = arg max

u

Q(st, u)

slide-21
SLIDE 21

Combining Value and Model Based Reinforcement Learning π(st) = arg min

u −Q(st, u)

−Q(st, u)

State-action value based RL: Visual servoing:

dynamics function

− f(xt, ut)|| π(st) = arg min

u ||x∗ − f(xt, ut)||2

slide-22
SLIDE 22

Servoing with Visual Dynamics Model

current

  • bservation

goal

  • bservation

predicted

  • bservation
slide-23
SLIDE 23

Features from Dilated VGG-16 Convolutional Neural Network

  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
  • F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016.
slide-24
SLIDE 24

Servoing with Visual Dynamics Model

current

  • bservation

goal

  • bservation

predicted

  • bservation
slide-25
SLIDE 25

current feature goal feature predicted feature

π(xt, x∗) = arg min

u ||y∗ − f(yt, ut)||2 w

Servoing with Visual Dynamics Model

−Qw(st, u)

slide-26
SLIDE 26

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

slide-27
SLIDE 27

Feature Dynamics: Multiscale Bilinear Model

slide-28
SLIDE 28

Feature Dynamics: Multiscale Bilinear Model

slide-29
SLIDE 29

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

slide-30
SLIDE 30

Learning Model Based Policy with Fitted Q-Iteration

−Qw(st, u)

π(st) = arg min

u ||y∗ − f(yt, ut)||2 w

slide-31
SLIDE 31

Learning Visual Servoing with Deep Feature Dynamics and FQI

goal

  • bservation

current

  • bservation

example executions of trained policy

trained with only 20 trajectories!

Q

Q-value value based + visual dynamics model

st ut

slide-32
SLIDE 32

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

slide-33
SLIDE 33

Comparison to Prior Methods

ORB feature points IBVS C-COT visual tracker IBVS CNN +TRPO (≥ 20000)

  • urs,

feature dynamics +FQI (20)

Feature Representation and Optimization Method

1 2 3 4 5

Average Cost (Negative Reward)

slide-34
SLIDE 34

■ Deep reinforcement learning allows us to learn complex robot

policies that can process complex visual inputs

■ Combine value based and model based for better sample

complexity

■ Visual servoing ■ Learn visual feature dynamics ■ Learn Q-values with fitted Q-iteration

Conclusion

slide-35
SLIDE 35

Thank You

Resources

Paper: Code: Servoing benchmark code: More videos: arxiv.org/abs/1703.11000 github.com/alexlee-gk/visual_dynamics github.com/alexlee-gk/citysim3d rll.berkeley.edu/visual_servoing

Acknowledgements