[PPT] - Learning Visual Servoing with Deep Features and Fitted Q-Iteration PowerPoint Presentation

SLIDE 1

Learning Visual Servoing with Deep Features and Fitted Q-Iteration

Alex X. Lee

1

, Sergey Levine

1, Pieter Abbeel 2,1,3 1UC Berkeley, 2OpenAI, 3International Computer Science Institute

SLIDE 2

Motivation

SLIDE 3

Deep Neural Networks in Computer Vision

image classification semantic segmentation

woman animal shaking singer1

bject tracking
bject detection

bottle : 0.726 person : 0.992 dog : 0.981 car : 0.955 55 55 car : 0.745 .745 horse : 0.991 person : 0.988 person : 0.797 bird : 0.978 bird : 0.972 bird : 0.941 bird : 0.902 person : 0.918 cow : 0.998 cow : 0.995 dog : 0.697 person : 0.961 person : 0.960 person person person : 0.958 person : 0.757 bus : 0.999 person : 0.996 per per per person : 0.995 person : 0.994 person : 0.985 cow : 0.985 cow : 0.979 cow : 0.979 cow : 0.974 cow : 0.892 person : 0.998 car : 0.999 person : 0.929 person : 0.994 person : 0.991 person : 0.988 pers p person : 0.976 person : 0.964 car : 0.997 car : 0.980 person : 0.993 person person person person person : 0.986 0 993 : n 86 n : n : nperson : 0.959

AlexNet

SLIDE 4

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

SLIDE 5

What is Reinforcement Learning?

Agent Environment state s, reward r actjon u

SLIDE 6

Reinforcement Learning Approaches

low sample complexity medium sample complexity high sample complexity relies on a good model challenge for continuous and high-dimensional action spaces policy might be simpler than value or model

value based

Q

Q-value

st ut

model based

environment model

rt+1 st+1

st ut

policy

ptimization

π

st ut

Agent Environment

state s, reward r actjon u

model free

π(ut|st) = arg max

u

Q(st, u)

SLIDE 7

What is Deep Reinforcement Learning?

value based

Q

Q-value

st ut

model based

environment model

rt+1 st+1 st ut

policy

ptimization

π

st

ut

model free

SLIDE 8

Examples of Deep Reinforcement Learning

Schulman et al, 2016 (TRPO + GAE) Silver et al, 2014 (DPG) Lillicrap et al, 2015 (DDPG) Levine*, Finn*, et al, 2016 (GPS) Gu*, Holly*, et al, 2016 Sadeghi et al, 2017 (CAD)2RL Tamar et al, 2016 (VIN) Mnih et al, 2015 (DQN) Mnih et al, 2016 (A3C)

SLIDE 9

Deep Reinforcement Learning for Robotics

Levine*, Finn*, et al, 2016 (GPS) Gu*, Holly*, et al, 2016 Sadeghi et al, 2017 (CAD)2RL

SLIDE 10

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

SLIDE 11

Visual Servoing

current

bservation

goal

bservation

SLIDE 12

Examples of Visual Servoing: Manipulation

Source: SeRViCE Lab, UT Dallas

SLIDE 13

Examples of Visual Servoing: Surgical Tasks

Source: Kehoe et al. 2016

SLIDE 14

Examples of Visual Servoing: Space Docking

Source: NASA

4x

SLIDE 15

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

SLIDE 16

Learning Visual Servoing with Reinforcement Learning

Agent Environment state s, reward r actjon u current and goal image observation linear and angular velocities distance to desired pose relative to car

SLIDE 17

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

SLIDE 18

Learning Visual Servoing with Policy Optimization

policy

ptimization

π

st ut

goal

bservation

current

bservation

example executions of trained policy

trained with more than 20000 trajectories!

SLIDE 19

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

SLIDE 20

Combining Value and Model Based Reinforcement Learning

State-action value based RL:

π(st) = arg max

u

Q(st, u)

SLIDE 21

Combining Value and Model Based Reinforcement Learning π(st) = arg min

u −Q(st, u)

−Q(st, u)

State-action value based RL: Visual servoing:

dynamics function

− f(xt, ut)|| π(st) = arg min

u ||x∗ − f(xt, ut)||2

SLIDE 22

Servoing with Visual Dynamics Model

current

bservation

goal

bservation

predicted

bservation

SLIDE 23

Features from Dilated VGG-16 Convolutional Neural Network

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016.

SLIDE 24

Servoing with Visual Dynamics Model

current

bservation

goal

bservation

predicted

bservation

SLIDE 25

current feature goal feature predicted feature

π(xt, x∗) = arg min

u ||y∗ − f(yt, ut)||2 w

Servoing with Visual Dynamics Model

−Qw(st, u)

SLIDE 26

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

SLIDE 27

Feature Dynamics: Multiscale Bilinear Model

SLIDE 28

Feature Dynamics: Multiscale Bilinear Model

SLIDE 29

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

SLIDE 30

Learning Model Based Policy with Fitted Q-Iteration

−Qw(st, u)

π(st) = arg min

u ||y∗ − f(yt, ut)||2 w

SLIDE 31

Learning Visual Servoing with Deep Feature Dynamics and FQI

goal

bservation

current

bservation

example executions of trained policy

trained with only 20 trajectories!

Q

Q-value value based + visual dynamics model

st ut

SLIDE 32

■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

Outline

SLIDE 33

Comparison to Prior Methods

ORB feature points IBVS C-COT visual tracker IBVS CNN +TRPO (≥ 20000)

urs,

feature dynamics +FQI (20)

Feature Representation and Optimization Method

1 2 3 4 5

Average Cost (Negative Reward)

SLIDE 34

■ Deep reinforcement learning allows us to learn complex robot

policies that can process complex visual inputs

■ Combine value based and model based for better sample

complexity

■ Visual servoing ■ Learn visual feature dynamics ■ Learn Q-values with fitted Q-iteration

Conclusion

SLIDE 35

Thank You

Resources

Paper: Code: Servoing benchmark code: More videos: arxiv.org/abs/1703.11000 github.com/alexlee-gk/visual_dynamics github.com/alexlee-gk/citysim3d rll.berkeley.edu/visual_servoing

Learning Visual Servoing with Deep Features and Fitted Q-Iteration

Alex X. Lee

Motivation

Deep Neural Networks in Computer Vision

Outline

What is Reinforcement Learning?

Agent Environment state s, reward r actjon u

Reinforcement Learning Approaches

Q

st ut

rt+1 st+1

st ut

π

st ut

π(ut|st) = arg max

Q(st, u)

What is Deep Reinforcement Learning?

Q

st ut

rt+1 st+1 st ut

π

st

ut

Examples of Deep Reinforcement Learning

Deep Reinforcement Learning for Robotics

Outline

Visual Servoing

Examples of Visual Servoing: Manipulation

Examples of Visual Servoing: Surgical Tasks

Examples of Visual Servoing: Space Docking

4x

Outline

Learning Visual Servoing with Reinforcement Learning

Outline

Learning Visual Servoing with Policy Optimization

π

st ut

Outline

Combining Value and Model Based Reinforcement Learning

State-action value based RL:

π(st) = arg max

u

Q(st, u)

Combining Value and Model Based Reinforcement Learning π(st) = arg min

u −Q(st, u)

−Q(st, u)

State-action value based RL: Visual servoing:

− f(xt, ut)|| π(st) = arg min

u ||x∗ − f(xt, ut)||2

Servoing with Visual Dynamics Model

Features from Dilated VGG-16 Convolutional Neural Network

Servoing with Visual Dynamics Model

π(xt, x∗) = arg min

Servoing with Visual Dynamics Model

−Qw(st, u)

Outline

Feature Dynamics: Multiscale Bilinear Model

Feature Dynamics: Multiscale Bilinear Model

Outline

Learning Model Based Policy with Fitted Q-Iteration

−Qw(st, u)

π(st) = arg min

u ||y∗ − f(yt, ut)||2 w

Learning Visual Servoing with Deep Feature Dynamics and FQI

Q

st ut

Outline

Comparison to Prior Methods

policies that can process complex visual inputs

complexity

Conclusion

Thank You

Resources

Acknowledgements