Learning Visual Servoing with Deep Features and Fitted Q-Iteration
Alex X. Lee
1
, Sergey Levine
1, Pieter Abbeel 2,1,3 1UC Berkeley, 2OpenAI, 3International Computer Science Institute
Learning Visual Servoing with Deep Features and Fitted Q-Iteration - - PowerPoint PPT Presentation
Learning Visual Servoing with Deep Features and Fitted Q-Iteration Alex X. Lee 1 , 2,1,3 1 , Pieter Abbeel Sergey Levine 1 UC Berkeley, 2 OpenAI, 3 International Computer Science Institute Motivation Deep Neural Networks in Computer Vision
1
, Sergey Levine
1, Pieter Abbeel 2,1,3 1UC Berkeley, 2OpenAI, 3International Computer Science Institute
image classification semantic segmentation
woman animal shaking singer1
AlexNet
■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
low sample complexity medium sample complexity high sample complexity relies on a good model challenge for continuous and high-dimensional action spaces policy might be simpler than value or model
value based
Q-value
model based
environment model
policy
Agent Environment
state s, reward r actjon u
model free
u
value based
Q-value
model based
environment model
policy
model free
Schulman et al, 2016 (TRPO + GAE) Silver et al, 2014 (DPG) Lillicrap et al, 2015 (DDPG) Levine*, Finn*, et al, 2016 (GPS) Gu*, Holly*, et al, 2016 Sadeghi et al, 2017 (CAD)2RL Tamar et al, 2016 (VIN) Mnih et al, 2015 (DQN) Mnih et al, 2016 (A3C)
Levine*, Finn*, et al, 2016 (GPS) Gu*, Holly*, et al, 2016 Sadeghi et al, 2017 (CAD)2RL
■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
current
goal
Source: SeRViCE Lab, UT Dallas
Source: Kehoe et al. 2016
Source: NASA
■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
Agent Environment state s, reward r actjon u current and goal image observation linear and angular velocities distance to desired pose relative to car
■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
policy
goal
current
example executions of trained policy
trained with more than 20000 trajectories!
■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
dynamics function
current
goal
predicted
current
goal
predicted
current feature goal feature predicted feature
u ||y∗ − f(yt, ut)||2 w
■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
goal
current
example executions of trained policy
trained with only 20 trajectories!
Q-value value based + visual dynamics model
■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
ORB feature points IBVS C-COT visual tracker IBVS CNN +TRPO (≥ 20000)
feature dynamics +FQI (20)
Feature Representation and Optimization Method
1 2 3 4 5
Average Cost (Negative Reward)
■ Deep reinforcement learning allows us to learn complex robot
■ Combine value based and model based for better sample
■ Visual servoing ■ Learn visual feature dynamics ■ Learn Q-values with fitted Q-iteration
Paper: Code: Servoing benchmark code: More videos: arxiv.org/abs/1703.11000 github.com/alexlee-gk/visual_dynamics github.com/alexlee-gk/citysim3d rll.berkeley.edu/visual_servoing