Learning Dexterity
Peter Welinder
SEPTEMBER 09, 2018
Learning Dexterity Peter Welinder SEPTEMBER 09, 2018 Learning - - PowerPoint PPT Presentation
Learning Dexterity Peter Welinder SEPTEMBER 09, 2018 Learning Trends towards learning-based robotics Reinforcement Learning Go (AlphaGo Zero) Dota 2 (OpenAI Five) What about Robotics? RL doesnt work because it uses lots of experience. 5
SEPTEMBER 09, 2018
Go (AlphaGo Zero) Dota 2 (OpenAI Five)
AGENT (POLICY) STATE ACTIONS REWARDS
score = X
t
reward(statet, actiont)
Schulman et al. (2017)
θ
τ∈episodes
fingertip positions object pose Noisy Observation finger joint positions Normalization Fully-connected ReLU LSTM Action Distribution Goal
Policy Parameters Optimizers
8 GPUs
Rollout Workers
6,000 CPU Cores
F Sadeghi, S Levine (2017) Tobin et al. (2017) Peng et al. (2018)
Dense Concat SSM ResNet Pool Conv Camera 2 SSM ResNet Pool Conv Camera 3 SSM ResNet Pool Conv Camera 1 Object rotation Object position
We train a control policy using reinforcement learning. It chooses the next action based on fingertip positions and the object pose. B We train a convolutional neural network to predict the
C Observed Robot States Actions Object Pose Distributed workers collect experience on randomized environments at large scale. A
LSTM CONV CONV CONV
Object Pose Fingertip Locations We combine the pose estimation network and the control policy to transfer to the real world. D
LSTM
Actions
CONV CONV CONV
RANDOMIZATONS OBJECT TRACKING MAX NUMBER OF SUCCESSES MEDIAN NUMBER OF SUCCESSES
All Vision 46 11.5 All Motion tracking 50 13 None Motion tracking 6
Consecutive Goals Achieved
1 10 100
10 20 30 40 50
Years of Experience
All Randomizations No Randomizations
Tip Pinch Palmar Pinch Tripod Quadpod Power Grasp 5-finger Precision Grasp
FOLLOW @OPENAI ON TWITTER