Learning Dexterity Peter Welinder SEPTEMBER 09, 2018 Learning - - PowerPoint PPT Presentation

learning dexterity
SMART_READER_LITE
LIVE PREVIEW

Learning Dexterity Peter Welinder SEPTEMBER 09, 2018 Learning - - PowerPoint PPT Presentation

Learning Dexterity Peter Welinder SEPTEMBER 09, 2018 Learning Trends towards learning-based robotics Reinforcement Learning Go (AlphaGo Zero) Dota 2 (OpenAI Five) What about Robotics? RL doesnt work because it uses lots of experience. 5


slide-1
SLIDE 1

Learning Dexterity

Peter Welinder

SEPTEMBER 09, 2018

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

Learning

slide-6
SLIDE 6

Trends towards learning-based robotics

slide-7
SLIDE 7

Reinforcement Learning

Go (AlphaGo Zero) Dota 2 (OpenAI Five)

slide-8
SLIDE 8

What about Robotics? RL doesn’t work because it uses lots of experience. 5 million games ~500 years of playing Go: 200 years per day Dota: 200 years per day

slide-9
SLIDE 9
slide-10
SLIDE 10

Simulators

slide-11
SLIDE 11

Learning dexterity

slide-12
SLIDE 12

24 joints: 20 actuated 4 under actuated

slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

Rotating a block Challenges RL in real world high dimensional control noisy and partial

  • bservations

manipulating multiple

  • bjects.
slide-16
SLIDE 16

Approach

slide-17
SLIDE 17

Reinforcement Learning + Domain Randomization

slide-18
SLIDE 18

Reinforcement Learning

AGENT (POLICY) STATE ACTIONS REWARDS

actiont = policy(statet)

score = X

t

reward(statet, actiont)

slide-19
SLIDE 19

Reinforcement Learning Proximal Policy Optimization (PPO)

Schulman et al. (2017)

θ∗ = arg max

θ

X

τ∈episodes

reward(policyθ, τ)

slide-20
SLIDE 20

Policy

fingertip positions object pose Noisy Observation finger joint positions Normalization Fully-connected ReLU LSTM Action Distribution Goal

slide-21
SLIDE 21

Distributed training with Rapid

Policy Parameters Optimizers

8 GPUs

Rollout Workers

6,000 CPU Cores

slide-22
SLIDE 22
slide-23
SLIDE 23

Domain Randomization

F Sadeghi, S Levine (2017) Tobin et al. (2017) Peng et al. (2018)

slide-24
SLIDE 24

Physics Randomizations

  • bject dimensions
  • bject and robot link masses

surface friction coefficients robot joint damping coefficients actuator force gains joint limits gravity vector

slide-25
SLIDE 25
slide-26
SLIDE 26

Dense Concat SSM ResNet Pool Conv Camera 2 SSM ResNet Pool Conv Camera 3 SSM ResNet Pool Conv Camera 1 Object rotation Object position

slide-27
SLIDE 27

We train a control policy using reinforcement learning. It chooses the next action based on fingertip positions and the object pose. B We train a convolutional neural network to predict the

  • bject pose given three simulated camera images.

C Observed Robot States Actions Object Pose Distributed workers collect experience on randomized environments at large scale. A

LSTM CONV CONV CONV

Train in Simulation

slide-28
SLIDE 28

Object Pose Fingertip Locations We combine the pose estimation network and the control policy to transfer to the real world. D

LSTM

Actions

CONV CONV CONV

Transfer to the Real World

slide-29
SLIDE 29

Results

slide-30
SLIDE 30

Page Title

slide-31
SLIDE 31

Results

RANDOMIZATONS OBJECT TRACKING MAX NUMBER
 OF SUCCESSES MEDIAN NUMBER
 OF SUCCESSES

All Vision 46 11.5 All Motion tracking 50 13 None Motion tracking 6

slide-32
SLIDE 32

Training time

Consecutive Goals Achieved

1 10 100

10 20 30 40 50

Years of Experience

All Randomizations No Randomizations

slide-33
SLIDE 33
slide-34
SLIDE 34

Tip Pinch Palmar Pinch Tripod Quadpod Power Grasp 5-finger Precision Grasp

slide-35
SLIDE 35

Thank You

Visit openai.com for more information.

FOLLOW @OPENAI ON TWITTER