Progressive Nets for Simulation to Robot Transfer Raia Hadsell - - PowerPoint PPT Presentation

progressive nets for simulation to robot transfer
SMART_READER_LITE
LIVE PREVIEW

Progressive Nets for Simulation to Robot Transfer Raia Hadsell - - PowerPoint PPT Presentation

Progressive Nets for Simulation to Robot Transfer Raia Hadsell Skepticism Lets acknowledge a few difficulties with deep learning and robotics: 1. Robot-domain data does not present itself in this form: Complex Environments - RAIA HADSELL


slide-1
SLIDE 1

Progressive Nets for Simulation to Robot Transfer

Raia Hadsell

slide-2
SLIDE 2

Complex Environments - RAIA HADSELL

Let’s acknowledge a few difficulties with deep learning and robotics: 1. Robot-domain data does not present itself in this form:

Skepticism

slide-3
SLIDE 3

Complex Environments - RAIA HADSELL

Deep RL to the rescue?

Continuous Deep Q-Learning with Model-based Acceleration. Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine. ICML 2016. Asynchronous Methods for Deep Reinforcement Learning. Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu Control of Memory, Active Perception, and Action in Minecraft. Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, and Honglak Lee

However, deep RL is very data inefficient

slide-4
SLIDE 4

Complex Environments - RAIA HADSELL

Let’s acknowledge a few difficulties with deep learning and robotics: 2. Robot-domain data does not present itself in this quantity:

Skepticism

slide-5
SLIDE 5

Complex Environments - RAIA HADSELL

Simulation to the rescue?

https://www.youtube.com/watch?v=3WXd4vC3lbQ

slide-6
SLIDE 6

Complex Environments - RAIA HADSELL

Simulation to the rescue?

Deep learning and deep RL likes simulators:

  • Training
  • Algorithms
  • Hyperparameters
  • Speed

However…

There is a Reality Gap! We aren’t interested in simulation unless learning can transfer to target domain, and transfer is hard, especially for deep learning.

slide-7
SLIDE 7

Complex Environments - RAIA HADSELL

  • Continual + Transfer learning can bridge reality gap and ameliorate data inefficiency
  • Unfortunately, neural networks are not well-suited to continual learning

■ Catastrophic forgetting from fine-tuning ■ Policy interference from multi-task learning

Transfer + continual learning

slide-8
SLIDE 8

Complex Environments - RAIA HADSELL

arxiv.org/abs/1606.04671

Andrei Rusu Neil C. Rabinowitz Guillaume Desjardins Hubert Soyer James Kirkpatrick Koray Kavukcuoglu Razvan Pascanu

Progressive Neural Networks

In collaboration with:

slide-9
SLIDE 9

Complex Environments - RAIA HADSELL

  • Progressive Neural Networks
slide-10
SLIDE 10

Complex Environments - RAIA HADSELL

  • Progressive Neural Networks
slide-11
SLIDE 11

Complex Environments - RAIA HADSELL

1 1 2 2

Progressive Neural Networks

a a

slide-12
SLIDE 12

Complex Environments - RAIA HADSELL

1 1 2 2

Progressive Neural Networks

a a

slide-13
SLIDE 13

Complex Environments - RAIA HADSELL

1 1 2 2

Progressive Neural Networks

3 3

a a a a a a

slide-14
SLIDE 14

Complex Environments - RAIA HADSELL

1 1 2 2

Progressive Neural Networks

3 3

a a a a a a

slide-15
SLIDE 15

Complex Environments - RAIA HADSELL

Progressive Neural Networks

Advantages

1. No catastrophic forgetting of previous tasks - by design. 2. Deep, compositional feature transfer from all previous tasks and layers 3. Added capacity for learning task-specific features 4. Provides framework for analysis of transferred features

slide-16
SLIDE 16

Complex Environments - RAIA HADSELL

Progressive Neural Networks

Disadvantages

1. Requires knowledge of task boundaries 2. Quadratic parameter growth! However, sensitivity analysis shows that successive columns use much less capacity.

slide-17
SLIDE 17

Complex Environments - RAIA HADSELL

Experimental setup

Baseline 1: column trained on task B Baseline 2: column trained on A, top layer fine- tuned on B Baseline 3: column trained on A, all layers fine- tuned on B

1 1 2 2

a a

Baseline 4: column 1 random, column 2 trained

  • n task B

1 1 2 2

a a

Progressive Net: column 1 trained

  • n A, column 2 on

task B

All training is with Asynchronous Advantage Actor-Critic (A3C) [mnih et al., 2016]

2 2 2 2 2 2

slide-18
SLIDE 18

Presentation Title — SPEAKER

Pong → white Pong Pong → horiz-flip Pong

Pong Soup

slide-19
SLIDE 19

Complex Environments - RAIA HADSELL

Analysis, 2 methods

1. Average Perturbation Sensitivity

Inject Gaussian noise and measure drop in performance

Pong to Noisy Pong

Noise injected at column1 (blue) or column 2 (green)

slide-20
SLIDE 20

Complex Environments - RAIA HADSELL

Analysis

2. Average Fisher Sensitivity

  • Compute modified diagonal Fisher matrix : network policy with respect

to normalized activations of each layer

  • AFS is computed for layer i, column k, and feature m.
slide-21
SLIDE 21

Complex Environments - RAIA HADSELL

Pong Soup - Analysis

pong h-flip fc conv 2 conv 1 pong zoom fc conv 2 conv 1

slide-22
SLIDE 22

Complex Environments - RAIA HADSELL

pong noisy

Pong Soup - Analysis

fc conv 2 conv 1 noisy pong fc conv 2 conv 1

slide-23
SLIDE 23

Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot

1 1

128

Column 1: Reacher task with random start, fixed target, trained with Mujoco model of Jaco arm. Input: RGB only Output: joint velocities (6 DOF) Network: ConvNet + LSTM + softmax output Learning: Asynchronous advantage actor-critic (A3C); 16 threads

slide-24
SLIDE 24

Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot

1 1

128

slide-25
SLIDE 25

Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot

Reacher task: random start, fixed target Input: RGB images Output: joint velocities (6 DOF)

1 1

128

2 2

16

slide-26
SLIDE 26

Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot

1 1 2 2

128 16

Column 2: Reacher task with random start, random target, trained with real Jaco arm. Input: proprioception + target XYZ Output: joint velocities (6 DOF) Network: MLP + LSTM + softmax output Learning: Asynchronous advantage actor-critic (A3C); 1 thread

slide-27
SLIDE 27

Complex Environments - RAIA HADSELL

1 1

128

Progressive nets from simulation to robot

https://www.youtube.com/watch?v=tXISbTOesMY

slide-28
SLIDE 28

Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot

1 1 2 2

128 16

slide-29
SLIDE 29

Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot

1 1 2 2

128 16 https://www.youtube.com/watch?v=YZz5Io_ipi8

slide-30
SLIDE 30

Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot

Column 3: ‘Catch’, trained with real Jaco arm. 1 1 2 2 3 3

128 16 16

https://www.youtube.com/watch?v=qzMTPzbPV0c

slide-31
SLIDE 31

Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot

1 1 2 2 3 3 4 4

128 16 16 16

Column 4: ‘Catch the bee’, trained with real Jaco arm.

https://www.youtube.com/watch?v=JkXhlIWsUA0

slide-32
SLIDE 32

Presentation Title — SPEAKER

Thank you

What’s next?

  • Scaling up Progressive Networks

○ Compression / Brain Damage / Complementary Learning ○ Limiting Model Growth with Sharing of Lateral Connections

  • Automating the progression

○ Eliminating the need for manual switch points while keeping model growth in check

  • Meta-controller making use old policies in new situations

○ Fast adaptation to new tasks using the fact that old policies are NOT forgotten.