Pieter Abbeel Berkeley Ar-ficial Intelligence Research laboratory - - PowerPoint PPT Presentation

pieter abbeel berkeley ar ficial intelligence research
SMART_READER_LITE
LIVE PREVIEW

Pieter Abbeel Berkeley Ar-ficial Intelligence Research laboratory - - PowerPoint PPT Presentation

Pieter Abbeel Berkeley Ar-ficial Intelligence Research laboratory (BAIR.berkeley.edu) PR1 [Wyrobek, Berger, Van der Loos, Salisbury, ICRA 2008] Personal RoboAcs Hardware ? PR2 Baxter Fetch Willow Garage Rethink RoboAcs Fetch RoboAcs


slide-1
SLIDE 1

Pieter Abbeel Berkeley Ar-ficial Intelligence Research laboratory (BAIR.berkeley.edu)

slide-2
SLIDE 2

PR1

[Wyrobek, Berger, Van der Loos, Salisbury, ICRA 2008]

slide-3
SLIDE 3

Personal RoboAcs Hardware

PR2 Willow Garage $400,000 2009 Baxter Rethink RoboAcs $30,000 2013 Fetch Fetch RoboAcs ~$80,000 2015 ?

slide-4
SLIDE 4

n Tele-op roboAc surgery

More Generally

n Driving n Flight

slide-5
SLIDE 5

Challenge Task: RoboAc Laundry

slide-6
SLIDE 6

n Variability

n ApprenAceship learning n Reinforcement learning

n Uncertainty

n Belief space planning

n Long-term reasoning

n Hierarchical planning

Challenges and Current DirecAons

slide-7
SLIDE 7

n Variability

n ApprenAceship learning [IJRR 2010, ICRA 2010 (2x), ICRA 2012, ISRR 2013, IROS 2014, ICRA 2015 (4x) IROS 2015 (2x)] n Reinforcement learning

n Uncertainty

n Belief space planning [RSS 2010, WAFR 2010, IJRR 2011, ICRA 2014, WAFR 2014, ICRA 2015]

n Long-term reasoning

n Hierarchical planning [PlanRob 2013, ICRA 2014, AAAI 2015, IROS 2015]

Challenges and Current DirecAons

slide-8
SLIDE 8

n

State-of-the-art object detecAon unAl 2012:

n

Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …):

n ~1.2 million training images from ImageNet [Deng, Dong, Socher, Li, Li, Fei-Fei, 2009]

Object DetecAon in Computer Vision

Input Image Hand-engineered features (SIFT, HOG, DAISY, …) Support Vector Machine (SVM) “cat” “dog” “car” … Input Image 8-layer neural network with 60 million parameters to learn “cat” “dog” “car” …

slide-9
SLIDE 9

Performance

graph credit Matt Zeiler, Clarifai

slide-10
SLIDE 10

Performance

graph credit Matt Zeiler, Clarifai

slide-11
SLIDE 11

Performance

graph credit Matt Zeiler, Clarifai

AlexNet

slide-12
SLIDE 12

Performance

graph credit Matt Zeiler, Clarifai

AlexNet

slide-13
SLIDE 13

Performance

graph credit Matt Zeiler, Clarifai

AlexNet

slide-14
SLIDE 14

Speech RecogniAon

graph credit Matt Zeiler, Clarifai

slide-15
SLIDE 15

n

State-of-the-art object detecAon unAl 2012:

n

Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …):

n ~1.2 million training images from ImageNet [Deng, Dong, Socher, Li, Li, Fei-Fei, 2009]

Object DetecAon in Computer Vision

Input Image Hand-engineered features (SIFT, HOG, DAISY, …) Support Vector Machine (SVM) “cat” “dog” “car” … Input Image 8-layer neural network with 60 million parameters to learn “cat” “dog” “car” …

slide-16
SLIDE 16

n

Current state-of-the-art robo-cs

n

Deep reinforcement learning

RoboAcs

Percepts Hand- engineered state- estimation Many-layer neural network with many parameters to learn Hand- engineered control policy class Hand-tuned (or learned) 10’ish free parameters Motor commands Percepts Motor commands

slide-17
SLIDE 17

Reinforcement Learning (RL)

n

RoboAcs

n

MarkeAng / AdverAsing

n

Dialogue

n

OpAmizing

  • peraAons /

logisAcs

n

Queue management

n

Robot + Environment

πθ(a|s)

probability of taking acAon a in state s

max

θ

E[

H

X

t=0

R(st)|πθ]

slide-18
SLIDE 18

Reinforcement Learning (RL)

n

Goal:

max

θ

E[

H

X

t=0

R(st)|πθ]

probability of taking acAon a in state s Robot + Environment

πθ(a|s)

n

Addi-onal challenges:

n

Stability

n

Credit assignment

n

Explora-on

slide-19
SLIDE 19

How About ConAnuous Control, e.g., LocomoAon?

Joint angles and kinematics Control Standard deviations Fully connected layer 30 units Input layer Mean parameters Sampling

Neural network architecture: Input: joint angles and velociAes Output: joint torques Robot models in physics simulator (MuJoCo, from Emo Todorov)

slide-20
SLIDE 20

Learning LocomoAon

[Schulman, Moritz, Levine, Jordan, Abbeel, 2015]

slide-21
SLIDE 21

n

Trust Region Policy OpAmizaAon [Schulman, Levine, Moritz, Jordan, Abbeel, 2015]

n Policy OpAmizaAon:

n Ojen simpler to represent good policies than good value funcAons n True objecAve of expected cost is opAmized (vs. a surrogate like Bellman error)

n Trust Region:

n Sampled evaluaAon of objecAve and gradient n Gradient only locally a good approximaAon n Change in policy changes state-acAon visitaAon frequencies

Technical Ideas

max

θ

E[

H

X

t=0

R(st)|πθ]

slide-22
SLIDE 22

n

Generalized Advantage EsAmaAon [Schulman, Moritz, Levine, Jordan, Abbeel, 2015]

n Fuse value funcAon esAmates with policy evaluaAons from roll-outs n Trust region approach to (high-dimensional) value funcAon esAmaAon

Technical Ideas

max

θ

E[

H

X

t=0

R(st)|πθ]

slide-23
SLIDE 23

In Contrast: Darpa RoboAcs Challenge

slide-24
SLIDE 24

n

Deep Q-Network (DQN) [Mnih et al, 2013/2015]

n

Dagger with Monte Carlo Tree Search [Xiao-Xiao et al, 2014]

n

Trust Region Policy OpAmizaAon [Schulman, Levine, Moritz, Jordan, Abbeel, 2015]

Atari Games

Pong Enduro Beamrider Q*bert

slide-25
SLIDE 25

How About Real RoboAc Visuo-Motor Skills?

slide-26
SLIDE 26

Supervised learning

trajectory opAmizaAon

policy search (RL) supervised learning trajectory opAmizaAon complex dynamics complex policy complex dynamics complex policy complex dynamics complex policy HARD EASY EASY

general-purpose neural network controller

Guided Policy Search

slide-27
SLIDE 27

Instrumented Training

training time test time

slide-28
SLIDE 28

Deep SpaAal Neural Net Architecture

[Levine*, Finn*, Darrell, Abbeel, 2015, TR at: rll.berkeley.edu/deeplearningroboAcs]

(92,000 parameters)

πθ

slide-29
SLIDE 29

Experimental Tasks

[Levine*, Finn*, Darrell, Abbeel, JMLR 2016

slide-30
SLIDE 30

Learning

[Levine*, Finn*, Darrell, Abbeel, JMLR 2016

slide-31
SLIDE 31

Learned Skills

[Levine*, Finn*, Darrell, Abbeel, JMLR 2016

slide-32
SLIDE 32

Visuomotor Learning Directly in Visual Space

  • 1. Set target end-effector pose
  • 2. Train exploratory non-vision controller
  • 3. Learning visual features with collected images
  • 4. Provide image that defines goal features
  • 5. Train final controller in visual feature space

[Finn, Tan, Duan, Darrell, Levine, Abbeel, 2015]

slide-33
SLIDE 33

Visuomotor Learning Directly in Visual Space

[Finn, Tan, Duan, Darrell, Levine, Abbeel, 2015]

slide-34
SLIDE 34

Autonomous Flight

Agriculture Urban Delivery Key challenge: Enable autonomous aerial vehicles (AAVs) to navigate complex, dynamic environments Law Enforcement

[Khan, Zhang, Levine, Abbeel 2016]

slide-35
SLIDE 35

ZED Stereo Depth Camera NVIDIA Jetson TX1 3DR Solo

slide-36
SLIDE 36

Experiments: Learned Neural Network Policy

[Khan, Zhang, Levine, Abbeel 2016]

slide-37
SLIDE 37

Experiments: Learned Neural Network Policy

[Khan, Zhang, Levine, Abbeel 2016]

slide-38
SLIDE 38

Experiments: comparisons

Canyon Forest

[Khan, Zhang, Levine, Abbeel, 2016]

slide-39
SLIDE 39

n

Shared and transfer learning

FronAers

n

Hierarchical reasoning

n MulA-Ame scale learning

n

Memory

n EsAmaAon

[Mordatch, Mishra, Eppner, Abbeel ICRA 2016] n

Transfer simulaAon -> real world

slide-40
SLIDE 40

n

Colleagues: Trevor Darrell, Ken Goldberg, Michael Jordan, Stuart Russell

n

Post-docs: Sergey Levine, Igor Mordatch, Sachin PaAl, Jia Pan, Aviv Tamar, Dave Held

n

Students: John Schulman, Chelsea Finn, Sandy Huang, Bradly Stadie, Alex Lee, Dylan Hadfield-Menell, Jonathan Ho, Ziang Xie, Rocky Duan, JusAn Fu, Abhishek Gupta, Gregory Kahn, Nikita Kitaev, Henry Lu, George Mulcaire, Nolan Wagener, Ankush Gupta, Sibi Venkatesan, Cameron Lee

Acknowledgements

slide-41
SLIDE 41

Thank you