Making Robots Learn Pieter Abbeel -- UC Berkeley EECS Object - PowerPoint PPT Presentation

Making Robots Learn Pieter Abbeel -- UC Berkeley EECS

Object Detec9on in Computer Vision State-of-the-art object detec9on un9l 2012: n Support “cat” Hand-engineered Input Vector “dog” features (SIFT, Image Machine “car” HOG, DAISY, … ) (SVM) … Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …): n “cat” “dog” Input 8-layer neural network with 60 million Image parameters to learn “car” … n ~1.2 million training images from ImageNet [Deng, Dong, Socher, Li, Li, Fei-Fei, 2009]

Performance graph credit Matt Zeiler, Clarifai

Performance AlexNet graph credit Matt Zeiler, Clarifai

Speech Recogni9on graph credit Matt Zeiler, Clarifai

History (Olshausen, 1996) 2000s Sparse, Probabilistic, and Energy models (Hinton, Bengio, LeCun, Ng) Is deep learning 3, 30, or 60 years old? Rosenblatt’s Perceptron based on history by K. Cho

What’s Changed Data Nonlinearity n n 1.2M training examples Sigmoid n 2048 (different crops) n à ReLU 90 (PCA re-colorings) n Regulariza9on n Compute power n n Drop-out Two NVIDIA GTX 580 GPUs n Explora9on of model structure 5-6 days of training 9me n n Op9miza9on know-how n

Object Detec9on in Computer Vision State-of-the-art object detec9on un9l 2012: n Support “cat” Hand-engineered Input Vector “dog” features (SIFT, Image Machine “car” HOG, DAISY, … ) (SVM) … Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …): n “cat” “dog” Input 8-layer neural network with 60 million Image parameters to learn “car” … n ~1.2 million training images from ImageNet [Deng, Dong, Socher, Li, Li, Fei-Fei, 2009]

Robo9cs Current state-of-the-art robo5cs n Hand- Hand-tuned Hand- Motor engineered (or learned) engineered Percepts commands control 10’ish free state- policy class parameters estimation Deep reinforcement learning n Many-layer neural network Motor Percepts commands with many parameters to learn

Reinforcement Learning probability of taking ac9on a in state s π θ ( a | s ) Robot + Environment H X Goal: max E[ R ( s t ) | π θ ] n θ t =0

From Pixels to Ac9ons? Pong Enduro Beamrider Q*bert

Deep Q-Network (DQN): From Pixels to Joys9ck Commands 32 8x8 filters with stride 4 + ReLU 64 4x4 filters with stride 2 + ReLU 64 3x3 filters with stride 1 + ReLU fully connected 512 units + ReLU [Source: Mnih et al., Nature 2015 (DeepMind) ] fully connected output units, one per ac9on

[ Source: Mnih et al., Nature 2015 (DeepMind) ]

Deep Q-Network (DQN) Approach: n Q-learning with e-greedy and deep network as func9on approximator n Key idea 1: stabilizing Q-learning n Mini-batches of size 32 (vs. single sample updates) n Q-values used to compute temporal difference only updated every 10,000 updates n Key idea 2: lots of data / compute n trained for a total of 50 million frames (= 38 days of game experience ) and use a replay n memory of one million most recent frames

How About Con9nuous Control, e.g., Locomo9on? Robot models in physics simulator (MuJoCo, from Emo Todorov) Input: joint angles and veloci9es Fully Input Mean Output: joint torques connected Sampling layer parameters layer Joint angles and kinematics Neural network architecture: Control Standard 30 units deviations

Challenges with Q-Learning n How to score every possible ac9on? n How to ensure monotonic progress?

Policy Op9miza9on H X max E[ R ( s t ) | π θ ] θ t =0 Oqen simpler to represent good policies than good value func9ons n True objec9ve of expected cost is op9mized (vs. a surrogate like Bellman error) n Exis9ng work: (natural) policy gradients n n Challenges: good, large step direc9ons

Trust Region Policy Op9miza9on H X max E[ R ( s t ) | π θ ] θ t =0 g > δθ max ˆ δθ s . t . KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) ≤ ε n Trust Region: n Sampled evalua9on of gradient n Gradient only locally a good approxima9on n Change in policy changes state-ac9on visita9on frequencies [Schulman, Levine, Moritz, Jordan, Abbeel, 2015]

Experiments in Locomo9on [Schulman, Levine, A.]

Learning Curves -- Comparison

Atari Games Deep Q-Network (DQN) [Mnih et al, 2013/2015] n Dagger with Monte Carlo Tree Search [Xiao-Xiao et al, 2014] n Trust Region Policy Op9miza9on [Schulman, Levine, Moritz, Jordan, Abbeel, 2015] n Pong Enduro Beamrider Q*bert

Generalized Advantage Es9ma9on (GAE) H Objec9ve: X max E[ R ( s t ) | π θ ] θ t =0 H H ! Gradient: X X E[ r θ log π θ ( a t | s t ) R ( s k ) � V ( s t ) ] t =0 k = t single sample es9mate of advantage Generalized Advantage Es9ma9on n n Exponen9al interpola9on between actor-cri9c and Monte Carlo es9mates n Trust region approach to (high-dimensional) value func9on es9ma9on [Schulman, Moritz, Levine, Jordan, Abbeel, 2015]

Learning Locomo9on [Schulman, Moritz, Levine, Jordan, Abbeel, 2015]

In Contrast: Darpa Robo9cs Challenge

How About Real Robo9c Visuo-Motor Skills?

Guided Policy Search general-purpose neural network controller complex dynamics complex policy policy search (RL) HARD complex dynamics complex policy supervised learning EASY complex dynamics complex policy trajectory op9miza9on EASY trajectory op9miza9on Supervised learning

[Levine & Abbeel, NIPS 2014]

Guided Policy Search

Comparison

Block Stacking – Learning the Controller for a Single Instance

Linear-Gaussian Controller Learning Curves

Instrumented Training training time test time

Architecture (92,000 parameters) [Levine*, Finn*, Darrell, Abbeel, 2015, TR at: rll.berkeley.edu/deeplearningrobo9cs]

Experimental Tasks

Learning

Learned Skills [Levine*, Finn*, Darrell, Abbeel, 2015, TR at: rll.berkeley.edu/deeplearningrobo9cs]

Comparisons end-to-end training (trained on pose only) pose predic9on (trained on pose only) pose features

Comparisons coat hanger success rate pose predic9on 55.6% pose features 88.9% end-to-end training 100% shape sor9ng cube success rate pose predic9on 0% pose features 70.4% end-to-end training 96.3% toy claw hammer success rate pose predic9on 8.9% pose features 62.2% 2 cm end-to-end training 91.1% bowle cap success rate pose predic9on n/a pose features 55.6% Meeussen et al. (Willow Garage) end-to-end training 88.9%

Visuomotor Learning Directly in Visual Space ? Provide image that defines goal Train controller in visual feature space [Finn, Tan, Duan, Darrell, Levine, Abbeel, 2015]

Visuomotor Learning Directly in Visual Space 1. Set target end-effector pose 4. Provide image that defines goal features 2. Train exploratory non-vision controller 5. Train final controller in visual feature space 3. Learning visual features with collected images [Finn, Tan, Duan, Darrell, Levine, Abbeel, 2015]

Visuomotor Learning Directly in Visual Space [Finn, Tan, Duan, Darrell, Levine, Abbeel, 2015]

Fron9ers: Applica9ons Vision-based flight Natural language interac9on n n Locomo9on Dialogue n n Manipula9on Program analysis n n

Fron9ers: Founda9ons Shared and transfer learning Memory n n n Es9ma9on n Temporal hierarchy / goal seyng Explora9on Tools / Experimenta9on n n n Stochas9c computa9on graphs n Computa9on graph toolkit (CGT)

Making Robots Learn Pieter Abbeel -- UC Berkeley EECS Object - PowerPoint PPT Presentation

Making Robots Learn Pieter Abbeel -- UC Berkeley EECS Object Detec9on in Computer Vision State-of-the-art object detec9on un9l 2012: n Support cat Hand-engineered Input Vector dog features (SIFT, Image Machine car HOG,

UNIVERSAL ROBOTS RUC 2018 Universal Robots - Evolving the future UNIVERSAL ROBOTS SET THE

The Imitation Game: The New Frontline of Security Fighting Robots Weve been warned for a

Robots Playing Catch Brandon Tolsch Brandon Tolsch Robots Playing Catch Two robots throwing

You will learn what git is . You will learn how you can use git . You will learn how to learn more

Learn Blackboard Learn Learn with others Learn in your own time, pace, space Learn through

Human robot interaction www.biorobotics.ttu.ee Social robots Traditional robots Tools

ROBOTS AND HEALTHCARE PAST, PRESENT, AND FUTURE COMPILED BY HOWIE BAUM What do you think of when

Agenda Overview of Mobile Industrial Robots Future Steps for Mobile Industrial Robots

Modular Robots Modular Robots by D. Dibbern and A. Werdermann by D. Dibbern and A. Werdermann

Building Situated Robots Overview: Agents and Robots Robot systems and architectures

Living with Social Robots Luca Iocchi RoCoCo (Cognitive Cooperating Robots) Lab Dept. of

Computations by Luminous Robots Giuseppe Prencipe Universit di Pisa Swarms of robots Many

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Lunch n Learn Lunch n Learn Lunch n Learn Lunch n Learn Understanding Understanding

Speech Encoder Importance of body language 2 Why data-driven? Yoon et al. "Robots Learn

Machinery Center Inc. Presents A Ful ull Line ne of f Robots fo for Inj Injection Mo

Mentoring the Mentors A Pilot Project in Information Literacy Daisy Benson & Scott Schaffer

[H ADOOP ] Whats this hullabaloo about an elephant? No, not the one named Horton Who has fun

Digital Publishing Summit Europe May 2018 Berlin The Milestone Concept Credit card size

Descriptors CSE 576 Ali Farhadi Many slides from Larry Zitnick, Steve Seitz How can we find

Rhodes, Marshall, Mitchell, Churchill, Fulbright, Truman, Goldwater, & Udall UK Scholarships

Shared Memory Programming with OpenMP Lecture 6: Tasks What are tasks? Tasks are

Present Status and Future Prospects of COMET to Search for -e Conversion at J-PARC Y. Fujii

What does respectful maternal care look like? Better Maternal Outcomes: IHI Rapid Improvement

Making Robots Learn Pieter Abbeel -- UC Berkeley EECS Object - PowerPoint PPT Presentation

Making Robots Learn Pieter Abbeel -- UC Berkeley EECS Object Detec9on in Computer Vision State-of-the-art object detec9on un9l 2012: n Support cat Hand-engineered Input Vector dog features (SIFT, Image Machine car HOG,

UNIVERSAL ROBOTS RUC 2018 Universal Robots - Evolving the future UNIVERSAL ROBOTS SET THE

The Imitation Game: The New Frontline of Security Fighting Robots Weve been warned for a

Robots Playing Catch Brandon Tolsch Brandon Tolsch Robots Playing Catch Two robots throwing

You will learn what git is . You will learn how you can use git . You will learn how to learn more

Learn Blackboard Learn Learn with others Learn in your own time, pace, space Learn through

Human robot interaction www.biorobotics.ttu.ee Social robots Traditional robots Tools

ROBOTS AND HEALTHCARE PAST, PRESENT, AND FUTURE COMPILED BY HOWIE BAUM What do you think of when

Agenda Overview of Mobile Industrial Robots Future Steps for Mobile Industrial Robots

Modular Robots Modular Robots by D. Dibbern and A. Werdermann by D. Dibbern and A. Werdermann

Building Situated Robots Overview: Agents and Robots Robot systems and architectures

Living with Social Robots Luca Iocchi RoCoCo (Cognitive Cooperating Robots) Lab Dept. of

Computations by Luminous Robots Giuseppe Prencipe Universit di Pisa Swarms of robots Many

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Lunch n Learn Lunch n Learn Lunch n Learn Lunch n Learn Understanding Understanding

Speech Encoder Importance of body language 2 Why data-driven? Yoon et al. &quot;Robots Learn

Machinery Center Inc. Presents A Ful ull Line ne of f Robots fo for Inj Injection Mo

Mentoring the Mentors A Pilot Project in Information Literacy Daisy Benson &amp; Scott Schaffer

[H ADOOP ] Whats this hullabaloo about an elephant? No, not the one named Horton Who has fun

Digital Publishing Summit Europe May 2018 Berlin The Milestone Concept Credit card size

Descriptors CSE 576 Ali Farhadi Many slides from Larry Zitnick, Steve Seitz How can we find

Rhodes, Marshall, Mitchell, Churchill, Fulbright, Truman, Goldwater, &amp; Udall UK Scholarships

Shared Memory Programming with OpenMP Lecture 6: Tasks What are tasks? Tasks are

Present Status and Future Prospects of COMET to Search for -e Conversion at J-PARC Y. Fujii

What does respectful maternal care look like? Better Maternal Outcomes: IHI Rapid Improvement

Speech Encoder Importance of body language 2 Why data-driven? Yoon et al. "Robots Learn

Mentoring the Mentors A Pilot Project in Information Literacy Daisy Benson & Scott Schaffer

Rhodes, Marshall, Mitchell, Churchill, Fulbright, Truman, Goldwater, & Udall UK Scholarships