Large-Scale Self-Supervised Robotic Learning Chelsea Finn In - PowerPoint PPT Presentation

Large-Scale Self-Supervised Robotic Learning Chelsea Finn In collaboration with Sergey Levine and Ian Goodfellow

Generalization in Reinforcement Learning to object instances to tasks and environments Oh et al. ‘16 Pinto & Gupta ‘16 Levine et al. ‘16 Mnih et al. ‘15

Generalization in Reinforcement Learning need data scale up First lesson : human supervision doesn’t scale (providing rewards, reseting the environment, etc.)

Generalization in Reinforcement Learning need data scale up where does the supervision come from? self-supervision most deep RL algorithms learn a single-purpose policy learn general-purpose model Evaluating unsupervised methods? lacking task-driven metrics for unsupervised learning

Data collection - 50k sequences (1M+ frames) test set with novel objects data publicly available for download sites.google.com/site/brainrobotdata

Train predictive model convolutional LSTMs action-conditioned stochastic fm ow prediction - feed back model’s predictions for multi-frame prediction - trained with l 2 loss

Train predictive model stochastic fm ow prediction ^ ^ I t I t+1 Stacked ConvLSTM masks transform parameters . * transformed images

Train predictive model convolutional LSTMs action-conditioned stochastic fm ow prediction evaluate on held-out objects Are these predictions good?

Train predictive model Finn et al., ‘16 Kalchbrenner et al., ‘16 Are these predictions good? accurate? useful?

What is prediction good for? action magnitude: 0x 0.5x 1x 1.5x

Visual MPC: Planning with Visual Foresight 1. Sample N potential action sequences 2. Predict the future for each action sequence 3. Pick best future & execute corresponding action 4. Repeat 1-3 to replan in real time

Which future is the best one? Specify goal by selecting where pixels should move. Select future with maximal probability of pixels reaching their respective goals.

0x We can predict how pixels will move based on the robot’s actions 0.5x 1x 1.5x output is the mean of a probability distribution over pixel motion predictions

How it works “Type a quote here.” –Johnny Appleseed

Does it work? - evaluation on short pushes of novel objects - translation & rotation Only human involvement during training is: programming initial motions and providing objects to play with.

Outperforms naive baselines “Type a quote here.” –Johnny Appleseed

Takeaways Bene fj ts of this approach - learn for a wide variety of tasks train visual foresight - scalable - requires minimal human involvement model - a good way to evaluate video prediction models unlabeled video experience Limitations indicated goal - can’t [yet] learn complex skills - compute-intensive at test time - some planning methods susceptible to adversarial examples

Future challenges in large-scale self-supervised learning better predictive models task-driven exploration, attention long-term planning - hierarchy - stochasticity learn visual reward functions

Collaborators Thanks to… Vincent Vanhoucke Peter Pastor Ethan Holly Jon Barron Ian Goodfellow Sergey Levine Bibliography Finn, C., Goodfellow, I., & Levine, S. Unsupervised Learning for Physical Interaction through Video Prediction . NIPS 2016 Finn, C. & Levine, S. Deep Visual Foresight for Planning Robot Motion . Under Review, arXiv 2016.

Questions? cb fj nn@eecs.berkeley.edu All data and code linked at: people.eecs.berkeley.edu/~cb fj nn

Collaborators Thanks to… Vincent Vanhoucke Peter Pastor Ethan Holly Jon Barron Ian Goodfellow Sergey Levine All data and code linked at: people.eecs.berkeley.edu/~cb fj nn Questions? cb fj nn@eecs.berkeley.edu

Thanks! Takeaway : Acquiring a cost function is important! (and challenging)

Sources of failure : model mispredictions “Type a quote here.” - more compute needed - occlusions - pixel tracking - –Johnny Appleseed

This is just the beginning… Collecting data with a purpose. Can we design the right model? stochastic? - longer sequences? - hierarchical? - deeper? - Can we handle long-term planning?

Large-Scale Self-Supervised Robotic Learning Chelsea Finn In - PowerPoint PPT Presentation

Large-Scale Self-Supervised Robotic Learning Chelsea Finn In collaboration with Sergey Levine and Ian Goodfellow Generalization in Reinforcement Learning to object instances to tasks and environments Oh et al. 16 Pinto & Gupta 16

Self-Supervised Deep Learning for Robotic Grasping Lars Berscheid | KUKA Roboter GmbH | 10/10/2017

Self-Supervised Feature Learning by Learning to Spot Artifacts Wonbin Kim Self-Supervised

Learning in Robotic Systems Robotic Agents @ Allegheny College Janyl Jumadinova November 27,

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Learning frameworks Self-supervised learning: (Auto)encoder networks Supervised learning Network

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Current State of Unsupervised Deep Learning William Falcon, PhD Student AGENDA AGENDA

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Large-Scale Self-supervised Robot Learning with GPU-enabled Video-Prediction Models Frederik

Localization and Mapping Chapter 25.3 Chapter 25.3 1 Sensors Range finders: sonar (land,

Computer Vision for Mobile Robots in GPS Denied Areas Michael Berli, 28th of April 2015

1 Spatiotemporal Query Service Design Goals of MobiQuery Allows a mobile user to periodically

The SLAM Problem CSE-571 A robot is exploring an 2 unknown, static environment. Robotics

Deep Learning beyond Classification Cees Snoek, UvA Efstratios Gavves, UvA Laurens van de

In the name of Allah the compassionate, the merciful Digital Video Systems S. Kasaei S. Kasaei

Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu

Computational Seismology: An Introduction Li Zhao Institute of Earth Sciences Academia Sinica,