Whats Wrong with Meta -Learning (and how we might fix it) Sergey - PowerPoint PPT Presentation

What’s Wrong with Meta -Learning (and how we might fix it) Sergey Levine UC Berkeley Google Brain

Yahya, Li, Kalakrishnan, Chebotar , Levine, ‘16

Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke, Levine. QT-Opt: Scalable Deep Reinforcement Learning of Vision-Based Robotic Manipulation Skills

about four hours about four weeks, nonstop people can learn new skills can we transfer past extremely quickly experience in order to how? learn how to learn? we never learn from scratch!

The meta-learning/few-shot learning problem A simpler, model-agnostic , meta-learning method Unsupervised meta-learning

Few-shot learning: problem formulation in pictures image credit: Ravi & Larochelle ‘17

Few-shot learning: problem formulation in equations input (e.g., image) output (e.g., label) training set test label • How to read in training set? • Many options, RNNs can work test input (few shot) training set

Some examples of representations Santoro et al. “Meta -Learning with Memory- Vinyals et al. “Matching Networks for One - Snell et al. “Prototyping Networks for Few - Augmented Neural Networks.” Shot Learning” Shot Learning” …and many many many others!

What kind of algorithm is learned? RNN-based meta-learning test label test input this implements the “learned learning algorithm” • Does it converge? • Kind of? • What does it converge to? • Who knows… • What to do if it’s not good enough? • Nothing…

Let’s step back a bit… is pretraining a type of meta-learning? better features = faster learning of new task!

Model-agnostic meta-learning a general recipe: Chelsea Finn * in general, can take more than one gradient step here ** we often use 4 – 10 steps “meta - loss” for task i Finn et al., “Model -Agnostic Meta- Learning”

What did we just do? Just another computation graph… Can implement with any autodiff package (e.g., TensorFlow)

Why does it work? MAML RNN-based meta-learning test label test input this implements the “learned learning algorithm” • Does it converge? • Does it converge? • Kind of? • Yes (it’s gradient descent…) • What does it converge to? • What does it converge to? • Who knows… • A local optimum (it’s gradient descent…) • What to do if it’s not good enough? • What to do if it’s not good enough? • Nothing… • Keep taking gradient steps (it’s gradient descent…)

Universality Did we lose anything? Universality: meta- learning can learn any “algorithm” Finn & Levine. “Meta - Learning and Universality”

Model-agnostic meta-learning: forward/backward locomotion after 1 gradient step after 1 gradient step after MAML training (forward reward) (backward reward)

Related work Ravi & Larochelle. “Optimization as Andrychowicz et al. “Learning to learn by Maclaurin et al. “Gradient -based Li & Malik. “Learning to optimize” a model for few- shot learning” gradient descent by gradient descent.” hyperparameter optimization” …and many many many others!

Follow-up work MiniImagenet few-shot benchmark: 5-shot 5-way Finn et al. ‘17: 63.11% …and the results keep getting better Li et al. ‘17: 64.03% Kim et al. ‘18 ( AutoMeta): 76.29%

Let’s Talk about Meta -Overfitting • Meta learning requires task distributions • When there are too few meta- training tasks, we can meta- after MAML training after 1 gradient step overfit • Specifying task distributions is hard, especially for meta-RL! • Can we propose tasks automatically ?

A General Recipe for Unsupervised Meta-RL Fast Unsupervised Meta-learned Meta-RL Adaptation Task Acquisition reward-maximizing environment -specific environment policy Unsupervised Meta-RL RL algorithm reward function Ben Eysenbach Abhishek Gupta Chelsea Finn Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning.

Random Task Proposals ◼ Use randomly initialize discriminators for reward functions D → randomly initialized network ◼ Important: Random functions over state space, not random policies

Diversity-Driven Proposals Environment Policy → visit states which are ◼ Action State Discriminator(D) discriminable Discriminator → predict skill Policy(Agent) ◼ from state Skill (z) Predict Skill Task Reward for UML: Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Examples of Acquired Tasks Ant Cheetah Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Does it work? Ant Cheetah 2D Navigation Meta-test performance with rewards Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning.

What about supervised learning?

Can we meta-train on only unlabeled images? task proposals meta-learning unsupervised learning MAML training test Class 1 images images Class 1 Class 2 Class 2 training test images images Class 1 Kyle Hsu Chelsea Finn Class 2 But... does it outperform unsupervised learning? Hsu, Levine, Finn. Unsupervised Learning via Meta-Learning.

Results: unsupervised meta-learning task proposals meta-learning unsupervised learning a few choices: Clustering to mini ImageNet: 5 shot, 5 way Automatically Construct BiGAN – Donahue et al. ’17 method accuracy Tasks for Unsupervised DeepCluster – Caron et al. ‘18 Meta-Learning (CACTUs) MAML with labels 62.13% BiGAN kNN 31.10% BiGAN logistic 33.91% no true BiGAN MLP + dropout 29.06% labels BiGAN cluster matching 29.49% at all ! BiGAN CACTUs 51.28% DeepCluster CACTUs 53.97% Same story across: • 3 different embedding methods • 4 datasets (Omniglot, miniImageNet, CelebA, MNIST) Hsu, Levine, Finn. Unsupervised Learning via Meta-Learning.

What’s next? Probabilistic meta-learning: learn to sample multiple hypotheses Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning. 2018. Meta-learning online learning & continual learning Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning: Continual Adaptation via Model-Based RL. 2018. Meta-learning to interpret weak supervision Instruction: Move blue triangle to green goal. and natural language Yu*, Finn*, Xie, Dasari, Abbeel, Levine. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning. 2018. Co-Reyes, Gupta, Sanjeev, Altieri, DeNero, Abbeel, Levine. Meta-Learning Correction 1: Enter Correction 2: Enter the Language-Guided Policy Learning. 2018. the blue room. red room.

RAIL website: http://rail.eecs.berkeley.edu source code: http://rail.eecs.berkeley.edu/code.html Robotic AI & Learning Lab

Whats Wrong with Meta -Learning (and how we might fix it) Sergey - PowerPoint PPT Presentation

Whats Wrong with Meta -Learning (and how we might fix it) Sergey Levine UC Berkeley Google Brain Yahya, Li, Kalakrishnan, Chebotar , Levine, 16 Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke,

Whats wrong with the What s wrong with the What s wrong with the Whats wrong with the

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Part 3 Terroir is fragile Can be lost through: High yields Wrong grape varieties in wrong place

Why I Was Wrong About TypeScript TJ VanToll TypeScript TypeScript TypeScript Why I Was Wrong

Defences Structure of the Courts What is a Crime? a public wrong Wrong committed

V2 28 May 2015 What Is Wrong With Stat 101? 1 2 V2 2015 USCOTS Whats Wrong with Stat 101?

There is nothing wrong with having friends! There is nothing wrong with having friends.

Why I Was Wrong About TypeScript TJ VanToll TypeScript TypeScript TypeScript Why I Was Wrong

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

Multi-contact Locomotion and Percep- tion on the Humanoid Robot HRP-2 J. Carpentier C.

Bipedal Locomotion by Pneumatic Artificial Muscles Koh Hosoda Osaka University The aim Key

Overview Overview Movement Paradigms System Architecture Presentation and

Introduction to Robotics Ph.D. Antonio Marin-Hernandez Artificial Intelligence Research Center

Geometric Methods for Modelling and Control of Shape-Actuated Underwater Vehicles Kristi A.

A Distributed and Stochastic Algorithmic Framework for Active Matter Sarah Cannon 1 Joshua Daymude

Trust Region Policy Optimization (TRPO) John Schulman, Sergey Levine, Philipp Moritz, Michael I.

Learning Novel Policies For Tasks Yunbo Zhang, Wenhao Yu, Greg Turk Motivation Want more than

Whats Wrong with Meta -Learning (and how we might fix it) Sergey - PowerPoint PPT Presentation

Whats Wrong with Meta -Learning (and how we might fix it) Sergey Levine UC Berkeley Google Brain Yahya, Li, Kalakrishnan, Chebotar , Levine, 16 Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke,

Whats wrong with the What s wrong with the What s wrong with the Whats wrong with the

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem &amp; Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Part 3 Terroir is fragile Can be lost through: High yields Wrong grape varieties in wrong place

Why I Was Wrong About TypeScript TJ VanToll TypeScript TypeScript TypeScript Why I Was Wrong

Defences Structure of the Courts What is a Crime? a public wrong Wrong committed

V2 28 May 2015 What Is Wrong With Stat 101? 1 2 V2 2015 USCOTS Whats Wrong with Stat 101?

There is nothing wrong with having friends! There is nothing wrong with having friends.

Why I Was Wrong About TypeScript TJ VanToll TypeScript TypeScript TypeScript Why I Was Wrong

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers &amp; References META-LRA Kft. 8400 Ajka,

Multi-contact Locomotion and Percep- tion on the Humanoid Robot HRP-2 J. Carpentier C.

Bipedal Locomotion by Pneumatic Artificial Muscles Koh Hosoda Osaka University The aim Key

Overview Overview Movement Paradigms System Architecture Presentation and

Introduction to Robotics Ph.D. Antonio Marin-Hernandez Artificial Intelligence Research Center

Geometric Methods for Modelling and Control of Shape-Actuated Underwater Vehicles Kristi A.

A Distributed and Stochastic Algorithmic Framework for Active Matter Sarah Cannon 1 Joshua Daymude

Trust Region Policy Optimization (TRPO) John Schulman, Sergey Levine, Philipp Moritz, Michael I.

Learning Novel Policies For Tasks Yunbo Zhang, Wenhao Yu, Greg Turk Motivation Want more than

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,