The Big Problem with Meta-Learning and How Bayesians Can Fix It - PowerPoint PPT Presentation

The Big Problem with Meta-Learning   and How Bayesians Can Fix It Chelsea Finn Stanford

training data test datapoint Braque Cezanne By Braque or Cezanne?

How did you accomplish this? Through previous experience.

How might you get a machine to accomplish this task? Modeling image formaKon Geometry Fewer human priors, more data -driven priors SIFT features, HOG features + SVM Greater success. Fine-tuning from ImageNet features Domain adaptaKon from other painters ??? Can we explicitly learn priors from previous experience that lead to efficient downstream learning? Can we learn to learn?

Outline 1. Brief overview of meta-learning 2. The problem: peculiar, lesser-known, yet ubiquitous 3. Steps towards a solution

How does meta-learning work? An example. Given 1 example of 5 classes: Classify new examples test set training data

How does meta-learning work? An example. training meta-training classes … … Given 1 example of 5 classes: Classify new examples meta-testing T test test set training data

How does meta-learning work? One approach : parameterize learner by neural network 4 0 1 2 3 4 y ts = f ( 𝒠 tr , x ts ; θ ) (Hochreiter et al. ’91, Santoro et al. ’16, many others)

How does meta-learning work? Another approach : embed optimization inside the learning process 4 r θ L y ts = f ( 𝒠 tr , x ts ; θ ) 0 1 2 3 4 (Maclaurin et al. ’15, Finn et al. ’17, many others)

The Bayesian perspective p ( ϕ | θ ) meta-learning <~> learning priors from data (Grant et al. ’18, Gordon et al. ’18, many others)

Outline 1. Brief overview of meta-learning 2. The problem: peculiar, lesser-known, yet ubiquitous 3. First steps towards a solution

How we construct tasks for meta-learning. 𝒠 tr x ts 0 1 2 3 4 2 4 0 1 2 3 4 3 1 T 3 0 1 2 3 4 4 3 Randomly assign class labels to image classes for each task —> Tasks are mutually exclusive . Algorithms must use training data to infer label ordering.

What if label order is consistent? 𝒠 tr x ts 0 1 2 3 4 2 4 0 1 2 3 4 3 1 T 3 0 2 3 4 1 1 2 Tasks are non-mutually exclusive : a single function can solve all tasks. The network can simply learn to classify inputs, irrespective of 𝒠 tr

The network can simply learn to classify inputs, irrespective of 𝒠 tr 4 1 2 3 4 0 4 r θ L 0 1 2 3 4

What if label order is consistent? 𝒠 tr x ts 0 1 2 3 4 2 4 0 1 2 3 4 3 1 T 3 0 2 3 4 1 1 2 For new image classes: can’t make predictions w/o 𝒠 tr T test training data test set

Is this a problem? - No : for image classi fi cation, we can just shu ffl e labels* - No , if we see the same image classes as training (& don’t need to adapt at meta-test time) - But, yes , if we want to be able to adapt with data for new tasks.

Another example “hammer” “close drawer” “stack” meta-training … T 50 “close box” T test If you tell the robot the task goal, the robot can ignore the trials. T Yu, D Quillen, Z He, R Julian, K Hausman, C Finn, S Levine. Meta-World . CoRL ‘19

Another example Model can memorize the canonical orientations of the training objects. Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ‘19

Can we do something about it?

If tasks mutually exclusive : single function cannot solve all tasks (i.e. due to label shu ffl ing, hiding information) If tasks are non - mutually exclusive : single function can solve all tasks y ts = f θ ( D tr multiple solutions to the i , x ts ) meta-learning problem 𝒠 tr One solution: θ memorize canonical pose info in & ignore i 𝒠 tr Another solution: θ carry no info about canonical pose in , acquire from i An entire spectrum of solutions based on how information fl ows. Suggests a potential approach: control information fl ow. Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ‘19

If tasks are non - mutually exclusive : single function can solve all tasks y ts = f θ ( D tr multiple solutions to the i , x ts ) meta-learning problem 𝒠 tr One solution: θ memorize canonical pose info in & ignore i 𝒠 tr Another solution: θ carry no info about canonical pose in , acquire from i An entire spectrum of solutions based on how information fl ows. one option: max I ( ̂ y ts , 𝒠 tr | x ts ) Meta-regularization minimize meta-training loss + information in θ ℒ ( θ , 𝒠 meta − train ) + β D KL ( q ( θ ; θ μ , θ σ ) ∥ p ( θ )) θ Places precedence on using information from over storing info in . 𝒠 tr Can combine with your favorite meta-learning algorithm. Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ‘19

Omniglot without label shu ffl ing: “non-mutually-exclusive” Omniglot On pose prediction task: (and it’s not just as simple as standard regularization) TAML: Jamal & Qi. Task-Agnostic Meta-Learning for Few-Shot Learning . CVPR ‘19 Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ‘19

Does meta-regularization lead to better generalization? P ( θ ) θ Let be an arbitrary distribution over that doesn’t depend on the meta-training data. P ( θ ) = 𝒪 ( θ ; 0 , I ) (e.g. ) 1 − δ For MAML, with probability at least , ∀ θ μ , θ σ error on the meta-regularization generalization meta-training set error β With a Taylor expansion of the RHS + a particular value of —> recover the MR MAML objective . Proof: draws heavily on Amit & Meier ‘18 Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ‘19

CS330: Deep Multi-Task & Meta-Learning Want to Learn More? Lecture videos coming out soon! Working on Meta-RL? Try out the Meta-World benchmark Collaborators T Yu, D Quillen, Z He, R Julian, K Hausman, C Finn, S Levine. Meta-World . CoRL ‘19 Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ‘19

The Big Problem with Meta-Learning and How Bayesians Can Fix It - PowerPoint PPT Presentation

The Big Problem with Meta-Learning and How Bayesians Can Fix It Chelsea Finn Stanford training data test datapoint Braque Cezanne By Braque or Cezanne? How did you accomplish this? Through previous experience. How might you get a

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Bayesians Can Learn From Old Data William H. Jefferys University of Texas at Austin University

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

META-SHARE META SHARE the Open Resource Exchange Facility Stelios Piperidis ILSP-Athena RC,

CS 671 Automated Reasoning Meta Reasoning Object Level versus Meta Level Object level:

Upping the ante: the full Lambek Calculus Robert Levine Ohio State University levine.1@osu.edu

an Open Source RISC-V Microarchitecture CARRV 2019 June 22 nd , 2019 - Phoenix, Arizona Abraham

Semantic Spaces for Zero-Shot Behaviour Analysis Xun Xu Computer Vision and Interactive Media

Lossless or Quantized Boosting with Integer Arithmetic Richard Nock Robert C. Williamson

Hammering towards Qed Cezary Kaliszyk Josef Urban University of Innsbruck Radboud University

Decomposition of effect algebras and the Hammer-Sobczyk theorem A report of the joint paper by

Lecture 4 (Part 3): Hierarchical 3D Models Prof Emmanuel Agu Computer Science Dept. Worcester

Fast Discriminative Component Analysis for Comparing Examples Jaakko Peltonen 1 , Jacob Goldberger

Sambuz

Useful Links

Newsletter

Mail Us

The Big Problem with Meta-Learning and How Bayesians Can Fix It - PowerPoint PPT Presentation

The Big Problem with Meta-Learning and How Bayesians Can Fix It Chelsea Finn Stanford training data test datapoint Braque Cezanne By Braque or Cezanne? How did you accomplish this? Through previous experience. How might you get a

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Bayesians Can Learn From Old Data William H. Jefferys University of Texas at Austin University

The Meta-Learning Problem &amp; Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers &amp; References META-LRA Kft. 8400 Ajka,

META-SHARE META SHARE the Open Resource Exchange Facility Stelios Piperidis ILSP-Athena RC,

CS 671 Automated Reasoning Meta Reasoning Object Level versus Meta Level Object level:

Upping the ante: the full Lambek Calculus Robert Levine Ohio State University levine.1@osu.edu

an Open Source RISC-V Microarchitecture CARRV 2019 June 22 nd , 2019 - Phoenix, Arizona Abraham

Semantic Spaces for Zero-Shot Behaviour Analysis Xun Xu Computer Vision and Interactive Media

Lossless or Quantized Boosting with Integer Arithmetic Richard Nock Robert C. Williamson

Hammering towards Qed Cezary Kaliszyk Josef Urban University of Innsbruck Radboud University

Decomposition of effect algebras and the Hammer-Sobczyk theorem A report of the joint paper by

Lecture 4 (Part 3): Hierarchical 3D Models Prof Emmanuel Agu Computer Science Dept. Worcester

Fast Discriminative Component Analysis for Comparing Examples Jaakko Peltonen 1 , Jacob Goldberger

Sambuz

Useful Links

Newsletter

Mail Us

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,