the meta learning problem black box meta learning
play

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 - PowerPoint PPT Presentation

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today, due Wednesday, September 30 Project guidelines will be posted by tomorrow. Plan for Today Transfer Learning - Problem formulation - Fine-tuning


  1. The Meta-Learning Problem & Black-Box Meta-Learning CS 330

  2. Logistics Homework 1 posted today, due Wednesday, September 30 Project guidelines will be posted by tomorrow.

  3. Plan for Today Transfer Learning - Problem formulation - Fine-tuning Meta-Learning - Problem formulation } - General recipe of meta-learning algorithms Topic of Homework 1! - Black-box adaptation approaches - Case study of GPT-3 (time-permitting) Goals for by the end of lecture : - Di ff erences between multi-task learning, transfer learning, and meta-learning problems - Basics of transfer learning via fi ne-tuning - Training set-up for few-shot meta-learning algorithms - How to implement black-box meta-learning techniques

  4. Multi-Task Learning vs. Transfer Learning Multi-Task Learning Transfer Learning 𝒰 1 , ⋯ , 𝒰 T 𝒰 b 𝒰 a Solve multiple tasks at once. Solve target task after solving source task by transferring knowledge learned from 𝒰 a T ∑ ℒ i ( θ , 𝒠 i ) min 𝒠 a Key assumption: Cannot access data during transfer. θ i =1 Transfer learning is a valid solution to multi-task learning. (but not vice versa) Question : What are some problems/applications where transfer learning might make sense? (answer in chat or raise hand) when you don’t care about solving 𝒠 a when is very large 𝒰 a 𝒰 b & simultaneously 𝒠 a (don’t want to retain & retrain on ) 4

  5. Transfer learning via fine-tuning Parameters pre-trained on 𝒠 a φ θ � α r θ L ( θ , D tr ) training data for new task 𝒰 b (typically for many gradient steps) Some common prac6ces What makes ImageNet good for transfer learning? Huh, Agrawal, Efros. ‘16 - Fine-tune with a smaller learning rate Where do you get the pre-trained parameters? - Smaller learning rate for earlier layers - Freeze earlier layers, gradually unfreeze - ImageNet classifica8on - Reini8alize last layer - Models trained on large language corpora (BERT, LMs) - Search over hyperparameters via cross-val - Other unsupervised learning techniques - Architecture choices maMer (e.g. ResNets) - Whatever large, diverse dataset you might have Pre-trained models oOen available online. 5

  6. Universal Language Model Fine-Tuning for Text Classifica6on . Howard, Ruder. ‘18 Fine-tuning doesn’t work well with small target task datasets This is where meta-learning can help. 6

  7. Plan for Today Transfer Learning - Problem formulation - Fine-tuning Meta-Learning - Problem formulation - General recipe of meta-learning algorithms - Black-box adaptation approaches - Case study of GPT-3 (time-permitting)

  8. The Meta-Learning Problem Statement (that we will consider in this class)

  9. Two ways to view meta-learning algorithms Mechanistic view Probabilistic view ➢ Deep network that can read in an entire ➢ Extract prior knowledge from a set of tasks dataset and make predictions for new that allows efficient learning of new tasks ➢ Learning a new task uses this prior and (small) datapoints ➢ Training this network uses a meta-dataset, training set to infer most likely posterior which itself consists of many datasets, each parameters for a different task Today : Focus primarily on the mechanistic view. (Bayes will come back later)

  10. How does meta-learning work? An example. Given 1 example of 5 classes: Classify new examples test set training data

  11. How does meta-learning work? An example. training meta-training classes … … Given 1 example of 5 classes: Classify new examples meta-testing T test test set training data any ML regression , language genera6on , skill learning , Can replace image classifica8on with: problem

  12. The Meta-Learning Problem 𝒰 1 , …, 𝒰 n Given data from , quickly solve new task 𝒰 test Key assumption : meta-training tasks and meta-test task drawn i.i.d. from same task distribution 𝒰 1 , …, 𝒰 n ∼ p ( 𝒰 ) 𝒰 j ∼ p ( 𝒰 ) , Like before, tasks must share structure. What do the tasks correspond to? - recognizing handwritten digits from di ff erent languages (see homework 1!) - spam fi lter for di ff erent users - classifying species in di ff erent regions of the world - a robot performing di ff erent tasks How many tasks do you need? The more the better. (analogous to more data in ML)

  13. Some terminology D tr task training set “support set” D test task test dataset i i “query set” k-shot learning : learning with k examples per class N-way classification : choosing between N classes (or k examples total for regression) Question : What are k and N for the above example? (answer in chat)

  14. Problem Settings Recap Multi-Task Learning Transfer Learning 𝒰 1 , ⋯ , 𝒰 T 𝒰 b 𝒰 a Solve multiple tasks at once. Solve target task after solving source task T by transferring knowledge learned from 𝒰 a ∑ min ℒ i ( θ , 𝒠 i ) θ i =1 The Meta-Learning Problem 𝒰 1 , …, 𝒰 n Given data from , quickly solve new task 𝒰 test In transfer learning and meta-learning: generally impractical to access prior tasks In all settings: tasks must share structure.

  15. Plan for Today Transfer Learning - Problem formulation - Fine-tuning Meta-Learning - Problem formulation - General recipe of meta-learning algorithms - Black-box adaptation approaches - Case study of GPT-3 (time-permitting)

  16. General recipe How to evaluate a meta-learning algorithm the Omniglot dataset Lake et al. Science 2015 1623 characters from 50 different alphabets many classes , few examples the “transpose” of MNIST … sta8s8cs more reflec8ve of the real world 20 instances of each character Proposes both few-shot discrimina6ve & few-shot genera6ve problems Ini8al few-shot learning approaches w/ Bayesian models, non-parametrics Fei-Fei et al. ‘03 Lake et al. ‘11 Salakhutdinov et al. ‘12 Lake et al. ‘13 Other datasets used for few-shot image recogni6on : 8eredImageNet, CIFAR, CUB, CelebA, others Other benchmarks: molecular property predic8on (Ngyugen et al. ’20), object pose predic8on (Yin et al. ICLR ’20)

  17. <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> Another View on the Meta-Learning Problem Supervised Learning: Inputs: Outputs: Data: Meta Supervised Learning: D tr Inputs: Outputs: Data: { Why is this view useful? Reduces the meta-learning problem to the design & optimization of h. Finn. Learning to Learn with Gradients . PhD Thesis 2018

  18. General recipe How to design a meta-learning algorithm 1. Choose a form of 2. Choose how to op8mize w.r.t. max-likelihood objec8ve using meta-training data θ meta-parameters

  19. Plan for Today Transfer Learning - Problem formulation - Fine-tuning Meta-Learning - Problem formulation - General recipe of meta-learning algorithms - Black-box adaptation approaches - Case study of GPT-3 (time-permitting)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend