The Meta-Learning Problem & Black-Box Meta-Learning CS 330

Logistics Homework 1 posted today, due Wednesday, September 30 Project guidelines will be posted by tomorrow.

Plan for Today Transfer Learning - Problem formulation - Fine-tuning Meta-Learning - Problem formulation } - General recipe of meta-learning algorithms Topic of Homework 1! - Black-box adaptation approaches - Case study of GPT-3 (time-permitting) Goals for by the end of lecture : - Di ff erences between multi-task learning, transfer learning, and meta-learning problems - Basics of transfer learning via fi ne-tuning - Training set-up for few-shot meta-learning algorithms - How to implement black-box meta-learning techniques

Multi-Task Learning vs. Transfer Learning Multi-Task Learning Transfer Learning 𝒰 1 , ⋯ , 𝒰 T 𝒰 b 𝒰 a Solve multiple tasks at once. Solve target task after solving source task by transferring knowledge learned from 𝒰 a T ∑ ℒ i ( θ , 𝒠 i ) min 𝒠 a Key assumption: Cannot access data during transfer. θ i =1 Transfer learning is a valid solution to multi-task learning. (but not vice versa) Question : What are some problems/applications where transfer learning might make sense? (answer in chat or raise hand) when you don’t care about solving 𝒠 a when is very large 𝒰 a 𝒰 b & simultaneously 𝒠 a (don’t want to retain & retrain on ) 4

Transfer learning via fine-tuning Parameters pre-trained on 𝒠 a φ θ � α r θ L ( θ , D tr ) training data for new task 𝒰 b (typically for many gradient steps) Some common prac6ces What makes ImageNet good for transfer learning? Huh, Agrawal, Efros. ‘16 - Fine-tune with a smaller learning rate Where do you get the pre-trained parameters? - Smaller learning rate for earlier layers - Freeze earlier layers, gradually unfreeze - ImageNet classifica8on - Reini8alize last layer - Models trained on large language corpora (BERT, LMs) - Search over hyperparameters via cross-val - Other unsupervised learning techniques - Architecture choices maMer (e.g. ResNets) - Whatever large, diverse dataset you might have Pre-trained models oOen available online. 5

Universal Language Model Fine-Tuning for Text Classifica6on . Howard, Ruder. ‘18 Fine-tuning doesn’t work well with small target task datasets This is where meta-learning can help. 6

Plan for Today Transfer Learning - Problem formulation - Fine-tuning Meta-Learning - Problem formulation - General recipe of meta-learning algorithms - Black-box adaptation approaches - Case study of GPT-3 (time-permitting)

The Meta-Learning Problem Statement (that we will consider in this class)

Two ways to view meta-learning algorithms Mechanistic view Probabilistic view ➢ Deep network that can read in an entire ➢ Extract prior knowledge from a set of tasks dataset and make predictions for new that allows efficient learning of new tasks ➢ Learning a new task uses this prior and (small) datapoints ➢ Training this network uses a meta-dataset, training set to infer most likely posterior which itself consists of many datasets, each parameters for a different task Today : Focus primarily on the mechanistic view. (Bayes will come back later)

How does meta-learning work? An example. Given 1 example of 5 classes: Classify new examples test set training data

How does meta-learning work? An example. training meta-training classes … … Given 1 example of 5 classes: Classify new examples meta-testing T test test set training data any ML regression , language genera6on , skill learning , Can replace image classifica8on with: problem

The Meta-Learning Problem 𝒰 1 , …, 𝒰 n Given data from , quickly solve new task 𝒰 test Key assumption : meta-training tasks and meta-test task drawn i.i.d. from same task distribution 𝒰 1 , …, 𝒰 n ∼ p ( 𝒰 ) 𝒰 j ∼ p ( 𝒰 ) , Like before, tasks must share structure. What do the tasks correspond to? - recognizing handwritten digits from di ff erent languages (see homework 1!) - spam fi lter for di ff erent users - classifying species in di ff erent regions of the world - a robot performing di ff erent tasks How many tasks do you need? The more the better. (analogous to more data in ML)

Some terminology D tr task training set “support set” D test task test dataset i i “query set” k-shot learning : learning with k examples per class N-way classification : choosing between N classes (or k examples total for regression) Question : What are k and N for the above example? (answer in chat)

Problem Settings Recap Multi-Task Learning Transfer Learning 𝒰 1 , ⋯ , 𝒰 T 𝒰 b 𝒰 a Solve multiple tasks at once. Solve target task after solving source task T by transferring knowledge learned from 𝒰 a ∑ min ℒ i ( θ , 𝒠 i ) θ i =1 The Meta-Learning Problem 𝒰 1 , …, 𝒰 n Given data from , quickly solve new task 𝒰 test In transfer learning and meta-learning: generally impractical to access prior tasks In all settings: tasks must share structure.

General recipe How to evaluate a meta-learning algorithm the Omniglot dataset Lake et al. Science 2015 1623 characters from 50 different alphabets many classes , few examples the “transpose” of MNIST … sta8s8cs more reflec8ve of the real world 20 instances of each character Proposes both few-shot discrimina6ve & few-shot genera6ve problems Ini8al few-shot learning approaches w/ Bayesian models, non-parametrics Fei-Fei et al. ‘03 Lake et al. ‘11 Salakhutdinov et al. ‘12 Lake et al. ‘13 Other datasets used for few-shot image recogni6on : 8eredImageNet, CIFAR, CUB, CelebA, others Other benchmarks: molecular property predic8on (Ngyugen et al. ’20), object pose predic8on (Yin et al. ICLR ’20)

<latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> Another View on the Meta-Learning Problem Supervised Learning: Inputs: Outputs: Data: Meta Supervised Learning: D tr Inputs: Outputs: Data: { Why is this view useful? Reduces the meta-learning problem to the design & optimization of h. Finn. Learning to Learn with Gradients . PhD Thesis 2018

General recipe How to design a meta-learning algorithm 1. Choose a form of 2. Choose how to op8mize w.r.t. max-likelihood objec8ve using meta-training data θ meta-parameters

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 - PowerPoint PPT Presentation

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today, due Wednesday, September 30 Project guidelines will be posted by tomorrow. Plan for Today Transfer Learning - Problem formulation - Fine-tuning

Paradoxes in Probability How probability continues to amuse me! Let's play a game! Box A Box B

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

A recipe for black box functors Maru Sarazola and Brendan Fong What is a black box functor? In

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Kid s Box American English Level 1 Presentation Plus: Kid s Box American English Kid s Box

Flux Box Flux Box A concept by Flux Laboratory Flux box : concept Flux box : concept What is Flux

[7] Gaussian Elimination Starting to peek inside the black box So far sol ve( A, b) is a black

Side Channel Analysis & Countermeasures Begl Bilgin 27 Dec. 2014 - IAM Alumni Meeting

PRIVATE EVENTS PrivateEvents@ACL-LIVE.com (512)404-1318 ACL LIVE: A Black Box for events

Make sure we can query black box algorithms

Efficient Black-Box Combinatorial Optimization Hamid Dadkhahi Karthikeyan Shanmugam Jesus Rios

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Black Hole Thermodynamics Robert M. Wald I. Black Holes; Event Horizons and Killing Horizons II.

Red-Black Trees Binary Search Trees with O(log n) Worst-Case Time per Operation The Red-Black

5 Rules 1 Red Black Tree Properties - A 1. Every Node Is Either RED or BLACK 2. Every NILL Node

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Introduction In this presentation, we will follow arXiv:quant-ph/0210077v1 by Aharnov and Naveh.

Software Engineering with Fusion and UML Prof.Dr. Bruce W. Watson bruce@bruce-watson.com

Statecharts for the many: Statecharts for the many: Algebraic State Algebraic State Transition

How Graphs and Java make GraphHopper efficient and fast By Peter @timetabling Berlin Buzzwords,

Introduction Copy Raising: Copy raising is a fascinating phenomenon that tests the limits

Causality in Biomedicine Lecture Series: Lecture 2 Ava Khamseh (Biomedical AI Lab) IGMM &

Independent and conditionally independent counterfactuals Marcin Wolski European Investment Bank

Add usability to y o r ski lm set! Note to ourselves: Do not use this fancy but unusable font