Meta Learning Shengchao Liu Background Meta Learning (AKA Learning - PowerPoint PPT Presentation

Meta Learning Shengchao Liu

Background • Meta Learning (AKA Learning to Learn) • A fast-learning algorithm: quickly adapted from the source tasks to the target tasks • Key terminologies • Support Set & Query Set • C-Way K-Shot Learning: C classes and each with K samples • Pre-training & Fine-tuning

Meta-Learning Metric-Based Model-Based Gradient-Based Siamese Meta GNN NN Meta Hyper MANN Networks Networks Relation Prototypical MAML Reptile ANIL Matching Network Networks (FOMAML) Network

1. Metric-Based • Similar ideas to nearest neighborhoods algorithm ∑ , where is the kernel function p θ ( y | x , S ) = k θ ( x , x i ) y i k θ • ( x i , y i ) ∈ S • Siamese Neural Networks for One-shot Image Recognition, ICML 2015 • Learning to Compare: Relation Network for Few-Shot Learning, CVPR 2018 • Matching Network for One-Shot Learning, NIPS 2016 • Prototypical Networks for Few-Shot Learning, NeurIPS 2017 • Few-Shot Learning with Graph Neural Networks, ICLR 2018

Siamese Neural Network • Few-Shot Learning • Twin network • L1-distance as the metric

Siamese Neural Network

Relation Network • Few-Shot Learning • Similar to Siamese Network • Di ff erence: concatenation and CNN as the relation module

Matching Network • S = { x i , y i } k Given a training set (k samples per class): i =1 k k exp[ cosine ( f ( ̂ x ), g ( x i ))] ∑ ∑ Goal: P ( ̂ y | ̂ a ( ̂ x , S ) = x , x i ) y i = y i • ∑ k j =1 exp[ cosine ( f ( ̂ x ), g ( x j ))] i =1 i =1 • Two embedding methods are tested for . f , g • Episodic Training • Support Set (C-Way K-Shot)

̂ Matching Network • Simple Embedding : with some CNN model and f = g • Full Context Embedding : • applies bidirectional LSTM g ( x i ) • f ( ̂ applies attention-LSTM x ) 1. First encodes through CNN to get f ′ ( ̂ x ) 2. Then an attention-LSTM is trained with a read attention over the full support set S h k , c k = LSTM ( f ′ ( ̂ x ), [ h k − 1 , r k − 1 ], c k − 1 ) h k = ̂ h k + f ′ ( ̂ x ) | S | ∑ a ( h k − 1 , g ( x i )) ⋅ g ( x i ) r k = i =1 | S | ∑ a ( h k − 1 , g ( x i )) = exp{ h T exp{ h T where k − 1 g ( x i )}/ k − 1 g ( x j )} j =1 3. Finally , where is # of read. f ( x ) = h K K

Prototypical Network • For each class: • Sample a support set 1 ∑ c k = f ϕ ( x i ) | S k | ( x i , y i ) ∈ S k • Sample a query set exp( − d ( f ϕ ( x ), c k )) p ( y = k | x ) = ∑ k ′ exp( − d ( f ϕ ( x ), c k ′ ))

Prototypical Network

Prototypical Network • When viewed as a clustering algorithm, then the Bregman divergences can achieve the minimum distance to the center point in S ) T ∇ ϕ ( z ′ d ϕ ( z , z ′ ) = ϕ ( z ) − ϕ ( z ′ ) − ( z − z ′ ) • Viewed as the linear regression when the Euclidean distance is used. • Comparison between Matching Network & Prototypical Network: • equal in the one-shot learning, not in the K-shot learning • Matching Network: k k exp[ cosine ( f ( ̂ x ), g ( x i ))] ∑ ∑ P ( ̂ y | ̂ a ( ̂ x ) = x , x i ) y i = y i ∑ k j =1 exp[ cosine ( f ( ̂ x ), g ( x j ))] i =1 i =1 • Prototypical Network: exp( − d ( f ϕ ( x ), c k )) p ( y = k | x ) = ∑ k ′ exp( − d ( f ϕ ( x ), c k ′ ))

Meta GNN

Meta GNN • For the -th layer: k • i = GCN ( x k − 1 ) x k • A k i , j = ϕ ( x k i , x k j ) = MLP ( abs | x k i − x k j | )

Metric-Based • Comments: • Highly depends on the metric function. • Robustness: more troublesome when the new task diverges from the source tasks.

2. Model-Based • Goal: to learn a model f θ • Solution: learning another model to parameterize f θ

2. Model-Based • Goal: to learn a base model f θ • Solution: learning a meta model to parameterize f θ

2. Model-Based • Goal: to learn a base model f θ • Solution: learning a meta model to parameterize f θ • Meta-Learning with Memory-Augmented Neural Networks, ICML 2016 • Meta Networks, ICML 2017 • HyperNetworks, ArXiv 2016

Memory-Augmented Neural Networks (MANN) • Basic idea (Neural Turning Machine): • Store the useful information of the new task using an external memory. • The true label of the last time step is used. • External memory.

Memory-Augmented Neural Networks (MANN) • Example:

Addressing Mechanism • key vector at step t is generated from input , memory matrix at step t is , memory at step t is k t x t M t r t • w r w u w w read weights , usage weights , write weights t t t • Read k t M t ( i ) t ( i ) = softmax ( exp(( ∥ k t ∥∥ M t ( i ) ∥ )) ) w r N ∑ w r r t = t M t ( i ) i =1 • Write (Least Recently Used Access, LRUA) w u t = γ w u t − 1 + w r t + w w t w w t = σ ( α ) w r t − 1 + (1 − σ ( α )) w lu t − 1 t = { 0, if w u t ( i ) > m ( w u t , n ) w ul m ( w u w u , where is the -th smallest element in vector t , n ) n t 1, otherwise M t ( i ) = M t − 1 ( i ) + w w t ( i ) k t , ∀ i

3. Gradient-Based Model-Based: • Goal: to learn a base model f θ • Solution: learning a meta model to parameterize f θ

3. Gradient-Based Model-Based: • Goal: to learn a base model f θ • Solution: learning a meta model to parameterize f θ Gradient-Based: • Goal: to learn a base model f θ • Solution: learning to parameterize without a meta model f θ

3. Gradient-Based • Learning to learn with Gradients • MAML (Model-Agnostic Meta Learning ） & FOMAML, ICML 2017 • Reptile, ArXiv 2018 • ANIL (Almost No Inner Loop), ICLR 2020

MAML • Model-Agnostic Meta-Learning (MAML) • Motivation • find a model parameter that are sensitive to changes in the task • small changes in the parameters can get large improvements

MAML • Outer loop: • Inner loop: • Sample batch of tasks τ i • Sample samples K i ) = ∑ ∑ Meta-object: min ℓ τ i ( f θ ′ ℓ τ i ( f θ − α ∇ θ ℓ τ i ( f θ ) ) • θ τ i ∼ P ( τ ) τ i ∼ P ( τ ) θ = θ − β ∇ θ ∑ i ) = θ − β ∇ θ ∑ SGD: ℓ τ i ( f θ ′ ℓ τ i ( f θ − α ∇ θ ℓ τ i ( f θ ) ) • τ i ∼ p ( τ ) τ i ∼ p ( τ )

FOMAML • Involves a gradient through a gradient: θ = θ − β ∇ θ ∑ ℓ τ i ( f θ − α ∇ θ ℓ τ i ( f θ ) ) τ i ∼ p ( τ ) • First-order approximation, A.K.A. first-order MAML (FOMAML) • Omit the second-order derivatives • Still compute the meta-gradient at the post-update parameter θ ′ i θ = θ − β ∇ θ ∑ ℓ τ i ( f θ ′ i ) • τ i ∼ p ( τ ) • Almost the same performance, but ~33% faster • Notice: this meta-objective is multi-task learning.

MAML • Outer loop: • Inner loop: • Sample batch of tasks τ i • Sample samples K i ) = ∑ ∑ Meta-object: min ℓ τ i ( f θ ′ ℓ τ i ( f θ − α ∇ θ ℓ τ i ( f θ ) ) • θ τ i ∼ P ( τ ) τ i ∼ P ( τ ) θ = θ − β ∇ θ ∑ i ) = θ − β ∇ θ ∑ SGD: ℓ τ i ( f θ ′ ℓ τ i ( f θ − α ∇ θ ℓ τ i ( f θ ) ) • τ i ∼ p ( τ ) τ i ∼ p ( τ )

FOMAML • Outer loop: • Inner loop: • Sample batch of tasks τ i • Sample samples K i ) = ∑ ∑ Meta-object: min ℓ τ i ( f θ ′ ℓ τ i ( f θ − α ∇ θ ℓ τ i ( f θ ) ) • θ τ i ∼ P ( τ ) τ i ∼ P ( τ ) • SGD: θ ′ = θ − α ∇ θ ℓ τ i f ( θ ) θ = θ − β ∇ θ ∑ SGD: ℓ τ i ( f θ ′ i ) • τ i ∼ p ( τ )

Reptile • Same motivation: • pre-training: learn a initialization • fine-tuning: able to quickly be adapted on new tasks

Reptile • For each iteration, do: • Sample task τ • Get the corresponding loss ℓ τ • ˜ θ = U k Compute , with steps of SGD/Adam τ ( θ ) k n θ = θ + ϵ 1 ∑ θ = θ + ϵ (˜ (˜ Update or θ − θ ) θ i − θ ) • n i =1

Reptile • If , Reptile is similar to min 𝔽 τ [ L τ ] k = 1 g Reptile , k =1 = θ − ˜ θ = θ − U τ , A ( θ ) = θ − ( θ − ∇ θ L τ , A ( θ )) = ∇ θ L τ , A ( θ ) • If , Reptile diverges from min 𝔽 τ [ L τ ] k > 1 θ − U τ , A ( θ ) ≠ θ − ( θ − ∇ θ L τ , A ( θ ))

ANIL • ANIL (Almost No Inner Loop) • The reason why MAML works: rapid learning or feature reuse

ANIL • ANIL: Only update the head (last layer) in the inner loop

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning - PowerPoint PPT Presentation

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A fast-learning algorithm: quickly adapted from the source tasks to the target tasks Key terminologies Support Set & Query Set C-Way K-Shot

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

META-SHARE META SHARE the Open Resource Exchange Facility Stelios Piperidis ILSP-Athena RC,

CS 671 Automated Reasoning Meta Reasoning Object Level versus Meta Level Object level:

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng Ou Outline

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong University

Meta-DermDiagnosis: Few-Shot Skin Disease Identification using Meta-Learning Kushagra Mahajan ,

On some Menon designs and related structures Dean Crnkovi c Department of Mathematics

FOTOGRAFIA DIGITAL FOTOGRAFIA DIGITAL Fernando Pereira Fernando Pereira Instituto Superior

Hounslow CCG Annual General Meeting 2019/20 Thank you for joining us. The meeting will begin at

Annual General Meeting 6 October 2020 Welcome to Hammersmith and Fulham CCGs Annual General

Siamese Network & Matching Network for one-shot learning Reference Papers Siamese Neural

Decision Trees: Some exercises 1. Exemplifying how to compute information gains and how to work

SQL Recursion WITH stu that lo oks lik e Datalog rules an SQL query ab out EDB,

Credit Assistance Overview Ma May 24, 24, 201 2016 TIFIA Risk Management and Financial

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning - PowerPoint PPT Presentation

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A fast-learning algorithm: quickly adapted from the source tasks to the target tasks Key terminologies Support Set & Query Set C-Way K-Shot

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem &amp; Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers &amp; References META-LRA Kft. 8400 Ajka,

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

META-SHARE META SHARE the Open Resource Exchange Facility Stelios Piperidis ILSP-Athena RC,

CS 671 Automated Reasoning Meta Reasoning Object Level versus Meta Level Object level:

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng Ou Outline

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong University

Meta-DermDiagnosis: Few-Shot Skin Disease Identification using Meta-Learning Kushagra Mahajan ,

On some Menon designs and related structures Dean Crnkovi c Department of Mathematics

FOTOGRAFIA DIGITAL FOTOGRAFIA DIGITAL Fernando Pereira Fernando Pereira Instituto Superior

Hounslow CCG Annual General Meeting 2019/20 Thank you for joining us. The meeting will begin at

Annual General Meeting 6 October 2020 Welcome to Hammersmith and Fulham CCGs Annual General

Siamese Network &amp; Matching Network for one-shot learning Reference Papers Siamese Neural

Decision Trees: Some exercises 1. Exemplifying how to compute information gains and how to work

SQL Recursion WITH stu that lo oks lik e Datalog rules an SQL query ab out EDB,

Credit Assistance Overview Ma May 24, 24, 201 2016 TIFIA Risk Management and Financial

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

Siamese Network & Matching Network for one-shot learning Reference Papers Siamese Neural