Meta-Learner with Linear Nulling Jun Seo Ph. D. Student Jaekyun - - PowerPoint PPT Presentation

meta learner with linear nulling
SMART_READER_LITE
LIVE PREVIEW

Meta-Learner with Linear Nulling Jun Seo Ph. D. Student Jaekyun - - PowerPoint PPT Presentation

Sung Whan Yoon Postdoctoral Researcher Meta-Learner with Linear Nulling Jun Seo Ph. D. Student Jaekyun Moon Professor An embedding network is combined with a linear transformer. The linear transformer carries out null-space projection


slide-1
SLIDE 1

Support set images

CNN (𝑔

𝜄)

𝚾 = 𝝔0, 𝝔1, 𝝔2, 𝝔3

ത 𝐡0 ത 𝐡1 ത 𝐡2 ത 𝐡3

𝝔𝟏

𝝔𝟐

𝝔𝟑

𝝔𝟒

𝒘𝑙 = 𝝔𝑙

′ − ഥ

𝒉𝑙: error vector

𝐍 ← null 𝒘0, 𝒘1, 𝒘2, 𝒘3

𝝔𝑙

′ = 𝑂𝑑 − 1 𝝔𝑙 − σ𝑘≠𝑙 𝝔𝑘

Meta-Learner with Linear Nulling

Sung Whan Yoon

Postdoctoral Researcher

Jun Seo

  • Ph. D. Student

Jaekyun Moon

Professor

An embedding network is combined with a linear transformer. The linear transformer carries out null-space projection on an alternative classification space. The projection space M is constructed to match the network output with a special set of reference vectors.

Images from Nc novel classes

𝐍

Softmax

CNN (𝑔

𝜄)

𝐡

𝒆(∙,∙)

𝚾

𝐍

Distance measures Classification Linear transformer Embedding network

Embedding space Alternative space M Reference vectors

★: references

slide-2
SLIDE 2

Oboe: Collaborative Filtering for AutoML Initialization

Chengrun Yang, Yuji Akimoto, Dae Won Kim, Madeleine Udell Cornell University

Goal: Select models for a new dataset within time budget. Given: Model performance and runtime on previous datasets. Approach:

I low rank dataset-by-model collaborative filtering matrix I predict model runtime using polynomials I classical experiment design for cold-start I missing entry imputation for model performance prediction

Performance:

I cold-start: high accuracy I model selection: fast and perform well 1 / 1
slide-3
SLIDE 3

Backpropamine: meta-learning with neuromodulated Hebbian plasticity

  • Differentiable plasticity: meta-learning with Hebbian plastic connections

○ Meta-train both the baseline weight and plasticity of each connection to support efficient learning in any episode

  • In nature, plasticity is under real-time control through neuromodulators

○ The brain can decide when and where to be plastic

  • Backpropamine = Differentiable plasticity + neuromodulation

○ Make the rate of plasticity a real-time output of the network ○ During each episode, the network effectively learns by self-modification

  • Results:

○ Solves tasks that non-modulated networks cannot ○ Improves LSTM performance on PTB language modeling task

slide-4
SLIDE 4
slide-5
SLIDE 5

Toward Multimodal Model-Agnostic Meta-Learning

Risto Vuorio1, Shao-Hua Sun2, Hexiang Hu2 & Joseph J. Lim2

University of Southern California2 University of Michigan1

Meta-learner Meta-learner

x y

( (

K

×

Samples

Task Embedding Network

υ

Task Embedding

Modulation Network Modulation Network

Modulation Network

x y θ2

τ2

θ1

τ1

τn

θn

ˆ y

Model-based Meta-learner Gradient-based Meta-learner

The limitation of the MAML family

  • One initialization can be suboptimal for

multimodal task distributions. Multi-Modal MAML 1. Model-based meta-learner computes task embeddings 2. Task embeddings are used to modulate gradient-based meta-learner 3. Gradient-based meta-learner adapts via gradient steps

slide-6
SLIDE 6

1. Finds architecture for CNNs in ~0.25 days 2. Based on the idea of utility of individual nodes. 3. Closely aligns with a theory of human brain ontogenesis.

Fast Neural Architecture Construction using EnvelopeNets

slide-7
SLIDE 7

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle

  • New benchmark for few-shot classification
  • Two-fold approach:

1. Change the data

  • Large-scale
  • Diverse

2. Change the task creation ○ Introduce imbalance ○ Utilize class hierarchy for ImageNet

  • Preliminary results on: baselines, Prototypical Networks, Matching Networks, and MAML.
  • Leveraging data of multiple sources remains an open and interesting research direction!
slide-8
SLIDE 8

Macro Neural Architecture Search Revisited

Hanzhang Hu1, John Langford2, Rich Caruana2, Eric Horvitz2, Debadeepta Dey2

1Carnegie Mellon University, 2Microsoft Research

Cell Search: applies the found template on predefined skeleton. Macro Search: learns all connections and layer types. Key take-away: macro search can be competitive against cell search, even with simple random growing strategies, if the initial model is the same as cell search. Cell Search: the predefined skeleton ensures the simplest cell search can achieve 4.6% error with 0.4M params on CIFAR 10.

slide-9
SLIDE 9

MetaLearn @ NeurIPS 2018 CiML @ NeurIPS 2018

AutoDL challenge design and beta tests

Zhengying Liu∗, Olivier Bousquet, André Elisseeff, Sergio Escalera, Isabelle Guyon,
 Julio Jacques Jr., Albert Clapés, Adrien Pavao, Michèle Sebag, Danny Silver,
 Lisheng Sun-Hosoya, Sébastien Tréguer, Wei-Wei Tu, Yiqi Hu, Jingsong Wang, Quanming Yao

Help Automating Deep Learning

Join the AutoDL challenge!

https://autodl.chalearn.org

slide-10
SLIDE 10

Modular meta-learning in abstract graph networks for combinatorial generalization

Ferran Alet, Maria Bauza, A. Rodriguez, T. Lozano-Perez, L. Kaelbling code&pdf:alet-etal.com Combinatorial generalization: generalizing by reusing neural modules

Nodes tied to entities

Graph Neural Networks Objects Particles Modular meta-learning We introduce: Abstract Graph Networks nodes are not tied to concrete entities Graph Element Networks Joints

OmniPush dataset

slide-11
SLIDE 11

Support set Conv BN FiLM ReLU 4x 4x Max Pool Query set Conv BN FiLM ReLU Max Pool G

Cross-Modulation Networks For Few-Shot Learning

Hugo Prol†, Vincent Dumoulin‡, and Luis Herranz†

† Computer Vision Center, Univ. Autònoma de Barcelona ‡ Google Brain

Key idea: allow support and query examples to interact at each level of abstraction.

☆ Channel-wise affine transformations:

Extending the feature extraction pipeline of Matching Networks:

☆ Subnetwork G predicts the affine parameters and

slide-12
SLIDE 12

Large Margin Meta-Learning for Few-Shot Classification Large Margin Principle

  • Fig. 1: Large margin meta-learning. (a) Classifier trained without

the large margin constraint. (b) Classifier trained with the large margin constraint. (c) Gradient of the triplet loss.

L = Lsoftmax + λ ∗ Llarge-margin

<latexit sha1_base64="hQ7au3k0NzkRQweDozCPaQYTPD0=">ADHicjVFNT9tAEH1xaH0K5RjLxahUtWqkc2lvVRC5cKBA5UIQSJRtN5swor1h9ZrBL8V/gnvXFDXEGceykS/Q/MLo4EhKpdy+s3b+Y97+xEmZK5CYLhvdk5umz2bn8y9evnr9prnwdjtPC81Fh6cq1TsRy4WSiegYaZTYybRgcaREN9pfs/nugdC5TJMtc5SJfszGiRxJzgxRg2Z3uRczs8eZKjcq/9udYFD2jDg0Z6OTMwOq8r/5PcUOQ/Zx0fKFNj8TmXSZVtTxotoJ24JY/DcIatFCvzbR5gR6GSMFRIZAkNYgSGnZxchAmTE9VESpwlJlxeoME/agqoEVTBi92kfU7RbswnF1jN3ak5/UfRqUvp4T5qU6jRh+zf5QvnbNm/eZfO057tiL5R7RUTa7BH7L90k8r/1dleDEb46nqQ1FPmGNsdr10Kdyv25P6drgw5ZMRZPKS8JsydcnLPvtPkrnd7t8zlf7tKy9qY17UFruwpacDhw3FOg+2Vdhi0wx8rdXv9ajn8A5L+EDz/IJVrGMTHfL+iV+4xh/v2DvxTr2z21KvUWsWcW95zcZPbSN</latexit><latexit sha1_base64="hQ7au3k0NzkRQweDozCPaQYTPD0=">ADHicjVFNT9tAEH1xaH0K5RjLxahUtWqkc2lvVRC5cKBA5UIQSJRtN5swor1h9ZrBL8V/gnvXFDXEGceykS/Q/MLo4EhKpdy+s3b+Y97+xEmZK5CYLhvdk5umz2bn8y9evnr9prnwdjtPC81Fh6cq1TsRy4WSiegYaZTYybRgcaREN9pfs/nugdC5TJMtc5SJfszGiRxJzgxRg2Z3uRczs8eZKjcq/9udYFD2jDg0Z6OTMwOq8r/5PcUOQ/Zx0fKFNj8TmXSZVtTxotoJ24JY/DcIatFCvzbR5gR6GSMFRIZAkNYgSGnZxchAmTE9VESpwlJlxeoME/agqoEVTBi92kfU7RbswnF1jN3ak5/UfRqUvp4T5qU6jRh+zf5QvnbNm/eZfO057tiL5R7RUTa7BH7L90k8r/1dleDEb46nqQ1FPmGNsdr10Kdyv25P6drgw5ZMRZPKS8JsydcnLPvtPkrnd7t8zlf7tKy9qY17UFruwpacDhw3FOg+2Vdhi0wx8rdXv9ajn8A5L+EDz/IJVrGMTHfL+iV+4xh/v2DvxTr2z21KvUWsWcW95zcZPbSN</latexit><latexit sha1_base64="hQ7au3k0NzkRQweDozCPaQYTPD0=">ADHicjVFNT9tAEH1xaH0K5RjLxahUtWqkc2lvVRC5cKBA5UIQSJRtN5swor1h9ZrBL8V/gnvXFDXEGceykS/Q/MLo4EhKpdy+s3b+Y97+xEmZK5CYLhvdk5umz2bn8y9evnr9prnwdjtPC81Fh6cq1TsRy4WSiegYaZTYybRgcaREN9pfs/nugdC5TJMtc5SJfszGiRxJzgxRg2Z3uRczs8eZKjcq/9udYFD2jDg0Z6OTMwOq8r/5PcUOQ/Zx0fKFNj8TmXSZVtTxotoJ24JY/DcIatFCvzbR5gR6GSMFRIZAkNYgSGnZxchAmTE9VESpwlJlxeoME/agqoEVTBi92kfU7RbswnF1jN3ak5/UfRqUvp4T5qU6jRh+zf5QvnbNm/eZfO057tiL5R7RUTa7BH7L90k8r/1dleDEb46nqQ1FPmGNsdr10Kdyv25P6drgw5ZMRZPKS8JsydcnLPvtPkrnd7t8zlf7tKy9qY17UFruwpacDhw3FOg+2Vdhi0wx8rdXv9ajn8A5L+EDz/IJVrGMTHfL+iV+4xh/v2DvxTr2z21KvUWsWcW95zcZPbSN</latexit><latexit sha1_base64="hQ7au3k0NzkRQweDozCPaQYTPD0=">ADHicjVFNT9tAEH1xaH0K5RjLxahUtWqkc2lvVRC5cKBA5UIQSJRtN5swor1h9ZrBL8V/gnvXFDXEGceykS/Q/MLo4EhKpdy+s3b+Y97+xEmZK5CYLhvdk5umz2bn8y9evnr9prnwdjtPC81Fh6cq1TsRy4WSiegYaZTYybRgcaREN9pfs/nugdC5TJMtc5SJfszGiRxJzgxRg2Z3uRczs8eZKjcq/9udYFD2jDg0Z6OTMwOq8r/5PcUOQ/Zx0fKFNj8TmXSZVtTxotoJ24JY/DcIatFCvzbR5gR6GSMFRIZAkNYgSGnZxchAmTE9VESpwlJlxeoME/agqoEVTBi92kfU7RbswnF1jN3ak5/UfRqUvp4T5qU6jRh+zf5QvnbNm/eZfO057tiL5R7RUTa7BH7L90k8r/1dleDEb46nqQ1FPmGNsdr10Kdyv25P6drgw5ZMRZPKS8JsydcnLPvtPkrnd7t8zlf7tKy9qY17UFruwpacDhw3FOg+2Vdhi0wx8rdXv9ajn8A5L+EDz/IJVrGMTHfL+iV+4xh/v2DvxTr2z21KvUWsWcW95zcZPbSN</latexit>

One Implementation: Triplet Loss

Llarge-margin = 1 Nt Nt X i=1 ⇥ k fφ(xa i ) fφ(xp i ) k2 2 k fφ(xa i ) fφ(xn i ) k2 2 +m + . <latexit sha1_base64="M4OHnhlbxja8lJ/D5UF5g/HcIvU=">AD/HicnVFNb9QwEJ0fJTw0S0cuUSskFpVu0r2AipUgUXDgViW0rbXYjJ+tkrTpOsB3UKgr/hBs3xJU/wBXOiH8A/4Kxm64oBYFwFOfNm3kvHk9ScaZ0EHx1VtwLFy9dXr3iXb12/cZab/3mniprmdJxWvJSHiREUc4EHWumOT2oJCVFwul+cvjY5PdfUalYKV7o4pOC5ILlrGUaKTidedelNCciYa+rC3VRpwklGP8UEtWcapjXirVel2dwlNpjAqiFynhzdM2biJNj3TDiczpoMCdibdjJ0iZsm2exbiNVF3HDtsN2dhJzmulJVBFJOKfcz+KoWrAN65pkzVEbsxnZHPyOrjaXsng0Gw3+y0ScNdkqIsnyhZ7GW8NIlKIuEiq9iIr5sl+Dl3fkeXGvHwDu/zIOxAH7q1W/a+QARzKCGFGgqgIEAj5kBA4TOBEAKokJtCg5xExGyeQgseamusolhBkD3EPcdo0rECY+OprDrFv3B8JSp9uIuaEuskYvM3+Zr62zYP3k31tOc7Ri/SedVIKthgezfdKeV/6ozvWjI4IHtgWFPlWVMd2nUtbMSf3f+pKo0OFnMFzEvEqVWe3rNvNcr2bu6W2Pw3W2lYE6dbQ3fzSlxwOGv4zwP9kbDMBiGz0f9nUfdqFfhNtyBDZznfdiBJ7ALY0idN85H5Pz2X3tvnXfue9PSlecTnMLziz3w97zgl6</latexit><latexit sha1_base64="M4OHnhlbxja8lJ/D5UF5g/HcIvU=">AD/HicnVFNb9QwEJ0fJTw0S0cuUSskFpVu0r2AipUgUXDgViW0rbXYjJ+tkrTpOsB3UKgr/hBs3xJU/wBXOiH8A/4Kxm64oBYFwFOfNm3kvHk9ScaZ0EHx1VtwLFy9dXr3iXb12/cZab/3mniprmdJxWvJSHiREUc4EHWumOT2oJCVFwul+cvjY5PdfUalYKV7o4pOC5ILlrGUaKTidedelNCciYa+rC3VRpwklGP8UEtWcapjXirVel2dwlNpjAqiFynhzdM2biJNj3TDiczpoMCdibdjJ0iZsm2exbiNVF3HDtsN2dhJzmulJVBFJOKfcz+KoWrAN65pkzVEbsxnZHPyOrjaXsng0Gw3+y0ScNdkqIsnyhZ7GW8NIlKIuEiq9iIr5sl+Dl3fkeXGvHwDu/zIOxAH7q1W/a+QARzKCGFGgqgIEAj5kBA4TOBEAKokJtCg5xExGyeQgseamusolhBkD3EPcdo0rECY+OprDrFv3B8JSp9uIuaEuskYvM3+Zr62zYP3k31tOc7Ri/SedVIKthgezfdKeV/6ozvWjI4IHtgWFPlWVMd2nUtbMSf3f+pKo0OFnMFzEvEqVWe3rNvNcr2bu6W2Pw3W2lYE6dbQ3fzSlxwOGv4zwP9kbDMBiGz0f9nUfdqFfhNtyBDZznfdiBJ7ALY0idN85H5Pz2X3tvnXfue9PSlecTnMLziz3w97zgl6</latexit><latexit sha1_base64="M4OHnhlbxja8lJ/D5UF5g/HcIvU=">AD/HicnVFNb9QwEJ0fJTw0S0cuUSskFpVu0r2AipUgUXDgViW0rbXYjJ+tkrTpOsB3UKgr/hBs3xJU/wBXOiH8A/4Kxm64oBYFwFOfNm3kvHk9ScaZ0EHx1VtwLFy9dXr3iXb12/cZab/3mniprmdJxWvJSHiREUc4EHWumOT2oJCVFwul+cvjY5PdfUalYKV7o4pOC5ILlrGUaKTidedelNCciYa+rC3VRpwklGP8UEtWcapjXirVel2dwlNpjAqiFynhzdM2biJNj3TDiczpoMCdibdjJ0iZsm2exbiNVF3HDtsN2dhJzmulJVBFJOKfcz+KoWrAN65pkzVEbsxnZHPyOrjaXsng0Gw3+y0ScNdkqIsnyhZ7GW8NIlKIuEiq9iIr5sl+Dl3fkeXGvHwDu/zIOxAH7q1W/a+QARzKCGFGgqgIEAj5kBA4TOBEAKokJtCg5xExGyeQgseamusolhBkD3EPcdo0rECY+OprDrFv3B8JSp9uIuaEuskYvM3+Zr62zYP3k31tOc7Ri/SedVIKthgezfdKeV/6ozvWjI4IHtgWFPlWVMd2nUtbMSf3f+pKo0OFnMFzEvEqVWe3rNvNcr2bu6W2Pw3W2lYE6dbQ3fzSlxwOGv4zwP9kbDMBiGz0f9nUfdqFfhNtyBDZznfdiBJ7ALY0idN85H5Pz2X3tvnXfue9PSlecTnMLziz3w97zgl6</latexit><latexit sha1_base64="M4OHnhlbxja8lJ/D5UF5g/HcIvU=">AD/HicnVFNb9QwEJ0fJTw0S0cuUSskFpVu0r2AipUgUXDgViW0rbXYjJ+tkrTpOsB3UKgr/hBs3xJU/wBXOiH8A/4Kxm64oBYFwFOfNm3kvHk9ScaZ0EHx1VtwLFy9dXr3iXb12/cZab/3mniprmdJxWvJSHiREUc4EHWumOT2oJCVFwul+cvjY5PdfUalYKV7o4pOC5ILlrGUaKTidedelNCciYa+rC3VRpwklGP8UEtWcapjXirVel2dwlNpjAqiFynhzdM2biJNj3TDiczpoMCdibdjJ0iZsm2exbiNVF3HDtsN2dhJzmulJVBFJOKfcz+KoWrAN65pkzVEbsxnZHPyOrjaXsng0Gw3+y0ScNdkqIsnyhZ7GW8NIlKIuEiq9iIr5sl+Dl3fkeXGvHwDu/zIOxAH7q1W/a+QARzKCGFGgqgIEAj5kBA4TOBEAKokJtCg5xExGyeQgseamusolhBkD3EPcdo0rECY+OprDrFv3B8JSp9uIuaEuskYvM3+Zr62zYP3k31tOc7Ri/SedVIKthgezfdKeV/6ozvWjI4IHtgWFPlWVMd2nUtbMSf3f+pKo0OFnMFzEvEqVWe3rNvNcr2bu6W2Pw3W2lYE6dbQ3fzSlxwOGv4zwP9kbDMBiGz0f9nUfdqFfhNtyBDZznfdiBJ7ALY0idN85H5Pz2X3tvnXfue9PSlecTnMLziz3w97zgl6</latexit>

Analysis

After rearrangement: The gradient:

Llarge-margin = 1 Nt X xs∈Ss k fφ(xi) fφ(xs) k2 2 X xd∈Sd k fφ(xi) fφ(xd) k2 2 ! + const. <latexit sha1_base64="qHdwKlUdQsrOIxWR+rRGL0k4OA=">AEFniclVHLbtQwFL1peLTDawpLNhYjpKnQjJLZwAapg0LhAbBtJXqEjmJM2PVeWA7qFWU/+BP2LFDbNkhFmx5/AXbkYtHSHAUZLjc+859r03rqTQJgi+eGv+hYuXLq9v9K5cvXb9Rn/z5o4ua5XwWVLKUu3FTHMpCj4zwki+VynO8ljy3fjwsY3vuFKi7J4aY4rfpCzeSEykTCDVLTpTWnM56Jo+OvaUW2vIzQeb9reBs2ZWSRMNk/bqKGH5lGMjXnoxy/omhb8pDQTLGkCdvmWRaKnlmhlTXOeZbcZw1R2kCaGiIC8i3dKSYl0W0WohmfSxNZoldRbp5pXk2hCRmTFP+3s0/+3T8/ZUyXmC7NF7pGTepOy0KYdE1qURZ3HXPUoL9Jlhxw+bV/UHwTjwC2yCsIODKBb07L/GSikUEICNeTAoQCDWAIDjc8+hBAhdwBNMgpRMLFObTQ2NWRwzGLKH+J3jbr9jC9xbT+3UCZ4i8VWoJHAXNSXmKcT2NOLitXO27J+8G+dp73aM/7jzypE1sED2b7pl5r/qbC0GMnjgahBYU+UYW13SudSuK/bm5ExVBh0q5CxOMa4QJ0657DNxGu1qt71lLv7DZVrW7pMut4af9pY4PD8OFfBzmQcBuPw+WSw/agb9TrchjswxHneh214AlOYQeK9875637zv/lv/vf/B/3iSuZ1mlvw2/I/QKpFRNg</latexit><latexit sha1_base64="qHdwKlUdQsrOIxWR+rRGL0k4OA=">AEFniclVHLbtQwFL1peLTDawpLNhYjpKnQjJLZwAapg0LhAbBtJXqEjmJM2PVeWA7qFWU/+BP2LFDbNkhFmx5/AXbkYtHSHAUZLjc+859r03rqTQJgi+eGv+hYuXLq9v9K5cvXb9Rn/z5o4ua5XwWVLKUu3FTHMpCj4zwki+VynO8ljy3fjwsY3vuFKi7J4aY4rfpCzeSEykTCDVLTpTWnM56Jo+OvaUW2vIzQeb9reBs2ZWSRMNk/bqKGH5lGMjXnoxy/omhb8pDQTLGkCdvmWRaKnlmhlTXOeZbcZw1R2kCaGiIC8i3dKSYl0W0WohmfSxNZoldRbp5pXk2hCRmTFP+3s0/+3T8/ZUyXmC7NF7pGTepOy0KYdE1qURZ3HXPUoL9Jlhxw+bV/UHwTjwC2yCsIODKBb07L/GSikUEICNeTAoQCDWAIDjc8+hBAhdwBNMgpRMLFObTQ2NWRwzGLKH+J3jbr9jC9xbT+3UCZ4i8VWoJHAXNSXmKcT2NOLitXO27J+8G+dp73aM/7jzypE1sED2b7pl5r/qbC0GMnjgahBYU+UYW13SudSuK/bm5ExVBh0q5CxOMa4QJ0657DNxGu1qt71lLv7DZVrW7pMut4af9pY4PD8OFfBzmQcBuPw+WSw/agb9TrchjswxHneh214AlOYQeK9875637zv/lv/vf/B/3iSuZ1mlvw2/I/QKpFRNg</latexit><latexit sha1_base64="qHdwKlUdQsrOIxWR+rRGL0k4OA=">AEFniclVHLbtQwFL1peLTDawpLNhYjpKnQjJLZwAapg0LhAbBtJXqEjmJM2PVeWA7qFWU/+BP2LFDbNkhFmx5/AXbkYtHSHAUZLjc+859r03rqTQJgi+eGv+hYuXLq9v9K5cvXb9Rn/z5o4ua5XwWVLKUu3FTHMpCj4zwki+VynO8ljy3fjwsY3vuFKi7J4aY4rfpCzeSEykTCDVLTpTWnM56Jo+OvaUW2vIzQeb9reBs2ZWSRMNk/bqKGH5lGMjXnoxy/omhb8pDQTLGkCdvmWRaKnlmhlTXOeZbcZw1R2kCaGiIC8i3dKSYl0W0WohmfSxNZoldRbp5pXk2hCRmTFP+3s0/+3T8/ZUyXmC7NF7pGTepOy0KYdE1qURZ3HXPUoL9Jlhxw+bV/UHwTjwC2yCsIODKBb07L/GSikUEICNeTAoQCDWAIDjc8+hBAhdwBNMgpRMLFObTQ2NWRwzGLKH+J3jbr9jC9xbT+3UCZ4i8VWoJHAXNSXmKcT2NOLitXO27J+8G+dp73aM/7jzypE1sED2b7pl5r/qbC0GMnjgahBYU+UYW13SudSuK/bm5ExVBh0q5CxOMa4QJ0657DNxGu1qt71lLv7DZVrW7pMut4af9pY4PD8OFfBzmQcBuPw+WSw/agb9TrchjswxHneh214AlOYQeK9875637zv/lv/vf/B/3iSuZ1mlvw2/I/QKpFRNg</latexit><latexit sha1_base64="qHdwKlUdQsrOIxWR+rRGL0k4OA=">AEFniclVHLbtQwFL1peLTDawpLNhYjpKnQjJLZwAapg0LhAbBtJXqEjmJM2PVeWA7qFWU/+BP2LFDbNkhFmx5/AXbkYtHSHAUZLjc+859r03rqTQJgi+eGv+hYuXLq9v9K5cvXb9Rn/z5o4ua5XwWVLKUu3FTHMpCj4zwki+VynO8ljy3fjwsY3vuFKi7J4aY4rfpCzeSEykTCDVLTpTWnM56Jo+OvaUW2vIzQeb9reBs2ZWSRMNk/bqKGH5lGMjXnoxy/omhb8pDQTLGkCdvmWRaKnlmhlTXOeZbcZw1R2kCaGiIC8i3dKSYl0W0WohmfSxNZoldRbp5pXk2hCRmTFP+3s0/+3T8/ZUyXmC7NF7pGTepOy0KYdE1qURZ3HXPUoL9Jlhxw+bV/UHwTjwC2yCsIODKBb07L/GSikUEICNeTAoQCDWAIDjc8+hBAhdwBNMgpRMLFObTQ2NWRwzGLKH+J3jbr9jC9xbT+3UCZ4i8VWoJHAXNSXmKcT2NOLitXO27J+8G+dp73aM/7jzypE1sED2b7pl5r/qbC0GMnjgahBYU+UYW13SudSuK/bm5ExVBh0q5CxOMa4QJ0657DNxGu1qt71lLv7DZVrW7pMut4af9pY4PD8OFfBzmQcBuPw+WSw/agb9TrchjswxHneh214AlOYQeK9875637zv/lv/vf/B/3iSuZ1mlvw2/I/QKpFRNg</latexit> ∂Llarge-margin ∂fφ(xi) = 2 Nt X xs∈Ss (fφ(xi) − fφ(xs)) − X xd∈Sd (fφ(xi) − fφ(xd)) ! = −2|Ss| Nt 1 |Ss| X xs∈Ss fφ(xs) − fφ(xi) ! − 2|Sd| Nt fφ(xi) − 1 |Sd| X xd∈Sd fφ(xd) ! = − 2|Ss| Nt (cs − fφ(xi)) | {z } pull towards its own class − 2|Sd| Nt (fφ(xi) − cd) | {z } push away from other classes . <latexit sha1_base64="KiGFv7+9REOYQvnl8Jmcqj5MVY=">AGB3icjVPNbtNAEJ4SAiX8tXDksqICJYdWS8gJKQCFw4IFUF/pG5lre1Nsur6h901beXuA/Am3LghrwA18IbwFswu3aQkxbR3HG38x832zTphLoU2/3PuUuty+8rV+Wud6zdu3rq9sHhnU2eFivhGlMlMbYdMcylSvmGEkXw7V5wloeRb4d4Ll9/6wJUWfrOHOZ8N2GjVAxFxAxCwWLrGQ35SKQlk2KU2g6hQ8WikuZMGcEkTZgZR0yWr2xQUsMPTCmZGvHlBO8itdaelJhQPOx6PqWcFge2ED0LHn4dMK5asvXgbFU8qHpUl0kyDit1YRQkZK3gbdJqblWVD3emSZzDFU98YZ6416NKjMamR2iapUSckUo7aB2x1+JP0JlR5WDkGtlQuXG5RV7ixLTeIblIiJEBxenI8M7nR2lRO3CDnZC9NGzhrAUacxUiNS/PXEbNF6G5c7zZoHqX8kJKYrJ9pmJNhNEk209JnWtpA4+iLbmOqJ24arceE7bNDMlRZQjIzRs9+Nd2BXlZyGXJ3z8ZKRYLnho73U2H8jSe/GM6wcJSf6XvLzIbDOpgCeprPVs4BgoxZBAQlwSMFgLIGBxs8ODKAPOWK7UCKmMBI+z8FCB3sLrOJYwRDdw/sIn3ZqNMVnx6l9d4RTJH4VdhJ4gD0Z1imM3Ti84Vnduj/uEvP6bQd4m9YcyWIGhgjel7fpPKifc6LgSE89h4Eeso94txFNUvht+KUk1OuDLkiLk4xrzCOPKdkz0T36O9d7db5vO/fKVD3XNU1xbw26nEAx78e5yzwebqyqC/MnizurT2vD7qebgH96GL5/kI1uAlrMGRK1Pre+t49aP9sf25/aX9teq9NJc3XMX/ra3/4A/X7G0A=</latexit><latexit sha1_base64="KiGFv7+9REOYQvnl8Jmcqj5MVY=">AGB3icjVPNbtNAEJ4SAiX8tXDksqICJYdWS8gJKQCFw4IFUF/pG5lre1Nsur6h901beXuA/Am3LghrwA18IbwFswu3aQkxbR3HG38x832zTphLoU2/3PuUuty+8rV+Wud6zdu3rq9sHhnU2eFivhGlMlMbYdMcylSvmGEkXw7V5wloeRb4d4Ll9/6wJUWfrOHOZ8N2GjVAxFxAxCwWLrGQ35SKQlk2KU2g6hQ8WikuZMGcEkTZgZR0yWr2xQUsMPTCmZGvHlBO8itdaelJhQPOx6PqWcFge2ED0LHn4dMK5asvXgbFU8qHpUl0kyDit1YRQkZK3gbdJqblWVD3emSZzDFU98YZ6416NKjMamR2iapUSckUo7aB2x1+JP0JlR5WDkGtlQuXG5RV7ixLTeIblIiJEBxenI8M7nR2lRO3CDnZC9NGzhrAUacxUiNS/PXEbNF6G5c7zZoHqX8kJKYrJ9pmJNhNEk209JnWtpA4+iLbmOqJ24arceE7bNDMlRZQjIzRs9+Nd2BXlZyGXJ3z8ZKRYLnho73U2H8jSe/GM6wcJSf6XvLzIbDOpgCeprPVs4BgoxZBAQlwSMFgLIGBxs8ODKAPOWK7UCKmMBI+z8FCB3sLrOJYwRDdw/sIn3ZqNMVnx6l9d4RTJH4VdhJ4gD0Z1imM3Ti84Vnduj/uEvP6bQd4m9YcyWIGhgjel7fpPKifc6LgSE89h4Eeso94txFNUvht+KUk1OuDLkiLk4xrzCOPKdkz0T36O9d7db5vO/fKVD3XNU1xbw26nEAx78e5yzwebqyqC/MnizurT2vD7qebgH96GL5/kI1uAlrMGRK1Pre+t49aP9sf25/aX9teq9NJc3XMX/ra3/4A/X7G0A=</latexit><latexit sha1_base64="KiGFv7+9REOYQvnl8Jmcqj5MVY=">AGB3icjVPNbtNAEJ4SAiX8tXDksqICJYdWS8gJKQCFw4IFUF/pG5lre1Nsur6h901beXuA/Am3LghrwA18IbwFswu3aQkxbR3HG38x832zTphLoU2/3PuUuty+8rV+Wud6zdu3rq9sHhnU2eFivhGlMlMbYdMcylSvmGEkXw7V5wloeRb4d4Ll9/6wJUWfrOHOZ8N2GjVAxFxAxCwWLrGQ35SKQlk2KU2g6hQ8WikuZMGcEkTZgZR0yWr2xQUsMPTCmZGvHlBO8itdaelJhQPOx6PqWcFge2ED0LHn4dMK5asvXgbFU8qHpUl0kyDit1YRQkZK3gbdJqblWVD3emSZzDFU98YZ6416NKjMamR2iapUSckUo7aB2x1+JP0JlR5WDkGtlQuXG5RV7ixLTeIblIiJEBxenI8M7nR2lRO3CDnZC9NGzhrAUacxUiNS/PXEbNF6G5c7zZoHqX8kJKYrJ9pmJNhNEk209JnWtpA4+iLbmOqJ24arceE7bNDMlRZQjIzRs9+Nd2BXlZyGXJ3z8ZKRYLnho73U2H8jSe/GM6wcJSf6XvLzIbDOpgCeprPVs4BgoxZBAQlwSMFgLIGBxs8ODKAPOWK7UCKmMBI+z8FCB3sLrOJYwRDdw/sIn3ZqNMVnx6l9d4RTJH4VdhJ4gD0Z1imM3Ti84Vnduj/uEvP6bQd4m9YcyWIGhgjel7fpPKifc6LgSE89h4Eeso94txFNUvht+KUk1OuDLkiLk4xrzCOPKdkz0T36O9d7db5vO/fKVD3XNU1xbw26nEAx78e5yzwebqyqC/MnizurT2vD7qebgH96GL5/kI1uAlrMGRK1Pre+t49aP9sf25/aX9teq9NJc3XMX/ra3/4A/X7G0A=</latexit><latexit sha1_base64="KiGFv7+9REOYQvnl8Jmcqj5MVY=">AGB3icjVPNbtNAEJ4SAiX8tXDksqICJYdWS8gJKQCFw4IFUF/pG5lre1Nsur6h901beXuA/Am3LghrwA18IbwFswu3aQkxbR3HG38x832zTphLoU2/3PuUuty+8rV+Wud6zdu3rq9sHhnU2eFivhGlMlMbYdMcylSvmGEkXw7V5wloeRb4d4Ll9/6wJUWfrOHOZ8N2GjVAxFxAxCwWLrGQ35SKQlk2KU2g6hQ8WikuZMGcEkTZgZR0yWr2xQUsMPTCmZGvHlBO8itdaelJhQPOx6PqWcFge2ED0LHn4dMK5asvXgbFU8qHpUl0kyDit1YRQkZK3gbdJqblWVD3emSZzDFU98YZ6416NKjMamR2iapUSckUo7aB2x1+JP0JlR5WDkGtlQuXG5RV7ixLTeIblIiJEBxenI8M7nR2lRO3CDnZC9NGzhrAUacxUiNS/PXEbNF6G5c7zZoHqX8kJKYrJ9pmJNhNEk209JnWtpA4+iLbmOqJ24arceE7bNDMlRZQjIzRs9+Nd2BXlZyGXJ3z8ZKRYLnho73U2H8jSe/GM6wcJSf6XvLzIbDOpgCeprPVs4BgoxZBAQlwSMFgLIGBxs8ODKAPOWK7UCKmMBI+z8FCB3sLrOJYwRDdw/sIn3ZqNMVnx6l9d4RTJH4VdhJ4gD0Z1imM3Ti84Vnduj/uEvP6bQd4m9YcyWIGhgjel7fpPKifc6LgSE89h4Eeso94txFNUvht+KUk1OuDLkiLk4xrzCOPKdkz0T36O9d7db5vO/fKVD3XNU1xbw26nEAx78e5yzwebqyqC/MnizurT2vD7qebgH96GL5/kI1uAlrMGRK1Pre+t49aP9sf25/aX9teq9NJc3XMX/ra3/4A/X7G0A=</latexit>

Case study

  • We implement and compare several of other large margin

methods for few-shot learning.

  • Our framework is simple, efficient, and can be applied to

improve existing and new meta-learning methods with very little

  • verhead.

Features

  • Graph Neural Network (GNN)
  • Prototypical Network (PN)

The University of Hong Kong1, The Hong Kong Polytechnic University2 Yong Wang1, Xiao-Ming Wu2, Qimai Li2, Jiatao Gu1, Wangmeng Xiang2, Lei Zhang2, Victor O.K. Li1

slide-13
SLIDE 13

Amortized Bayesian Meta-Learning

Sachin Ravi & Alex Beatson Department of Computer Science, Princeton University

  • Lot of progress in few-shot learning but under controlled settings
  • In real world, relationship between training and testing tasks can be tenuous
  • Task-specific predictive uncertainty is crucial
  • We present gradient-based meta-learning method for computing task-specific approximate posterior
  • Show that method displays good predictive uncertainty on contextual-bandit and few-shot learning tasks
slide-14
SLIDE 14

The effects of negative adaptation in Model-Agnostic Meta-Learning

Tristan Deleu, Yoshua Bengio

  • The advantage of meta-learning is well-founded under the assumption

that the adaptation phase does improve the performance of the model on the task of interest

  • Optimization: maximize the performance after adaptation,

performance improvement is not explicitly enforced

  • We show empirically that performance


can decrease after adaptation in MAML.
 We call this negative adaptation

  • How to fix this issue? Ideas from


Safe Reinforcement Learning

min

θ

ET ⇠p(T )[L(θ0

T ; D0 T )]

slide-15
SLIDE 15

Audrey G. Chung, Paul Fieguth, Alexander Wong

slide-16
SLIDE 16

Evolvability ES: Scalable Evolutionary Meta-Learning

  • Evolvability ES is a meta-learning algorithm

inspired by Evolution Strategies [1]

  • Surprisingly, Evolvability ES finds parameters

such that at test time, random perturbations result in diverse behaviors

  • In a simulated Ant locomotion domain, adding

Gaussian noise to the parameters results in policies which move in many different directions

By Alexander Gajewski, Jeff Clune, Kenneth O. Stanley, and Joel Lehman

[1] Salimans et al., Evolution Strategies as a Scalable Alternative to Reinforcement Learning, 2017.

slide-17
SLIDE 17

Consolidating the Meta-Learning Zoo

A Unifying Perspective as Posterior Predictive Inference

► Novel: Probabilistic, amortized, multi-task, meta-learning framework. ► Meta-learning: Learns how to learn a classifier or regressor for each new task. ► Unifies: MAML, Meta-LSTM, Prototypical networks, and Conditional Neural Processes are special cases. ► State of the art: Leading classification accuracy on 5 of 6 Omniglot & miniImageNet tasks. ► Efficient: Test-time requires only forward passes, no gradient steps are needed. ► Versatile: Robust classification accuracy as shot and way are varied at test-time.

Jonathan Gordon1, John Bronskill1, Matthias Bauer1,2, Sebastian Nowozin3, Richard E. Turner1,3

1University of Cambridge, 2MPI for Intelligent Systems, Tübingen, 3Microsoft Research

► High quality 1-shot view reconstruction:

slide-18
SLIDE 18

Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Anusha Nagabandi, Chelsea Finn, Sergey Levine

Can we use meta-learning for efgective online learning?

Our method can:

  • Reason about non-stationary

latent distributions over tasks.

  • Recall past tasks