T HOUGHTS ON P ROGRESS M ADE AND C HALLENGES A HEAD IN F EW -S HOT L - PowerPoint PPT Presentation

T HOUGHTS ON P ROGRESS M ADE AND   C HALLENGES A HEAD IN F EW -S HOT L EARNING Hugo Larochelle Google Brain

3 Human-level concept learning People are   through probabilistic good at it program induction Brenden M. Lake, 1 * Ruslan Salakhutdinov, 2 Joshua B. Tenenbaum 3 Machines are getting better at it

5 RELATED WORK: ONE-SHOT LEARNING • One-shot learning has been studied before ‣ One-Shot learning of object categories (2006)   Fei-Fei Li, Rob Fergus and Pietro Perona ‣ Knowledge transfer in learning to recognize visual objects classes (2004)   Fei-Fei Li ‣ Object classification from a single example utilizing class relevance pseudo-metrics (2004)   Michael Fink ‣ Cross-generalization: learning novel classes from a single example by feature replacement (2005)   Evgeniy Bart and Shimon Ullman • These largely relied on hand-engineered features and algorithms ‣ with recent progress in end-to-end deep learning, we hope to jointly learn a representation and algorithm better suited for few-shot learning

6 META-LEARNING

7 META-LEARNING D train D test = episode

8 META-LEARNING D train D test = episode

8 META-LEARNING D train D test = episode Meta-learner ( A )

8 META-LEARNING D train D test = episode Meta-learner ( A ) Learner ( M )

8 META-LEARNING D train D test = episode Loss Meta-learner ( A ) Learner ( M )

9 META-LEARNING D train D test = episode Loss Meta-learner ( A ) Learner ( M )

If you don’t evaluate on never-seen problems/datasets…

If you don’t evaluate on never-seen problems/datasets… … it’s not meta-learning!

11 LEARNING PROBLEM STATEMENT • Assuming a probabilistic model M over labels, the cost per episode can written as 1 X C ( D train , D test ) = − log p ( y t | x t , D train ) | D test | ( x t ,y t ) ∈ D test • Here jointly represents the meta-learner A (which processes p ( y | x , D train ) D train ) and the learner M (which processes x )

12 CHOOSING A META-LEARNER • How to parametrize learning algorithms (meta-learners )? p ( y | x , D train ) • Two approaches to defining a meta-learner ‣ Take inspiration from a known learning algorithm kNN/kernel machine: Matching networks (Vinyals et al. 2016) - Gaussian classifier: Prototypical Networks (Snell et al. 2017) - Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) , MAML (Finn et al. 2017) - ‣ Derive it from a black box neural network SNAIL (Mishra et al. 2018) -

14 MATCHING NETWORKS • Training a “ pattern matcher ” (kNN/kernel machine) k X y = ˆ a (ˆ x, x i ) y i i =1 attention models and kernel functions) is to x ) ,g ( x i )) / P k x, x i ) = e c ( f (ˆ j =1 e c ( f (ˆ x ) ,g ( x j )) a (ˆ ate neural networks (potentially with f = g ) to • Matching networks for one shot learning (2016)   Oriol Vinyals, Charles Blundell, Timothy P. Lillicrap, Koray Kavukcuoglu, and Daan Wierstra

15 PROTOTYPICAL NETWORKS • Training a “ prototype extractor ” (Gaussian classifier) exp( − d ( f φ ( x ) , c k )) p φ ( y = k | x ) = P k 0 exp( − d ( f φ ( x ) , c k 0 )) c 2 minimizing the negative log-probability J ( φ ) = 1 X c k = f φ ( x i ) | S k | x ( x i ,y i ) ∈ S k c 1 S k = { ( x i , y i ) | y i = k, ( x i , y i ) ∈ D train } c 3 φ ≡ Θ • Prototypical Networks for Few-shot Learning (2017)   Jake Snell, Kevin Swersky and Richard Zemel

16 META-LEARNER LSTM • Training an “ initialize and gradient descent procedure ” applied on some learner M D test D train C ( D train , D test )

16 META-LEARNER LSTM • Training an “ initialize and gradient descent procedure ” applied on some learner M D test D train C ( D train , D test ) • Optimization as a Model for Few-Shot Learning (2017)   Sachin Ravi and Hugo Larochelle

16 META-LEARNER LSTM • Training an “ initialize and gradient descent procedure ” applied on some learner M D test D train C ( D train , D test ) • Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017)   Chelsea Finn, Pieter Abbeel and Sergey Levine

18 SIMPLE NEURAL ATTENTIVE LEARNER Supervised Learning • Using a convolutional/attentional network   edicted Label t to represent p ( y | x , D train ) ‣ alternates between dilated convolutional layers and attentional layers ‣ when inputs are images, an convolutional embedding network is used   (b) Attention Block (key size K, value size V) to map to a vector space outputs, shape [T, C + V] (a) Dense Block (dilation rate R, D lters) concatenate outputs, shape [T, C + D] matmul a ne, output size V concatenate matmul, masked softmax (values) causal conv, kernel 2 a ne, output size K a ne, output size K (query) (keys) dilation R, D lters inputs, shape [T, C] inputs, shape [T, C] x t-3 x t-2 x t-1 x t (Examples, • A Simple Neural Attentive Meta-Learner (2018)   y t-3 y t-2 y t-1 -- Nikhil Mishra, Mostafa Rohaninejad, Xi Chen and Pieter Abbeel

19 AND SO MUCH MORE!!! bit.ly/2PikS82

20 EXPERIMENT • Mini-ImageNet (split used in Ravi & Larochelle, 2017) ‣ random subset of 100 classes (64 training, 16 validation, 20 testing) ‣ random sets D train are generated by randomly picking 5 classes from class subset 5-class Model 1 -shot 5 -shot Baseline-finetune 28 . 86 ± 0 . 54 % 49 . 79 ± 0 . 79 % Baseline-nearest-neighbor 41 . 08 ± 0 . 70 % 51 . 04 ± 0 . 65 % 43 . 40 ± 0 . 78 % 51 . 09 ± 0 . 71 % Matching Network 43 . 56 ± 0 . 84 % 55 . 31 ± 0 . 73 % Matching Network FCE 43.56% ± 0.84% 55.31% ± 0.73% 43.44% ± 0.77% 60.60% ± 0.71% Meta-Learner LSTM (OURS) 43 . 44 ± 0 . 77 % 60 . 60 ± 0 . 71 %

21 EXPERIMENT • Mini-ImageNet (split used in Ravi & Larochelle, 2017) ‣ random subset of 100 classes (64 training, 16 validation, 20 testing) ‣ random sets D train are generated by randomly picking 5 classes from class subset 5-class Model 1 -shot 5 -shot Baseline-finetune 28 . 86 ± 0 . 54 % 49 . 79 ± 0 . 79 % 49.42% ± 0.78% 68.20% ± 0.66% Prototypical Nets (Snell et al.) Baseline-nearest-neighbor 41 . 08 ± 0 . 70 % 51 . 04 ± 0 . 65 % MAML (Finn et al.) 48.70% ± 1.84% 63.10% ± 0.92% 43 . 40 ± 0 . 78 % 51 . 09 ± 0 . 71 % Matching Network SNAIL (Mishra et al.) 55.71% ± 0.99% 68.88% ± 0.98% 43 . 56 ± 0 . 84 % 55 . 31 ± 0 . 73 % Matching Network FCE 43.56% ± 0.84% 55.31% ± 0.73% 43.44% ± 0.77% 60.60% ± 0.71% Meta-Learner LSTM (OURS) 43 . 44 ± 0 . 77 % 60 . 60 ± 0 . 71 %

22 REMAINING CHALLENGES • Going beyond supervised classification ‣ unsupervised learning, structured output, interactive learning • Going beyond Mini-ImageNet ‣ coming up with a realistic definition of distributions over problems/datasets • Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle Google

23 META-DATASET • To learn across many tasks requires learning over many datasets (a) ImageNet (b) Omniglot (c) Aircraft (d) Birds (e) DTD (f) Quick Draw (g) Fungi (h) VGG Flower (i) Traffic Signs (j) MSCOCO

23 META-DATASET • To learn across many tasks requires learning over many datasets Held out for testing (a) ImageNet (b) Omniglot (c) Aircraft (d) Birds (e) DTD (f) Quick Draw (g) Fungi (h) VGG Flower (i) Traffic Signs (j) MSCOCO

T HOUGHTS ON P ROGRESS M ADE AND C HALLENGES A HEAD IN F EW -S HOT L - PowerPoint PPT Presentation

T HOUGHTS ON P ROGRESS M ADE AND C HALLENGES A HEAD IN F EW -S HOT L EARNING Hugo Larochelle Google Brain 3 Human-level concept learning People are through probabilistic good at it program induction Brenden M. Lake, 1 * Ruslan

RENEWING THE PARTNERSHIP T HOUGHTS ON THE C URRENT S TATUS OF A MERICAN R ESEARCH U NIVERSITIES : A

Thoughts on Economic Planning houghts on Economic Planning in in the 21 the 21 st st centur

Melanoma in 2009: therapeutic p rogress on two fronts tumor and host John M. Kirkwood, MD

OF P ROGRESS AND N EXT S TEPS Leon Rajaobelina Co-President WAVES Madagascar Steering Committee

Pro rogress ress and d Pur urpo pose se of of Americas Register of Deeds Off ffic

New Mechanis ism of f Actio tion: : How to to pro rogress SA SAIC ICM Iss Issues of f

M easures of A cademic P rogress N orth w est E ducation A ssociation Bernards Twp. BOE April 9,

Generating Connector Laws W ORK IN P ROGRESS Dave Clarke (CWI) Goal: Axiomatize Component

Statewide Electr Statewide Electronic onic Transcript System anscript System Ray Girdler, ADE

Repr eprod oductive M Medicine e - upgrade ade Ma Master ers o of R f Rep eproductive e

Insights from Light Curve Modelling Collaborators: Christo Venter AK Harding, AS Seyffert, M

Rural Tourism and Enterprise Management, Marketing and Sustainability Edited by ADE ORIADE AND

FOO OOD an and AG AGRICULTURAL AL PROD ODUCTS IN INDUSTRY an and TRAD ADE A. A.S.

Multi-Ag e nc y Pa rtne rship E ve nt City of L ondon Cor por ation and City and Hac kne y

Conflue nc e Ac ade my, Inc . And Re late d E ntity Audito r Co mmunic a tio ns And Re sults Of

SW & Computing Organization and progress at CERN Andrea DellAcqua, CERN EP-ADE

A Grand Challenge for Testing Nanoelectronic Circuits Introduction B. Becker, University of

Geometry and Divergence of High- dimensional Point Clouds Peng Qiu Department of Bioinformatics

Test Driven Infrastructure with Puppet, Docker, Test Kitchen and Serverspec About Me Russian

ST RIVING F OR E XCE L L E NCE T om Sugar, Pre side nt, Comple te Colle ge Ame r ic a

Intro to Assumption Testing November 22, 2019 Todays Agenda Intro to Assumption Testing and

Testing of embedded and mobile Qt and QML Applications Qt Developer Days 2013 by Harri Porten

Finding Semantic Bugs in File Systems with an Extensible Fuzzing Framework Seulbae Kim, Meng Xu *

(New) Challenges in Random Number Generation for Cryptography Viktor F ISCHER Laboratoire Hubert

T HOUGHTS ON P ROGRESS M ADE AND C HALLENGES A HEAD IN F EW -S HOT L - PowerPoint PPT Presentation

T HOUGHTS ON P ROGRESS M ADE AND C HALLENGES A HEAD IN F EW -S HOT L EARNING Hugo Larochelle Google Brain 3 Human-level concept learning People are through probabilistic good at it program induction Brenden M. Lake, 1 * Ruslan

RENEWING THE PARTNERSHIP T HOUGHTS ON THE C URRENT S TATUS OF A MERICAN R ESEARCH U NIVERSITIES : A

Thoughts on Economic Planning houghts on Economic Planning in in the 21 the 21 st st centur

Melanoma in 2009: therapeutic p rogress on two fronts tumor and host John M. Kirkwood, MD

OF P ROGRESS AND N EXT S TEPS Leon Rajaobelina Co-President WAVES Madagascar Steering Committee

Pro rogress ress and d Pur urpo pose se of of Americas Register of Deeds Off ffic

New Mechanis ism of f Actio tion: : How to to pro rogress SA SAIC ICM Iss Issues of f

M easures of A cademic P rogress N orth w est E ducation A ssociation Bernards Twp. BOE April 9,

Generating Connector Laws W ORK IN P ROGRESS Dave Clarke (CWI) Goal: Axiomatize Component

Statewide Electr Statewide Electronic onic Transcript System anscript System Ray Girdler, ADE

Repr eprod oductive M Medicine e - upgrade ade Ma Master ers o of R f Rep eproductive e

Insights from Light Curve Modelling Collaborators: Christo Venter AK Harding, AS Seyffert, M

Rural Tourism and Enterprise Management, Marketing and Sustainability Edited by ADE ORIADE AND

FOO OOD an and AG AGRICULTURAL AL PROD ODUCTS IN INDUSTRY an and TRAD ADE A. A.S.

Multi-Ag e nc y Pa rtne rship E ve nt City of L ondon Cor por ation and City and Hac kne y

Conflue nc e Ac ade my, Inc . And Re late d E ntity Audito r Co mmunic a tio ns And Re sults Of

SW &amp; Computing Organization and progress at CERN Andrea DellAcqua, CERN EP-ADE

A Grand Challenge for Testing Nanoelectronic Circuits Introduction B. Becker, University of

Geometry and Divergence of High- dimensional Point Clouds Peng Qiu Department of Bioinformatics

Test Driven Infrastructure with Puppet, Docker, Test Kitchen and Serverspec About Me Russian

ST RIVING F OR E XCE L L E NCE T om Sugar, Pre side nt, Comple te Colle ge Ame r ic a

Intro to Assumption Testing November 22, 2019 Todays Agenda Intro to Assumption Testing and

Testing of embedded and mobile Qt and QML Applications Qt Developer Days 2013 by Harri Porten

Finding Semantic Bugs in File Systems with an Extensible Fuzzing Framework Seulbae Kim, Meng Xu *

(New) Challenges in Random Number Generation for Cryptography Viktor F ISCHER Laboratoire Hubert

SW & Computing Organization and progress at CERN Andrea DellAcqua, CERN EP-ADE