me meta lear learnin ing a bri brief introduct ction
play

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong - PowerPoint PPT Presentation

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng TG Ph.D. Student 2018-12-01 Ou Outline Introduction to Meta Learning Types of Meta-Learning Models Papers: Optimization as a model for few-shot


  1. Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng TG Ph.D. Student 2018-12-01

  2. Ou Outline • Introduction to Meta Learning • Types of Meta-Learning Models • Papers: • � Optimization as a model for few-shot learning � ICLR2017 • � Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks � ICML2017 • � Meta-Learning for Low-Resource Neural Machine Translation � EMNLP2018 • Conclusion

  3. Me Meta-lear learnin ing Reinforcement learning Machine Learning ����L��������� ������� ����L��������� Meta Learning ��R������ Deep Learning ����������� ��������� +�������� L��������� ��D���� ��������� Meta Learning/Learning to learn https://zhuanlan.zhihu.com/p/28639662

  4. Me Meta-lear learnin ing • Learning to learn ������ • ������������� • ��������������������� ��������������������� ����� �� �������������� ����������� ���� ������ ����� ���������� ������ �������I������������� ��A���������� �������� �������� • Meta learning �� AI ������ ���� Learning to Learn ������������������� https://zhuanlan.zhihu.com/p/27629294

  5. Ex Exampl ple Learner �� model ���������� ���������� �� • �� • �� • �� • ���� • ���� • �� • �� • �� • �� • � SGD/Adam Meta-learner ����� Learner � Learning rate Dacay …… Meta learning Machine or Deep learning

  6. Ty Types of Meta-Le Learn rning Mod Models • Humans learn following different methodologies tailored to specific circumstances. • In the same way, not all meta-learning models follow the same techniques. • Types of Meta-Learning Models 1. Few Shots Meta-Learning 2. Optimizer Meta-Learning 3. Metric Meta-Learning 4. Recurrent Model Meta-Learning 5. Initializations Meta-Learning What’s New in Deep Learning Research: Understanding Meta-Learning

  7. Fe Few Shots Meta ta-Le Learn rning • Create models that can learn from minimalistic datasets mimicking --> (learn from tiny data) • Papers • Optimization As A Model For Few Shot Learning � ICLR2017 � • One-Shot Generalization in Deep Generative Models � ICML2016 � • Meta-Learning with Memory-Augmented Neural Networks � ICML2016 �

  8. Op Optimizer Meta-Le Learn rning • Task: Learning how to optimize a neural network to better accomplish a task. • There is one network (the meta-learner) which learns to update another network (the learner) so that the learner effectively learns the task. • Papers: • Learning to learn by gradient descent by gradient descent (NIPS 2016) • Learning to Optimize Neural Nets

  9. Me Metri ric Me Meta-Le Learn rning • To determine a metric space in which learning is particularly efficient. This approach can be seen as a subset of few shots meta-learning in which we used a learned metric space to evaluate the quality of learning with a few examples • Papers: • Prototypical Networks for Few-shot Learning(NIPS2017) • Matching Networks for One Shot Learning(NIPS2016) • Siamese Neural Networks for One-shot Image Recognition • Learning to Learn: Meta-Critic Networks for Sample Efficient Learning

  10. Re Recurrent Model Meta-Le Learn rning • The meta-learner algorithm will train a RNN model will process a dataset sequentially and then process new inputs from the task • Papers: • Meta-Learning with Memory-Augmented Neural Networks • Learning to reinforcement learn • !" # : Fast Reinforcement Learning via Slow Reinforcement Learning

  11. Initializ Initializatio tions ns Meta-Le Learn rning • Optimized for an initial representation that can be effectively fine-tuned from a small number of examples • Papers: • Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks � ICML 2017 � • Meta-Learning for Low-Resource Neural Machine Translation � EMNLP2018 �

  12. Pa Papers Few Shots Meta-Learning � Recurrent Model Meta- Learning � Optimizer Meta-Learning � Initializations Meta-Learning � Supervised Meta Learning Optimization As a Model For Few Shot Learning (ICLR2017) Modern Meta Learning Meta Learning in NLP Model-Agnostic Meta-Learning for Meta-Learning for Low-Resource Fast Adaptation of Deep Networks Neural Machine Translation (ICML2017) (EMNLP2018)

  13. Op Optimization on As a Mod odel For or Few Sh Shot ot Le Lear arnin ing g Twitter, Sachin Ravi, Hugo Larochelle ICLR2017 Few Shots Meta-Learning • Recurrent Model Meta-Learning • Optimizer Meta-Learning • Supervised Meta Learning • Initializations Meta-Learning •

  14. Fe Few Shots Learning • Given a tiny labelled training set ! � which has " examples, ! = $ % , ' % , … $ ) , ' ) , • In classification problem: • * − ,ℎ./ Learning • " classes • * labelled examples( * is always less than 20)

  15. LSTM TM-Ce Cell state update forgetting the things we decided to forget earlier new cell state old cell state new candidate values �� ��� � �� https://www.jianshu.com/p/9dc9f41f0b29

  16. Su Supervised l learn rning ���� NN ���������� Optimizer �� • SGD �� • Adam ���� • …… �� • �� • !(#) → & image label

  17. Me Meta l learn rning • Meta-learning suggests framing the learning problem at two levels. (Thrun, 1998; Schmidhuber et al., 1997) • The first is quick acquisition of knowledge within each separate task presented. (Fast adaption) • This process is guided by the second, which involves slower extraction of information learned across all the tasks.(Learning)

  18. Mot Motivation on • Deep Learning has shown great success in a variety of tasks with large amounts of labeled data. • Gradient-based optimization (momentum, Adagrad, Adadelta and ADAM) in high capacity classifiers requires many iterative steps over many examples to perform well. • Start from a random initialization of its parameters. • Perform poorly on few-shot learning tasks. Is there an optimizer can finish the optimization task using just few examples?

  19. Me Method od LSTM cell-state update � Gradient based update � Propose an LSTM based meta-learner model to learn the exact optimization algorithm used to train another learner neural network classifier in the few-shot regime.

  20. LSTM-based meta-learner Me Method od optimizer that is trained to optimize a learner neural network classifier. Current parameter ! "#$ Gradient ∇ & '() ℒ Meta-learner Learner Learn optimization algorithm Neural network classifier New parameter ! " Gradient-based optimization: Meta-learner optimization: ! " = metalearner(! "#$ , ∇ & '() ℒ) knowing how to quickly optim the parameters

  21. Mod Model Given by learner Given by learner

  22. Ta Task Description episode Used to train learner Used to train meta-learner

  23. Tr Training • Example: 5 classes, 1 shot learning • & "'()* , & ",-" ← Random dataset from & /,"(#"'()* Loss ℒ Learner Neural network classifier ( ! "#$ ) Gradient ∇ 1 234 ℒ Loss ℒ Meta-learner Output of Current param ! "#$ Learn optimization meta learner algorithm( Θ 6#$ ) Gradient ∇ 1 234 ℒ 7 " Output of Learner Learner Update learner meta learner Neural network Update 7 " classifier ( ! " ) Learner Meta-Learner Neural network Loss ℒ ",-" Update classifier ( ! " ) Θ 6 = Θ 6#$ − :∇ ; <34 ℒ ",-"

  24. Initializ Initializatio tions ns Meta-Le Learn rning • Initial value of the cell state ! " • Initial weights of the classifier # " • ! " = # " • Learning this initial value lets the meta-learner determine the optimal initial weights of the learner

  25. Te Testing • Example: 5 classes, 1 shot learning • ' #()*+ , ' #-.# ← Random dataset from ' 0-#)$1231 Loss ℒ Learner (Init with ! " , Current ! #$% ) Gradient ∇ 5 678 ℒ Loss ℒ Meta-learner Output of Current param ! #$% learn optimization meta learner algorithm( Θ ) Gradient ∇ 5 678 ℒ : # Output of Learner Learner Update learner meta learner Neural network Update : # classifier( ! # ) Learner Testing Neural network Metric classifier

  26. Tr Training Learner Update Meta-Learner Update

  27. Tr Trick • Parameter Sharing • meta-learner to produce updates for deep neural networks, which consist of tens of thousands of parameters, to prevent an explosion of meta-learner parameters we need to employ some sort of parameter sharing. • Batch Normalization • Speed up learning of deep neural networks by reducing internal covariate shift within the learner’s hidden layers.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend