Meta-Learning
Lake 2019 & McCoy et al. 2020
By Joe O'Connor, Abby Bertics, and Ferran Alet
Meta-Learning Lake 2019 & McCoy et al. 2020 By Joe O'Connor, - - PowerPoint PPT Presentation
Meta-Learning Lake 2019 & McCoy et al. 2020 By Joe O'Connor, Abby Bertics, and Ferran Alet Timeline 5 min 35 min 15 min 10 min 25 min 15 min 5 min Introduction Lake 2019 Discussion Break McCoy et al. 2020 Discussion Conclusion
By Joe O'Connor, Abby Bertics, and Ferran Alet
5 min
Introduction Lake 2019
Compositional generalization through meta sequence-to-sequence learning
35 min
Discussion
Breakout rooms + group 15 min
Break
10 min
McCoy et al. 2020
Universal linguistic inductive biases via meta-learning
25 min
Discussion
Interspersed 15 min
Conclusion
5 min
Leveraging related tasks, either in terms of data or computations
Two views of meta-learning:
Adaptable network Small adaptation
Untrained network Meta-train test training
Meta-test training
apply
test
This adaptability can take many forms LSTM, memory, gradient update, other optimizations It’s not fine for the model to have access to this test This is the only number we care about to measure how good our model is. It’s fine for the model to have access to this test!
Adaptable weights Specialized modules Specialized modules of adaptable weights Search structure Finetune weights Search structure + Finetune weights
Parametric meta-learning Modular meta-learning Combination
Untrained Neural Net Untrained modules Untrained modules Meta-train Meta-test test training
apply
Presented by Ferran Alet and Joe O’Connor
Solving meta-seq2seq: learning to solve sequence-to-sequence tasks from small amounts of data with memory-augmented neural networks:networks that can probe learned soft dictionaries that encode previous inputs
=Training =Test =Test =Training
=Training =Training Meta-test episode =Test =Test =Test =Training
Meta-learning version: 4! Assignments of 4 words to 4 colors
We meta-train on 4!-1=23 variations of SCAN by mapping (‘jump’, ‘run’, ‘walk’, ‘look’) a permutation of the correct meanings (JUMP, RUN, WALK, LOOK) and test on the unseen identity permutation
Encode RNN to encode each input into memory keys Use input encoder to create key to probe memory Use different RNN to encode each output into memory values Use decoder from retrieved context to decode output
Memory as soft dictionary
Given examples...
program... which can be applied to held-out examples: G.apply(`zup fep`) = [zup][zup][zup] =
ME to resolve ambiguity in laboratory tasks on artificial language
Hm… well this one is definitely a cup... … and I’ve never seen anything like this before
drinking wine on a Tuesday night in November
(out of all 24), with the mappings also randomly defined
essentially the same way, you should know how to do YX
relationship between left and right to figure out how to jump around right
compositional skills and variable manipulation, it should have no problem figuring out the meaning
twice and look opposite right thrice
insist is necessary and unattainable via connectionist models, but how close is it? Would some extra symbolic machinery get it the rest of the way there, as he suggests it would?
mappings and must learn the fourth mapping. Assuming the query set was such that the mappings where still uniquely determined, what if it got two and had to learn two? One and three? Zero and four?
Presented by Abby Bertics
○ Poverty of the Stimulus / Data Sparsity Problem ○ Data + inductive biases
Flood the chat
What biases might be useful and/or necessary for a language learner?
1. Which learners are sure to discover a grammar G′ such that the language of G is the same as the language of G′? 2. Which learners can do this for samples drawn from any language which belong to some class of languages? 3. What kind of sample does the learner need to succeed in this way?
○ Grammars which generate these patterns are ultimately constrained in some fashion
“meta-learning is a very powerful approach for endowing artificial systems with useful inductive biases” Human learning: Given biases, learn (any) language. Meta-Learning: Given possible languages, learn biases. Shifting need for structure from model to data.
Standard Training:
training language Meta-Training:
after few steps of training
Given: L = {L0, L1,...,Ln}, p(L), M0 At step i:
solely by the ranking of an a priori finite set of constraints
Note: phonotactics is an “easy” problem in the realm of language. No phonological systems extend beyond the regular boundary in the Chomsky hierarchy (aka they can all be described by FSAs)
1. Define the space of learning problems (L) 2. Meta-training 3. Verification that inductive bias was acquired
98.8% accuracy with meta-learned initial parameters vs. 6.5% accuracy with a randomly-initialized model
Is this cheating?
Data generated with:
underlying data-generating process, to lend insight into the inductive biases that shaped this data
○ meta-learning can disentangle universal inductive biases from non-universal factors
meta-learning framework have?
algorithm for language? Or is it more modular?
works well for phonology work well for syntax?
Is this cheating? What would it mean to not cheat?
“Properties of the learning mechanism explain patterns found in natural language.” (Heinz 2007)
How about the inverse: Patterns found in natural language explain properties of the learning mechanism.
Meta-Learning of Compositional Distributions in Humans and Machines (Kumar et al. 2020) No Free Lunch in Linguistics or Machine Learning (Rawski and Heinz 2019)
Inductive Learning of Phonotactic Patterns (Heinz 2007)