LTI Orientation Deep Learning Research for NLP Graham Neubig
Language Processing Mary prevents Peter from scoring a goal. John passes the ball upfield to Peter, who shoots for the goal. The shot is deflected by Mary and the ball goes out of bounds.
Structured Prediction X Y
Supervised Learning X X Y Y
Supervised Learning w/ Neural Nets Learning θ X X Y Y
Structured Prediction w/ Neural Nets
Neural Structured Prediction w Y Model X Loss
Neural Structured Prediction w Y Model X Search Loss
The Problem of Discrete Decisions y i ( w ) = ‘dog’ ˆ y i ( w ) = ‘the’ ˆ y i ( w ) = ‘cat’ ˆ g ( w ) w
Soft Search [Goyal+18] (Faculty: Neubig) y i = ‘the’ ˆ argmax { e ( dog ) s i ( dog ) peaked softmax { e ( y ) · exp[ α · s i ( y )] e ( the ) X s i ( the ) Z y e ( cat ) α -soft argmax s i ( cat ) ˜ e i h i +1 h i
Smoothed Surface y i ( w ) = ‘dog’ ˆ y i ( w ) = ‘the’ ˆ y i ( w ) = ‘cat’ ˆ α = 10 α = 1 g ( w ) w
Prediction over Word Embedding Space [Kumar+18] (Faculty: Tsvetkov)
Structured Modeling w/ Neural Nets
Why Structured Neural Nets • In pre-neural NLP we did feature engineering to capture the salient features of text • Now, neural nets capture features for us • But given too much freedom they will not learn or overfit • So we do architecture engineering to add bias
Structure in Language Words Sentences S VP VP PP NP Alice gave a message to Bob Phrases Documents This film was completely unbelievable. The characters were wooden and the plot was absurd. That being said, I liked it.
BiLSTM Conditional Random Fields [Ma+15] (Faculty: Hovy) • Add an additional layer that ensures consistency between tags <s> I hate this movie <s> PRP VBP DT NN • Training and prediction use dynamic programming
Neural Factor Graph Models [Malaviya+18] (Faculty: Gormley, Neubig) • Problem: Neural CRFs can only handle single tag/word • Idea: Expand to multiple tags using graphical models
Stack LSTM [Dyer+15] (Faculty: Dyer (now DeepMind)) RED-L(amod) SHIFT … SHIFT REDUCE_L REDUCE_R S B p t } {z | } {z | TOP TOP amod an decision was made ∅ root overhasty TOP | REDUCE-LEFT(amod) Compositional {z Representation A SHIFT } …
Morphological Language Models [Matthews+18] (Faculty: Neubig, Dyer) • Problem: Language modeling for morphologically rich languages is hard • Idea: Specifically decompose input and output using morphological structure
Neural-Symbolic Integration
Neural-Symbolic Hybrids • Neural and symbolic models better at different things • Neural: smoothing over differences using similarity • Symbolic: remembering individual single-shot events • How can we combine the two?
Discrete Lexicons in Neural Seq2seq [Arthur+15] (Faculty: Neubig)
NNs + Logic Rules [Hu+16] (Faculty: Hovy, Xing) • Problem: It is difficult to explicitly incorporate knowledge into neural-net-based models • Idea: Use logical rules to constrain space of predicted probabilities
Latent Variable Models
Latent Variable Models X Y
Latent Variable Models X X Z Y ? Z ? ? Y
Neural Latent Variable Models Z X
Generating Text from Latent Space Z X X
Example: Discourse Level Modeling with VAE [Zhao+17] (Faculty: Eskenazi) • Use latent variable as a way to represent entire discourse in dialog
Handling Discrete Latent Variables [Zhou+17] (Faculty: Neubig)
Structured Latent Variables [Yin+18] (Faculty: Neubig) • Problem: Paucity of training data for structured prediction problems • Idea: Treat the structure as a latent variable in a VAE model
Better Learning Algorithms for Latent Variable Models [He+2019] (Faculty: Neubig) • Problem: When learning latent variable models, predicting the latent variables can be difficult • Solution: Perform aggressive update of the part of the model that predicts these variables
Any Questions?
Recommend
More recommend