LTI Orientation
Deep Learning Research for NLP Graham Neubig Language Processing - - PowerPoint PPT Presentation
Deep Learning Research for NLP Graham Neubig Language Processing - - PowerPoint PPT Presentation
LTI Orientation Deep Learning Research for NLP Graham Neubig Language Processing Mary prevents Peter from scoring a goal. John passes the ball upfield to Peter, who shoots for the goal. The shot is deflected by Mary and the ball goes
Language Processing
John passes the ball upfield to Peter, who shoots for the
- goal. The shot is deflected by
Mary and the ball goes out
- f bounds.
Mary prevents Peter from scoring a goal.
Structured Prediction
X Y
Supervised Learning
X Y
X Y
Supervised Learning w/ Neural Nets
Learning
θ
X Y
Y X
Structured Prediction w/ Neural Nets
Neural Structured Prediction
X
Model Loss
w
Y
Neural Structured Prediction
X
Model Loss
w
Y
Search
The Problem of Discrete Decisions
w
g(w)
ˆ yi(w) = ‘dog’ ˆ yi(w) = ‘the’ ˆ yi(w) = ‘cat’
Soft Search [Goyal+18]
(Faculty: Neubig)
{
e(dog) e(the) e(cat)
si(cat) si(the) si(dog)
hi+1 hi
˜ ei
α-soft argmax
X
y
e(y) · exp[α · si(y)] Z
{
peaked softmax
ˆ yi = ‘the’
argmax
Smoothed Surface
w
g(w)
α = 1 α = 10
ˆ yi(w) = ‘dog’ ˆ yi(w) = ‘the’ ˆ yi(w) = ‘cat’
Prediction over Word Embedding Space [Kumar+18]
(Faculty: Tsvetkov)
Structured Modeling w/ Neural Nets
Why Structured Neural Nets
- In pre-neural NLP we did feature engineering to
capture the salient features of text
- Now, neural nets capture features for us
- But given too much freedom they will not learn or
- verfit
- So we do architecture engineering to add bias
Structure in Language
Phrases Words Sentences
Alice gave a message to Bob
PP NP VP VP S
Documents
This film was completely unbelievable. The characters were wooden and the plot was absurd. That being said, I liked it.
BiLSTM Conditional Random Fields [Ma+15]
(Faculty: Hovy)
- Add an additional layer that ensures consistency between tags
I hate this movie <s> <s> PRP VBP DT NN
- Training and prediction use dynamic programming
Neural Factor Graph Models [Malaviya+18]
(Faculty: Gormley, Neubig)
- Problem: Neural CRFs can only handle single tag/word
- Idea: Expand to multiple tags using graphical models
Stack LSTM [Dyer+15]
(Faculty: Dyer (now DeepMind))
- verhasty
an decision was
amod
REDUCE-LEFT(amod) SHIFT
| {z }
| {z }
| {z }
…
SHIFT RED-L(amod)
…
made S B A ∅ pt root
TOP TOP TOP
REDUCE_L REDUCE_R SHIFT
Compositional Representation
Morphological Language Models [Matthews+18]
(Faculty: Neubig, Dyer)
- Problem: Language
modeling for morphologically rich languages is hard
- Idea: Specifically
decompose input and output using morphological structure
Neural-Symbolic Integration
Neural-Symbolic Hybrids
- Neural and symbolic models better at different
things
- Neural: smoothing over differences using
similarity
- Symbolic: remembering individual single-shot
events
- How can we combine the two?
Discrete Lexicons in Neural Seq2seq [Arthur+15]
(Faculty: Neubig)
NNs + Logic Rules [Hu+16]
(Faculty: Hovy, Xing)
- Problem: It is difficult to explicitly incorporate knowledge into
neural-net-based models
- Idea: Use logical rules to constrain space of predicted probabilities
Latent Variable Models
Latent Variable Models
X Y
Latent Variable Models
X Y
?
Z
? ?
Y X
Z
Neural Latent Variable Models X
Z
Generating Text from Latent Space
Z
X X
Example: Discourse Level Modeling with VAE [Zhao+17]
(Faculty: Eskenazi)
- Use latent variable as a way to represent entire
discourse in dialog
Handling Discrete Latent Variables [Zhou+17]
(Faculty: Neubig)
Structured Latent Variables [Yin+18]
(Faculty: Neubig)
- Problem: Paucity of training data for structured prediction
problems
- Idea: Treat the structure as a latent variable in a VAE model
Better Learning Algorithms for Latent Variable Models [He+2019]
(Faculty: Neubig)
- Problem: When learning latent variable models,
predicting the latent variables can be difficult
- Solution: Perform aggressive update of the part of the
model that predicts these variables