CS11-747 Neural Networks for NLP
Using/Evaluating Sentence Representations
Graham Neubig
Site https://phontron.com/class/nn4nlp2017/
Using/Evaluating Sentence Representations Graham Neubig Site - - PowerPoint PPT Presentation
CS11-747 Neural Networks for NLP Using/Evaluating Sentence Representations Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Sentence Representations We can create a vector or sequence of vectors from a sentence this is an
CS11-747 Neural Networks for NLP
Graham Neubig
Site https://phontron.com/class/nn4nlp2017/
from a sentence this is an example this is an example “You can’t cram the meaning of a whole %&!$ing sentence into a single $&!*ing vector!” — Ray Mooney Obligatory Quote!
results
I hate this movie I love this movie
very good good neutral bad very bad very good good neutral bad very bad
I hate this movie
lookup lookup lookup lookup softmax
probs some complicated function to extract combination features (usually a CNN) scores
(Socher et al. 2013)
use a loose sense of similarity Charles O. Prince, 53, was named as Mr. Weill’s successor.
named as his successor.
(Dolan and Brockett 2005)
heuristics or classifier
(67% were)
this is an example this is another example
classifier
yes/no
(Kiros et al. 2015)
combine to make a decision this is an example this is an example
classifier
yes/no
(Ji and Eisenstein 2013)
factorization of word/ context vectors
vectors based on discriminativeness
(Marelli et al. 2014)
active↔passive, replacing w/ synonyms, etc.)
replace words w/ antonyms)
human score (e.g. Pearson’s correlation)
(Mueller and Thyagarajan 2016)
Including pre-training, using pre-trained word embeddings, etc.
this is an example this is another example
similarity
[0,1] e−||h1−h2||1
where opposite is also true)
→ The woman bought lunch
→ The woman did not buy a sandwich
→ The woman bought a sandwich for dinner
(Bowman et al. 2015)
contradicted caption for each caption
agreement between annotator and “gold” label
information in both directions, encode
https://nlp.stanford.edu/projects/snli/
(Conneau et al. 2017)
inference learn generalizable embeddings?
nuance → yes?
matches
he ate some things my database entry this is another example DB this is an example Source
than the other answers he ate some things my database entry this is another example this is an example 0.6
0.4
bad
by a margin (e.g. 1) he ate some things my database entry this is another example this is an example 0.6
0.8
bad
portion of the database as negative samples he ate some things my database entry this is another example this is an example 0.6 0.8
L(x∗, y∗, S) = X
x∈S
max(0, 1 + s(x, y∗) − s(x∗, y∗)) correct input correct
negative samples incorrect score plus one correct score
choices?”
recall curve for all queries
training
all inputs and outputs
samples
batch training)
approximate nearest neighbor search
Image Credit: https://micvog.com/2013/09/08/storm-first-story-detection/
(Hodosh et al. 2013)
generate captions