CS11-747 Neural Networks for NLP
Models of Words
Graham Neubig
Site https://phontron.com/class/nn4nlp2019/
Models of Words Graham Neubig Site - - PowerPoint PPT Presentation
CS11-747 Neural Networks for NLP Models of Words Graham Neubig Site https://phontron.com/class/nn4nlp2019/ What do we want to know about words? Are they the same part of speech? Do they have the same conjugation? Do these two words
CS11-747 Neural Networks for NLP
Graham Neubig
Site https://phontron.com/class/nn4nlp2019/
went-to-school-at)?
speech, semantic relations
without the effort?
Image Credit: NLTK
semantics may be included
2013)
easily accessible in many traditional resources
and test on another, (e.g. parsing)
word2vec)
(Summary of Goldberg 10.4)
(Harris 1954); distribution of words indicative of usage
from lexical resources such as WordNet, etc.
values, each representing activations
a discrete symbol (one-hot vector)
(see Goldberg 10.4.1)
(try it yourself w/ kwic.py)
context, with rows as word, columns as contexts
(or generalized Jaccard similarity, others)
network
giving a
lookup lookup
probs
softmax
+ bias = scores
tanh( W1*h + b1)
sentence, other methods possible!
count-based methods
lookup lookup lookup lookup
giving a at the *** + + + probs
softmax
= scores
= talk loss
lookup
talk
= giving loss a at the
and prediction-based methods (Levy and Goldberg 2014)
factorization with PMI and discount for number of samples k (sampling covered next time)
Mw,c = PMI(w, c) − log(k)
satisfies desiderata Why? Start:
Meaningful in linear space (differences, dot products) Word/context invariance Robust to low-freq. ctxts.
End:
embeddings
topical embeddings
words with same inflection grouped
examples.
for visualization (e.g. Mikolov et al. 2013)
dimensional space
that give each other a high probability according to a Gaussian
(Image credit: Derksen 2016) PCA t-SNE
(categorization from Schnabel et al 2015)
cosine similarity and human eval of similarity?
embeddings, and measure purity of clusters.
noun is a typical argument of a verb.
embeddings
model parameters
across languages
racial biases)
Morpheme-based (Luong et al. 2013) Character-based (Ling et al. 2015)
(Wieting et al. 2016) where
<wh, whe, her, ere, re>
different embeddings (Reisinger and Mooney 2010)
2017)
would like our vectors to match (Faruqui et al. 2015)
information content of non-zero dimensions for each word (e.g. Murphy et al. 2012)
neutralize, and ensure that they are neutral in that direction
https://github.com/facebookresearch/fastText/
https://github.com/facebookresearch/fastText/blob/master/ pretrained-vectors.md