IN5550: Neural Methods in Natural Language Processing Lecture 6 Evaluating Word Embeddings and Using them in Deep Neural Networks
Andrey Kutuzov, Vinit Ravishankar, Lilja Øvrelid, Stephan Oepen, & Erik Velldal
University of Oslo
21 February 2019
1
IN5550: Neural Methods in Natural Language Processing Lecture 6 - - PowerPoint PPT Presentation
IN5550: Neural Methods in Natural Language Processing Lecture 6 Evaluating Word Embeddings and Using them in Deep Neural Networks Andrey Kutuzov, Vinit Ravishankar, Lilja vrelid, Stephan Oepen, & Erik Velldal University of Oslo 21
1
1
2
2
◮ Principal Component Analysis (PCA) [Tipping and Bishop, 1999] ◮ t-distributed Stochastic Neighbor Embedding (t-SNE)
3
◮ produces a different picture each run
4
4
◮ TOEFL dataset (1997)
◮ ESSLI 2008 dataset ◮ Battig dataset (2010)
◮ RG dataset [Rubenstein and Goodenough, 1965] ◮ WordSim-353 (WS353) dataset [Finkelstein et al., 2001] ◮ MEN dataset [Bruni et al., 2014] ◮ SimLex999 dataset [Hill et al., 2015] 5
6
7
8
◮ Google Analogies dataset [Le and Mikolov, 2014]; ◮ Bigger Analogy Test Set (BATS) [Gladkova et al., 2016]; ◮ Many domain-specific test sets inspired by Google Analogies.
◮ QVEC uses words affiliations with WordNet synsets [Tsvetkov et al., 2015]; ◮ Linguistic Diagnostics Toolkit (ldtoolkit) offers a multi-factor evaluation
9
10
11
11
12
13
◮ useful to get rid of long noisy lexical tail.
14
◮ a matrix of row vectors; ◮ transforms integers (word identifiers) into the corresponding vectors; ◮ ...or sequences of integers into sequences of vectors.
15
15
◮ for classification, ◮ for clustering, ◮ for information retrieval (including web search). 16
17
◮ wf = (1 + log10 tf ) × log10
df
18
19
20
21
◮ Concatenation ◮ Multiplication ◮ Weighted sum ◮ etc...
22
23
24
◮ learn word embeddings in a usual way (shared by all documents); ◮ randomly initialize document vectors; ◮ use document vectors together with word vectors to predict the
◮ minimize error; ◮ the trained model can inference a vector for any new document (the
◮ don’t use sliding window at all; ◮ just predict all words in the current document using its vector.
25
26
27
28
28
29
30
31
32
33
34
35
36