INF5820: Language technological applications Lecture 6 Evaluating Word Embeddings and Using them in Deep Neural Networks
Andrey Kutuzov, Lilja Øvrelid, Stephan Oepen, & Erik Velldal
University of Oslo
25 September 2018
1
INF5820: Language technological applications Lecture 6 Evaluating - - PowerPoint PPT Presentation
INF5820: Language technological applications Lecture 6 Evaluating Word Embeddings and Using them in Deep Neural Networks Andrey Kutuzov, Lilja vrelid, Stephan Oepen, & Erik Velldal University of Oslo 25 September 2018 1 Contents
1
1
2
2
◮ Principal Component Analysis (PCA) [Tipping and Bishop, 1999] ◮ t-distributed Stochastic Neighbor Embedding (t-SNE)
3
◮ produces a different picture each run
4
4
◮ TOEFL dataset (1997)
◮ ESSLI 2008 dataset ◮ Battig dataset (2010)
◮ RG dataset [Rubenstein and Goodenough, 1965] ◮ WordSim-353 (WS353) dataset [Finkelstein et al., 2001] ◮ MEN dataset [Bruni et al., 2014] ◮ SimLex999 dataset [Hill et al., 2015] 5
6
7
8
◮ Google Analogies dataset [Le and Mikolov, 2014]; ◮ Bigger Analogy Test Set (BATS) [Gladkova et al., 2016]; ◮ Many domain-specific test sets inspired by Google Analogies.
◮ QVEC uses words affiliations with WordNet synsets [Tsvetkov et al., 2015]; ◮ Linguistic Diagnostics Toolkit (ldtoolkit) offers a multi-factor evaluation
9
10
11
11
12
13
◮ useful to get rid of long noisy lexical tail.
14
15
◮ a matrix of row vectors; ◮ transforms integers (word identifiers) into the corresponding vectors; ◮ ...or sequences of integers into sequences of vectors.
16
16
◮ for classification, ◮ for clustering, ◮ for information retrieval (including web search). 17
18
◮ wf = (1 + log10tf ) × log10( n
df )
19
20
21
22
◮ Concatenation ◮ Multiplication ◮ Weighted sum ◮ etc...
23
24
25
◮ learn word embeddings in a usual way (shared by all documents); ◮ randomly initialize document vectors; ◮ use document vectors together with word vectors to predict the
◮ minimize error; ◮ the trained model can inference a vector for any new document (the
◮ don’t use sliding window at all; ◮ just predict all words in the current document using its vector.
26
27
28
29
30
30
31
32
33
34
35
36
37
38
39