IN5550: Neural Methods in Natural Language Processing Lecture 5 Distributional hypothesis and distributed word embeddings
Andrey Kutuzov, Vinit Ravishankar, Lilja Øvrelid, Stephan Oepen, & Erik Velldal
University of Oslo
14 February 2019
1
IN5550: Neural Methods in Natural Language Processing Lecture 5 - - PowerPoint PPT Presentation
IN5550: Neural Methods in Natural Language Processing Lecture 5 Distributional hypothesis and distributed word embeddings Andrey Kutuzov, Vinit Ravishankar, Lilja vrelid, Stephan Oepen, & Erik Velldal University of Oslo 14 February 2019
1
1
2
◮ Language models are trained on raw texts, no manual annotation needed. ◮ No (principal) problems with training an LM on the texts collected from
3
4
(by Dmitry Malkov) 5
6
7
8
9
10
◮ philosopher Ludwig Wittgenstein
◮ linguists Zelig Harris [Harris, 1954] and
11
12
13
14
15
16
17
18
19
20
21
22
23
24
24
◮ Continuous Bag-of-Words (CBOW), ◮ Continuous Skip-Gram (skipgram); ◮ fastText [Bojanowski et al., 2017];
25
◮ maximally similar to the vectors of its paradigmatic neighbors, ◮ minimally similar to the vectors of the words which in the training corpus
◮ ...as we have already seen in neural language models. 26
◮ [Mikolov et al., 2013] ◮ https://code.google.com/p/word2vec/ 27
28
29
30
31
32
33
34
35
36
37
38
39
40
◮ partially solves the out-of-vocabulary (OOV) words problem.
41
42
43
44
◮ words and sequences of values representing their vectors, one word per
◮ the first line gives information on the number of words in the model and
◮ uses NumPy arrays; ◮ stores a lot of additional information (training weights, hyperparameters,
45
46
◮ should ‘coffee’ be more similar to ‘tea’ than to ‘bean’?
◮ Recent ‘contextualized embeddings’ (ELMo) offer a solution; more on
47
48
(by Dmitry Malkov) 49
49
50
50
51
52
53
54
55
56
57