Natural Language Processing (CSEP 517): Distributional Semantics
Roy Schwartz
c 2017 University of Washington roysch@cs.washington.edu
May 15, 2017
1 / 59
Natural Language Processing (CSEP 517): Distributional Semantics - - PowerPoint PPT Presentation
Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c University of Washington roysch@cs.washington.edu May 15, 2017 1 / 59 To-Do List Read: (Jurafsky and Martin, 2016a,b) 2 / 59 Distributional
1 / 59
2 / 59
3 / 59
4 / 59
5 / 59
6 / 59
7 / 59
8 / 59
9 / 59
10 / 59
11 / 59
◮ No requirement for any sort of annotation in the general case 12 / 59
◮ vdog = (cat: 10, leash: 15, loyal: 27, bone: 8, piano: 0, cloud: 0, . . . )
13 / 59
14 / 59
15 / 59
16 / 59
◮ Pointwise mutual information (PMI), TF-IDF 17 / 59
◮ Pointwise mutual information (PMI), TF-IDF
◮ Singular value decomposition (SVD), principal component analysis (PCA), matrix
18 / 59
◮ Pointwise mutual information (PMI), TF-IDF
◮ Singular value decomposition (SVD), principal component analysis (PCA), matrix
◮ Bag-of-words context, document context (Latent Semantic Analysis (LSA)),
19 / 59
20 / 59
◮ Synonym detection ◮ TOEFL (Landauer and Dumais, 1997) ◮ Word clustering ◮ CLUTO (Karypis, 2002) 21 / 59
◮ Synonym detection ◮ TOEFL (Landauer and Dumais, 1997) ◮ Word clustering ◮ CLUTO (Karypis, 2002)
◮ Semantic Similarity ◮ RG-65 (Rubenstein and Goodenough, 1965), wordsim353 (Finkelstein et al., 2001),
◮ Word Analogies ◮ Mikolov et al. (2013) 22 / 59
23 / 59
24 / 59
25 / 59
◮ These representations turn out to be quite effective vector space representations ◮ Word embeddings 26 / 59
◮ Initially language models, nowadays virtually every sequence level NLP task ◮ Bengio et al. (2003); Collobert and Weston (2008); Collobert et al. (2011);
27 / 59
◮ Initially language models, nowadays virtually every sequence level NLP task ◮ Bengio et al. (2003); Collobert and Weston (2008); Collobert et al. (2011);
28 / 59
29 / 59
30 / 59
31 / 59
32 / 59
1Along with GloVe (Pennington et al., 2014) 33 / 59
34 / 59
35 / 59
36 / 59
37 / 59
38 / 59
39 / 59
◮ vgood + vday = vgood day 40 / 59
41 / 59
42 / 59
◮ In particular, long short-term memory (LSTM, Hochreiter and Schmidhuber (1997))
43 / 59
◮ In particular, long short-term memory (LSTM, Hochreiter and Schmidhuber (1997))
◮ These tasks traditionally relied on syntactic information ◮ Many of these results come from the UW NLP group 44 / 59
45 / 59
◮ Machine translation (Ling et al., 2015) ◮ Syntactic parsing (Ballesteros et al., 2015)
46 / 59
47 / 59
◮ Synonymy: high — tall ◮ Co-hyponymy: dog — cat ◮ Association: coffee — cup ◮ Dissimilarity: good — bad ◮ Attributional similarity: banana — the sun (both are yellow) ◮ Morphological similarity: going — crying (same verb tense) ◮ Schwartz et al. (2015); Rubinstein et al. (2015); Cotterell et al. (2016)
48 / 59
◮ Which capture general word association
◮ Dependency links (Pad´
◮ Symmetric patterns (e.g., “X and Y”, Schwartz et al. (2015, 2016)) ◮ Substitute vectors (Yatbaz et al., 2012) ◮ Morphemes (Cotterell et al., 2016)
49 / 59
◮ Part of the model (Yu and Dredze, 2014; Kiela et al., 2015) ◮ Post-processing (Faruqui et al., 2015; Mrkˇ
50 / 59
◮ Most prominently visual
◮ They are also able to capture visual attributes such as size and color, which are often
51 / 59
◮ vdog ∼ vperro
52 / 59
53 / 59
54 / 59
55 / 59
56 / 59
57 / 59
58 / 59
59 / 59