Lesson 10 Deep learning for NLP: Mul6lingual Word Sequence Modeling
December 15, 2016
EPFL Doctoral Course EE-724 Nikolaos Pappas Idiap Research Ins6tute, Mar6gny
Lesson 10 Deep learning for NLP: Mul6lingual Word Sequence Modeling - - PowerPoint PPT Presentation
Human Language Technology: Applica6on to Informa6on Access Lesson 10 Deep learning for NLP: Mul6lingual Word Sequence Modeling December 15, 2016 EPFL Doctoral Course EE-724 Nikolaos Pappas Idiap Research Ins6tute, Mar6gny Outline of the
December 15, 2016
EPFL Doctoral Course EE-724 Nikolaos Pappas Idiap Research Ins6tute, Mar6gny
Nikolaos Pappas
/88
2
* Figure from Lebret's thesis, EPFL, 2016
Nikolaos Pappas
/88
3
Nikolaos Pappas
/88
4
king - man + woman ≈ queen
Nikolaos Pappas
/88
5
Nikolaos Pappas
/88
➡ neural ones implicitly do SVD over a PMI matrix ➡ similar to count-based when using the same tricks
➡ efficient and scalable objec6ve + toolkit ➡ intui6ve formula6on (=predict words in context)
6
Nikolaos Pappas
/88
7
Nikolaos Pappas
/88
8
Nikolaos Pappas
/88
9
Nikolaos Pappas
/88
10
* Figure from Gouts et al., 2015.
Nikolaos Pappas
/88
and analogy rela6ons between words
11
* Figure from Gouts et al., 2015.
Nikolaos Pappas
/88
Sentence by sentence and word alignments
Sentence by sentence alignments
Documents with topic or label alignments
Word by word transla6ons
Really!
12
Annotation cost low high
Nikolaos Pappas
/88
Nikolaos Pappas
/88
Nikolaos Pappas
/88
15
(Gows et al., 2016)
Nikolaos Pappas
/88
16
Nikolaos Pappas
/88
17
Nikolaos Pappas
/88
18
(Gows et al., 2015)
Nikolaos Pappas
/88
19
(Pappas et al., 2016)
concept = adjec6ve-noun-phrase (ANP)
Nikolaos Pappas
/88
20
(Jou et al., 2015)
Nikolaos Pappas
/88
21
(Pappas et al., 2016)
Nikolaos Pappas
/88
22
(Pappas et al., 2016)
Nikolaos Pappas
/88
23
(Pappas et al., 2016)
Nikolaos Pappas
/88
24
(Pappas et al., 2016)
Nikolaos Pappas
/88
25
(Pappas et al., 2016) (Pappas et al., 2016)
Nikolaos Pappas
/88
26
(Pappas et al., 2016) (Pappas et al., 2016)
Nikolaos Pappas
/88
27
(Pappas et al., 2016)
Nikolaos Pappas
/88
28
concept retrieval, clustering and sen6ment predic6on
Nikolaos Pappas
/88
29
usually work bejer than it in several mul6lingual or crosslingual NLP tasks without parallel data
Nikolaos Pappas
/88
30
* Figure from Colah’s blog, 2015.
Nikolaos Pappas
/88
31
simply “likelihood of a text”: P(w1, w2, …, wt)
Nikolaos Pappas
/88
32
Nikolaos Pappas
/88
33
term dependencies: Hochreiter and Schmidhuber 1997
Simple RNN:
* Figure from Colah’s blog, 2015.
Nikolaos Pappas
/88
34
term dependencies: Hochreiter and Schmidhuber 1997
state regulated by “gates”
* Figure from Colah’s blog, 2015.
Nikolaos Pappas
/88
35
and input gates into a single “update gate”
zt: update gate — rt: reset gate — ht: regular RNN update
* Figure from Colah’s blog, 2015.
Nikolaos Pappas
/88
36
(Irsoy and Cardie, 2014)
Nikolaos Pappas
/88
37
(Collobert et al., 2011) (Kim, 2014)
applied every k words:
constraining to gramma6cal phrases
Nikolaos Pappas
/88
38
(Tang et al., 2015)
Nikolaos Pappas
/59
39
to each input posi6on given encoder hidden state for that posi6on and the previous decoder state
(Bahdanau et al., 2015)
Nikolaos Pappas
/59
40
hidden states: Pappas and Popescu-Belis 2016)
respect to the target labels
(Pappas and Popescu-Belis, 2014)
Nikolaos Pappas
/88
41
* Figure from Colah’s blog, 2015.
Nikolaos Pappas
/88
42
given the source sequence:
(Cho et al., 2014)
Nikolaos Pappas
/88
43
(Sutskever et al., 2014)
Nikolaos Pappas
/88
44
(Sutskever et al., 2014)
Nikolaos Pappas
/88
45
(Bahdanau et al., 2015)
the needed informa6on in the last encoder state?
Nikolaos Pappas
/88
46
(Bahdanau et al., 2015)
Nikolaos Pappas
/88
47
(Luong et al., 2015)
Nikolaos Pappas
/88
48
(Zoph and Knight, 2016)
directly on trilingual data
any (f, g) pair
model and concatenate context from mul6ple sources
Nikolaos Pappas
/88
49
(Zoph and Knight, 2016)
French English and German English pairs
Nikolaos Pappas
/88
50
(Zoph and Knight, 2016)
French English and German English pairs
Nikolaos Pappas
/88
51
(Dong et al., 2015)
language transla6on
Nikolaos Pappas
/88
52
(Dong et al., 2015)
and moses baselines
datasets
convergence in mul6ple language transla6on
Nikolaos Pappas
/88
53
(Firat et al., 2016)
mul6ple encoders and decoders shared across pairs
expensive O(L^2)
all pairs!
Figure: n_th encoder and m_th decoder at 6mestep t / φ makes encoder & decoder states compa6ble with the ajen6on mechanism / f_adp makes context vector compa6ble with the decoder → all these transforma6ons to support different types of encoders/decoders for different languages!
Nikolaos Pappas
/88
54
(Firat et al., 2016)
resource languages
bigger the improvement
source or target language for all pairs → bejer decoder ?
Nikolaos Pappas
/88
55
(Firat et al., 2016)
resource languages
bigger the improvement
source or target language for all pairs → bejer decoder ?
Nikolaos Pappas
/88
56
(Wu et al., 2016)
Nikolaos Pappas
/88
57
(Wu et al., 2016)
Nikolaos Pappas
/88
58
mul6ple pairs of languages jointly and with other tasks → Image cap6oning, Speech recogni6on !
(Luong, Cho, Manning tutorial, 2016)
in document classifica6on will be key
→ Effec6ve Ajen6on / Memory?
Nikolaos Pappas
/88
59
* Figure from Colah’s blog, 2015.
Nikolaos Pappas
/88
60
(Le et al., 2014)
Nikolaos Pappas
/88
61
document-level sen6ment classifica6on
(Le et al., 2014)
Nikolaos Pappas
/88
62
(Kim et al., 2014)
hidden units during back-propaga6on)
Nikolaos Pappas
/88
63
(Kim et al., 2014)
Nikolaos Pappas
/88
64
(Denil et al., 2014)
Nikolaos Pappas
/88
65
(Denil et al., 2014)
Nikolaos Pappas
/88
66
(Denil et al., 2014)
Nikolaos Pappas
/88
67
(Denil et al., 2014)
Nikolaos Pappas
/88
68
(Tang et al., 2015)
Nikolaos Pappas
/88
69
(Tang et al., 2015)
Nikolaos Pappas
/88
70
recurrent, recursive NNs
(Pappas and Popescu-Belis, 2014)
Nikolaos Pappas
/88
71
(Pappas and Popescu-Belis, 2014)
Nikolaos Pappas
/88
72
(Pappas and Popescu-Belis, 2014)
Nikolaos Pappas
/88
73
(Pappas and Popescu-Belis, 2014)
Nikolaos Pappas
/88
74
(Pappas and Popescu-Belis, 2014)
Nikolaos Pappas
/88
75
(Pappas and Popescu-Belis, 2014)
superior than alterna6ves
but to a different extent
features used
without using:
Nikolaos Pappas
/88
76
(Pappas and Popescu-Belis, 2016)
Nikolaos Pappas
/88
77
(Yang et al., 2016)
structure as Tang et al., 2015 except average pooling
word and document levels
Nikolaos Pappas
/88
78
(Yang et al., 2016)
Nikolaos Pappas
/88
79
target classes and examples
support mul6ple languages but only monolingual classifica6on is possible
Nikolaos Pappas
/88
80
Nikolaos Pappas
/88
81
there is lack of parallel data
establishment of neural methods
Nikolaos Pappas
/88
82
1188-1196. 2014.
summarising documents with a single convolu6onal neural network." arXiv preprint arXiv:1406.3830, 2014.
document classifica6on." In Proceedings of the 2016 Conference of the North American Chapter of the Associa6on for Computa6onal Linguis6cs: Human Language Technologies. 2016.
classifica6on." In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422-1432, 2015.
neural machine transla6on." Computer Speech & Language, 2016.
visual sen6ment concept matching.” In Interna6onal Conference of Mul6media Retrieval, 2016.
based sen6ment analysis." In Conference on Empirical Methods in Natural Language Processing, 2014.
Under review.
Nikolaos Pappas
/88
83
"Google's Neural Machine Transla6on System: Bridging the Gap between Human and Machine Transla6on." arXiv preprint arXiv:1609.08144, 2016.
In Proceedings of the 53rd Annual Mee6ng of the ACL and the 7th Interna6onal Joint Conference on Natural Language Processing, pp. 1723-1732. 2015.
transla6on." arXiv preprint arXiv:1508.04025, 2015.
translate." arXiv preprint arXiv:1409.0473, 2014.
neural informa6on processing systems, pp. 3104-3112. 2014.
Yoshua Bengio. "Learning phrase representa6ons using RNN encoder-decoder for sta6s6cal machine transla6on." arXiv preprint arXiv:1406.1078, 2014.
Informa6on Processing Systems, pp. 2096-2104. 2014.
networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).
Nikolaos Pappas
/88
84
Visual Concept Retrieval and Clustering.”, under review, 2016.
2012.
preprint arXiv:1312.6173, 2013.
Associa6on for Computa6onal Linguis6cs, 2014.
indexing for cross-lingual nlp." In The 53rd Annual Mee6ng of the Associa6on for Computa6onal Linguis6cs and the 7th Interna6onal Joint Conference of the Asian Federa6on of Natural Language Processing (ACL-IJCNLP 2015), 2015.
mul6lingual word embeddings." arXiv preprint arXiv:1602.01925, 2016.
distributed representa6ons." In Proceedings of the 53rd Annual Mee6ng of the Associa6on for Computa6onal Linguis6cs and the 7th Interna6onal Joint Conference on Natural Language Processing, vol. 1, pp. 1234-1244, 2015.
model." arXiv preprint arXiv:1501.02598, 2015.
arXiv:1506.01070 (2015).
Nikolaos Pappas
/59
➡ Online courses
hjps://www.coursera.org/learn/neural-networks
hjps://www.coursera.org/learn/machine-learning
hjp://cs224d.stanford.edu/
➡ Conference tutorials
tutorial. hjp://nlp.stanford.edu/courses/NAACL2013/
Concepts to Documents”, EMNLP 2015 tutorial. hjp://www.emnlp2015.org/tutorials.html#t1
Processing”, NAACL 2016 tutorial. hjp://naacl.org/naacl-hlt-2016/t2.html
85
Nikolaos Pappas
/59
➡ Deep learning toolkits
➡ Pre-trained word vectors and codes
hjps://code.google.com/p/word2vec/
hjp://nlp.stanford.edu/projects/glove/
hjps://github.com/rlebret/hpca
hjp://wordvectors.org/
86