On the Limitations of Unsupervised Bilingual Dictionary Induction
Sebastian Ruder Ivan Vulić Anders Søgaard
On the Limitations of Unsupervised Bilingual Dictionary Induction - - PowerPoint PPT Presentation
On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian Ruder Ivan Vuli Background: Unsupervised MT 2 Background: Unsupervised MT Recently: Unsupervised neural machine translation (Artetxe
Sebastian Ruder Ivan Vulić Anders Søgaard
2
(Artetxe et al., ICLR 2018; Lample et al., ICLR 2018)
2
(Artetxe et al., ICLR 2018; Lample et al., ICLR 2018)
2
(Artetxe et al., ICLR 2018; Lample et al., ICLR 2018)
2
Initialization via unsupervised cross-lingual alignment of word embedding spaces
3
transfer
3
transfer
space into another by learning a transformation matrix between source embeddings and their translations
3
W xi yi n
transfer
space into another by learning a transformation matrix between source embeddings and their translations (Mikolov et al., 2013)
3
n
∑
i=1
∥Wxi − yi∥2 W xi yi n
transfer
space into another by learning a transformation matrix between source embeddings and their translations (Mikolov et al., 2013)
unsupervised mapping
3
n
∑
i=1
∥Wxi − yi∥2 W xi yi n
transfer
space into another by learning a transformation matrix between source embeddings and their translations (Mikolov et al., 2013)
unsupervised mapping
isomorphic, i.e. same number of vertices, connected the same way.
3
n
∑
i=1
∥Wxi − yi∥2 W xi yi n
4
words in English and German are not isomorphic.
4
words in English and German are not isomorphic.
translations into German
4
English German
words in English and German are not isomorphic.
translations into German
4
English German
5
translations
5
English German
translations
5
English German
translations
5
English German
Word embeddings are not approximately isomorphic across languages.
6
and of different languages are
6
G1 G2
and of different languages are
6
G1 G2
and of different languages are
6
G1 G2 A1, A2 G1, G2
and of different languages are
6
G1 G2 A1, A2 G1, G2 G1, G2 D1, D2
and of different languages are
6
G1 G2 A1, A2 G1, G2 G1, G2 D1, D2 L1 = D1 − A1, L2 = D2 − A2 G1, G2
and of different languages are
6
G1 G2 A1, A2 G1, G2 G1, G2 D1, D2 L1 = D1 − A1, L2 = D2 − A2 G1, G2 λ1, λ2 L1, L2
and of different languages are
6
G1 G2 A1, A2 G1, G2 G1, G2 D1, D2 L1 = D1 − A1, L2 = D2 − A2 G1, G2 λ1, λ2 L1, L2 Δ =
k
∑
i=1
(λ1i − λ2i)2 k = min
j {
∑k
i=1 λji
∑n
i=1 λji
> 0.9}
7
Δ =
k
∑
i=1
(λ1i − λ2i)2 k = min
j {
∑k
i=1 λji
∑n
i=1 λji
> 0.9}
7
they have the same spectrum (same sets of eigenvalues). Δ =
k
∑
i=1
(λ1i − λ2i)2 k = min
j {
∑k
i=1 λji
∑n
i=1 λji
> 0.9}
7
they have the same spectrum (same sets of eigenvalues).
→ ↛ Δ =
k
∑
i=1
(λ1i − λ2i)2 k = min
j {
∑k
i=1 λji
∑n
i=1 λji
> 0.9}
7
they have the same spectrum (same sets of eigenvalues).
↛ Δ : G1, G2 → [0,∞) Δ =
k
∑
i=1
(λ1i − λ2i)2 k = min
j {
∑k
i=1 λji
∑n
i=1 λji
> 0.9}
7
they have the same spectrum (same sets of eigenvalues).
→ ↛ Δ : G1, G2 → [0,∞) Δ = 0 G1, G2 Δ =
k
∑
i=1
(λ1i − λ2i)2 k = min
j {
∑k
i=1 λji
∑n
i=1 λji
> 0.9}
7
they have the same spectrum (same sets of eigenvalues).
→ ↛ Δ : G1, G2 → [0,∞) Δ = 0 G1, G2 Δ → ∞ G1, G2 Δ =
k
∑
i=1
(λ1i − λ2i)2 k = min
j {
∑k
i=1 λji
∑n
i=1 λji
> 0.9}
8
8
8
8
Conneau et al. (2018) This work
8
Conneau et al. (2018) This work Languages Dependent-marking, fusional and isolating Agglutinative, many cases
8
Conneau et al. (2018) This work Languages Dependent-marking, fusional and isolating Agglutinative, many cases Corpora Comparable (Wikipedia) Different domains
8
Conneau et al. (2018) This work Languages Dependent-marking, fusional and isolating Agglutinative, many cases Corpora Comparable (Wikipedia) Different domains Algorithms/ hyperparameters Same Different
9
Learn monolingual vector spaces and .
9
X Y
Learn monolingual vector spaces and .
Learn a translation matrix . Train discriminator to discriminate samples from and .
9
X Y W WX Y
Build bilingual dictionary of frequent words using . Learn a new based on frequent word pairs.
10
W W
Build bilingual dictionary of frequent words using . Learn a new based on frequent word pairs.
Use similarity measure that increases similarity of isolated word vectors, decreases similarity of vectors in dense areas.
10
W W
11
11
11
11
12
12
target language word in the cross-lingual embedding space
12
target language word in the cross-lingual embedding space
12
target language word in the cross-lingual embedding space
12
target language word in the cross-lingual embedding space
12
target language word in the cross-lingual embedding space
Conneau et al. (2018) This work Languages (English to) French, German, Chinese, Russian, Spanish Estonian (ET), Finnish (FI), Greek (EL), Hungarian (HU), Polish (PL), Turkish
13
P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI
Unsupervised (Adversarial) Weakly supervised (Identical strings)
13
P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI
Unsupervised (Adversarial) Weakly supervised (Identical strings)
13
are not isolating and not dependent marking
P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI
Unsupervised (Adversarial) Weakly supervised (Identical strings)
13
are not isolating and not dependent marking
similar language pairs and better results for dissimilar pairs
P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI
Unsupervised (Adversarial) Weakly supervised (Identical strings)
14
Eigenvector similarity 2 4 6 8 BDI performance 22.5 45 67.5 90
14
Eigenvector similarity 2 4 6 8 BDI performance 22.5 45 67.5 90
14
performance
Eigenvector similarity 2 4 6 8 BDI performance 22.5 45 67.5 90
(ρ ∼ 0.89)
15
15
EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)
15
English-Spanish
P@1
17.5 35 52.5 70
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)
15
English-Spanish
P@1
17.5 35 52.5 70
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)
15
English-Spanish
P@1
17.5 35 52.5 70
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)
15
English-Spanish
P@1
17.5 35 52.5 70
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)
dissimilar
15
English-Spanish
P@1
17.5 35 52.5 70
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)
dissimilar
16
English-Finnish
P@1
7.5 15 22.5 30
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
16
English-Finnish
P@1
7.5 15 22.5 30
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
16
English-Finnish
P@1
7.5 15 22.5 30
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
generalising across dissimilar languages
17
English-Hungarian
P@1
7.5 15 22.5 30
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
17
English-Hungarian
P@1
7.5 15 22.5 30
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
17
English-Hungarian
P@1
7.5 15 22.5 30
Domain similarity
0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
performance still deteriorates
18
18
18
18
P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7
English-Spanish (skipgram) English-Spanish (cbow)
≠ ≠ ≠
18
P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7
English-Spanish (skipgram) English-Spanish (cbow)
≠ ≠ ≠
18
P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7
English-Spanish (skipgram) English-Spanish (cbow)
≠ ≠ ≠
19
P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7
English-Spanish (skipgram) English-Spanish (cbow)
wildly different structures.
≠ ≠ ≠
20
P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR
300-dimensional embeddings 40-dimensional embeddings
20
P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR
300-dimensional embeddings 40-dimensional embeddings
dissimilar language pairs (Estonian, Finnish, Greek).
20
P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR
300-dimensional embeddings 40-dimensional embeddings
dissimilar language pairs (Estonian, Finnish, Greek).
peculiarities of languages.
20
P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR
300-dimensional embeddings 40-dimensional embeddings
21
21
Performance on verbs is lowest across the board.
21
Performance on verbs is lowest across the board.
Sensitivity to frequency for Hungarian, but less so for Spanish.
21
Performance on verbs is lowest across the board.
Sensitivity to frequency for Hungarian, but less so for Spanish.
Lower precision due to loan words/proper names. High precision for free with weak supervision.
22
isomorphic across languages.
22
isomorphic across languages.
relatedness of two monolingual vector spaces.
22
isomorphic across languages.
relatedness of two monolingual vector spaces.
unsupervised bilingual dictionary induction performance.
22
isomorphic across languages.
relatedness of two monolingual vector spaces.
unsupervised bilingual dictionary induction performance.
22
isomorphic across languages.
relatedness of two monolingual vector spaces.
unsupervised bilingual dictionary induction performance.
22
isomorphic across languages.
relatedness of two monolingual vector spaces.
unsupervised bilingual dictionary induction performance.
22
isomorphic across languages.
relatedness of two monolingual vector spaces.
unsupervised bilingual dictionary induction performance.
22