On the Limitations of Unsupervised Bilingual Dictionary Induction - - PowerPoint PPT Presentation

on the limitations of unsupervised bilingual dictionary
SMART_READER_LITE
LIVE PREVIEW

On the Limitations of Unsupervised Bilingual Dictionary Induction - - PowerPoint PPT Presentation

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian Ruder Ivan Vuli Background: Unsupervised MT 2 Background: Unsupervised MT Recently: Unsupervised neural machine translation (Artetxe


slide-1
SLIDE 1

On the Limitations of Unsupervised Bilingual Dictionary Induction

Sebastian Ruder Ivan Vulić Anders Søgaard

slide-2
SLIDE 2

2

Background:
 Unsupervised MT

slide-3
SLIDE 3
  • Recently: Unsupervised neural machine translation

(Artetxe et al., ICLR 2018; Lample et al., ICLR 2018)

2

Background:
 Unsupervised MT

slide-4
SLIDE 4
  • Recently: Unsupervised neural machine translation

(Artetxe et al., ICLR 2018; Lample et al., ICLR 2018)

2

Background:
 Unsupervised MT

slide-5
SLIDE 5
  • Recently: Unsupervised neural machine translation

(Artetxe et al., ICLR 2018; Lample et al., ICLR 2018)

2

Background:
 Unsupervised MT

  • Key component:

Initialization via unsupervised cross-lingual alignment of word embedding spaces

slide-6
SLIDE 6

3

Background:
 Cross-lingual word embeddings

slide-7
SLIDE 7
  • Cross-lingual word embeddings enable cross-lingual

transfer

3

Background:
 Cross-lingual word embeddings

slide-8
SLIDE 8
  • Cross-lingual word embeddings enable cross-lingual

transfer

  • Most common approach: Project one word embedding

space into another by learning a transformation matrix 
 between source embeddings and their translations

3

Background:
 Cross-lingual word embeddings

W xi yi n

slide-9
SLIDE 9
  • Cross-lingual word embeddings enable cross-lingual

transfer

  • Most common approach: Project one word embedding

space into another by learning a transformation matrix 
 between source embeddings and their translations (Mikolov et al., 2013)

3

Background:
 Cross-lingual word embeddings

n

i=1

∥Wxi − yi∥2 W xi yi n

slide-10
SLIDE 10
  • Cross-lingual word embeddings enable cross-lingual

transfer

  • Most common approach: Project one word embedding

space into another by learning a transformation matrix 
 between source embeddings and their translations (Mikolov et al., 2013)

  • More recently: Use an adversarial setup to learn an

unsupervised mapping

3

Background:
 Cross-lingual word embeddings

n

i=1

∥Wxi − yi∥2 W xi yi n

slide-11
SLIDE 11
  • Cross-lingual word embeddings enable cross-lingual

transfer

  • Most common approach: Project one word embedding

space into another by learning a transformation matrix 
 between source embeddings and their translations (Mikolov et al., 2013)

  • More recently: Use an adversarial setup to learn an

unsupervised mapping

  • Assumption: Word embedding spaces are approximately

isomorphic, i.e. same number of vertices, connected the same way.

3

Background:
 Cross-lingual word embeddings

n

i=1

∥Wxi − yi∥2 W xi yi n

slide-12
SLIDE 12

4

How similar are embeddings across languages?

slide-13
SLIDE 13
  • Nearest neighbour (NN) graphs of top 10 most frequent

words in English and German are not isomorphic.

4

How similar are embeddings across languages?

slide-14
SLIDE 14
  • Nearest neighbour (NN) graphs of top 10 most frequent

words in English and German are not isomorphic.

  • NN graphs of top 10 most frequent English words and their

translations into German

4

How similar are embeddings across languages?

English German

slide-15
SLIDE 15
  • Nearest neighbour (NN) graphs of top 10 most frequent

words in English and German are not isomorphic.

  • NN graphs of top 10 most frequent English words and their

translations into German

4

How similar are embeddings across languages?

English German

  • Not isomorphic
slide-16
SLIDE 16

5

How similar are embeddings across languages?

slide-17
SLIDE 17
  • NN graphs of top 10 most frequent English nouns and their

translations

5

How similar are embeddings across languages?

English German

slide-18
SLIDE 18
  • NN graphs of top 10 most frequent English nouns and their

translations

5

How similar are embeddings across languages?

English German

  • Not isomorphic
slide-19
SLIDE 19
  • NN graphs of top 10 most frequent English nouns and their

translations

5

How similar are embeddings across languages?

English German

  • Not isomorphic

Word embeddings are not approximately isomorphic across languages.

slide-20
SLIDE 20

6

How do we quantify similarity?

slide-21
SLIDE 21
  • Need a metric to measure how similar two NN graphs

and of different languages are

6

How do we quantify similarity?

G1 G2

slide-22
SLIDE 22
  • Need a metric to measure how similar two NN graphs

and of different languages are

  • Propose eigenvector similarity

6

How do we quantify similarity?

G1 G2

slide-23
SLIDE 23
  • Need a metric to measure how similar two NN graphs

and of different languages are

  • Propose eigenvector similarity
  • : adjacency matrices of

6

How do we quantify similarity?

G1 G2 A1, A2 G1, G2

slide-24
SLIDE 24
  • Need a metric to measure how similar two NN graphs

and of different languages are

  • Propose eigenvector similarity
  • : adjacency matrices of
  • : degree matrices of

6

How do we quantify similarity?

G1 G2 A1, A2 G1, G2 G1, G2 D1, D2

slide-25
SLIDE 25
  • Need a metric to measure how similar two NN graphs

and of different languages are

  • Propose eigenvector similarity
  • : adjacency matrices of
  • : degree matrices of
  • : Laplacians of

6

How do we quantify similarity?

G1 G2 A1, A2 G1, G2 G1, G2 D1, D2 L1 = D1 − A1, L2 = D2 − A2 G1, G2

slide-26
SLIDE 26
  • Need a metric to measure how similar two NN graphs

and of different languages are

  • Propose eigenvector similarity
  • : adjacency matrices of
  • : degree matrices of
  • : Laplacians of
  • : eigenvalues (spectra) of

6

How do we quantify similarity?

G1 G2 A1, A2 G1, G2 G1, G2 D1, D2 L1 = D1 − A1, L2 = D2 − A2 G1, G2 λ1, λ2 L1, L2

slide-27
SLIDE 27
  • Need a metric to measure how similar two NN graphs

and of different languages are

  • Propose eigenvector similarity
  • : adjacency matrices of
  • : degree matrices of
  • : Laplacians of
  • : eigenvalues (spectra) of

6

How do we quantify similarity?

G1 G2 A1, A2 G1, G2 G1, G2 D1, D2 L1 = D1 − A1, L2 = D2 − A2 G1, G2 λ1, λ2 L1, L2 Δ =

k

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

  • Metric: where
slide-28
SLIDE 28

7

How do we quantify similarity?

Δ =

k

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

  • Metric: where
slide-29
SLIDE 29

7

How do we quantify similarity?

  • Quantifies how much two NN graphs are isospectral, i.e.

they have the same spectrum (same sets of eigenvalues). Δ =

k

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

  • Metric: where
slide-30
SLIDE 30

7

How do we quantify similarity?

  • Quantifies how much two NN graphs are isospectral, i.e.

they have the same spectrum (same sets of eigenvalues).

  • Isomorphic isospectral, but isospectral isomorphic

→ ↛ Δ =

k

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

  • Metric: where
slide-31
SLIDE 31

7

How do we quantify similarity?

  • Quantifies how much two NN graphs are isospectral, i.e.

they have the same spectrum (same sets of eigenvalues).

  • Isomorphic isospectral, but isospectral isomorphic

↛ Δ : G1, G2 → [0,∞) Δ =

k

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

  • Metric: where
slide-32
SLIDE 32

7

How do we quantify similarity?

  • Quantifies how much two NN graphs are isospectral, i.e.

they have the same spectrum (same sets of eigenvalues).

  • Isomorphic isospectral, but isospectral isomorphic
  • : are isospectral (very similar)

→ ↛ Δ : G1, G2 → [0,∞) Δ = 0 G1, G2 Δ =

k

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

  • Metric: where
slide-33
SLIDE 33

7

How do we quantify similarity?

  • Quantifies how much two NN graphs are isospectral, i.e.

they have the same spectrum (same sets of eigenvalues).

  • Isomorphic isospectral, but isospectral isomorphic
  • : are isospectral (very similar)
  • : become less similar

→ ↛ Δ : G1, G2 → [0,∞) Δ = 0 G1, G2 Δ → ∞ G1, G2 Δ =

k

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

  • Metric: where
slide-34
SLIDE 34

8

Unsupervised cross-lingual learning assumptions

slide-35
SLIDE 35
  • Besides isomorphism, several other implicit assumptions

8

Unsupervised cross-lingual learning assumptions

slide-36
SLIDE 36
  • Besides isomorphism, several other implicit assumptions
  • May or may not scale to low-resource languages

8

Unsupervised cross-lingual learning assumptions

slide-37
SLIDE 37
  • Besides isomorphism, several other implicit assumptions
  • May or may not scale to low-resource languages

8

Unsupervised cross-lingual learning assumptions

Conneau et al. (2018) This work

slide-38
SLIDE 38
  • Besides isomorphism, several other implicit assumptions
  • May or may not scale to low-resource languages

8

Unsupervised cross-lingual learning assumptions

Conneau et al. (2018) This work Languages Dependent-marking, fusional and isolating Agglutinative, many cases

slide-39
SLIDE 39
  • Besides isomorphism, several other implicit assumptions
  • May or may not scale to low-resource languages

8

Unsupervised cross-lingual learning assumptions

Conneau et al. (2018) This work Languages Dependent-marking, fusional and isolating Agglutinative, many cases Corpora Comparable (Wikipedia) Different domains

slide-40
SLIDE 40
  • Besides isomorphism, several other implicit assumptions
  • May or may not scale to low-resource languages

8

Unsupervised cross-lingual learning assumptions

Conneau et al. (2018) This work Languages Dependent-marking, fusional and isolating Agglutinative, many cases Corpora Comparable (Wikipedia) Different domains Algorithms/ hyperparameters Same Different

slide-41
SLIDE 41

9

Conneau et al. (2018)

slide-42
SLIDE 42
  • 1. Monolingual word embeddings:


Learn monolingual vector spaces and .

9

Conneau et al. (2018)

X Y

slide-43
SLIDE 43
  • 1. Monolingual word embeddings:


Learn monolingual vector spaces and .

  • 2. Adversarial mapping:


Learn a translation matrix . Train discriminator to discriminate samples from and .

9

Conneau et al. (2018)

X Y W WX Y

slide-44
SLIDE 44
  • 3. Refinement (Procrustes analysis):


Build bilingual dictionary of frequent words using . Learn a new based on frequent word pairs.

10

Conneau et al. (2018)

W W

slide-45
SLIDE 45
  • 3. Refinement (Procrustes analysis):


Build bilingual dictionary of frequent words using . Learn a new based on frequent word pairs.

  • 4. Cross-domain similarity local scaling (CSLS):


Use similarity measure that increases similarity of isolated word vectors, decreases similarity of vectors in dense areas.

10

Conneau et al. (2018)

W W

slide-46
SLIDE 46

11

A simple weakly supervised method

slide-47
SLIDE 47
  • Extract identically spelled words in both languages

11

A simple weakly supervised method

slide-48
SLIDE 48
  • Extract identically spelled words in both languages
  • Use these as bilingual seed words

11

A simple weakly supervised method

slide-49
SLIDE 49
  • Extract identically spelled words in both languages
  • Use these as bilingual seed words
  • Run refinement step of Conneau et al. (2018)

11

A simple weakly supervised method

slide-50
SLIDE 50

12

Experiments:
 Bilingual dictionary induction

slide-51
SLIDE 51

12

Experiments:
 Bilingual dictionary induction

  • Given a list of source language words, find the closest

target language word in the cross-lingual embedding space

slide-52
SLIDE 52

12

Experiments:
 Bilingual dictionary induction

  • Given a list of source language words, find the closest

target language word in the cross-lingual embedding space

  • Compare against a gold standard dictionary
slide-53
SLIDE 53

12

Experiments:
 Bilingual dictionary induction

  • Given a list of source language words, find the closest

target language word in the cross-lingual embedding space

  • Compare against a gold standard dictionary
  • Metric: Precision at 1 (P@1)
slide-54
SLIDE 54

12

Experiments:
 Bilingual dictionary induction

  • Given a list of source language words, find the closest

target language word in the cross-lingual embedding space

  • Compare against a gold standard dictionary
  • Metric: Precision at 1 (P@1)
  • Use fastText monolingual embeddings
slide-55
SLIDE 55

12

Experiments:
 Bilingual dictionary induction

  • Given a list of source language words, find the closest

target language word in the cross-lingual embedding space

  • Compare against a gold standard dictionary
  • Metric: Precision at 1 (P@1)
  • Use fastText monolingual embeddings

Conneau et al. (2018) This work Languages
 (English to) French, German, Chinese, Russian, Spanish Estonian (ET), Finnish (FI), Greek (EL), Hungarian (HU), Polish (PL), Turkish

slide-56
SLIDE 56

13

Impact of language similarity

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI

Unsupervised (Adversarial) Weakly supervised (Identical strings)

slide-57
SLIDE 57

13

Impact of language similarity

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI

Unsupervised (Adversarial) Weakly supervised (Identical strings)

slide-58
SLIDE 58

13

Impact of language similarity

  • Unsupervised approaches are challenged by languages that

are not isolating and not dependent marking

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI

Unsupervised (Adversarial) Weakly supervised (Identical strings)

slide-59
SLIDE 59

13

Impact of language similarity

  • Unsupervised approaches are challenged by languages that

are not isolating and not dependent marking

  • Naive supervision leads to competitive performance on

similar language pairs and better results for dissimilar pairs

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI

Unsupervised (Adversarial) Weakly supervised (Identical strings)

slide-60
SLIDE 60

14

Impact of language similarity

Eigenvector similarity 2 4 6 8 BDI performance 22.5 45 67.5 90

slide-61
SLIDE 61

14

Impact of language similarity

Eigenvector similarity 2 4 6 8 BDI performance 22.5 45 67.5 90

slide-62
SLIDE 62

14

Impact of language similarity

  • Eigenvector similarity strongly correlates with BDI

performance

Eigenvector similarity 2 4 6 8 BDI performance 22.5 45 67.5 90

(ρ ∼ 0.89)

slide-63
SLIDE 63

15

Impact of domain differences

slide-64
SLIDE 64

15

Impact of domain differences

  • Source and target embeddings induced on 3 corpora:


EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

slide-65
SLIDE 65

15

Impact of domain differences

English-Spanish

P@1

17.5 35 52.5 70

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

  • Source and target embeddings induced on 3 corpora:


EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

slide-66
SLIDE 66

15

Impact of domain differences

English-Spanish

P@1

17.5 35 52.5 70

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

  • Source and target embeddings induced on 3 corpora:


EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

slide-67
SLIDE 67

15

Impact of domain differences

English-Spanish

P@1

17.5 35 52.5 70

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

  • Source and target embeddings induced on 3 corpora:


EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

slide-68
SLIDE 68

15

Impact of domain differences

English-Spanish

P@1

17.5 35 52.5 70

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

  • Source and target embeddings induced on 3 corpora:


EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

  • Unsupervised approaches break down when domains are

dissimilar

slide-69
SLIDE 69

15

Impact of domain differences

English-Spanish

P@1

17.5 35 52.5 70

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

  • Source and target embeddings induced on 3 corpora:


EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

  • Unsupervised approaches break down when domains are

dissimilar

slide-70
SLIDE 70

16

Impact of domain differences

English-Finnish

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

slide-71
SLIDE 71

16

Impact of domain differences

English-Finnish

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

slide-72
SLIDE 72

16

Impact of domain differences

English-Finnish

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

  • Domain differences may exacerbate difficulties of

generalising across dissimilar languages

slide-73
SLIDE 73

17

Impact of domain differences

English-Hungarian

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

slide-74
SLIDE 74

17

Impact of domain differences

English-Hungarian

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

slide-75
SLIDE 75

17

Impact of domain differences

English-Hungarian

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

  • Weak supervision helps to bridge domain differences, but

performance still deteriorates

slide-76
SLIDE 76

18

Impact of hyper-parameters

slide-77
SLIDE 77

18

Impact of hyper-parameters

  • Settings: English with skipgram, win=2, ngrams=3-6
slide-78
SLIDE 78

18

Impact of hyper-parameters

  • Settings: English with skipgram, win=2, ngrams=3-6
  • Vary hyper-parameters of Spanish embeddings
slide-79
SLIDE 79

18

Impact of hyper-parameters

P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7

English-Spanish (skipgram) English-Spanish (cbow)

  • Settings: English with skipgram, win=2, ngrams=3-6
  • Vary hyper-parameters of Spanish embeddings

≠ ≠ ≠

slide-80
SLIDE 80

18

Impact of hyper-parameters

P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7

English-Spanish (skipgram) English-Spanish (cbow)

  • Settings: English with skipgram, win=2, ngrams=3-6
  • Vary hyper-parameters of Spanish embeddings

≠ ≠ ≠

slide-81
SLIDE 81

18

Impact of hyper-parameters

P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7

English-Spanish (skipgram) English-Spanish (cbow)

  • Settings: English with skipgram, win=2, ngrams=3-6
  • Vary hyper-parameters of Spanish embeddings

≠ ≠ ≠

slide-82
SLIDE 82

19

Impact of hyper-parameters

P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7

English-Spanish (skipgram) English-Spanish (cbow)

  • Different algorithms introduce embedding spaces with

wildly different structures.

≠ ≠ ≠

slide-83
SLIDE 83

20

Impact of dimensionality

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR

300-dimensional embeddings 40-dimensional embeddings

slide-84
SLIDE 84

20

Impact of dimensionality

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR

300-dimensional embeddings 40-dimensional embeddings

slide-85
SLIDE 85
  • Worse performance overall, but better performance for

dissimilar language pairs (Estonian, Finnish, Greek).

20

Impact of dimensionality

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR

300-dimensional embeddings 40-dimensional embeddings

slide-86
SLIDE 86
  • Worse performance overall, but better performance for

dissimilar language pairs (Estonian, Finnish, Greek).

  • Monolingual word embeddings may overfit to rare

peculiarities of languages.

20

Impact of dimensionality

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR

300-dimensional embeddings 40-dimensional embeddings

slide-87
SLIDE 87

21

Impact of evaluation procedure

slide-88
SLIDE 88

21

Impact of evaluation procedure

  • Part-of-speech:


Performance on verbs is lowest across the board.

slide-89
SLIDE 89

21

Impact of evaluation procedure

  • Part-of-speech:


Performance on verbs is lowest across the board.

  • Frequency:


Sensitivity to frequency for Hungarian, but less so for Spanish.

slide-90
SLIDE 90

21

Impact of evaluation procedure

  • Part-of-speech:


Performance on verbs is lowest across the board.

  • Frequency:


Sensitivity to frequency for Hungarian, but less so for Spanish.

  • Homographs:


Lower precision due to loan words/proper names. High precision for free with weak supervision.

slide-91
SLIDE 91

22

Takeaways

slide-92
SLIDE 92
  • Word embedding spaces are not approximately

isomorphic across languages.

22

Takeaways

slide-93
SLIDE 93
  • Word embedding spaces are not approximately

isomorphic across languages.

  • We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

22

Takeaways

slide-94
SLIDE 94
  • Word embedding spaces are not approximately

isomorphic across languages.

  • We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

  • Eigenvector similarity strongly correlates with

unsupervised bilingual dictionary induction performance.

22

Takeaways

slide-95
SLIDE 95
  • Word embedding spaces are not approximately

isomorphic across languages.

  • We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

  • Eigenvector similarity strongly correlates with

unsupervised bilingual dictionary induction performance.

  • Limitations of unsupervised bilingual dictionary induction:

22

Takeaways

slide-96
SLIDE 96
  • Word embedding spaces are not approximately

isomorphic across languages.

  • We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

  • Eigenvector similarity strongly correlates with

unsupervised bilingual dictionary induction performance.

  • Limitations of unsupervised bilingual dictionary induction:
  • Morphologically rich languages.

22

Takeaways

slide-97
SLIDE 97
  • Word embedding spaces are not approximately

isomorphic across languages.

  • We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

  • Eigenvector similarity strongly correlates with

unsupervised bilingual dictionary induction performance.

  • Limitations of unsupervised bilingual dictionary induction:
  • Morphologically rich languages.
  • Corpora from different domains.

22

Takeaways

slide-98
SLIDE 98
  • Word embedding spaces are not approximately

isomorphic across languages.

  • We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

  • Eigenvector similarity strongly correlates with

unsupervised bilingual dictionary induction performance.

  • Limitations of unsupervised bilingual dictionary induction:
  • Morphologically rich languages.
  • Corpora from different domains.
  • Different word embedding algorithms.

22

Takeaways