[PPT] - On the Limitations of Unsupervised Bilingual Dictionary Induction PowerPoint Presentation

SLIDE 1

On the Limitations of Unsupervised Bilingual Dictionary Induction

Sebastian Ruder Ivan Vulić Anders Søgaard

SLIDE 2

2

Background:  Unsupervised MT

SLIDE 3

Recently: Unsupervised neural machine translation

(Artetxe et al., ICLR 2018; Lample et al., ICLR 2018)

2

Background:  Unsupervised MT

SLIDE 4

Recently: Unsupervised neural machine translation

(Artetxe et al., ICLR 2018; Lample et al., ICLR 2018)

2

Background:  Unsupervised MT

SLIDE 5

Recently: Unsupervised neural machine translation

(Artetxe et al., ICLR 2018; Lample et al., ICLR 2018)

2

Background:  Unsupervised MT

Key component:

Initialization via unsupervised cross-lingual alignment of word embedding spaces

SLIDE 6

3

Background:  Cross-lingual word embeddings

SLIDE 7

Cross-lingual word embeddings enable cross-lingual

transfer

3

Background:  Cross-lingual word embeddings

SLIDE 8

Cross-lingual word embeddings enable cross-lingual

transfer

Most common approach: Project one word embedding

space into another by learning a transformation matrix   between source embeddings and their translations

3

Background:  Cross-lingual word embeddings

W xi yi n

SLIDE 9

Cross-lingual word embeddings enable cross-lingual

transfer

Most common approach: Project one word embedding

space into another by learning a transformation matrix   between source embeddings and their translations (Mikolov et al., 2013)

3

Background:  Cross-lingual word embeddings

n

∑

i=1

∥Wxi − yi∥2 W xi yi n

SLIDE 10

Cross-lingual word embeddings enable cross-lingual

transfer

Most common approach: Project one word embedding

space into another by learning a transformation matrix   between source embeddings and their translations (Mikolov et al., 2013)

More recently: Use an adversarial setup to learn an

unsupervised mapping

3

Background:  Cross-lingual word embeddings

n

∑

i=1

∥Wxi − yi∥2 W xi yi n

SLIDE 11

Cross-lingual word embeddings enable cross-lingual

transfer

Most common approach: Project one word embedding

space into another by learning a transformation matrix   between source embeddings and their translations (Mikolov et al., 2013)

More recently: Use an adversarial setup to learn an

unsupervised mapping

Assumption: Word embedding spaces are approximately

isomorphic, i.e. same number of vertices, connected the same way.

3

Background:  Cross-lingual word embeddings

n

∑

i=1

∥Wxi − yi∥2 W xi yi n

SLIDE 12

4

How similar are embeddings across languages?

SLIDE 13

Nearest neighbour (NN) graphs of top 10 most frequent

words in English and German are not isomorphic.

4

How similar are embeddings across languages?

SLIDE 14

Nearest neighbour (NN) graphs of top 10 most frequent

words in English and German are not isomorphic.

NN graphs of top 10 most frequent English words and their

translations into German

4

How similar are embeddings across languages?

English German

SLIDE 15

Nearest neighbour (NN) graphs of top 10 most frequent

words in English and German are not isomorphic.

NN graphs of top 10 most frequent English words and their

translations into German

4

How similar are embeddings across languages?

English German

Not isomorphic

SLIDE 16

5

How similar are embeddings across languages?

SLIDE 17

NN graphs of top 10 most frequent English nouns and their

translations

5

How similar are embeddings across languages?

English German

SLIDE 18

NN graphs of top 10 most frequent English nouns and their

translations

5

How similar are embeddings across languages?

English German

Not isomorphic

SLIDE 19

NN graphs of top 10 most frequent English nouns and their

translations

5

How similar are embeddings across languages?

English German

Not isomorphic

Word embeddings are not approximately isomorphic across languages.

SLIDE 20

6

How do we quantify similarity?

SLIDE 21

Need a metric to measure how similar two NN graphs

and of different languages are

6

How do we quantify similarity?

G1 G2

SLIDE 22

Need a metric to measure how similar two NN graphs

and of different languages are

Propose eigenvector similarity

6

How do we quantify similarity?

G1 G2

SLIDE 23

Need a metric to measure how similar two NN graphs

and of different languages are

Propose eigenvector similarity
: adjacency matrices of

6

How do we quantify similarity?

G1 G2 A1, A2 G1, G2

SLIDE 24

Need a metric to measure how similar two NN graphs

and of different languages are

Propose eigenvector similarity
: adjacency matrices of
: degree matrices of

6

How do we quantify similarity?

G1 G2 A1, A2 G1, G2 G1, G2 D1, D2

SLIDE 25

Need a metric to measure how similar two NN graphs

and of different languages are

Propose eigenvector similarity
: adjacency matrices of
: degree matrices of
: Laplacians of

6

How do we quantify similarity?

G1 G2 A1, A2 G1, G2 G1, G2 D1, D2 L1 = D1 − A1, L2 = D2 − A2 G1, G2

SLIDE 26

Need a metric to measure how similar two NN graphs

and of different languages are

Propose eigenvector similarity
: adjacency matrices of
: degree matrices of
: Laplacians of
: eigenvalues (spectra) of

6

How do we quantify similarity?

G1 G2 A1, A2 G1, G2 G1, G2 D1, D2 L1 = D1 − A1, L2 = D2 − A2 G1, G2 λ1, λ2 L1, L2

SLIDE 27

Need a metric to measure how similar two NN graphs

and of different languages are

Propose eigenvector similarity
: adjacency matrices of
: degree matrices of
: Laplacians of
: eigenvalues (spectra) of

6

How do we quantify similarity?

G1 G2 A1, A2 G1, G2 G1, G2 D1, D2 L1 = D1 − A1, L2 = D2 − A2 G1, G2 λ1, λ2 L1, L2 Δ =

k

∑

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

Metric: where

SLIDE 28

7

How do we quantify similarity?

Δ =

k

∑

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

Metric: where

SLIDE 29

7

How do we quantify similarity?

Quantifies how much two NN graphs are isospectral, i.e.

they have the same spectrum (same sets of eigenvalues). Δ =

k

∑

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

Metric: where

SLIDE 30

7

How do we quantify similarity?

Quantifies how much two NN graphs are isospectral, i.e.

they have the same spectrum (same sets of eigenvalues).

Isomorphic isospectral, but isospectral isomorphic

→ ↛ Δ =

k

∑

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

Metric: where

SLIDE 31

7

How do we quantify similarity?

Quantifies how much two NN graphs are isospectral, i.e.

they have the same spectrum (same sets of eigenvalues).

Isomorphic isospectral, but isospectral isomorphic
→

↛ Δ : G1, G2 → [0,∞) Δ =

k

∑

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

Metric: where

SLIDE 32

7

How do we quantify similarity?

Quantifies how much two NN graphs are isospectral, i.e.

they have the same spectrum (same sets of eigenvalues).

Isomorphic isospectral, but isospectral isomorphic
: are isospectral (very similar)

→ ↛ Δ : G1, G2 → [0,∞) Δ = 0 G1, G2 Δ =

k

∑

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

Metric: where

SLIDE 33

7

How do we quantify similarity?

Quantifies how much two NN graphs are isospectral, i.e.

they have the same spectrum (same sets of eigenvalues).

Isomorphic isospectral, but isospectral isomorphic
: are isospectral (very similar)
: become less similar

→ ↛ Δ : G1, G2 → [0,∞) Δ = 0 G1, G2 Δ → ∞ G1, G2 Δ =

k

∑

i=1

(λ1i − λ2i)2 k = min

j {

∑k

i=1 λji

∑n

i=1 λji

> 0.9}

Metric: where

SLIDE 34

8

Unsupervised cross-lingual learning assumptions

SLIDE 35

Besides isomorphism, several other implicit assumptions

8

Unsupervised cross-lingual learning assumptions

SLIDE 36

Besides isomorphism, several other implicit assumptions
May or may not scale to low-resource languages

8

Unsupervised cross-lingual learning assumptions

SLIDE 37

Besides isomorphism, several other implicit assumptions
May or may not scale to low-resource languages

8

Unsupervised cross-lingual learning assumptions

Conneau et al. (2018) This work

SLIDE 38

Besides isomorphism, several other implicit assumptions
May or may not scale to low-resource languages

8

Unsupervised cross-lingual learning assumptions

Conneau et al. (2018) This work Languages Dependent-marking, fusional and isolating Agglutinative, many cases

SLIDE 39

Besides isomorphism, several other implicit assumptions
May or may not scale to low-resource languages

8

Unsupervised cross-lingual learning assumptions

Conneau et al. (2018) This work Languages Dependent-marking, fusional and isolating Agglutinative, many cases Corpora Comparable (Wikipedia) Different domains

SLIDE 40

Besides isomorphism, several other implicit assumptions
May or may not scale to low-resource languages

8

Unsupervised cross-lingual learning assumptions

Conneau et al. (2018) This work Languages Dependent-marking, fusional and isolating Agglutinative, many cases Corpora Comparable (Wikipedia) Different domains Algorithms/ hyperparameters Same Different

SLIDE 41

9

Conneau et al. (2018)

SLIDE 42

1. Monolingual word embeddings:

Learn monolingual vector spaces and .

9

Conneau et al. (2018)

X Y

SLIDE 43

1. Monolingual word embeddings:

Learn monolingual vector spaces and .

2. Adversarial mapping:

Learn a translation matrix . Train discriminator to discriminate samples from and .

9

Conneau et al. (2018)

X Y W WX Y

SLIDE 44

3. Refinement (Procrustes analysis):

Build bilingual dictionary of frequent words using . Learn a new based on frequent word pairs.

10

Conneau et al. (2018)

W W

SLIDE 45

3. Refinement (Procrustes analysis):

Build bilingual dictionary of frequent words using . Learn a new based on frequent word pairs.

4. Cross-domain similarity local scaling (CSLS):

Use similarity measure that increases similarity of isolated word vectors, decreases similarity of vectors in dense areas.

10

Conneau et al. (2018)

W W

SLIDE 46

11

A simple weakly supervised method

SLIDE 47

Extract identically spelled words in both languages

11

A simple weakly supervised method

SLIDE 48

Extract identically spelled words in both languages
Use these as bilingual seed words

11

A simple weakly supervised method

SLIDE 49

Extract identically spelled words in both languages
Use these as bilingual seed words
Run refinement step of Conneau et al. (2018)

11

A simple weakly supervised method

SLIDE 50

12

Experiments:  Bilingual dictionary induction

SLIDE 51

12

Experiments:  Bilingual dictionary induction

Given a list of source language words, find the closest

target language word in the cross-lingual embedding space

SLIDE 52

12

Experiments:  Bilingual dictionary induction

Given a list of source language words, find the closest

target language word in the cross-lingual embedding space

Compare against a gold standard dictionary

SLIDE 53

12

Experiments:  Bilingual dictionary induction

Given a list of source language words, find the closest

target language word in the cross-lingual embedding space

Compare against a gold standard dictionary
Metric: Precision at 1 (P@1)

SLIDE 54

12

Experiments:  Bilingual dictionary induction

Given a list of source language words, find the closest

target language word in the cross-lingual embedding space

Compare against a gold standard dictionary
Metric: Precision at 1 (P@1)
Use fastText monolingual embeddings

SLIDE 55

12

Experiments:  Bilingual dictionary induction

Given a list of source language words, find the closest

target language word in the cross-lingual embedding space

Compare against a gold standard dictionary
Metric: Precision at 1 (P@1)
Use fastText monolingual embeddings

Conneau et al. (2018) This work Languages  (English to) French, German, Chinese, Russian, Spanish Estonian (ET), Finnish (FI), Greek (EL), Hungarian (HU), Polish (PL), Turkish

SLIDE 56

13

Impact of language similarity

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI

Unsupervised (Adversarial) Weakly supervised (Identical strings)

SLIDE 57

13

Impact of language similarity

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI

Unsupervised (Adversarial) Weakly supervised (Identical strings)

SLIDE 58

13

Impact of language similarity

Unsupervised approaches are challenged by languages that

are not isolating and not dependent marking

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI

Unsupervised (Adversarial) Weakly supervised (Identical strings)

SLIDE 59

13

Impact of language similarity

Unsupervised approaches are challenged by languages that

are not isolating and not dependent marking

Naive supervision leads to competitive performance on

similar language pairs and better results for dissimilar pairs

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI

Unsupervised (Adversarial) Weakly supervised (Identical strings)

SLIDE 60

14

Impact of language similarity

Eigenvector similarity 2 4 6 8 BDI performance 22.5 45 67.5 90

SLIDE 61

14

Impact of language similarity

Eigenvector similarity 2 4 6 8 BDI performance 22.5 45 67.5 90

SLIDE 62

14

Impact of language similarity

Eigenvector similarity strongly correlates with BDI

performance

Eigenvector similarity 2 4 6 8 BDI performance 22.5 45 67.5 90

(ρ ∼ 0.89)

SLIDE 63

15

Impact of domain differences

SLIDE 64

15

Impact of domain differences

Source and target embeddings induced on 3 corpora:

EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

SLIDE 65

15

Impact of domain differences

English-Spanish

P@1

17.5 35 52.5 70

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

Source and target embeddings induced on 3 corpora:

EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

SLIDE 66

15

Impact of domain differences

English-Spanish

P@1

17.5 35 52.5 70

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

Source and target embeddings induced on 3 corpora:

EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

SLIDE 67

15

Impact of domain differences

English-Spanish

P@1

17.5 35 52.5 70

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

Source and target embeddings induced on 3 corpora:

EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

SLIDE 68

15

Impact of domain differences

English-Spanish

P@1

17.5 35 52.5 70

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

Source and target embeddings induced on 3 corpora:

EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

Unsupervised approaches break down when domains are

dissimilar

SLIDE 69

15

Impact of domain differences

English-Spanish

P@1

17.5 35 52.5 70

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

Source and target embeddings induced on 3 corpora:

EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)

Unsupervised approaches break down when domains are

dissimilar

SLIDE 70

16

Impact of domain differences

English-Finnish

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

SLIDE 71

16

Impact of domain differences

English-Finnish

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

SLIDE 72

16

Impact of domain differences

English-Finnish

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

Domain differences may exacerbate difficulties of

generalising across dissimilar languages

SLIDE 73

17

Impact of domain differences

English-Hungarian

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

SLIDE 74

17

Impact of domain differences

English-Hungarian

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

SLIDE 75

17

Impact of domain differences

English-Hungarian

P@1

7.5 15 22.5 30

Domain similarity

0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA

Domain similarity Unsupervised Weakly supervised

Weak supervision helps to bridge domain differences, but

performance still deteriorates

SLIDE 76

18

Impact of hyper-parameters

SLIDE 77

18

Impact of hyper-parameters

Settings: English with skipgram, win=2, ngrams=3-6

SLIDE 78

18

Impact of hyper-parameters

Settings: English with skipgram, win=2, ngrams=3-6
Vary hyper-parameters of Spanish embeddings

SLIDE 79

18

Impact of hyper-parameters

P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7

English-Spanish (skipgram) English-Spanish (cbow)

Settings: English with skipgram, win=2, ngrams=3-6
Vary hyper-parameters of Spanish embeddings

≠ ≠ ≠

SLIDE 80

18

Impact of hyper-parameters

P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7

English-Spanish (skipgram) English-Spanish (cbow)

Settings: English with skipgram, win=2, ngrams=3-6
Vary hyper-parameters of Spanish embeddings

≠ ≠ ≠

SLIDE 81

18

Impact of hyper-parameters

P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7

English-Spanish (skipgram) English-Spanish (cbow)

Settings: English with skipgram, win=2, ngrams=3-6
Vary hyper-parameters of Spanish embeddings

≠ ≠ ≠

SLIDE 82

19

Impact of hyper-parameters

P@1 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7

English-Spanish (skipgram) English-Spanish (cbow)

Different algorithms introduce embedding spaces with

wildly different structures.

≠ ≠ ≠

SLIDE 83

20

Impact of dimensionality

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR

300-dimensional embeddings 40-dimensional embeddings

SLIDE 84

20

Impact of dimensionality

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR

300-dimensional embeddings 40-dimensional embeddings

SLIDE 85

Worse performance overall, but better performance for

dissimilar language pairs (Estonian, Finnish, Greek).

20

Impact of dimensionality

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR

300-dimensional embeddings 40-dimensional embeddings

SLIDE 86

Worse performance overall, but better performance for

dissimilar language pairs (Estonian, Finnish, Greek).

Monolingual word embeddings may overfit to rare

peculiarities of languages.

20

Impact of dimensionality

P@1 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR

300-dimensional embeddings 40-dimensional embeddings

SLIDE 87

21

Impact of evaluation procedure

SLIDE 88

21

Impact of evaluation procedure

Part-of-speech:

Performance on verbs is lowest across the board.

SLIDE 89

21

Impact of evaluation procedure

Part-of-speech:

Performance on verbs is lowest across the board.

Frequency:

Sensitivity to frequency for Hungarian, but less so for Spanish.

SLIDE 90

21

Impact of evaluation procedure

Part-of-speech:

Performance on verbs is lowest across the board.

Frequency:

Sensitivity to frequency for Hungarian, but less so for Spanish.

Homographs:

Lower precision due to loan words/proper names. High precision for free with weak supervision.

SLIDE 91

22

Takeaways

SLIDE 92

Word embedding spaces are not approximately

isomorphic across languages.

22

Takeaways

SLIDE 93

Word embedding spaces are not approximately

isomorphic across languages.

We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

22

Takeaways

SLIDE 94

Word embedding spaces are not approximately

isomorphic across languages.

We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

Eigenvector similarity strongly correlates with

unsupervised bilingual dictionary induction performance.

22

Takeaways

SLIDE 95

Word embedding spaces are not approximately

isomorphic across languages.

We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

Eigenvector similarity strongly correlates with

unsupervised bilingual dictionary induction performance.

Limitations of unsupervised bilingual dictionary induction:

22

Takeaways

SLIDE 96

Word embedding spaces are not approximately

isomorphic across languages.

We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

Eigenvector similarity strongly correlates with

unsupervised bilingual dictionary induction performance.

Limitations of unsupervised bilingual dictionary induction:
Morphologically rich languages.

22

Takeaways

SLIDE 97

Word embedding spaces are not approximately

isomorphic across languages.

We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

Eigenvector similarity strongly correlates with

unsupervised bilingual dictionary induction performance.

Limitations of unsupervised bilingual dictionary induction:
Morphologically rich languages.
Corpora from different domains.

22

Takeaways

SLIDE 98

Word embedding spaces are not approximately

isomorphic across languages.

We can use eigenvector similarity to characterise the

relatedness of two monolingual vector spaces.

Eigenvector similarity strongly correlates with

unsupervised bilingual dictionary induction performance.

Limitations of unsupervised bilingual dictionary induction:
Morphologically rich languages.
Corpora from different domains.
Different word embedding algorithms.

22

On the Limitations of Unsupervised Bilingual Dictionary Induction

Background: Unsupervised MT

Background: Unsupervised MT

Background: Unsupervised MT

Background: Unsupervised MT

Background: Cross-lingual word embeddings

Background: Cross-lingual word embeddings

Background: Cross-lingual word embeddings

Background: Cross-lingual word embeddings

Background: Cross-lingual word embeddings

Background: Cross-lingual word embeddings

How similar are embeddings across languages?

How similar are embeddings across languages?

How similar are embeddings across languages?

How similar are embeddings across languages?

How similar are embeddings across languages?

How similar are embeddings across languages?

How similar are embeddings across languages?

How similar are embeddings across languages?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

How do we quantify similarity?

Unsupervised cross-lingual learning assumptions

Unsupervised cross-lingual learning assumptions

Unsupervised cross-lingual learning assumptions

Unsupervised cross-lingual learning assumptions

Unsupervised cross-lingual learning assumptions

Unsupervised cross-lingual learning assumptions

Unsupervised cross-lingual learning assumptions

Conneau et al. (2018)

Conneau et al. (2018)

Conneau et al. (2018)

Conneau et al. (2018)

Conneau et al. (2018)

A simple weakly supervised method

A simple weakly supervised method

A simple weakly supervised method

A simple weakly supervised method

Experiments: Bilingual dictionary induction

Experiments: Bilingual dictionary induction

Experiments: Bilingual dictionary induction

Experiments: Bilingual dictionary induction

Experiments: Bilingual dictionary induction

Experiments: Bilingual dictionary induction

Impact of language similarity

Impact of language similarity

Impact of language similarity

Impact of language similarity

Impact of language similarity

Impact of language similarity

Impact of language similarity

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of domain differences

Impact of hyper-parameters

Impact of hyper-parameters

Impact of hyper-parameters

Impact of hyper-parameters

Impact of hyper-parameters

Background:  Unsupervised MT

Background:  Unsupervised MT

Background:  Unsupervised MT

Background:  Unsupervised MT

Background:  Cross-lingual word embeddings

Background:  Cross-lingual word embeddings

Background:  Cross-lingual word embeddings

Background:  Cross-lingual word embeddings

Background:  Cross-lingual word embeddings

Background:  Cross-lingual word embeddings

Experiments:  Bilingual dictionary induction

Experiments:  Bilingual dictionary induction

Experiments:  Bilingual dictionary induction

Experiments:  Bilingual dictionary induction

Experiments:  Bilingual dictionary induction

Experiments:  Bilingual dictionary induction