embeddings with (almost) no bilingual data Mik ikel l Art - - PowerPoint PPT Presentation

โ–ถ
embeddings with almost
SMART_READER_LITE
LIVE PREVIEW

embeddings with (almost) no bilingual data Mik ikel l Art - - PowerPoint PPT Presentation

Learning bilingual word embeddings with (almost) no bilingual data Mik ikel l Art rtetxe, Gorka Labaka, Eneko Agirre IXA NLP group University of the Basque Country (UPV/EHU) Who cares? Who cares? word embeddings are useful! Who


slide-1
SLIDE 1

Learning bilingual word embeddings with (almost) no bilingual data

Mik ikel l Art rtetxe, Gorka Labaka, Eneko Agirre

IXA NLP group โ€“ University of the Basque Country (UPV/EHU)

slide-2
SLIDE 2

Who cares?

slide-3
SLIDE 3

Who cares?

word embeddings are useful!

slide-4
SLIDE 4

Who cares?

word embeddings are useful!

slide-5
SLIDE 5

Who cares?

word embeddings are useful!

slide-6
SLIDE 6

Who cares?

word embeddings are useful!

slide-7
SLIDE 7

Who cares?

  • inherently crosslingual tasks

word embeddings are useful!

slide-8
SLIDE 8

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

word embeddings are useful!

slide-9
SLIDE 9

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

bilingual signal for training word embeddings are useful!

slide-10
SLIDE 10

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

bilingual signal for training word embeddings are useful!

slide-11
SLIDE 11

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

bilingual signal for training

Previous work

word embeddings are useful!

slide-12
SLIDE 12

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

bilingual signal for training

Previous work

  • parallel corpora

word embeddings are useful!

slide-13
SLIDE 13

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

bilingual signal for training

Previous work

  • parallel corpora
  • comparable corpora

word embeddings are useful!

slide-14
SLIDE 14

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

bilingual signal for training

Previous work

  • parallel corpora
  • comparable corpora
  • (big) dictionaries

word embeddings are useful!

slide-15
SLIDE 15

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

bilingual signal for training

Previous work

  • parallel corpora
  • comparable corpora
  • (big) dictionaries

word embeddings are useful!

slide-16
SLIDE 16

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

bilingual signal for training

This talk Previous work

  • parallel corpora
  • comparable corpora
  • (big) dictionaries

word embeddings are useful!

slide-17
SLIDE 17

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

bilingual signal for training

This talk

  • 25 word dictionary

Previous work

  • parallel corpora
  • comparable corpora
  • (big) dictionaries

word embeddings are useful!

slide-18
SLIDE 18

Who cares?

  • inherently crosslingual tasks
  • crosslingual transfer learning

bilingual signal for training

This talk

  • 25 word dictionary
  • numerals (1, 2, 3โ€ฆ)

Previous work

  • parallel corpora
  • comparable corpora
  • (big) dictionaries

word embeddings are useful!

slide-19
SLIDE 19

Bilingual embedding mappings

slide-20
SLIDE 20

Bilingual embedding mappings

๐‘Œ ๐‘Ž Bas Basque En English

slide-21
SLIDE 21

Bilingual embedding mappings

๐‘Œ ๐‘Ž

See eed dicti ictionary ry

Bas Basque En English

slide-22
SLIDE 22

Bilingual embedding mappings

Txakur Sagar โ‹ฎ Egutegi Dog Apple โ‹ฎ Calendar

๐‘Œ ๐‘Ž

See eed dicti ictionary ry

Bas Basque En English

slide-23
SLIDE 23

Bilingual embedding mappings

Txakur Sagar โ‹ฎ Egutegi Dog Apple โ‹ฎ Calendar

๐‘‹ ๐‘Œ ๐‘Ž

See eed dicti ictionary ry

Bas Basque En English

slide-24
SLIDE 24

Bilingual embedding mappings

Txakur Sagar โ‹ฎ Egutegi Dog Apple โ‹ฎ Calendar

๐‘‹ ๐‘Œ ๐‘Ž ๐‘Œ๐‘‹

See eed dicti ictionary ry

Bas Basque En English

slide-25
SLIDE 25

Bilingual embedding mappings

๐‘Œ1,โˆ— ๐‘Œ2,โˆ— โ‹ฎ ๐‘Œ๐‘œ,โˆ— ๐‘‹ โ‰ˆ ๐‘Ž1,โˆ— ๐‘Ž2,โˆ— โ‹ฎ ๐‘Ž๐‘œ,โˆ—

Txakur Sagar โ‹ฎ Egutegi Dog Apple โ‹ฎ Calendar

๐‘‹ ๐‘Œ ๐‘Ž ๐‘Œ๐‘‹

See eed dicti ictionary ry

Bas Basque En English

slide-26
SLIDE 26

Bilingual embedding mappings

๐‘Œ1,โˆ— ๐‘Œ2,โˆ— โ‹ฎ ๐‘Œ๐‘œ,โˆ— ๐‘‹ โ‰ˆ ๐‘Ž1,โˆ— ๐‘Ž2,โˆ— โ‹ฎ ๐‘Ž๐‘œ,โˆ—

Txakur Sagar โ‹ฎ Egutegi Dog Apple โ‹ฎ Calendar

๐‘‹ ๐‘Œ ๐‘Ž ๐‘Œ๐‘‹

See eed dicti ictionary ry

Bas Basque En English

slide-27
SLIDE 27

Bilingual embedding mappings

๐‘Œ1,โˆ— ๐‘Œ2,โˆ— โ‹ฎ ๐‘Œ๐‘œ,โˆ— ๐‘‹ โ‰ˆ ๐‘Ž1,โˆ— ๐‘Ž2,โˆ— โ‹ฎ ๐‘Ž๐‘œ,โˆ—

Txakur Sagar โ‹ฎ Egutegi Dog Apple โ‹ฎ Calendar

๐‘‹ ๐‘Œ ๐‘Ž ๐‘Œ๐‘‹

arg min

๐‘‹โˆˆ๐‘ƒ(๐‘œ)

เท

๐‘—

๐‘Œ๐‘—โˆ—๐‘‹ โˆ’ ๐‘Ž

๐‘˜โˆ— 2

Bas Basque En English

slide-28
SLIDE 28

Bilingual embedding mappings

๐‘Œ1,โˆ— ๐‘Œ2,โˆ— โ‹ฎ ๐‘Œ๐‘œ,โˆ— ๐‘‹ โ‰ˆ ๐‘Ž1,โˆ— ๐‘Ž2,โˆ— โ‹ฎ ๐‘Ž๐‘œ,โˆ—

Txakur Sagar โ‹ฎ Egutegi Dog Apple โ‹ฎ Calendar

๐‘‹ ๐‘Œ ๐‘Ž ๐‘Œ๐‘‹

arg min

๐‘‹โˆˆ๐‘ƒ(๐‘œ)

เท

๐‘—

๐‘Œ๐‘—โˆ—๐‘‹ โˆ’ ๐‘Ž

๐‘˜โˆ— 2

Bas Basque En English

slide-29
SLIDE 29

Bilingual embedding mappings

๐‘Œ1,โˆ— ๐‘Œ2,โˆ— โ‹ฎ ๐‘Œ๐‘œ,โˆ— ๐‘‹ โ‰ˆ ๐‘Ž1,โˆ— ๐‘Ž2,โˆ— โ‹ฎ ๐‘Ž๐‘œ,โˆ—

Txakur Sagar โ‹ฎ Egutegi Dog Apple โ‹ฎ Calendar

๐‘‹ ๐‘Œ ๐‘Ž ๐‘Œ๐‘‹

arg min

๐‘‹โˆˆ๐‘ƒ(๐‘œ)

เท

๐‘—

๐‘Œ๐‘—โˆ—๐‘‹ โˆ’ ๐‘Ž

๐‘˜โˆ— 2

Bas Basque En English

slide-30
SLIDE 30

Bilingual embedding mappings

๐‘Œ1,โˆ— ๐‘Œ2,โˆ— โ‹ฎ ๐‘Œ๐‘œ,โˆ— ๐‘‹ โ‰ˆ ๐‘Ž1,โˆ— ๐‘Ž2,โˆ— โ‹ฎ ๐‘Ž๐‘œ,โˆ—

Txakur Sagar โ‹ฎ Egutegi Dog Apple โ‹ฎ Calendar

๐‘‹ ๐‘Œ ๐‘Ž ๐‘Œ๐‘‹

arg min

๐‘‹โˆˆ๐‘ƒ(๐‘œ)

เท

๐‘—

๐‘Œ๐‘—โˆ—๐‘‹ โˆ’ ๐‘Ž

๐‘˜โˆ— 2

Bas Basque En English

slide-31
SLIDE 31

Bilingual embedding mappings

slide-32
SLIDE 32

Bilingual embedding mappings

Monolingual embeddings

slide-33
SLIDE 33

Bilingual embedding mappings

Monolingual embeddings Dictionary

slide-34
SLIDE 34

Bilingual embedding mappings

Monolingual embeddings Dictionary

slide-35
SLIDE 35

Bilingual embedding mappings

Mapping Monolingual embeddings Dictionary

slide-36
SLIDE 36

Bilingual embedding mappings

Mapping Monolingual embeddings Dictionary

slide-37
SLIDE 37

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary

slide-38
SLIDE 38

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary

better!

slide-39
SLIDE 39

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary

better!

slide-40
SLIDE 40

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary Mapping

better!

slide-41
SLIDE 41

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary Mapping

better!

slide-42
SLIDE 42

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary

better!

slide-43
SLIDE 43

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary

better! even en better!

slide-44
SLIDE 44

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary

better! even en better!

slide-45
SLIDE 45

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary

better! even en better!

slide-46
SLIDE 46

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary

better! even en better!

slide-47
SLIDE 47

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary

better! even en better!

slide-48
SLIDE 48

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary

better! even en better! even en better!

slide-49
SLIDE 49

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary

slide-50
SLIDE 50

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary

proposed self-learning method

slide-51
SLIDE 51

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary

proposed self-learning method

formalization and implementation details in the paper based on the mapping method of Artetxe et al. (2016)

slide-52
SLIDE 52

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary

proposed self-learning method

formalization and implementation details in the paper based on the mapping method of Artetxe et al. (2016)

Too good to be true?

slide-53
SLIDE 53

Bilingual embedding mappings

Mapping Dictionary Monolingual embeddings Dictionary

proposed self-learning method

formalization and implementation details in the paper based on the mapping method of Artetxe et al. (2016)

Too good to be true?

slide-54
SLIDE 54

Experiments

slide-55
SLIDE 55

Experiments

  • Dataset by Dinu et al. (2015)
slide-56
SLIDE 56

Experiments

  • Dataset by Dinu et al. (2015)

Eng English-Italian

slide-57
SLIDE 57

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

Eng English-Italian

slide-58
SLIDE 58

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

Eng English-Italian Eng English-German Eng English-Finnish

slide-59
SLIDE 59

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling)

Eng English-Italian Eng English-German Eng English-Finnish

slide-60
SLIDE 60

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 5,00 ,000 5,00 ,000

slide-61
SLIDE 61

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 5,00 ,000 25 25 5,00 ,000 25 25

slide-62
SLIDE 62

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num.

slide-63
SLIDE 63

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ‡’ Test dictionary: 1,500 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num.

slide-64
SLIDE 64

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ‡’ Test dictionary: 1,500 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num.

wor

  • rd translation inductio

ion

slide-65
SLIDE 65

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ‡’ Test dictionary: 1,500 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) Xing et al. (2015) Zhang et al. (2016) Artetxe et al. (2016)

wor

  • rd translation inductio

ion

slide-66
SLIDE 66

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ‡’ Test dictionary: 1,500 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) Xing et al. (2015) Zhang et al. (2016) Artetxe et al. (2016) Our method

wor

  • rd translation inductio

ion

slide-67
SLIDE 67

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ‡’ Test dictionary: 1,500 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47%

wor

  • rd translation inductio

ion

slide-68
SLIDE 68

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ‡’ Test dictionary: 1,500 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47%

wor

  • rd translation inductio

ion

slide-69
SLIDE 69

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ‡’ Test dictionary: 1,500 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47%

wor

  • rd translation inductio

ion

slide-70
SLIDE 70

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ‡’ Test dictionary: 1,500 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47%

wor

  • rd translation inductio

ion

slide-71
SLIDE 71

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ‡’ Test dictionary: 1,500 word pairs

Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47%

wor

  • rd translation inductio

ion

slide-72
SLIDE 72

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ‡’ Test dictionary: 1,500 word pairs

Eng English-Italian 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% Zhang et al. (2016) 36.73% 0.07% 0.27% Artetxe et al. (2016) 39.27% 0.07% 0.40% Our method 39.67 .67% 37.27 .27% 39.40 .40%

wor

  • rd translation inductio

ion

slide-73
SLIDE 73

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

slide-74
SLIDE 74

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals cros

  • ssli

lingual wor

  • rd si

simil ilarity

slide-75
SLIDE 75

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

EN EN-IT IT EN EN-DE DE WS WS RG RG WS WS

cros

  • ssli

lingual wor

  • rd si

simil ilarity

slide-76
SLIDE 76

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS

cros

  • ssli

lingual wor

  • rd si

simil ilarity

slide-77
SLIDE 77

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl

cros

  • ssli

lingual wor

  • rd si

simil ilarity

slide-78
SLIDE 78

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl Mikolov et al. (2013a) 5k dict Xing et al. (2015) 5k dict Zhang et al. (2016) 5k dict Artetxe et al. (2016) 5k dict

cros

  • ssli

lingual wor

  • rd si

simil ilarity

slide-79
SLIDE 79

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl Mikolov et al. (2013a) 5k dict Xing et al. (2015) 5k dict Zhang et al. (2016) 5k dict Artetxe et al. (2016) 5k dict Our method 5k dict 25 dict num.

cros

  • ssli

lingual wor

  • rd si

simil ilarity

slide-80
SLIDE 80

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% Our method 5k dict 62.4% 74.2% 61.6% .6% 25 dict 62.6% 74.9% .9% 61.2% num. 62.8% .8% 73.9% 60.4%

cros

  • ssli

lingual wor

  • rd si

simil ilarity

slide-81
SLIDE 81

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% Our method 5k dict 62.4% 74.2% 61.6% .6% 25 dict 62.6% 74.9% .9% 61.2% num. 62.8% .8% 73.9% 60.4%

cros

  • ssli

lingual wor

  • rd si

simil ilarity

slide-82
SLIDE 82

Experiments

  • Dataset by Dinu et al. (2015) extended to German and Finnish

โ‡’ Monolingual embeddings (CBOW + negative sampling) โ‡’ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% Our method 5k dict 62.4% 74.2% 61.6% .6% 25 dict 62.6% 74.9% .9% 61.2% num. 62.8% .8% 73.9% 60.4%

cros

  • ssli

lingual wor

  • rd si

simil ilarity

slide-83
SLIDE 83

Why does it work?

slide-84
SLIDE 84

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary

slide-85
SLIDE 85

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary

small

slide-86
SLIDE 86

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary

small la large

slide-87
SLIDE 87

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary

small no

  • err

rror la large

slide-88
SLIDE 88

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary

small no

  • err

rror la large er errors

slide-89
SLIDE 89

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary

small no

  • err

rror la large er errors

slide-90
SLIDE 90

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary

small no

  • err

rror la large er errors better?

slide-91
SLIDE 91

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary

small no

  • err

rror la large er errors better? worse?

slide-92
SLIDE 92

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary

small no

  • err

rror la large er errors better? worse?

slide-93
SLIDE 93

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary

small no

  • err

rror la large er errors better? worse? even en better?

slide-94
SLIDE 94

Why does it work?

Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary

small no

  • err

rror la large er errors better? worse? even en better? even en worse?

slide-95
SLIDE 95

Why does it work?

๐‘Ž ๐‘Œ๐‘‹

slide-96
SLIDE 96

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-97
SLIDE 97

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

Independent from seed dictionary!

slide-98
SLIDE 98

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-99
SLIDE 99

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-100
SLIDE 100

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-101
SLIDE 101

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-102
SLIDE 102

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-103
SLIDE 103

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-104
SLIDE 104

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-105
SLIDE 105

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-106
SLIDE 106

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-107
SLIDE 107

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-108
SLIDE 108

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-109
SLIDE 109

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-110
SLIDE 110

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-111
SLIDE 111

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-112
SLIDE 112

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-113
SLIDE 113

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-114
SLIDE 114

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-115
SLIDE 115

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

Independent from seed dictionary!

slide-116
SLIDE 116

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

Independent from seed dictionary!

So why do we need a seed dictionary?

slide-117
SLIDE 117

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

Independent from seed dictionary!

So why do we need a seed dictionary? Avoid poor local optima!

slide-118
SLIDE 118

Why does it work?

๐‘Ž ๐‘Œ๐‘‹ ๐‘‹โˆ— = arg max

๐‘‹

เท

๐‘—

max

๐‘˜

๐‘Œ๐‘—โˆ—๐‘‹ โˆ™ ๐‘Ž

๐‘˜โˆ—

s.t. ๐‘‹๐‘‹๐‘ˆ = ๐‘‹๐‘ˆ๐‘‹ = ๐ฝ

Implicit objective:

slide-119
SLIDE 119

Conclusions

๐‘Ž ๐‘Œ๐‘‹

slide-120
SLIDE 120

Conclusions

๐‘Ž ๐‘Œ๐‘‹

  • Simple self-learning method to train bilingual embedding mappings
slide-121
SLIDE 121

Conclusions

๐‘Ž ๐‘Œ๐‘‹

  • Simple self-learning method to train bilingual embedding mappings
  • High quality results with almost no supervision (25 words, numerals)
slide-122
SLIDE 122

Conclusions

๐‘Ž ๐‘Œ๐‘‹

  • Simple self-learning method to train bilingual embedding mappings
  • High quality results with almost no supervision (25 words, numerals)
  • Implicit optimization objective independent from seed dictionary
slide-123
SLIDE 123

Conclusions

๐‘Ž ๐‘Œ๐‘‹

  • Simple self-learning method to train bilingual embedding mappings
  • High quality results with almost no supervision (25 words, numerals)
  • Implicit optimization objective independent from seed dictionary
  • Seed dictionary necessary to avoid poor local optima
slide-124
SLIDE 124

Conclusions

๐‘Ž ๐‘Œ๐‘‹

  • Simple self-learning method to train bilingual embedding mappings
  • High quality results with almost no supervision (25 words, numerals)
  • Implicit optimization objective independent from seed dictionary
  • Seed dictionary necessary to avoid poor local optima
  • Future work: fully unsupervised training
slide-125
SLIDE 125

One more thingโ€ฆ

๐‘Ž ๐‘Œ๐‘‹ >

slide-126
SLIDE 126

One more thingโ€ฆ

๐‘Ž ๐‘Œ๐‘‹ > git clone https://github.com/artetxem/vecmap.git

slide-127
SLIDE 127

One more thingโ€ฆ

๐‘Ž ๐‘Œ๐‘‹ > git clone https://github.com/artetxem/vecmap.git >

slide-128
SLIDE 128

One more thingโ€ฆ

๐‘Ž ๐‘Œ๐‘‹ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py

slide-129
SLIDE 129

One more thingโ€ฆ

๐‘Ž ๐‘Œ๐‘‹ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals

slide-130
SLIDE 130

One more thingโ€ฆ

๐‘Ž ๐‘Œ๐‘‹ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB

slide-131
SLIDE 131

One more thingโ€ฆ

๐‘Ž ๐‘Œ๐‘‹ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB

slide-132
SLIDE 132

One more thingโ€ฆ

๐‘Ž ๐‘Œ๐‘‹ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB >

slide-133
SLIDE 133

One more thingโ€ฆ

๐‘Ž ๐‘Œ๐‘‹ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB > vecmap/reproduce_acl2017.sh

slide-134
SLIDE 134

One more thingโ€ฆ

๐‘Ž ๐‘Œ๐‘‹ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB > vecmap/reproduce_acl2017.sh

  • ENGLISH-ITALIAN
  • 5,000 WORD DICTIONARY
  • Mikolov et al. (2013a) | Translation: 34.93% MWS353: 62.66%
  • Xing et al. (2015) | Translation: 36.87% MWS353: 61.41%
  • Zhang et al. (2016) | Translation: 36.73% MWS353: 61.62%
  • Artetxe et al. (2016) | Translation: 39.27% MWS353: 61.74%
  • Proposed method | Translation: 39.67% MWS353: 62.35%

25 WORD DICTIONARY

  • Mikolov et al. (2013a) | Translation: 0.00% MWS353: -6.42%
  • Xing et al. (2015) | Translation: 0.00% MWS353: 19.49%
  • Zhang et al. (2016) | Translation: 0.07% MWS353: 15.52%
  • Artetxe et al. (2016) | Translation: 0.07% MWS353: 17.45%
  • Proposed method | Translation: 37.27% MWS353: 62.64%

NUMERAL DICTIONARY

  • Mikolov et al. (2013a) | Translation: 0.00% MWS353: 28.75%
  • Xing et al. (2015) | Translation: 0.13% MWS353: 27.75%
  • Zhang et al. (2016) | Translation: 0.27% MWS353: 27.38%
  • Artetxe et al. (2016) | Translation: 0.40% MWS353: 24.85%
  • Proposed method | Translation: 39.40% MWS353: 62.82%
slide-135
SLIDE 135

Thank you!

๐‘Ž ๐‘Œ๐‘‹

https://github.com/artetxem/vecmap