Learning bilingual word embeddings with (almost) no bilingual data
Mik ikel l Art rtetxe, Gorka Labaka, Eneko Agirre
IXA NLP group โ University of the Basque Country (UPV/EHU)
embeddings with (almost) no bilingual data Mik ikel l Art - - PowerPoint PPT Presentation
Learning bilingual word embeddings with (almost) no bilingual data Mik ikel l Art rtetxe, Gorka Labaka, Eneko Agirre IXA NLP group University of the Basque Country (UPV/EHU) Who cares? Who cares? word embeddings are useful! Who
Mik ikel l Art rtetxe, Gorka Labaka, Eneko Agirre
IXA NLP group โ University of the Basque Country (UPV/EHU)
word embeddings are useful!
word embeddings are useful!
word embeddings are useful!
word embeddings are useful!
word embeddings are useful!
word embeddings are useful!
bilingual signal for training word embeddings are useful!
bilingual signal for training word embeddings are useful!
bilingual signal for training
Previous work
word embeddings are useful!
bilingual signal for training
Previous work
word embeddings are useful!
bilingual signal for training
Previous work
word embeddings are useful!
bilingual signal for training
Previous work
word embeddings are useful!
bilingual signal for training
Previous work
word embeddings are useful!
bilingual signal for training
This talk Previous work
word embeddings are useful!
bilingual signal for training
This talk
Previous work
word embeddings are useful!
bilingual signal for training
This talk
Previous work
word embeddings are useful!
๐ ๐ Bas Basque En English
๐ ๐
See eed dicti ictionary ry
Bas Basque En English
Txakur Sagar โฎ Egutegi Dog Apple โฎ Calendar
๐ ๐
See eed dicti ictionary ry
Bas Basque En English
Txakur Sagar โฎ Egutegi Dog Apple โฎ Calendar
๐ ๐ ๐
See eed dicti ictionary ry
Bas Basque En English
Txakur Sagar โฎ Egutegi Dog Apple โฎ Calendar
๐ ๐ ๐ ๐๐
See eed dicti ictionary ry
Bas Basque En English
๐1,โ ๐2,โ โฎ ๐๐,โ ๐ โ ๐1,โ ๐2,โ โฎ ๐๐,โ
Txakur Sagar โฎ Egutegi Dog Apple โฎ Calendar
๐ ๐ ๐ ๐๐
See eed dicti ictionary ry
Bas Basque En English
๐1,โ ๐2,โ โฎ ๐๐,โ ๐ โ ๐1,โ ๐2,โ โฎ ๐๐,โ
Txakur Sagar โฎ Egutegi Dog Apple โฎ Calendar
๐ ๐ ๐ ๐๐
See eed dicti ictionary ry
Bas Basque En English
๐1,โ ๐2,โ โฎ ๐๐,โ ๐ โ ๐1,โ ๐2,โ โฎ ๐๐,โ
Txakur Sagar โฎ Egutegi Dog Apple โฎ Calendar
๐ ๐ ๐ ๐๐
arg min
๐โ๐(๐)
เท
๐
๐๐โ๐ โ ๐
๐โ 2
Bas Basque En English
๐1,โ ๐2,โ โฎ ๐๐,โ ๐ โ ๐1,โ ๐2,โ โฎ ๐๐,โ
Txakur Sagar โฎ Egutegi Dog Apple โฎ Calendar
๐ ๐ ๐ ๐๐
arg min
๐โ๐(๐)
เท
๐
๐๐โ๐ โ ๐
๐โ 2
Bas Basque En English
๐1,โ ๐2,โ โฎ ๐๐,โ ๐ โ ๐1,โ ๐2,โ โฎ ๐๐,โ
Txakur Sagar โฎ Egutegi Dog Apple โฎ Calendar
๐ ๐ ๐ ๐๐
arg min
๐โ๐(๐)
เท
๐
๐๐โ๐ โ ๐
๐โ 2
Bas Basque En English
๐1,โ ๐2,โ โฎ ๐๐,โ ๐ โ ๐1,โ ๐2,โ โฎ ๐๐,โ
Txakur Sagar โฎ Egutegi Dog Apple โฎ Calendar
๐ ๐ ๐ ๐๐
arg min
๐โ๐(๐)
เท
๐
๐๐โ๐ โ ๐
๐โ 2
Bas Basque En English
Monolingual embeddings
Monolingual embeddings Dictionary
Monolingual embeddings Dictionary
Mapping Monolingual embeddings Dictionary
Mapping Monolingual embeddings Dictionary
Mapping Dictionary Monolingual embeddings Dictionary
Mapping Dictionary Monolingual embeddings Dictionary
better!
Mapping Dictionary Monolingual embeddings Dictionary
better!
Mapping Dictionary Monolingual embeddings Dictionary Mapping
better!
Mapping Dictionary Monolingual embeddings Dictionary Mapping
better!
Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary
better!
Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary
better! even en better!
Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary
better! even en better!
Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary
better! even en better!
Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary
better! even en better!
Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary
better! even en better!
Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary
better! even en better! even en better!
Mapping Dictionary Monolingual embeddings Dictionary
Mapping Dictionary Monolingual embeddings Dictionary
proposed self-learning method
Mapping Dictionary Monolingual embeddings Dictionary
proposed self-learning method
formalization and implementation details in the paper based on the mapping method of Artetxe et al. (2016)
Mapping Dictionary Monolingual embeddings Dictionary
proposed self-learning method
formalization and implementation details in the paper based on the mapping method of Artetxe et al. (2016)
Too good to be true?
Mapping Dictionary Monolingual embeddings Dictionary
proposed self-learning method
formalization and implementation details in the paper based on the mapping method of Artetxe et al. (2016)
Too good to be true?
Eng English-Italian
Eng English-Italian
Eng English-Italian Eng English-German Eng English-Finnish
โ Monolingual embeddings (CBOW + negative sampling)
Eng English-Italian Eng English-German Eng English-Finnish
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 5,00 ,000 5,00 ,000
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 5,00 ,000 25 25 5,00 ,000 25 25
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num.
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ Test dictionary: 1,500 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num.
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ Test dictionary: 1,500 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num.
wor
ion
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ Test dictionary: 1,500 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) Xing et al. (2015) Zhang et al. (2016) Artetxe et al. (2016)
wor
ion
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ Test dictionary: 1,500 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) Xing et al. (2015) Zhang et al. (2016) Artetxe et al. (2016) Our method
wor
ion
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ Test dictionary: 1,500 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47%
wor
ion
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ Test dictionary: 1,500 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47%
wor
ion
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ Test dictionary: 1,500 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47%
wor
ion
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ Test dictionary: 1,500 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47%
wor
ion
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ Test dictionary: 1,500 word pairs
Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47%
wor
ion
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals โ Test dictionary: 1,500 word pairs
Eng English-Italian 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% Zhang et al. (2016) 36.73% 0.07% 0.27% Artetxe et al. (2016) 39.27% 0.07% 0.40% Our method 39.67 .67% 37.27 .27% 39.40 .40%
wor
ion
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals cros
lingual wor
simil ilarity
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN EN-IT IT EN EN-DE DE WS WS RG RG WS WS
cros
lingual wor
simil ilarity
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS
cros
lingual wor
simil ilarity
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl
cros
lingual wor
simil ilarity
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl Mikolov et al. (2013a) 5k dict Xing et al. (2015) 5k dict Zhang et al. (2016) 5k dict Artetxe et al. (2016) 5k dict
cros
lingual wor
simil ilarity
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl Mikolov et al. (2013a) 5k dict Xing et al. (2015) 5k dict Zhang et al. (2016) 5k dict Artetxe et al. (2016) 5k dict Our method 5k dict 25 dict num.
cros
lingual wor
simil ilarity
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% Our method 5k dict 62.4% 74.2% 61.6% .6% 25 dict 62.6% 74.9% .9% 61.2% num. 62.8% .8% 73.9% 60.4%
cros
lingual wor
simil ilarity
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% Our method 5k dict 62.4% 74.2% 61.6% .6% 25 dict 62.6% 74.9% .9% 61.2% num. 62.8% .8% 73.9% 60.4%
cros
lingual wor
simil ilarity
โ Monolingual embeddings (CBOW + negative sampling) โ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% Our method 5k dict 62.4% 74.2% 61.6% .6% 25 dict 62.6% 74.9% .9% 61.2% num. 62.8% .8% 73.9% 60.4%
cros
lingual wor
simil ilarity
Mapping Dictionary Monolingual embeddings Dictionary
Mapping Dictionary Monolingual embeddings Dictionary
small
Mapping Dictionary Monolingual embeddings Dictionary
small la large
Mapping Dictionary Monolingual embeddings Dictionary
small no
rror la large
Mapping Dictionary Monolingual embeddings Dictionary
small no
rror la large er errors
Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary
small no
rror la large er errors
Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary
small no
rror la large er errors better?
Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary
small no
rror la large er errors better? worse?
Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary
small no
rror la large er errors better? worse?
Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary
small no
rror la large er errors better? worse? even en better?
Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary
small no
rror la large er errors better? worse? even en better? even en worse?
๐ ๐๐
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
Independent from seed dictionary!
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
Independent from seed dictionary!
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
Independent from seed dictionary!
So why do we need a seed dictionary?
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
Independent from seed dictionary!
So why do we need a seed dictionary? Avoid poor local optima!
๐ ๐๐ ๐โ = arg max
๐
เท
๐
max
๐
๐๐โ๐ โ ๐
๐โ
s.t. ๐๐๐ = ๐๐๐ = ๐ฝ
Implicit objective:
๐ ๐๐
๐ ๐๐
๐ ๐๐
๐ ๐๐
๐ ๐๐
๐ ๐๐
๐ ๐๐ >
๐ ๐๐ > git clone https://github.com/artetxem/vecmap.git
๐ ๐๐ > git clone https://github.com/artetxem/vecmap.git >
๐ ๐๐ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py
๐ ๐๐ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals
๐ ๐๐ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB
๐ ๐๐ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB
๐ ๐๐ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB >
๐ ๐๐ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB > vecmap/reproduce_acl2017.sh
๐ ๐๐ > git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB > vecmap/reproduce_acl2017.sh
25 WORD DICTIONARY
NUMERAL DICTIONARY
๐ ๐๐
https://github.com/artetxem/vecmap