Inferring Translation Candidates for Multilingual Dictionary - - PowerPoint PPT Presentation

inferring translation candidates for multilingual
SMART_READER_LITE
LIVE PREVIEW

Inferring Translation Candidates for Multilingual Dictionary - - PowerPoint PPT Presentation

Inferring Translation Candidates for Multilingual Dictionary Generation with Multi-Way Neural Machine Translation Mihael Arcan, Daniel Torregrosa*, Sina Ahmadi* and John P. McCrae This publication has emanated from research supported in part


slide-1
SLIDE 1

Inferring Translation Candidates for Multilingual Dictionary Generation with Multi-Way Neural Machine Translation

Mihael Arcan, Daniel Torregrosa*, Sina Ahmadi* and John P. McCrae

This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, co-funded by the European Regional Development Fund, and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731015, ELEXIS - European Lexical Infrastructure.

slide-2
SLIDE 2

Introduction Neural machine translation Results Dictionary data Conclusion

2

slide-3
SLIDE 3

Motivation

  • Knowledge bases are useful for many applications, but

available in few languages

  • The creation and curation of knowledge bases is

expensive

  • Hence, few or no knowledge bases in most languages
  • Can we use machine translation to translate knowledge?

3

slide-4
SLIDE 4

Overview

  • Multi-way neural machine translation without the

targeted direction

  • Continuous training with a small curated dictionary
  • Discovery of new bilingual dictionary entries

4

slide-5
SLIDE 5

Targeted languages

ES RO CA FR EO EN IT GL PT EU

5

slide-6
SLIDE 6

Introduction Neural machine translation Results Dictionary data Conclusion

6

slide-7
SLIDE 7

Machine translation before 2014

  • Rule-based machine translation
  • Humans write rules
  • Highly customisable
  • High maintenance cost
  • Phrase-based statistical machine translation
  • Learns from parallel corpus
  • Less control on the translations

7

slide-8
SLIDE 8

Word embeddings

  • Fixed size numerical representation for words
  • From one-hot space (one dimension per difgerent word)

to embedding space

  • The embedding vector represents the context where the

word appears

8

slide-9
SLIDE 9

Long-short term memory

ct

Memory

× Output × × σ

Forget Gate

σ

Input Gate

σ

Output Gate

Input

Based on tex.stackexchange.com/questions/332747/how-to-draw-a-diagram-of-long-short-term-memory 9

slide-10
SLIDE 10

Bi-directional LSTM

. . . LSTM→ LSTM→ LSTM→ LSTM→ . . . LSTM← LSTM← LSTM← LSTM← . . . . . . ⃗ x2 ⃗ x3 ⃗ x4 ⃗ x5 ⃗ h5 ⃗ h4 ⃗ h3 ⃗ h2 … …

Based on github.com/PetarV-/TikZ 10

slide-11
SLIDE 11

Neural machine translation

11

slide-12
SLIDE 12

Neural machine translation

11

slide-13
SLIDE 13

Neural machine translation

11

slide-14
SLIDE 14

Neural machine translation

11

slide-15
SLIDE 15

Neural machine translation

11

slide-16
SLIDE 16

Neural machine translation

11

slide-17
SLIDE 17

Neural machine translation

11

slide-18
SLIDE 18

Neural machine translation

11

slide-19
SLIDE 19

Neural machine translation

11

slide-20
SLIDE 20

Subword units

  • One-hot vocabulary space has to be limited due to

performance issues

  • This generates a lot of out-of-vocabulary entries
  • To minimize the efgect, we use subword units instead of

words

12

slide-21
SLIDE 21

Byte pair encoding

  • BPE is a compression technique
  • It starts with all the difgerent characters in the corpus
  • The most frequent character combination is selected as a

BPE operation

  • This is repeated until the desired number of BPE is

reached

  • The fjnal size of the vocabulary is the number of BPE
  • perations + the alphabet

13

slide-22
SLIDE 22

Byte pair encoding example

low lower big bigger

14

slide-23
SLIDE 23

Byte pair encoding example

l o w _ l o w e r _ b i g _ b i g g e r

14

slide-24
SLIDE 24

Byte pair encoding example

l o w _ l o w e r _ b i g _ b i g g e r

14

slide-25
SLIDE 25

Byte pair encoding example

l‚o w _ l‚o w e r _ b i g _ b i g g e r

14

slide-26
SLIDE 26

Byte pair encoding example

l‚o w _ l‚o w e r _ b‚i g _ b‚i g g e r

14

slide-27
SLIDE 27

Byte pair encoding II

Present bebo bebemos Conditional bebería beberíamos bebes bebéis beberías beberíais bebe beben bebería beberían Preterit bebí bebimos Future beberé beberemos bebiste bebisteis beberás beberéis bebió bebieron beberá beberán Imperfect bebía bebíamos bebías bebíais bebía bebían

15

slide-28
SLIDE 28

Byte pair encoding II

Present bebo bebemos Conditional bebería beberíamos bebes bebéis beberías beberíais bebe beben bebería beberían Preterit bebí bebimos Future beberé beberemos bebiste bebisteis beberás beberéis bebió bebieron beberá beberán Imperfect bebía bebíamos bebías bebíais bebía bebían

15

slide-29
SLIDE 29

Multi-way model

  • The model receives corpus in several difgerent languages

both for source and target sentences

  • Each input sentence is annotated with the source

language and the requested target language

  • In our case, Spanish-English, French-Romanian and

Italian-Portuguese

16

slide-30
SLIDE 30

Continuous training

  • After training, the network is seldom able to produce text

in the requested language other than the training one

  • For example, if requested to translate Spanish to French,

it will generate English

  • We continue the training with a small corpus of sentences

17

slide-31
SLIDE 31

Dictionary data

We used three difgerent dictionaries to continue training the system

  • Spanish to French Apertium dictionary (paper)
  • Spanish-French, Spanish-Portuguese and

French-Portuguese dictionaries generated from Apertium data (task)

  • By following a cycle-based approach
  • By following a path-based approach

18

slide-32
SLIDE 32

Part of speech

  • The NMT models were trained without part of speech

(POS) data

  • To assign POS, we use monolingual dictionaries

automatically extracted from Wiktionary

  • If > the source word is in the source-language dictionary; and

> the target word is in the target-language dictionary; and > they have one or more POS tags in common,

  • generate one entry per shared POS

19

slide-33
SLIDE 33

Introduction Neural machine translation Results Dictionary data Conclusion

20

slide-34
SLIDE 34

Evaluation

  • We used a dictionary automatically extracted from

Wiktionary as gold standard

  • For those systems that have confjdence intervals, we

calculate the precision and recall for all possible thresholds

21

slide-35
SLIDE 35

Results (paper)

0.2 0.4 0.6 0.8 1 3000 6000 9000 12000 0.2 0.4 0.6 0.8 1 3000 6000 9000 12000 Precision Correct entries Spanish→French Precision Correct entries French→Spanish Apertium NMT+Apertium1 NMT+Apertium10 22

slide-36
SLIDE 36

Introduction Neural machine translation Results Dictionary data Conclusion

23

slide-37
SLIDE 37

Graph-based approaches

Basic idea: Retrieve translations based on the graph of lan- guages Two defjnitions:

  • Language graph refers to the Apertium dictionary graph
  • Translation graph refers to a graph where vertices

represent a word and edges represent the translations in

  • ther languages.

24

slide-38
SLIDE 38

Cycle-based approach

EN:antique EO:antikva EN:ancient EU:zahar FR:antique ES:antiguo

Apertium translations (black lines) in English (EN), French (FR), Basque (EU) and Esperanto (EO), and discovered possible translations (gray lines) and synonyms (red lines).

25

slide-39
SLIDE 39

Path-based approach

Traverse all simple paths using pivot-oriented inference

spring malguki muelle udaberri primavera printemps printempo primavero primavera primavera iturri fuente source fonto brollador font fonte

  • rigen
  • rigem

English (en) Basque (eu) Spanish (es) French (fr) Esperanto (eo) Catalan (ca) Portuguese (pt)

(Task) Weight translations w.r.t. frequency and path length

26

slide-40
SLIDE 40

Results (task, Wiktionary reference)

0.2 0.4 0.6 0.8 1 3000 6000 9000 12000 0.2 0.4 0.6 0.8 1 3000 6000 9000 12000 0.2 0.4 0.6 0.8 1 3000 6000 9000 12000 0.2 0.4 0.6 0.8 1 3000 6000 9000 12000 0.2 0.4 0.6 0.8 1 3000 6000 9000 12000 0.2 0.4 0.6 0.8 1 3000 6000 9000 12000 Precision English→French French→English English→Portuguese Precision Correct entries Portuguese→English Correct entries Portuguese→French Correct entries French→Portuguese Cycle Path NMT-Cycle NMT-Path 27

slide-41
SLIDE 41

Introduction Neural machine translation Results Dictionary data Conclusion

28

slide-42
SLIDE 42

Conclusion

  • Using neural machine translation with
  • Existing bilingual knowledge (Paper)
  • Discovered bilingual knowledge (Task)
  • to generate new dictionaries.

29